CN118116402A - Bilinear filtering-based multichannel voice noise reduction method - Google Patents
Bilinear filtering-based multichannel voice noise reduction method Download PDFInfo
- Publication number
- CN118116402A CN118116402A CN202410241360.4A CN202410241360A CN118116402A CN 118116402 A CN118116402 A CN 118116402A CN 202410241360 A CN202410241360 A CN 202410241360A CN 118116402 A CN118116402 A CN 118116402A
- Authority
- CN
- China
- Prior art keywords
- signal
- noise
- bilinear
- vector
- filter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000001914 filtration Methods 0.000 title claims abstract description 11
- 230000009467 reduction Effects 0.000 title claims description 49
- 239000000654 additive Substances 0.000 claims abstract description 8
- 230000000996 additive effect Effects 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 239000013598 vector Substances 0.000 claims description 60
- 239000011159 matrix material Substances 0.000 claims description 54
- 230000008569 process Effects 0.000 claims description 18
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 2
- 230000008901 benefit Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 2
- 230000017105 transposition Effects 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000008521 reorganization Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
本发明公开了一类基于双线性滤波的多通道语音降噪方法,包括:采集时域带噪语音信号,对时域带噪语音信号进行预处理;估计时域带噪语音信号与加性噪声信号的统计特性;基于统计特性估计双线性维纳降噪滤波器;基于双线性维纳降噪滤波器对带噪语音信号进行滤波降噪,获得干净语音信号的估计值。本发明将在时域维度和空域维度起降噪作用的滤波器系数分解开来,将一个长滤波器的估计问题转换成两个较短子滤波器的估计问题,较短的滤波器意味着需要估计的参数变少,相比传统方法,本发明中算法复杂度显著降低,只需要更少的观测样本来估计滤波器系数,提高了算法对非平稳噪声的跟踪能力;与目前实际系统应用的频域语音降噪方法相比,不存在音乐噪声。
The present invention discloses a multi-channel speech denoising method based on bilinear filtering, including: collecting time-domain noisy speech signals, preprocessing the time-domain noisy speech signals; estimating the statistical characteristics of the time-domain noisy speech signals and additive noise signals; estimating the bilinear Wiener denoising filter based on the statistical characteristics; filtering and denoising the noisy speech signals based on the bilinear Wiener denoising filter, and obtaining an estimated value of the clean speech signal. The present invention decomposes the filter coefficients that play a denoising role in the time domain dimension and the spatial domain dimension, and converts the estimation problem of a long filter into the estimation problem of two shorter sub-filters. The shorter filter means that fewer parameters need to be estimated. Compared with the traditional method, the algorithm complexity in the present invention is significantly reduced, and only fewer observation samples are needed to estimate the filter coefficients, which improves the algorithm's tracking ability for non-stationary noise; compared with the frequency domain speech denoising method currently used in actual systems, there is no music noise.
Description
技术领域Technical Field
本发明属于语音降噪领域,特别是涉及一类基于双线性滤波的多通道语音降噪方法。The invention belongs to the field of speech noise reduction, and in particular relates to a multi-channel speech noise reduction method based on bilinear filtering.
背景技术Background technique
在日常环境中,噪声无处不在。噪声会降低语音信号的质量和可懂度,并且会导致听力疲劳。语音降噪技术致力于抑制噪声的影响,并从噪声中提取干净语音信号,进而提高语音的质量和可懂度,在语音通信中起着重要作用。Noise is everywhere in daily life. Noise can reduce the quality and intelligibility of speech signals and cause listening fatigue. Speech noise reduction technology is committed to suppressing the impact of noise and extracting clean speech signals from noise, thereby improving the quality and intelligibility of speech, and plays an important role in voice communication.
根据语音降噪算法执行域的不同,降噪算法可分为时域算法和变换域算法(如频域、小波域等)。目前,应用范围最广的语音降噪算法为频域降噪方法。相较于时域降噪方法,频域降噪方法需要的复杂度低,可以集成在嵌入式系统中完成实时降噪。但频域降噪方法的缺点是容易产生音乐噪声(musical noise)。经研究,人们对音乐噪声的忍耐程度比对噪声的忍耐程度更低。因此,如何降低频域降噪算法产生的音乐噪声一直是研究的热点。而时域降噪方法的优势恰恰是不会音乐噪声。但在时域语音降噪算法中,其滤波器通常较长,导致复杂度过高,这是限制其实际部署的最大瓶颈。尤其是对于多通道语音降噪算法,其复杂度会随着通道数的增多快速增加,使得很难在实际系统中部署时域语音降噪算法对带噪语音信号进行实时降噪处理。According to the different execution domains of speech noise reduction algorithms, noise reduction algorithms can be divided into time domain algorithms and transform domain algorithms (such as frequency domain, wavelet domain, etc.). At present, the most widely used speech noise reduction algorithm is the frequency domain noise reduction method. Compared with the time domain noise reduction method, the frequency domain noise reduction method requires low complexity and can be integrated in embedded systems to complete real-time noise reduction. However, the disadvantage of the frequency domain noise reduction method is that it is easy to generate musical noise. According to research, people's tolerance for musical noise is lower than their tolerance for noise. Therefore, how to reduce the musical noise generated by the frequency domain noise reduction algorithm has always been a hot topic of research. The advantage of the time domain noise reduction method is that it will not generate musical noise. However, in the time domain speech noise reduction algorithm, its filter is usually long, resulting in high complexity, which is the biggest bottleneck limiting its actual deployment. Especially for multi-channel speech noise reduction algorithms, its complexity will increase rapidly with the increase in the number of channels, making it difficult to deploy the time domain speech noise reduction algorithm in the actual system to perform real-time noise reduction processing on noisy speech signals.
本发明中,为降低时域语音降噪算法的复杂度,通过更新多通道信号向量的组织形式,提出一种双线性(bilinear)降噪方案。In the present invention, in order to reduce the complexity of the time-domain speech denoising algorithm, a bilinear denoising scheme is proposed by updating the organization form of multi-channel signal vectors.
发明内容Summary of the invention
本发明的目的是提供一类基于双线性滤波的多通道语音降噪方法,以解决上述现有技术存在的问题。The purpose of the present invention is to provide a multi-channel speech noise reduction method based on bilinear filtering to solve the problems existing in the above-mentioned prior art.
为实现上述目的,本发明提供了一类基于双线性滤波的多通道语音降噪方法,包括:To achieve the above object, the present invention provides a multi-channel speech noise reduction method based on bilinear filtering, comprising:
采集时域带噪语音信号,对所述时域带噪语音信号进行预处理;Collecting a time-domain noisy speech signal and preprocessing the time-domain noisy speech signal;
估计所述时域带噪语音信号与加性噪声信号的统计特性;estimating statistical characteristics of the time-domain noisy speech signal and the additive noise signal;
基于所述统计特性估计双线性维纳降噪滤波器;estimating a bilinear Wiener denoising filter based on the statistical characteristics;
基于所述双线性维纳降噪滤波器对所述时域带噪语音信号进行滤波降噪,获得干净语音信号的估计值。The time-domain noisy speech signal is filtered and denoised based on the bilinear Wiener denoising filter to obtain an estimated value of a clean speech signal.
可选的,采集时域带噪语音信号,并对所述时域带噪语音信号进行预处理的过程包括:Optionally, the process of collecting a time-domain noisy speech signal and preprocessing the time-domain noisy speech signal includes:
时域信号模型为:The time domain signal model is:
ym(t)=xm(t)+vm(t) (1)y m (t) = x m (t) + v m (t) (1)
其中,t表示离散时间点,下标(·)m表示第m个麦克风接收到的信号,设麦克风阵列共有M个麦克风,xm(t)和vm(t)分别表示第m个麦克风接收到的干净语音信号和加性噪声信号,ym(t)表示第m个麦克风接收的带噪语音信号,xm(t)和vm(t)互不相关;选取麦克风阵列中的第1个麦克风作为参考麦克风,即x1(t)作为期望信号。Wherein, t represents a discrete time point, the subscript (·) m represents the signal received by the m-th microphone, and the microphone array has M microphones in total. x m (t) and v m (t) represent the clean speech signal and additive noise signal received by the m-th microphone, respectively, and y m (t) represents the noisy speech signal received by the m-th microphone. x m (t) and v m (t) are independent of each other. The first microphone in the microphone array is selected as the reference microphone, that is, x 1 (t) is taken as the expected signal.
通过将L个连续的样本点组合在一起,将第m个麦克风接收到的信号写成长度为L的向量:The signal received by the mth microphone is written as a vector of length L by grouping together L consecutive sample points:
其中,xm(t)和vm(t)的定义和ym(t)相似,即:The definitions of x m (t) and v m (t) are similar to those of y m (t), namely:
xm(t)=[xm(t) xm(t-1)…xm(t-L+1)]T xm ( t)=[ xm (t)xm(t-1)… xm (t-L+1)] T
vm(t)=[vm(t) vm(t-1)…vm(t-L+1)]T v m (t) = [v m (t) v m (t-1)…v m (t-L+1)] T
xm(t)和vm(t)分别表示第m个通道的期望信号向量和第m个通道的噪声信号向量,ym(t)表示第m个通道的带噪信号向量,上标(·)T表示转置。x m (t) and v m (t) represent the expected signal vector and the noise signal vector of the m th channel respectively, y m (t) represents the noisy signal vector of the m th channel, and the superscript (·) T represents transposition.
将M个长度为L的带噪信号向量ym(t)(m=1,2,…,M)拼接在一起,可写成:By concatenating M noisy signal vectors y m (t) (m = 1, 2, ..., M) of length L, we can write:
其中,x(t)和v(t)的定义与y(t)类似,即:The definitions of x (t) and v (t) are similar to those of y (t), namely:
y(t)、x(t)和v(t)分别表示整体带噪信号向量、整体干净语音信号向量以及整体噪声信号向量。 y (t), x (t) and v (t) represent the overall noisy signal vector, the overall clean speech signal vector and the overall noise signal vector respectively.
可选的,估计所述时域带噪语音信号与所述加性噪声信号的统计特性的过程包括:Optionally, the process of estimating the statistical characteristics of the time-domain noisy speech signal and the additive noise signal includes:
通过现有噪声估计算法估计所述整体噪声信号向量v(t)的相关矩阵R v (t),通过递归算法估计整体带噪信号向量y(t)的相关矩阵R y (t):R y (t)=αR y (t-1)+(1-α)y(t)y T(t),其中α为遗忘因子(0<α<1);通过R x (t)=R y (t)-R v (t)估计整体干净语音信号向量x(t)的相关矩阵R x (t),基于语音信号相关矩阵R x (t)确定向量ρ(t),获得统计特性。The correlation matrix R v (t) of the overall noise signal vector v (t) is estimated by an existing noise estimation algorithm, and the correlation matrix R y (t) of the overall noisy signal vector y (t) is estimated by a recursive algorithm: R y (t) = αR y (t-1) + (1-α) y (t) y T (t), where α is a forgetting factor (0 < α <1); the correlation matrix R x (t) of the overall clean speech signal vector x (t) is estimated by R x (t) = R y (t) - R v (t), and the vector ρ (t) is determined based on the speech signal correlation matrix R x (t) to obtain statistical characteristics.
可选的,基于语音信号相关矩阵R x (t)确定向量ρ(t)的过程包括:Optionally, the process of determining the vector ρ (t) based on the speech signal correlation matrix R x (t) includes:
提取所述语音信号相关矩阵R x (t)第一行第一列的元素与第一列的元素,第一列的元素除以第一行第一列的元素获得向量ρ(t)。The elements of the first row and first column and the elements of the first column of the speech signal correlation matrix R x (t) are extracted, and the elements of the first column are divided by the elements of the first row and first column to obtain a vector ρ (t).
可选的,基于所述统计特性估计双线性维纳降噪滤波器的过程包括:Optionally, the process of estimating a bilinear Wiener denoising filter based on the statistical characteristics includes:
将整体带噪信号向量进行重组,根据重组获得的带噪信号矩阵Y(t)对传统降噪方案中的滤波器进行拆分获得包含两个子滤波器的双线性降噪方案,基于矩阵的向量化操作与克罗内克积对双线性降噪方案进行等价变形,获得基于两个子滤波器的双线性降噪方案;The whole noisy signal vector is reorganized, and the filter in the traditional denoising scheme is split according to the noisy signal matrix Y(t) obtained by the reorganization to obtain a bilinear denoising scheme containing two sub-filters. The bilinear denoising scheme is equivalently transformed based on the vectorization operation of the matrix and the Kronecker product to obtain a bilinear denoising scheme based on two sub-filters;
基于所述带噪语音信号的统计特性、两个子滤波器h1(t)和h2(t)的克罗内克积、带噪信号矩阵Y(t)获得期望信号估计值均方误差的定义,根据两个子滤波器的关系对定义进行改写,获得最终的期望信号估计值均方误差的定义表达式;Obtaining a definition of a mean square error of an expected signal estimate based on the statistical characteristics of the noisy speech signal, the Kronecker product of the two sub-filters h 1 (t) and h 2 (t), and the noisy signal matrix Y(t), rewriting the definition according to the relationship between the two sub-filters to obtain a final definition expression of the mean square error of an expected signal estimate;
基于最终的期望信号估计值均方误差的定义表达式估计双线性维纳降噪滤波器。The bilinear Wiener denoising filter is estimated based on the definition expression of the mean square error of the final expected signal estimate.
可选的,获得基于两个子滤波器的双线性降噪方案的过程包括:Optionally, the process of obtaining a bilinear denoising scheme based on two sub-filters includes:
将整体带噪信号向量按如下形式进行重组:The overall noisy signal vector is reorganized as follows:
其中,y(t)=vec[Y(t)],干净语音信号矩阵X(t)=[x1(t) x2(t) … xM(t)](x(t)=vec[X(t)]),噪声信号矩阵V(t)=[v1(t) v2(t) … vM(t)](v(t)=vec[V(t)]),符号vec[·]表示矩阵的向量化操作,矩阵Y(t)、X(t)和V(t)的维数均为L×M;Wherein, y (t)=vec[Y(t)], clean speech signal matrix X(t)=[ x1 (t) x2 (t)… xM (t)]( x (t)=vec[X(t)]), noise signal matrix V(t)=[ v1 (t) v2 (t)… vM (t)]( v (t)=vec[V(t)]), symbol vec[·] indicates matrix vectorization operation, and dimensions of matrices Y(t), X(t) and V(t) are all L×M;
基于式(5),对传统降噪方案进行修改,获得如下双线性降噪方案:Based on formula (5), the traditional noise reduction scheme is modified to obtain the following bilinear noise reduction scheme:
其中,两个子滤波器h1(t)和h2(t)分别在时域维度和空域维度进行降噪,h1(t)长为L,h2(t)长为M,z(t)为期望信号x1(t)的估计值,为滤波后的语音信号,/>表示滤波后的残留噪声;Among them, the two sub-filters h 1 (t) and h 2 (t) perform noise reduction in the time domain and spatial domain respectively. The length of h 1 (t) is L, the length of h 2 (t) is M, and z(t) is the estimated value of the expected signal x 1 (t). is the filtered speech signal, /> represents the residual noise after filtering;
将式(6)变形获得如下公式:Transform equation (6) to obtain the following formula:
其中,符号vec[·]表示矩阵的向量化操作,符号/>表示克罗内克积,符号tr[·]表示矩阵的迹;in, The symbol vec[·] represents the vectorization operation of the matrix, and the symbol /> represents the Kronecker product, and the symbol tr[·] represents the trace of the matrix;
利用式(7),式(6)可写为Using formula (7), formula (6) can be written as
其中,x(t)=vec[X(t)],v(t)=vec[V(t)]。Where x (t)=vec[X(t)], v (t)=vec[V(t)].
可选的,获得最终的期望信号估计值均方误差的定义表达式的过程包括:Optionally, the process of obtaining a definition expression of a final expected signal estimate mean square error includes:
两个子滤波器h1(t)和h2(t)有如下关系:The two sub-filters h 1 (t) and h 2 (t) have the following relationship:
其中,IL和IM分别为维数为L×L和M×M的单位矩阵;Among them, IL and IM are identity matrices with dimensions of L×L and M×M respectively;
基于带噪语音信号的统计特性与噪声信号的统计特性、两个子滤波器h1(t)和h2(t)的克罗内克积、带噪信号矩阵Y(t)获得期望信号估计值的均方误差的定义如下:The definition of the mean square error of the expected signal estimate obtained based on the statistical characteristics of the noisy speech signal and the noise signal, the Kronecker product of the two sub-filters h 1 (t) and h 2 (t), and the noisy signal matrix Y(t) is as follows:
其中,为位于矩阵R x (t)第一行第一列的元素。in, is the element located in the first row and first column of the matrix R x (t).
根据式(10)将式(12)改写为:According to formula (10), formula (12) can be rewritten as:
其中,in,
可选的,基于最终的期望信号估计值均方误差的定义表达式估计双线性维纳降噪滤波器的过程如下:Optionally, the process of estimating the bilinear Wiener denoising filter based on the definition expression of the final expected signal estimate mean square error is as follows:
步骤一:根据式初始化子滤波器h1(t),其中,/>为第一个通道带噪语音信号向量的自相关矩阵,为矩阵R y (t)的前L行前L列,向量/>为向量ρ(t)的前L个元素组成的向量,上标(·)(n)表示第n次迭代的结果;Step 1: According to the formula Initialize sub-filter h 1 (t), where /> is the autocorrelation matrix of the noisy speech signal vector of the first channel, is the first L rows and first L columns of the matrix R y (t), vector/> is the vector consisting of the first L elements of the vector ρ (t), and the superscript (·) (n) indicates the result of the nth iteration;
步骤二:将带入至公式中,得到和/>上标(·)(n)表示第n次迭代的结果;Step 2: Substitute into the formula In, get and/> The superscript (·) (n) indicates the result of the nth iteration;
步骤三:将和/>带入至公式/>中,得到 Step 3: and/> Substitute into the formula/> In, get
步骤四:将带入至公式和/>中,得到/>和 Step 4: Substitute into the formula and/> In, get/> and
步骤五:将和/>带入至公式/>中,得到 Step 5: and/> Substitute into the formula/> In, get
重复步骤二至步骤五N次,获得和/> Repeat steps 2 to 5 N times to obtain and/>
根据公式获得双线性维纳降噪滤波器h bW(t)。According to the formula Obtain the bilinear Wiener denoising filter h bW (t).
本发明的技术效果为:The technical effects of the present invention are:
本发明将在时域维度和空域维度起降噪作用的滤波器系数分解开来,从而将一个长滤波器的估计问题转换成两个较短子滤波器的估计问题、具有较低计算复杂度、且较高非平稳噪声处理能力的多通道双线性语音降噪滤波器;适用于时域,还可非常直观地将本发明的核心思路推广至频域降噪框架中;既可用于智能语音、人机交互等系统,也可用于音视频会议、车载、临境通信等系统;可单独使用,也可和回声消除、声源定位、去混响、语音分离等模块配合使用。本发明中的方法具有以下优势:1)算法复杂度显著降低;2)需要更少的观测样本来估计滤波器系数,从而提高了算法对非平稳噪声的跟踪能力。另外,本发明提出的方法为低复杂度时域语音降噪方法,相比目前在实际系统中应用的频域语音降噪方法,本发明的另一个优势为不存在音乐噪声。The present invention decomposes the filter coefficients that play a role in noise reduction in the time domain dimension and the spatial domain dimension, thereby converting the estimation problem of a long filter into the estimation problem of two shorter sub-filters, a multi-channel bilinear speech noise reduction filter with low computational complexity and high non-stationary noise processing capability; it is applicable to the time domain, and the core idea of the present invention can be very intuitively extended to the frequency domain noise reduction framework; it can be used for intelligent voice, human-computer interaction and other systems, as well as audio and video conferencing, vehicle-mounted, immersive communication and other systems; it can be used alone or in conjunction with modules such as echo cancellation, sound source localization, dereverberation, and speech separation. The method in the present invention has the following advantages: 1) the algorithm complexity is significantly reduced; 2) fewer observation samples are required to estimate the filter coefficients, thereby improving the algorithm's tracking ability for non-stationary noise. In addition, the method proposed in the present invention is a low-complexity time domain speech noise reduction method. Compared with the frequency domain speech noise reduction method currently used in actual systems, another advantage of the present invention is that there is no music noise.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
构成本申请的一部分的附图用来提供对本申请的进一步理解,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings constituting a part of this application are used to provide a further understanding of this application. The illustrative embodiments and descriptions of this application are used to explain this application and do not constitute an improper limitation on this application. In the drawings:
图1为本发明实施例中的方法流程图;FIG1 is a flow chart of a method in an embodiment of the present invention;
图2为本发明实施例中的系统结构图。FIG. 2 is a system structure diagram of an embodiment of the present invention.
具体实施方式Detailed ways
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that, in the absence of conflict, the embodiments and features in the embodiments of the present application can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and in combination with the embodiments.
需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。It should be noted that the steps shown in the flowcharts of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and that, although a logical order is shown in the flowcharts, in some cases, the steps shown or described can be executed in an order different from that shown here.
实施例一Embodiment 1
如图1-2所示,本实施例中提供一类基于双线性滤波的多通道语音降噪方法,包括:As shown in FIG1-2, this embodiment provides a multi-channel speech noise reduction method based on bilinear filtering, including:
步骤1、采集带噪语音信号;Step 1: Collect noisy speech signals;
步骤2、估计带噪语音信号及噪声信号的统计特性;Step 2, estimating the statistical characteristics of the noisy speech signal and the noise signal;
步骤3、估计双线性维纳降噪滤波器;Step 3, estimating the bilinear Wiener denoising filter;
步骤4、对带噪语音信号进行滤波降噪,得到干净语音信号的估计值。Step 4: Filter and reduce noise on the noisy speech signal to obtain an estimated value of the clean speech signal.
在语音降噪中,时域信号模型为:In speech noise reduction, the time domain signal model is:
ym(t)=xm(t)+vm(t) (1)y m (t) = x m (t) + v m (t) (1)
这里,t表示离散时间点,下标(·)m表示第m个麦克风接收到的信号(本发明中设麦克风阵列共有M个麦克风),xm(t)和vm(t)分别表示第m个麦克风接收到的干净语音信号和加性噪声信号,ym(t)表示第m个麦克风接收到的带噪语音信号。本发明中设所有信号都是零均值、宽带实信号。本发明中选取麦克风阵列中的第1个麦克风作为参考麦克风,即选取x1(t)作为期望信号(需要恢复的信号)。但理论上,任何麦克风都可以作为参考麦克风。Here, t represents a discrete time point, the subscript (·) m represents the signal received by the m-th microphone (in the present invention, the microphone array is assumed to have a total of M microphones), x m (t) and v m (t) represent the clean speech signal and the additive noise signal received by the m-th microphone, respectively, and y m (t) represents the noisy speech signal received by the m-th microphone. In the present invention, all signals are assumed to be zero-mean, broadband real signals. In the present invention, the first microphone in the microphone array is selected as the reference microphone, that is, x 1 (t) is selected as the desired signal (the signal to be restored). However, in theory, any microphone can be used as a reference microphone.
通过将L个连续的样本点组合在一起,第m个麦克风接收到的信号可以写成长度为L的向量:By grouping together L consecutive sample points, the signal received by the mth microphone can be written as a vector of length L:
其中,xm(t)和vm(t)的定义和ym(t)相似,即:The definitions of x m (t) and v m (t) are similar to those of y m (t), namely:
xm(t)=[xm(t) xm(t-1) … xm(t-L+1)]T xm ( t)=[ xm (t)xm(t-1) … xm (t-L+1)] T
vm(t)=[vm(t) vm(t-1) … vm(t-L+1)]T v m (t) = [v m (t) v m (t-1) … v m (t-L+1)] T
xm(t)和vm(t)分别表示第m个通道的期望信号向量和第m个通道的噪声信号向量,ym(t)表示第m个通道的带噪信号向量,上标(·)T表示转置。x m (t) and v m (t) represent the expected signal vector and the noise signal vector of the m th channel respectively, y m (t) represents the noisy signal vector of the m th channel, and the superscript (·) T represents transposition.
在传统的时域多通道语音增强中,通常将M个长度为L的带噪信号向量ym(t)(m=1,2,...,M)拼接在一起,写成:In traditional time-domain multi-channel speech enhancement, M noisy signal vectors y m (t) (m = 1, 2, ..., M) of length L are usually concatenated together and written as:
其中,x(t)和v(t)的定义与y(t)类似,即:The definitions of x (t) and v (t) are similar to those of y (t), namely:
y(t)、x(t)和v(t)分别表示整体带噪信号向量、整体干净语音信号向量以及整体噪声信号向量。 y (t), x (t) and v (t) represent the overall noisy signal vector, the overall clean speech signal vector and the overall noise signal vector respectively.
所述步骤2中估计带噪语音信号及噪声信号的统计特性的过程为:通过现有噪声估计算法估计噪声信号向量v(t)的相关矩阵R v (t);通过递归方法估计带噪语音信号向量y(t)的相关矩阵R y (t):R y (t)=αR y (t-1)+(1-α)y(t)y T(t),其中α为遗忘因子(0<α<1);通过R x (t)=R y (t)-R v (t)估计干净语音信号向量x(t)的相关矩阵R x (t);基于语音信号相关矩阵R x (t),即可确定向量ρ(t)(位于矩阵R x (t)中第1行第1列的元素即为矩阵R x (t)的第1列除以/>即为向量ρ(t))。The process of estimating the statistical characteristics of the noisy speech signal and the noise signal in step 2 is: estimating the correlation matrix R v (t) of the noise signal vector v (t) by an existing noise estimation algorithm; estimating the correlation matrix R y (t) of the noisy speech signal vector y (t) by a recursive method: R y (t) = αR y (t-1) + (1-α) y (t) y T (t), where α is a forgetting factor (0 < α <1); estimating the correlation matrix R x (t) of the clean speech signal vector x (t) by R x (t) = R y (t) - R v (t); based on the speech signal correlation matrix R x (t), the vector ρ (t) (the element located in the first row and first column of the matrix R x (t)) can be determined Divide the first column of the matrix R x (t) by /> That is the vector ρ (t).
所述步骤3的具体方法为:The specific method of step 3 is:
在传统方法中,通常将长为ML的带噪信号向量y(t)通过一个线性滤波器h(t),即In traditional methods, the noisy signal vector y (t) of length ML is usually passed through a linear filter h(t), that is,
其中,z(t)为期望信号x1(t)的估计值。所以在传统方法中,需要估计一个长为ML的滤波器h(t)。Where z(t) is the estimated value of the desired signal x 1 (t). Therefore, in the traditional method, a filter h (t) with a length of ML needs to be estimated.
在本发明中,为构造双线性降噪方案,将带噪信号向量按如下形式重组:In the present invention, in order to construct a bilinear noise reduction scheme, the noisy signal vector is reorganized in the following form:
其中,矩阵X(t)和V(t)的定义与Y(t)类似,矩阵X(t)=[x1(t) x2(t) … xM(t)](x(t)=vec[X(t)]),V(t)=[v1(t) v2(t) … vM(t)](v(t)=vec[V(t)]),矩阵Y(t)、X(t)和V(t)的维数均为L×M,Y(t)为带噪信号矩阵、X(t)为干净语音信号矩阵,V(t)为噪声信号矩阵,符号vec[·]表示矩阵的向量化操作。Among them, the definitions of matrices X(t) and V(t) are similar to those of Y(t), matrix X(t)=[ x1 (t) x2 (t)… xM (t)]( x (t)=vec[X(t)]), V(t)=[ v1 (t) v2 (t)… vM (t)]( v (t)=vec[V(t)]), the dimensions of matrices Y(t), X(t) and V(t) are all L×M, Y(t) is the noisy signal matrix, X(t) is the clean speech signal matrix, V(t) is the noise signal matrix, and the symbol vec[·] represents the vectorization operation of the matrix.
基于式(5),可将传统降噪方案修改为如下的双线性降噪方案:Based on formula (5), the traditional noise reduction scheme can be modified into the following bilinear noise reduction scheme:
其中,两个较短的子滤波器h1(t)(长为L)和h2(t)(长为M)分别在时域维度和空域维度起到降噪作用,z(t)为期望信号x1(t)的估计值,为滤波后的语音信号,/>表示滤波后的残留噪声。Among them, the two shorter sub-filters h 1 (t) (length is L) and h 2 (t) (length is M) play a role in noise reduction in the time domain and spatial domain respectively. z(t) is the estimated value of the desired signal x 1 (t). is the filtered speech signal, /> represents the residual noise after filtering.
式(6)可做如下变形:Formula (6) can be transformed as follows:
其中,符号vec[·]表示矩阵的向量化操作,例如vec[Y(t)]=y(t),符号/>表示克罗内克积(Kronecker product)。需要注意的是,此处的滤波器h b(t)和直接用传统方法导出的滤波器h(t)不同。in, The symbol vec[·] represents a vectorized operation of a matrix, for example, vec[Y(t)] = y (t), the symbol/> It should be noted that the filter h b (t) here is different from the filter h (t) derived directly by traditional methods.
利用式(7),式(6)可写为Using formula (7), formula (6) can be written as
其中,x(t)=vec[X(t)],v(t)=vec[V(t)]。Where x (t)=vec[X(t)], v (t)=vec[V(t)].
由于语音信号和噪声信号不相关,所以z(t)中的两部分是互不相关的,z(t)的方差可以写成Since the speech signal and the noise signal are uncorrelated, the two parts of z(t) are uncorrelated, and the variance of z(t) can be written as
其中,in,
R y (t)=E[y(t)y T(t)],R x (t)=E[x(t)x T(t)],R v (t)=E[v(t)v T(t)](R v (t)为满秩矩阵)。R y ( t ) = E [ y ( t ) y T ( t ) ] , R x ( t ) = E [ x ( t ) x T ( t ) ] , R v ( t ) = E [ v ( t ) v T ( t ) ] ( R v ( t ) is a full rank matrix).
在双线性降噪方案中,两个子滤波器h1(t)和h2(t)有如下关系:In the bilinear denoising scheme, the two sub-filters h 1 (t) and h 2 (t) have the following relationship:
其中,IL和IM分别为维数为L×L和M×M的单位矩阵。Among them, IL and IM are unit matrices with dimensions of L×L and M×M respectively.
为了推导出双线性迭代降噪滤波器,需导出期望信号估计值z(t)的均方误差。定义z(t)的误差为In order to derive the bilinear iterative denoising filter, it is necessary to derive the mean square error of the expected signal estimate z(t). The error of z(t) is defined as
ε(t)=z(t)-x1(t) (11)ε(t)=z(t)-x 1 (t) (11)
基于式(11),z(t)的均方误差可定义为Based on equation (11), the mean square error of z(t) can be defined as
其中, in,
为充分利用双线性降噪方案的优势,应用式(10)可将式(12)改写为In order to fully utilize the advantages of the bilinear noise reduction scheme, equation (10) can be applied to rewrite equation (12) as
其中,in,
从上面几个式子可以看出,在双线性降噪方案中,所需矩阵R y ,1(t)(维数为M×M)和R y ,2(t)(维数为L×L)的维数远小于传统降噪方案中所需矩阵R y (t)(维数为ML×ML)的维数。因此,通常需要更少的观测样本来估计相关矩阵R y ,1(t)和R y ,2(t)。所以,双线性降噪滤波器可以更好地跟踪信号统计特性的变化,更加适合处理非平稳噪声。另外,在求解子滤波器h1(t)和h2(t)的过程中,需要对相关矩阵R y ,1(t)和R y ,2(t)进行求逆。而求解传统降噪滤波器h(t)时需要对相关矩阵R y (t)进行求逆。由于矩阵R y ,1(t)和R y ,2(t)的维数远远小于矩阵R y (t)的维数,所以求解子滤波器h1(t)和h2(t)的复杂度远远小于传统降噪滤波器h(t)的复杂度。From the above formulas, it can be seen that in the bilinear denoising scheme, the dimensions of the required matrices R y ,1 (t) (dimension is M×M) and R y ,2 (t) (dimension is L×L) are much smaller than the dimension of the required matrix R y (t) (dimension is ML×ML) in the traditional denoising scheme. Therefore, fewer observation samples are usually required to estimate the correlation matrices R y ,1 (t) and R y ,2 (t). Therefore, the bilinear denoising filter can better track the changes in the statistical characteristics of the signal and is more suitable for processing non-stationary noise. In addition, in the process of solving the sub-filters h 1 (t) and h 2 (t), the correlation matrices R y ,1 (t) and R y ,2 (t) need to be inverted. When solving the traditional denoising filter h (t), the correlation matrix R y (t) needs to be inverted. Since the dimensions of the matrices R y ,1 ( t ) and R y ,2 ( t ) are much smaller than the dimension of the matrix R y ( t ), the complexity of solving the sub-filters h 1 ( t ) and h 2 ( t ) is much smaller than the complexity of the traditional denoising filter h ( t ).
为推导双线性迭代维纳滤波器,分别固定h2(t)和h1(t),将式(13)写为To derive the bilinear iterative Wiener filter, h 2 (t) and h 1 (t) are fixed respectively and equation (13) is written as
将h1(t)的初始值设为Set the initial value of h 1 (t) to
为第一个通道的维纳滤波器,长度为L。其中,/> 将/>带入至式(14)和(15)中,可以得到 is the Wiener filter of the first channel, with a length of L. Among them,/> Will/> Substituting into equations (14) and (15), we can obtain
然后,将式(20)和(21)带入至式(19)中,可以得到Then, substituting equations (20) and (21) into equation (19), we can obtain
其中,上标(·)(n)表示第n次迭代的结果。将式(22)对求导,并令结果置零,可得到The superscript (·) (n) represents the result of the nth iteration. Taking the derivative and setting the result to zero, we get
将式(23)带入至式(16)和(17)中,可得Substituting equation (23) into equations (16) and (17), we can obtain
利用和/>可以将式(18)写为use and/> Formula (18) can be written as
将式(26)对求导,并将结果置零,可得到Reverse equation (26) Taking the derivative and setting the result to zero, we get
按上述过程,继续迭代n次后,可得According to the above process, after iterating n times, we can get
其中,in,
基于式(28)和(29),可以得到在第N次迭代之后的双线性迭代维纳滤波器:Based on equations (28) and (29), the bilinear iterative Wiener filter after the Nth iteration can be obtained:
将带噪信号通过所设计的双线性维纳降噪滤波器hbW(t),即可得到降噪后的语音信号或者,可通过公式/>得到降噪后的语音信号z(t),两种方法等价。The noisy signal is passed through the designed bilinear Wiener denoising filter h bW (t) to obtain the denoised speech signal Alternatively, the formula can be used The denoised speech signal z(t) is obtained. The two methods are equivalent.
推导双线性迭代维纳滤波器的过程可简述为:The process of deriving the bilinear iterative Wiener filter can be briefly described as:
步骤一:根据式初始化子滤波器h1(t),其中,/>为第一个通道带噪语音信号向量的自相关矩阵,为矩阵R y (t)的前L行前L列,向量/>为向量ρ(t)的前L个元素组成的向量,上标(·)(n)表示第n次迭代的结果;Step 1: According to the formula Initialize sub-filter h 1 (t), where /> is the autocorrelation matrix of the noisy speech signal vector of the first channel, is the first L rows and first L columns of the matrix R y (t), vector/> is the vector consisting of the first L elements of the vector ρ (t), and the superscript (·) (n) indicates the result of the nth iteration;
步骤二:将带入至公式和/>中,得到和/>上标(·)(n)表示第n次迭代的结果;Step 2: Substitute into the formula and/> In, get and/> The superscript (·) (n) indicates the result of the nth iteration;
步骤三:将和/>带入至公式/>中,得到 Step 3: and/> Substitute into the formula/> In, get
步骤四:将带入至公式和/>中,得到/>和 Step 4: Substitute into the formula and/> In, get/> and
步骤五:将和/>带入至公式/>中,得到 Step 5: and/> Substitute into the formula/> In, get
重复步骤二至步骤五N次,获得和/> Repeat steps 2 to 5 N times to obtain and/>
根据公式获得双线性维纳降噪滤波器hbW(t)。According to the formula Obtain the bilinear Wiener denoising filter h bW (t).
以上所述,仅为本申请较佳的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。The above is only a preferred specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any changes or substitutions that can be easily thought of by a person skilled in the art within the technical scope disclosed in the present application should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410241360.4A CN118116402A (en) | 2024-03-04 | 2024-03-04 | Bilinear filtering-based multichannel voice noise reduction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410241360.4A CN118116402A (en) | 2024-03-04 | 2024-03-04 | Bilinear filtering-based multichannel voice noise reduction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118116402A true CN118116402A (en) | 2024-05-31 |
Family
ID=91215861
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410241360.4A Pending CN118116402A (en) | 2024-03-04 | 2024-03-04 | Bilinear filtering-based multichannel voice noise reduction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118116402A (en) |
-
2024
- 2024-03-04 CN CN202410241360.4A patent/CN118116402A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108172231B (en) | A Kalman Filter-Based Reverberation Method and System | |
CN110867181B (en) | Multi-target speech enhancement method based on joint estimation of SCNN and TCNN | |
Lee et al. | Blind source separation of real world signals | |
Delcroix et al. | Precise dereverberation using multichannel linear prediction | |
JP2007526511A (en) | Method and apparatus for blind separation of multipath multichannel mixed signals in the frequency domain | |
Park et al. | Subband-based blind signal separation for noisy speech recognition | |
CN111312275B (en) | An Online Sound Source Separation Enhancement System Based on Subband Decomposition | |
CN105225672A (en) | Merge the system and method for the directed noise suppression of dual microphone of fundamental frequency information | |
Chao et al. | Cross-domain single-channel speech enhancement model with bi-projection fusion module for noise-robust ASR | |
CN114220453B (en) | Multi-channel non-negative matrix decomposition method and system based on frequency domain convolution transfer function | |
Whitmal et al. | Reducing correlated noise in digital hearing aids | |
CN113409804B (en) | Multichannel frequency domain voice enhancement algorithm based on variable expansion into generalized subspace | |
CN117854536B (en) | RNN noise reduction method and system based on multidimensional voice feature combination | |
Yoshioka et al. | Dereverberation by using time-variant nature of speech production system | |
CN118116402A (en) | Bilinear filtering-based multichannel voice noise reduction method | |
CN114613384B (en) | Deep learning-based multi-input voice signal beam forming information complementation method | |
CN111968627B (en) | Bone conduction voice enhancement method based on joint dictionary learning and sparse representation | |
CN115273885A (en) | Full-band speech enhancement method based on spectral compression and self-attention neural network | |
CN117894332A (en) | A time-domain multi-channel speech denoising method based on Kronecker decomposition | |
CN108074580B (en) | Noise elimination method and device | |
Hsieh et al. | On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement | |
Hussain et al. | Binaural sub-band adaptive speech enhancement using artificial neural networks | |
Ghasemi | A new approach based on SVD for speech enhancement | |
CN115588438B (en) | WLS multi-channel speech dereverberation method based on bilinear decomposition | |
Li et al. | Lightweight Single-Channel Speech Enhancement Based on Multi-Frame MVDR Filters with Learnable Parameters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |