CN101778322B - Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic - Google Patents
Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic Download PDFInfo
- Publication number
- CN101778322B CN101778322B CN2009102503930A CN200910250393A CN101778322B CN 101778322 B CN101778322 B CN 101778322B CN 2009102503930 A CN2009102503930 A CN 2009102503930A CN 200910250393 A CN200910250393 A CN 200910250393A CN 101778322 B CN101778322 B CN 101778322B
- Authority
- CN
- China
- Prior art keywords
- noise
- signal
- power spectrum
- speech signal
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000001228 spectrum Methods 0.000 claims abstract description 117
- 230000000873 masking effect Effects 0.000 claims abstract description 53
- 239000011159 matrix material Substances 0.000 claims description 50
- 238000013461 design Methods 0.000 claims description 16
- 238000001914 filtration Methods 0.000 claims description 10
- 230000003595 spectral effect Effects 0.000 claims description 8
- 230000005284 excitation Effects 0.000 claims description 7
- FGUUSXIOTUKUDN-IBGZPJMESA-N C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 Chemical compound C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 FGUUSXIOTUKUDN-IBGZPJMESA-N 0.000 claims description 4
- 238000005315 distribution function Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 210000002469 basement membrane Anatomy 0.000 claims description 2
- 230000002708 enhancing effect Effects 0.000 claims description 2
- 230000009467 reduction Effects 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 14
- 238000000354 decomposition reaction Methods 0.000 description 12
- 210000000721 basilar membrane Anatomy 0.000 description 7
- 230000008447 perception Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Images
Landscapes
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
技术领域 technical field
本发明涉及麦克风阵列的信号子空间方法、听觉掩蔽效应及后滤波器的设计。The invention relates to a signal subspace method of a microphone array, an auditory masking effect and the design of a post-filter.
背景技术 Background technique
现实生活中的语音常常受到环境中噪声的影响,多通道语音增强方法在近些年来受到了广泛的关注。麦克风阵列语音增强方法相对于单通道语音增强方法的优势在于它可以利用多路信号之间的相关性更准确地估计信号的特性,从而达到更好的语音增强效果。其中,麦克风阵列后滤波语音增强方法更是由于其出色的降噪性能近年来得到了广泛的使用。Simmer等(参考文献1:K.Uwe Simmer,et al,“Post-filtering techniques”,inMicrophone Arrays,M.Brandstein and D.Ward,Eds.New York:Springer,ch.3,pp.36-60,2001.)证明了最小均方误差意义下的最优多通道语音增强解可分解为一个最小方差非畸变响应波束形成器加上一个单通道的维纳后滤波器的形式。尽管理论上证明了后滤波方法的最优性,但在实际应用中,由于很难精确地估计出语音信号和噪声信号的功率谱来得到理想的后滤波器,限制了后滤波方法的性能。所以,合理的后滤波器设计,准确的信号功率谱估计都可以使得语音增强方法的性能得到大幅的提高。Zelinski(参考文献2:R.Zelinski,“A microphone array with adaptive post-filteringfor noise reduction in reverberant rooms”,in Proc.of ICASSP-88,1988,Vol.5,pp.2578-2581.)假设各个阵元上的噪声信号是不相关的,提出了一种后滤波器设计方法。但由于实际环境中,阵元噪声之间是存在一定相关性的,所以该方法性能较差。McCowan(参考文献3:Iain A.McCowan,HervéBourlard,“Microphone array post-filter based on noise field coherence”,IEEETransaction on Speech and Audio Processing,Vol.11,pp.709-715,Nov.2003.)考虑了噪声之间的相关性,利用散射噪声场的特性,提出了一种后滤波器设计方法,具有较好的语音增强性能。但由于其方法是基于散射噪声场假设的,所以,当实际场合中的噪声场不符合散射噪声场时,该方法性能会有明显的下降。本发明利用人耳的听觉掩蔽效应,提出了一种基于听觉感知特性的后滤波器设计方法。为了更准确地估计噪声功率谱,本发明将带噪信号空间分解为信号子空间和噪声子空间,提出了用目标语音信号信号存在概率最大化来估计子空间维度的方法,合理地估计出信号子空间和噪声子空间的维度,在噪声子空间上,提出了用条件概率估计噪声功率谱的方法。实验证明,本发明所提出的噪声估计方法比以往的噪声估计方法更为准确,所提出的基于听觉感知特性的后滤波器也比传统的后滤波器更为有效。Speech in real life is often affected by noise in the environment, and multi-channel speech enhancement methods have received extensive attention in recent years. The advantage of the microphone array speech enhancement method over the single-channel speech enhancement method is that it can use the correlation between multiple signals to more accurately estimate the characteristics of the signal, thereby achieving better speech enhancement effects. Among them, the microphone array post-filter speech enhancement method has been widely used in recent years because of its excellent noise reduction performance. Simmer et al. (Reference 1: K.Uwe Simmer, et al, "Post-filtering techniques", in Microphone Arrays, M. Brandstein and D. Ward, Eds. New York: Springer, ch.3, pp.36-60, 2001.) proved that the optimal multi-channel speech enhancement solution in the sense of minimum mean square error can be decomposed into the form of a minimum variance undistorted response beamformer plus a single-channel Wiener post-filter. Although the optimality of the post-filtering method is proved in theory, in practical applications, it is difficult to accurately estimate the power spectrum of the speech signal and the noise signal to obtain an ideal post-filter, which limits the performance of the post-filtering method. Therefore, reasonable post-filter design and accurate signal power spectrum estimation can greatly improve the performance of speech enhancement methods. Zelinski (Reference 2: R. Zelinski, "A microphone array with adaptive post-filtering for noise reduction in reverberant rooms", in Proc. of ICASSP-88, 1988, Vol.5, pp.2578-2581.) assumes that each array The noise signal on the element is uncorrelated, and a post-filter design method is proposed. However, due to the fact that there is a certain correlation between the array element noises in the actual environment, the performance of this method is poor. McCowan (Reference 3: Iain A. McCowan, Hervé Bourlard, "Microphone array post-filter based on noise field coherence", IEEE Transaction on Speech and Audio Processing, Vol.11, pp.709-715, Nov.2003.) considered The correlation between noises, using the characteristics of the scattering noise field, proposes a post-filter design method with better speech enhancement performance. However, because the method is based on the assumption of the scattering noise field, the performance of the method will decrease obviously when the noise field in the actual occasion does not conform to the scattering noise field. The invention utilizes the auditory masking effect of the human ear to propose a post-filter design method based on auditory perception characteristics. In order to estimate the noise power spectrum more accurately, the present invention decomposes the noisy signal space into a signal subspace and a noise subspace, and proposes a method for estimating the dimension of the subspace by maximizing the existence probability of the target speech signal, and reasonably estimating the signal Dimensions of the subspace and noise subspace. On the noise subspace, a method for estimating the noise power spectrum with conditional probability is proposed. Experiments prove that the noise estimation method proposed by the present invention is more accurate than previous noise estimation methods, and the proposed post-filter based on auditory perception characteristics is also more effective than the traditional post-filter.
假设由L个麦克风组成的阵列上接收到的带噪语音信号向量的频域表示为:X=[X1,…,XL]H。由阵列输入信号的加权相加得到的增强后的语音信号的频域表示如下:Assume that the frequency domain representation of the noisy speech signal vector received on an array composed of L microphones is: X=[X 1 , . . . , X L ] H . The frequency domain representation of the enhanced speech signal obtained by the weighted addition of the array input signals is as follows:
Y=wHX=wH[Sd+N](1)Y=w H X=w H [Sd+N](1)
其中,模型w是阵列加权系数,S是目标信号,d=[d1,…,dL]T是传播向量,N=[N1,…,NL]H是噪声信号向量,[·]H为共轭转置算子。Among them, the model w is the array weighting coefficient, S is the target signal, d=[d 1 ,…,d L ] T is the propagation vector, N=[N 1 ,…,N L ] H is the noise signal vector, [·] H is the conjugate transpose operator.
误差信号e=S-wHX的功率为:The power of error signal e=Sw H X is:
其中,ΦXX是多通道带噪语音信号X的交叉功率谱矩阵,φXS是多通道带噪语音信号X与单通道目标信号S的互功率谱,φSS是单通道目标语音信号S的功率谱。Among them, Φ XX is the cross power spectrum matrix of multi-channel noisy speech signal X, φ XS is the cross power spectrum of multi-channel noisy speech signal X and single-channel target signal S, φ SS is the power of single-channel target speech signal S Spectrum.
令φee对权值w求导数,使其为零,可得最优加权系数:Let φ ee take the derivative of the weight w to make it zero, and the optimal weighting coefficient can be obtained:
在目标语音信号与噪声不相关的假设下,(3)式变为:Under the assumption that the target speech signal is uncorrelated with the noise, formula (3) becomes:
应用Sherman-Morrison-Woodbury恒等式,上式又可表示为:Applying the Sherman-Morrison-Woodbury identity, the above formula can be expressed as:
其中,φNN分别是单通道噪声的自功率谱,ΦNN是多通道噪声交叉功率谱矩阵。式(5)可看成一个最小方差非畸变响应波束形成器ΦNN -1d/(dHΦNN -1d)加上一个单通道的维纳后滤波器φSS/(φSS+φNN)。Among them, φ NN is the self-power spectrum of single-channel noise, and Φ NN is the cross power spectrum matrix of multi-channel noise. Equation (5) can be regarded as a minimum variance non-distortion response beamformer Φ NN -1 d/(d H Φ NN -1 d) plus a single-channel Wiener post-filter φ SS /(φ SS +φ NN ).
发明内容 Contents of the invention
为了解决现有技术的问题,本发明的目的在于对单通道后滤波器进行设计,利用多分布模型自适应选择方法和听觉特性设计一种新的后滤波器。单通道后滤波器设计需要考虑的问题包括两个方面:好的降噪性能和较小的目标语音信号畸变。通常而言,后滤波器在降噪的同时,也可能会增加目标语音信号的畸变。所以,对这两者进行合理的折中是后滤波器设计必须考虑的问题。In order to solve the problems in the prior art, the object of the present invention is to design a single-channel post-filter, and to design a new post-filter by using a multi-distribution model adaptive selection method and auditory characteristics. The issues that need to be considered in the design of single-channel post-filter include two aspects: good noise reduction performance and small target speech signal distortion. Generally speaking, the post-filter may increase the distortion of the target speech signal while reducing noise. Therefore, a reasonable compromise between the two is a problem that must be considered in the design of the post-filter.
为达成所述目的,本发明提供一种基于多模型和听觉特性的麦克风阵列后滤波语音增强方法,该方法的具体步骤如下:In order to achieve the stated purpose, the present invention provides a method for speech enhancement based on multi-model and auditory characteristic microphone array post-filtering, the specific steps of the method are as follows:
步骤a:通过L个麦克风组成的麦克风阵列采集带噪声的多路语音信号,把各路带噪声的语音信号进行时域对齐,使用短时离散傅里叶变换将对齐后的各路信号表示成复数值的频率信号形式,计算麦克风阵列多路信号的功率谱矩阵并对此功率谱矩阵进行特征值分解得到特征值矩阵和特征向量矩阵;Step a: Collect noisy multi-channel speech signals through a microphone array composed of L microphones, align each noisy speech signal in the time domain, and use short-time discrete Fourier transform to express the aligned signals as In the form of complex-valued frequency signals, calculate the power spectrum matrix of the multi-channel signal of the microphone array and perform eigenvalue decomposition on the power spectrum matrix to obtain the eigenvalue matrix and eigenvector matrix;
步骤b:通过极大化带噪语音信号中目标语音信号的存在概率,确定信号子空间的维度Q,且Q≤L;Step b: Determine the dimension Q of the signal subspace by maximizing the existence probability of the target speech signal in the noisy speech signal, and Q≤L;
步骤c:基于谱的平稳性,自适应选择带噪语音信号中噪声功率谱分布模型;Step c: adaptively select the noise power spectrum distribution model in the noisy speech signal based on the stationarity of the spectrum;
步骤d:利用条件概率估计噪声功率谱;Step d: Estimate the noise power spectrum using conditional probability;
步骤e:根据信号子空间维度和噪声功率谱估计,利用听觉掩蔽效应,基于信号子空间估计得到各频点的听觉掩蔽阈值;Step e: According to the signal subspace dimension and the noise power spectrum estimation, using the auditory masking effect, the auditory masking threshold of each frequency point is obtained based on the signal subspace estimation;
步骤f:根据噪声功率谱、听觉掩蔽阈值,结合拉格朗日乘子估计后滤波器,使得增强语音中的残余噪声小于人耳的听觉掩蔽阈值,从而消除残余噪声影响,并使目标语音信号的畸变尽可能的小,完成麦克风阵列后滤波语音增强。Step f: According to the noise power spectrum and auditory masking threshold, combined with Lagrangian multipliers to estimate the post-filter, so that the residual noise in the enhanced speech is smaller than the auditory masking threshold of the human ear, thereby eliminating the influence of residual noise and making the target speech signal The distortion is as small as possible, and the speech enhancement is filtered after completing the microphone array.
其中,所述对功率谱矩阵进行特征值分解,包括:Wherein, the eigenvalue decomposition of the power spectrum matrix includes:
利用特征值分解将带噪语音信号空间分为两个子空间,即信号子空间:包含目标语音信号和噪声;噪声子空间:只包含噪声;把带噪语音信号X在时帧t和频率k的功率谱矩阵ΦXX(k,t)特征值分解为:Using eigenvalue decomposition, the noisy speech signal space is divided into two subspaces, that is, the signal subspace: contains the target speech signal and noise; the noise subspace: only contains noise; the noisy speech signal X in the time frame t and frequency k The eigenvalue decomposition of the power spectrum matrix Φ XX (k, t) is:
ΦXX(k,t)=UΛXXUH=U(ΛSS+φNN(k,t)I)UH Φ XX (k, t) = UΛ XX U H = U(Λ SS + φ NN (k, t)I) U H
其中,X=S+N,X为带噪语音信号,S为目标语音信号,N为噪声;ΛXX为特征值降序排列的带噪语音信号功率谱特征值矩阵,ΛSS为特征值降序排列的目标语音信号功率谱特征值矩阵,U为特征向量矩阵,φNN(k,t)为时帧t和频率k的噪声功率,I为L阶单位阵,[·]H为共轭转置算子。Wherein, X=S+N, X is a noisy speech signal, S is a target speech signal, and N is a noise; Λ XX is the noisy speech signal power spectrum eigenvalue matrix of the eigenvalue descending order, and Λ SS is the eigenvalue descending order arrangement The eigenvalue matrix of the power spectrum of the target speech signal, U is the eigenvector matrix, φ NN (k, t) is the noise power of time frame t and frequency k, I is the L-order identity matrix, [·] H is the conjugate transpose operator.
其中,所述确定信号子空间维度是取最合适的Q值使得带噪语音中目标语音信号存在的概率最大;利用条件概率计算,步骤包括:Wherein, the determination of the signal subspace dimension is to get the most suitable Q value so that the probability of the target speech signal in the noisy speech is maximized; using conditional probability calculation, the steps include:
定义互斥事件H0和H1:Define mutually exclusive events H 0 and H 1 :
事件H0:带噪语音信号中,只存在噪声,不存在目标语音信号;Event H 0 : In the noisy speech signal, there is only noise and no target speech signal;
事件H1:带噪语音信号中,目标语音信号与噪声同时存在;Event H 1 : In the noisy speech signal, the target speech signal and noise exist simultaneously;
信号子空间维度Q定义为:The signal subspace dimension Q is defined as:
其中,S(k,t)是目标语音信号信号在第t帧的第k个频率点上的功率谱,P(·)是目标语音信号谱的分布函数,argmax(·)是寻找具有最大评分的参数值的算子。Among them, S(k, t) is the power spectrum of the target speech signal signal at the kth frequency point of frame t, P(·) is the distribution function of the target speech signal spectrum, and argmax(·) is to find The operator of the parameter value of .
其中,所述基于谱的平稳性,自适应选择带噪语音信号中噪声功率谱分布模型,包括以下步骤:Wherein, the described stationarity based on the spectrum, adaptively selects the noise power spectrum distribution model in the noisy speech signal, comprising the following steps:
步骤c1:定义一个用来表述功率谱的平稳性的判别函数Ω:Step c1: Define a discriminant function Ω used to describe the stationarity of the power spectrum:
即,Ω为几何平均对算术平均的比值,其中,是带噪语音信号功率谱特征值矩阵ΛXX的第i个特征值,i∈{Q+1,…,L}是特征值的下标,Ω的值在0到1之间;That is, Ω is the geometric mean arithmetic mean The ratio of , where, is the i-th eigenvalue of the noisy speech signal power spectrum eigenvalue matrix ΛXX , i∈{Q+1,...,L} is the subscript of the eigenvalue, and the value of Ω is between 0 and 1;
步骤c2:根据判别函数值与预设阈值比较,确定适用在带噪语音信号中的噪声功率谱分布模型。Step c2: Determine the noise power spectrum distribution model applicable to the noisy speech signal according to the comparison between the value of the discriminant function and the preset threshold.
其中,所述根据判别函数值与预设阈值的比较步骤包括:Wherein, the step of comparing the discriminant function value with the preset threshold includes:
步骤c21:确定两个预设阈值Ω1和Ω2,Ω1<Ω2;Step c21: Determine two preset thresholds Ω 1 and Ω 2 , Ω 1 <Ω 2 ;
步骤c22:比较判别函数与预设阈值,特别地,如果判别函数小于预设阈值Ω1,则选用零均值高斯分布;如果判别大于预设阈值Ω2,则选用伽玛分布;否则选用拉普拉斯分布。Step c22: compare the discriminant function with the preset threshold, in particular, if the discriminant function is smaller than the preset threshold Ω 1 , use the zero-mean Gaussian distribution; if the discriminant function is greater than the preset threshold Ω 2 , use the Gamma distribution; otherwise, use the Lapp Russ distribution.
其中,利用条件概率估计噪声功率谱的步骤包括:Wherein, the steps of estimating the noise power spectrum using the conditional probability include:
对于每一帧带噪语音信号,它只含有噪声的概率是P(H0|X),即含有噪声又含有目标语音信号的概率是P(H1|X);针对这两种情况,分别估计噪声功率谱如下:For each frame of noisy speech signal, the probability that it only contains noise is P(H 0 |X), that is, the probability that it contains both noise and target speech signal is P(H 1 |X); for these two cases, respectively The estimated noise power spectrum is as follows:
其中,φNN 0和φNN 1分别是噪声在互斥事件H0和H1发生情况下的功率谱,i∈{1,…,L}是特征值的下标;Among them, φ NN 0 and φ NN 1 are the power spectrum of the noise when mutually exclusive events H 0 and H 1 occur, respectively, and i∈{1,…,L} is the subscript of the eigenvalue;
根据条件概率公式,噪声功率谱估计如下:According to the conditional probability formula, the noise power spectrum is estimated as follows:
其中,所述估计听觉掩蔽阈值的步骤包括:Wherein, the step of estimating the auditory masking threshold comprises:
步骤f1:将听觉频率范围0-15500Hz划分为若干个关键子频带;Step f1: dividing the auditory frequency range 0-15500Hz into several key sub-bands;
步骤f2:分别计算每个子频带中的听觉掩蔽阈值。Step f2: Calculate the auditory masking threshold in each sub-band separately.
其中,所述计算每个子频带中的听觉掩蔽阈值是计算各子频带上各频点的能量,计算人耳基膜对于各频段声音的传播系数,然后将各子频带上各频点的能量和各频段声音的传播系数两者相乘得到人耳基膜上的激励能量值,再根据人耳基膜上的激励能量值与听觉掩蔽阈值的函数关系计算得到掩蔽阈值。Wherein, the calculation of the auditory masking threshold in each sub-band is to calculate the energy of each frequency point on each sub-band, calculate the transmission coefficient of the human ear basilar membrane for the sound of each frequency band, and then calculate the energy of each frequency point on each sub-band and Multiply the propagation coefficients of the sound in each frequency band to obtain the excitation energy value on the basilar membrane of the human ear, and then calculate the masking threshold according to the functional relationship between the excitation energy value on the basilar membrane of the human ear and the auditory masking threshold.
其中,所述结合拉格朗日乘子估计后滤波器G的步骤如下:Wherein, the steps of combining the estimated filter G with Lagrange multipliers are as follows:
步骤fa:在残余噪声功率小于掩蔽阈值的约束条件下,最小化目标语音信号的畸变,以此建立最优化问题;Step fa: Under the constraint that the residual noise power is less than the masking threshold, minimize the distortion of the target speech signal, thereby establishing an optimization problem;
步骤fb:结合拉格朗日乘子求解,得到后滤波器的最优估计;Step fb: Combining with Lagrangian multipliers to solve, obtain the optimal estimate of the post-filter;
步骤fc:带入听觉掩蔽阈值和噪声功率谱估计,完成后滤波器的设计。Step fc: Bring in the auditory masking threshold and noise power spectrum estimation, and complete the design of the post-filter.
本发明的有益效果:本发明利用人耳的听觉掩蔽效应提出了一种合理的折中方案,设计了一种新的基于听觉感知特性的后滤波器。传统的噪声估计方法是基于VAD的噪声估计方法,也就是检测出带噪语音中的纯噪声帧,用这些帧上的平均功率谱来估计语音与噪声混合帧上的噪声功率谱。由于噪声是变化的,各帧上的噪声实际上是不同的。所以,基于VAD的噪声估计方法用纯噪声帧上的平均噪声功率谱来估计所有帧上的噪声功率谱会导致较大的估计误差。针对这一情况,本发明提出了一种基于带噪信号子空间分解的噪声功率谱估计方法,在每一帧信号上都估计噪声功率谱,极大的减少了噪声估计误差。接着,本发明利用人耳的听觉掩蔽效应设计后滤波器,使得增强后语音中的残余噪声被目标语音所掩蔽,在降噪的同时也减少了目标语音的失真。Beneficial effects of the present invention: the present invention uses the auditory masking effect of the human ear to propose a reasonable compromise solution, and designs a new post-filter based on auditory perception characteristics. The traditional noise estimation method is a noise estimation method based on VAD, which detects the pure noise frames in the noisy speech, and uses the average power spectrum on these frames to estimate the noise power spectrum on the speech and noise mixed frames. Since the noise is variable, the noise is actually different on each frame. Therefore, the VAD-based noise estimation method uses the average noise power spectrum on pure noise frames to estimate the noise power spectrum on all frames, which will lead to a large estimation error. In view of this situation, the present invention proposes a noise power spectrum estimation method based on subspace decomposition of noisy signals, which estimates the noise power spectrum on each frame signal, greatly reducing noise estimation errors. Next, the present invention utilizes the auditory masking effect of the human ear to design the post-filter so that the residual noise in the enhanced speech is covered by the target speech, reducing the distortion of the target speech while reducing the noise.
附图说明 Description of drawings
本发明进一步的特色和优点将参考说明性的附图在下面描述。Further features and advantages of the invention will be described below with reference to the illustrative drawings.
图1示出一个应用基于多模型和听觉特性的麦克风阵列后滤波语音增强方法的示例流程图;Fig. 1 shows an example flowchart of applying the microphone array post-filtering speech enhancement method based on multiple models and auditory characteristics;
图2是一个确定信号子空间维度方法的流程图;Fig. 2 is a flowchart of a method for determining signal subspace dimensions;
图3是一个确定带噪语音信号中噪声功率谱分布模型的流程图;Fig. 3 is a flow chart of determining the noise power spectrum distribution model in the noisy speech signal;
图4是一个利用条件概率估计噪声功率谱的流程图;Fig. 4 is a flowchart of estimating the noise power spectrum using conditional probability;
图5是一个计算听觉掩蔽阈值的流程图;Fig. 5 is a flowchart of calculating the auditory masking threshold;
图6是一个设计后滤波器的流程图。Figure 6 is a flow chart for designing a post-filter.
具体实施方式 Detailed ways
应当理解,不同示例以及附图的下列详细说明不是意在把本发明限制于特殊的说明性实施例;被描述的说明性实施例仅仅是例证本发明的各个步骤,其范围由附加的权利要求来定义。It should be understood that the following detailed description of the various examples and drawings is not intended to limit the invention to the particular illustrative embodiments; the described illustrative embodiments merely exemplify the various steps of the invention, the scope of which is defined by the appended claims to define.
本发明利用人耳的听觉掩蔽效应提出了一种合理的折中方案,设计了一种新的基于听觉感知特性的后滤波器。人耳的听觉掩蔽效应是指,在通常情况下,目标语音信号信号是强信号,而背景噪声相对较弱,这样听觉系统会根据具体的目标语音信号信号确定频域上的听觉掩蔽阈值,如果使滤波后的残余噪声限制在人耳的听觉掩蔽阈值之下,那么该噪声就不会被人耳感知,从而实现对带噪语音信号的增强。具体的步骤如下:The present invention uses the auditory masking effect of the human ear to propose a reasonable compromise solution, and designs a new post-filter based on auditory perception characteristics. The auditory masking effect of the human ear means that, under normal circumstances, the target speech signal signal is a strong signal, while the background noise is relatively weak, so the auditory system will determine the auditory masking threshold in the frequency domain according to the specific target speech signal signal, if If the filtered residual noise is limited below the auditory masking threshold of the human ear, then the noise will not be perceived by the human ear, thereby enhancing the noisy speech signal. The specific steps are as follows:
一种新的基于多模型和听觉特性的麦克风阵列后滤波语音增强方法,包括下列步骤:A new method for speech enhancement based on multi-model and auditory characteristics after microphone array filtering, comprising the following steps:
步骤a:通过L个麦克风组成的麦克风阵列采集带噪声的多路语音信号,把各路带噪声的语音信号进行时域对齐,使用短时离散傅里叶变换将对齐后的各路信号表示成复数值的频率信号形式,计算麦克风阵列多路信号的功率谱矩阵并对此功率谱矩阵进行特征值分解得到特征值矩阵和特征向量矩阵;Step a: Collect noisy multi-channel speech signals through a microphone array composed of L microphones, align each noisy speech signal in the time domain, and use short-time discrete Fourier transform to express the aligned signals as In the form of complex-valued frequency signals, calculate the power spectrum matrix of the multi-channel signal of the microphone array and perform eigenvalue decomposition on the power spectrum matrix to obtain the eigenvalue matrix and eigenvector matrix;
步骤b:通过极大化带噪语音信号中目标语音信号的存在概率,确定信号子空间的维度Q;Step b: Determine the dimension Q of the signal subspace by maximizing the existence probability of the target speech signal in the noisy speech signal;
步骤c:基于谱的平稳性,自适应选择带噪语音信号中噪声功率谱分布模型;Step c: adaptively select the noise power spectrum distribution model in the noisy speech signal based on the stationarity of the spectrum;
步骤d:利用条件概率估计噪声功率谱;Step d: Estimate the noise power spectrum using conditional probability;
步骤e:根据信号子空间维度和噪声功率谱估计,利用听觉掩蔽效应,基于信号子空间估计得到各频点的听觉掩蔽阈值;Step e: According to the signal subspace dimension and the noise power spectrum estimation, using the auditory masking effect, the auditory masking threshold of each frequency point is obtained based on the signal subspace estimation;
步骤f:根据噪声功率谱、听觉掩蔽阈值,结合拉格朗日乘子估计后滤波器,使得增强语音中的残余噪声小于人耳的听觉掩蔽阈值,从而消除残余噪声影响,并使目标语音信号的畸变尽可能的小,完成麦克风阵列后滤波语音增强。Step f: According to the noise power spectrum and auditory masking threshold, combined with Lagrangian multipliers to estimate the post-filter, so that the residual noise in the enhanced speech is smaller than the auditory masking threshold of the human ear, thereby eliminating the influence of residual noise and making the target speech signal The distortion is as small as possible, and the speech enhancement is filtered after completing the microphone array.
通常使用的噪声估计方法是基于VAD的噪声估计方法。也就是检测出带噪语音中的纯噪声帧,用这些帧上的平均功率谱来估计语音与噪声混合帧上的噪声功率谱。由于噪声是变化的,各帧上的噪声实际上是不同的。所以,基于VAD的噪声估计方法用纯噪声帧上的平均噪声功率谱来估计所有帧上的噪声功率谱会导致较大的估计误差。A commonly used noise estimation method is a VAD-based noise estimation method. That is to detect pure noise frames in noisy speech, and use the average power spectrum on these frames to estimate the noise power spectrum on speech and noise mixed frames. Since the noise is variable, the noise is actually different on each frame. Therefore, the VAD-based noise estimation method uses the average noise power spectrum on pure noise frames to estimate the noise power spectrum on all frames, which will lead to a large estimation error.
针对这一情况,本发明步骤b)和步骤d)采用了一种基于带噪信号子空间分解的方法来估计噪声子空间的维度和噪声功率谱,在每一帧信号上都估计噪声功率谱,极大地减少了噪声估计误差。In response to this situation, step b) and step d) of the present invention adopt a method based on the decomposition of the subspace of the noisy signal to estimate the dimension and noise power spectrum of the noise subspace, and estimate the noise power spectrum on each frame signal , greatly reducing the noise estimation error.
在目标语音信号与噪声不相关的假设下,带噪语音信号在时帧t和频率k的功率谱矩阵ΦXX(k,t)可表示为目标语音信号信号功率谱矩阵ΦSS(k,t)和噪声信号功率谱矩阵ΦNN(k,t)之和:Under the assumption that the target speech signal is uncorrelated with the noise, the power spectrum matrix Φ XX (k, t) of the noisy speech signal at time frame t and frequency k can be expressed as the target speech signal signal power spectrum matrix Φ SS (k, t ) and the sum of the noise signal power spectrum matrix Φ NN (k, t):
ΦXX(k,t)=ΦSS(k,t)+ΦNN(k,t)(6)Φ XX (k, t) = Φ SS (k, t) + Φ NN (k, t) (6)
对于麦克风阵列信号而言,可假设各阵元上噪声信号的自功率谱相等,而阵元间噪声信号不相关,则下式成立:For the microphone array signal, it can be assumed that the self-power spectrum of the noise signal on each array element is equal, and the noise signals between the array elements are not correlated, then the following formula holds:
ΦNN(k,t)=φNN(k,t)I (7)Φ NN (k, t) = Φ NN (k, t)I (7)
其中,I为L阶单位矩阵,φNN(k,t)为单通道噪声的自功率谱。Among them, I is the L-order identity matrix, and φ NN (k, t) is the self-power spectrum of single-channel noise.
令目标语音信号功率谱矩阵的特征值分解为:Let the eigenvalue decomposition of the power spectrum matrix of the target speech signal be:
ΦSS(k,t)=UΛSSUH (8)Φ SS (k, t) = U Λ SS U H (8)
其中,ΛSS为特征值降序排列的特征值矩阵,U为对应的特征向量矩阵,Q为矩阵的秩,且Q≤L。Among them, ΛSS is the eigenvalue matrix with the eigenvalues arranged in descending order, U is the corresponding eigenvector matrix, Q is the rank of the matrix, and Q≤L.
利用特征值分解可将带噪信号空间分为两个子空间:信号子空间(包含目标语音信号和噪声)和噪声子空间(只包含噪声)。设带噪信号功率谱矩阵特征值分解为:Using eigenvalue decomposition, the noisy signal space can be divided into two subspaces: signal subspace (including target speech signal and noise) and noise subspace (only noise). Let the eigenvalue decomposition of the power spectrum matrix of the noisy signal be:
ΦXX(k,t)=UΛXXUH=U(ΛSS+φNN(k,t)I)UH (9)Φ XX (k, t) = UΛ XX U H = U(Λ SS + φ NN (k, t)I) U H (9)
ΛXX为特征值降序排列的带噪语音信号功率谱特征值矩阵,I为L阶单位阵。 ΛXX is the eigenvalue matrix of the power spectrum eigenvalue of the noisy speech signal arranged in descending order of eigenvalue, and I is the L-order unit matrix.
本发明提出了从噪声子空间中估计得到噪声自功率谱φNN的方法。首先需要确定信号子空间的维度Q和噪声子空间维度P。The present invention proposes a method for estimating and obtaining the noise autopower spectrum φ NN from the noise subspace. First, the dimension Q of the signal subspace and the dimension P of the noise subspace need to be determined.
在步骤b)中,提供了一种通过极大化带噪语音信号中目标语音信号的存在概率来确定Q的方法,即取最合适的Q值使得目标语音信号存在的概率最大。In step b), a method for determining Q by maximizing the existence probability of the target speech signal in the noisy speech signal is provided, that is, taking the most appropriate Q value to maximize the probability of the target speech signal.
利用条件概率计算,定义互斥事件H0和H1:Using conditional probability calculation, define mutually exclusive events H 0 and H 1 :
事件H0:带噪语音信号中,只存在噪声,不存在目标语音信号;Event H 0 : In the noisy speech signal, there is only noise and no target speech signal;
事件H1:带噪语音信号中,目标语音信号与噪声同时存在;Event H 1 : In the noisy speech signal, the target speech signal and noise exist simultaneously;
信号子空间维度Q定义为:The signal subspace dimension Q is defined as:
其中,S(k,t)是目标语音信号信号在第t帧的第k个频率点上的功率谱,P(·)是目标语音信号谱的分布函数,argmax(·)是寻找具有最大评分的参数值的算子。Among them, S(k, t) is the power spectrum of the target speech signal signal at the kth frequency point of frame t, P(·) is the distribution function of the target speech signal spectrum, and argmax(·) is to find The operator of the parameter value of .
步骤c)提供了一种基于谱的平稳性选择带噪语音信号中噪声功率谱分布模型的自适应方法。该方法包括下列步骤:Step c) provides an adaptive method for selecting a noise power spectrum distribution model in a noisy speech signal based on the stationarity of the spectrum. The method includes the following steps:
首先,定义判别函数ΩFirst, define the discriminant function Ω
即,Ω为几何平均对算术平均的比值其中,是带噪语音信号功率谱特征值矩阵ΛXX的第i个特征值,i∈{Q+1,…,L}是特征值的下标,Ω的值在0到1之间。That is, Ω is the geometric mean arithmetic mean The ratio of which, is the i-th eigenvalue of the power spectrum eigenvalue matrix ΛXX of the noisy speech signal, i∈{Q+1,...,L} is the subscript of the eigenvalue, and the value of Ω is between 0 and 1.
然后,确定两个预设阈值,Ω1和Ω2(Ω1<Ω2),比较判别函数与预设阈值,特别地,如果判别函数小于预设阈值Ω1,则选用零均值高斯分布;如果判别大于预设阈值Ω2,则选用伽玛分布;否则选用拉普拉斯分布。Then, determine two preset thresholds, Ω 1 and Ω 2 (Ω 1 <Ω 2 ), compare the discriminant function with the preset threshold, in particular, if the discriminant function is smaller than the preset threshold Ω 1 , then use a zero-mean Gaussian distribution; If the discrimination is greater than the preset threshold Ω 2 , the Gamma distribution is selected; otherwise, the Laplace distribution is selected.
在步骤d)中,提供了一种利用条件概率估计噪声功率谱的方法。对于每一帧带噪语音信号,它只含有噪声的概率是P(H0|X),即含有噪声又含有目标语音信号的概率是P(H1|X);针对这两种情况,分别估计噪声功率谱如下:In step d), a method for estimating the noise power spectrum using conditional probability is provided. For each frame of noisy speech signal, the probability that it only contains noise is P(H 0 |X), that is, the probability that it contains both noise and target speech signal is P(H 1 |X); for these two cases, respectively The estimated noise power spectrum is as follows:
其中,i∈{1,…,L}是特征值的下标,φNN 0和φNN 1分别是噪声在互斥事件H0和H1发生情况下的功率谱。where i∈{1,...,L} is the subscript of the eigenvalues, and φ NN 0 and φ NN 1 are the power spectra of the noise when mutually exclusive events H 0 and H 1 occur, respectively.
根据条件概率公式,噪声功率谱估计方法如下:According to the conditional probability formula, the noise power spectrum estimation method is as follows:
步骤e)提供了一种根据信号子空间维度和噪声功率谱估计,利用听觉掩蔽效应,基于信号子空间估计得到各频点的听觉掩蔽阈值的方法。Step e) provides a method for obtaining the auditory masking threshold of each frequency point based on the signal subspace estimation by using the auditory masking effect according to the dimension of the signal subspace and the estimation of the noise power spectrum.
听觉频率范围是0到15500Hz,覆盖了24个临界子频带,需要在每个子频带中计算听觉掩蔽阈值。首先计算各子频带上各频点的能量,再计算人耳基膜对于各频段声音的传播系数,然后将各子频带上各频点的能量和各频段声音的传播系数两者相乘得到人耳基膜上的激励能量值。最后,根据人耳基膜上的激励能量值与听觉掩蔽阈值的函数关系,再进一步计算得到掩蔽阈值。The auditory frequency range is 0 to 15500 Hz, covering 24 critical sub-bands, and the auditory masking threshold needs to be calculated in each sub-band. First calculate the energy of each frequency point on each sub-band, and then calculate the transmission coefficient of the human ear basement membrane for each frequency band sound, and then multiply the energy of each frequency point on each sub-band and the transmission coefficient of each frequency band sound to obtain the human Excitation energy values on the ear basilar membrane. Finally, according to the functional relationship between the excitation energy value on the basilar membrane of the human ear and the auditory masking threshold, the masking threshold is further calculated.
步骤f)提供了一种根据噪声功率谱、听觉掩蔽阈值,结合拉格朗日乘子估计后滤波器G(ejω)的方法。使得增强语音中的残余噪声小于人耳的听觉掩蔽阈值,从而消除残余噪声影响,并使目标语音信号的畸变尽可能的小。完成麦克风阵列后滤波语音增强。Step f) provides a method for estimating the post-filter G(e jω ) according to the noise power spectrum, auditory masking threshold, and Lagrangian multipliers. The residual noise in the enhanced speech is made smaller than the auditory masking threshold of the human ear, thereby eliminating the influence of the residual noise and making the distortion of the target speech signal as small as possible. Filtered speech enhancement after completing the microphone array.
假设最小方差非畸变响应波束形成器的输出信号为目标语音信号信号为S(ejω),后滤波增强后的语音信号与目标语音信号信号的误差可表述如下:Assume that the output signal of the minimum variance undistorted response beamformer is The target speech signal is S(e jω ), and the error between the post-filtered and enhanced speech signal and the target speech signal can be expressed as follows:
其中,为中的噪音。in, for in the noise.
式(14)中的第一项描述了增强语音中目标语音信号的畸变,第二项描述了增强语音中残余噪声的大小。可计算出一个合适的后滤波器G(ejω)使得增强语音中的残余噪声小于人耳的听觉掩蔽阈值,从而消除其影响。针对式(14),本发明提出如下目标约束:The first term in formula (14) describes the distortion of the target speech signal in the enhanced speech, and the second term describes the size of the residual noise in the enhanced speech. An appropriate post-filter G(e jω ) can be calculated to make the residual noise in the enhanced speech smaller than the auditory masking threshold of the human ear, thereby eliminating its influence. For formula (14), the present invention proposes the following target constraints:
约束条件:Restrictions:
其中,Cthr为听觉掩蔽阈值。Among them, C thr is the auditory masking threshold.
用拉格朗日乘子法求解,令:Solve using the Lagrange multiplier method, let:
其中,μ是拉格朗日乘子。where μ is the Lagrangian multiplier.
令J对G(ejω)求导,并使其为零,可得:Let J take the derivative of G(e jω ) and make it zero, we can get:
由式(18)可看出在本发明的目标约束下,基于听觉感知特性的后滤波器在表达形式上就是更合理地估计了噪声的维纳滤波器。It can be seen from formula (18) that under the objective constraints of the present invention, the post-filter based on auditory perception characteristics is a Wiener filter that estimates noise more reasonably in terms of expression.
令J对μ求导,并使其为零,可得:Let J take the derivative with respect to μ, and make it zero, we can get:
由(18)和(19)两式相等,可得:From (18) and (19) are equal, we can get:
将(20)带入(18),并用式(13)中的代替得到本文所提的基于听觉感知特性的后滤波器如下:Bring (20) into (18), and use the formula (13) replace The post-filter based on auditory perception characteristics proposed in this paper is obtained as follows:
在图1中出一个应用基于多模型和听觉特性的麦克风阵列后滤波语音增强方法流程图。系统包括至少两个麦克风101的麦克风阵列。FIG. 1 shows a flow chart of a speech enhancement method based on multi-model and auditory characteristic-based microphone array post-filtering. The system comprises a microphone array of at least two
麦克风阵列的麦克风可能有不同的排列,特别地,麦克风101被置于一排,其中每个麦克风和相邻近的麦克风有预设距离。例如,两个麦克风之间的距离可能大约是5厘米。对于不同的应用环境和技术要求,麦克风阵列可能被安装在适当的位置。The microphones of the microphone array may be arranged in different ways. In particular, the
从麦克风101采集的语音信号被送到信号处理单元102。在送往信号处理单元之前,语音信号可以经过低通滤波器来预处理语音信号。The voice signal collected from the
信号处理单元102对不同麦克风输采集的语音信号进行延迟补偿以实现时域对齐。使用短时离散傅里叶变换将对齐后的各麦克风信号表示成复数值的频率信号形式,计算麦克风阵列采集的多路带噪语音信号在时帧t、频率k的功率谱矩阵ΦXX(k,t)并对此矩阵进行特征值分解,得到特征值矩阵ΛXX和特征向量矩阵U。The
在接下来的步骤103中,利用特征值矩阵ΛXX,通过极大化带噪语音信号中目标语音信号的存在概率的方法,确定信号子空间的维度Q。In the
接着,步骤104利用信号子空间的维度Q,基于谱的平稳性,自适应选择带噪语音信号中噪声功率谱分布模型。Next, step 104 uses the dimension Q of the signal subspace to adaptively select a noise power spectrum distribution model in the noisy speech signal based on the stationarity of the spectrum.
步骤105利用信号子空间维度Q和噪声功率谱分布模型,根据条件概率估计噪声功率谱。Step 105 uses the signal subspace dimension Q and the noise power spectrum distribution model to estimate the noise power spectrum according to the conditional probability.
步骤106利用信号子空间维度和噪声功率谱估计,根据听觉掩蔽效应,基于信号子空间估计得到各频点的听觉掩蔽阈值。Step 106 uses the signal subspace dimension and the noise power spectrum estimation to obtain the auditory masking threshold of each frequency point based on the signal subspace estimation according to the auditory masking effect.
最后,步骤107利用噪声功率谱估计和听觉掩蔽阈值,结合拉格朗日乘子设计后滤波器。Finally,
在图2,说明了一个确定信号子空间维度的方法的流程,该方法对应于图1中的步骤103。In FIG. 2 , a flowchart of a method for determining the dimension of a signal subspace is illustrated, and the method corresponds to step 103 in FIG. 1 .
经过步骤101和步骤102,麦克风阵列采集的语音信号已经通过时域对齐,短时傅里叶变换。并对多路带噪语音信号的功率谱ΦXX进行特征值分解,得到特征值矩阵ΛXX和特征向量矩阵U。由(9)式,带噪信号功率谱特征值矩阵被分解为信号功率谱特征值与噪声功率谱特征值的和,Q是信号子空间的维度。After
在第一步骤201中,初始化信号子空间的维度Q,令其为1。In the
接下来,步骤202更新噪声功率谱和目标语音信号功率谱。由于带噪语音信号功率谱特征值矩阵ΛXX是降序排列,并假设信号强度大于噪声,所以当信号子空间的维度为Q时,噪声的功率为Next, step 202 updates the noise power spectrum and target speech signal power spectrum. Since the power spectrum eigenvalue matrix ΛXX of the noisy speech signal is arranged in descending order, and it is assumed that the signal strength is greater than the noise, so when the dimension of the signal subspace is Q, the power of the noise is
其中,i∈{Q+1,…,L}是特征值的下标。where i∈{Q+1,...,L} is the subscript of the eigenvalues.
而目标语音信号的功率为The power of the target speech signal is
其中,i∈{1,…,Q}是特征值的下标。where i ∈ {1,...,Q} is the subscript of the eigenvalues.
那么,目标语音信号的方差为Then, the variance of the target speech signal is
其中,其中,i∈{1,…,Q}是特征值的下标。where, i ∈ {1,...,Q} is the subscript of the eigenvalues.
步骤203从高斯模型、拉普拉斯模型和伽玛模型中任意选择一个来描述目标语音信号的谱分布。计算目标语音信号的条件概率PG(S(k,t)|H1),特别地,当选择高斯模型时,Step 203 randomly selects one from Gaussian model, Laplacian model and Gamma model to describe the spectral distribution of the target speech signal. Calculate the conditional probability P G (S(k,t)|H 1 ) of the target speech signal, especially, when the Gaussian model is selected,
步骤204实现变量Q和j的自加运算:Step 204 realizes the self-increment operation of variables Q and j:
Q=Q+1Q=Q+1
接着步骤205判断循环终止条件Q>L,特别地,当条件不满足时,返回步骤202;否则进行步骤206。Then step 205 judges the loop termination condition Q>L, especially, when the condition is not satisfied, return to step 202; otherwise, go to step 206.
步骤206利用本发明的(10)式,最终确定了信号子空间的维度Q,即Step 206 utilizes formula (10) of the present invention to finally determine the dimension Q of the signal subspace, namely
在图3中,说明了一个确定带噪语音信号中噪声功率谱分布模型的流程图。该方法对应于图1中的步骤104。In Fig. 3, a flowchart for determining a noise power spectral distribution model in a noisy speech signal is illustrated. This method corresponds to step 104 in FIG. 1 .
高斯模型、拉普拉斯模型和伽玛模型都可以被用来描述语音信号和噪声信号的谱系数,但是对于不同的噪声类型其噪声特性也会有所不同,所以模型选择应根据目标噪声的特性有针对性的进行。在本示例中,根据计算机风扇噪声的统计数据给出了一种基于谱的平稳性进行模型选择的方法。Gaussian model, Laplacian model and Gamma model can all be used to describe the spectral coefficients of speech signals and noise signals, but the noise characteristics will be different for different noise types, so the model selection should be based on the target noise Features are targeted. In this example, a method for model selection based on spectral stationarity is presented based on statistics of computer fan noise.
在步骤301中,由(11)式计算出判别函数值Ω。In
步骤302判断判别函数值Ω是否小于Ω1,如果判断结果为真,则选择高斯模型;否则执行步骤303,判断判别函数值Ω是否小于Ω2,如果判断结果为真,则选择拉普拉斯模型;否则选择伽玛模型。Step 302 judges whether the discriminant function value Ω is less than Ω 1 , if the judgment result is true, then select the Gaussian model; otherwise, execute
本发明体现的模型自适应选择算法,是基于在对大量计算机风扇噪声实验数据统计的结果。实验发现高斯模型在Ω取较小值时为最优模型,在Ω值较大时,拉普拉斯模型最优,而伽玛模型总的平均噪声估计误差是最小的。据此,本发明进行模型选择如下:The model self-adaptive selection algorithm embodied in the present invention is based on the statistical results of a large number of computer fan noise experiment data. Experiments have found that the Gaussian model is the optimal model when the value of Ω is small, the Laplace model is the best when the value of Ω is large, and the total average noise estimation error of the Gamma model is the smallest. Accordingly, the present invention selects the model as follows:
在图4中,说明了一个利用条件概率估计噪声功率谱的方法流程图。该方法对应于图1中的步骤105。In Fig. 4, a flowchart of a method for estimating the noise power spectrum using conditional probability is illustrated. This method corresponds to step 105 in FIG. 1 .
步骤401计算带噪语音信号起始段纯噪声帧的平均功率谱φNN pre。Step 401 calculates the average power spectrum φ NN pre of the pure noise frame at the beginning of the noisy speech signal.
步骤402计算计算当前帧的功率谱Step 402 calculates and calculates the power spectrum of the current frame
其中,i∈{1,…,L}是特征值的下标。where i ∈ {1,...,L} is the subscript of the eigenvalues.
接下来步骤403计算当前帧功率谱与纯噪声功率谱的比值
步骤403到步骤408共同完成了条件概率P(H0|X)的计算。首先比较r与设定阈值α的大小,α取略大于1的较小值,特别地,α取为1.2。当r<α时,当前帧更可能为纯噪声帧,所以P(H0|X)应取较大的值,本发明设置其下限为0.8。如果当r>α,当前帧更可能是语音帧,此时P(H0|X)应取一个合适的值。由于信号的能量在各个频率上分布式不均匀的,所以,这里根据不同的频率取不同的P(H0|X)值。在低频时,P(H0|X)的值应大于高频的值,因为信号的能量大多集中在低频区域。即
其中,fthr是高低频的界限频率,β1和β2是加权系数。Among them, f thr is the limit frequency of high and low frequencies, and β 1 and β 2 are weighting coefficients.
步骤409计算条件概率P(H1|X)=1-P(H0|X)。Step 409 calculates the conditional probability P(H 1 |X)=1-P(H 0 |X).
得到条件概率P(H0|X)和P(H1|X)以后,步骤410利用(13)式得到噪声功率谱的估计值 After obtaining the conditional probabilities P(H 0 |X) and P(H 1 |X),
在图5中,说明了一种计算听觉掩蔽阈值的方法的流程图。该方法对应于图1中的步骤106。为了将信号中的噪声掩蔽掉,从而实现对目标语音信号信号的增强,需要将噪声限制在该阈值以下。In Fig. 5, a flow diagram of a method of calculating an auditory masking threshold is illustrated. This method corresponds to step 106 in FIG. 1 . In order to mask the noise in the signal so as to enhance the target speech signal, it is necessary to limit the noise below the threshold.
步骤501将0到15500Hz的人耳听觉范围划分为24个子频带,以便于在每个子频带中计算听觉掩蔽阈值。Step 501 divides the human hearing range from 0 to 15500 Hz into 24 sub-bands, so as to calculate the auditory masking threshold in each sub-band.
在步骤502中,利用步骤206所得的信号子空间维度,计算了各频点的能量。H(j,b)表示的是第j个子频带内第b个频点上的能量,可根据信号子空间特征值和特征向量计算出来。In
其中,
SF(j)是表达第j个子频带上人耳基膜传播特性的函数,j∈{1,…,24}。SF(j) is a function expressing the propagation characteristics of the basilar membrane of the human ear on the jth sub-band, j ∈ {1,...,24}.
在步骤503中,计算每个子频带的传播函数In
接下来,步骤504计算表征人耳基膜上能量的激励能量值Next,
C(j,b)=SF(j)*H(j,b),j∈{1,…,24}(29)C(j,b)=SF(j)*H(j,b), j∈{1,...,24} (29)
步骤505,计算听觉掩蔽阈值
其中,O(j)是偏移量,j∈{1,…,24}表示第j个子频带。where O(j) is the offset, and j ∈ {1,...,24} denotes the jth subband.
在图6中,说明了一个设计后滤波器的流程图。该方法对应于图1中的步骤107。In Fig. 6, a flowchart for designing a post-filter is illustrated. This method corresponds to step 107 in FIG. 1 .
在保证增强后的语音中残余噪声的功率低于听觉掩蔽阈值的条件下,为使目标语音信号信号的畸变达到最小。Under the condition that the power of the residual noise in the enhanced speech is lower than the auditory masking threshold, the distortion of the target speech signal is minimized.
步骤601描述有约束的最优化问题,如下:Step 601 describes a constrained optimization problem as follows:
目标:Target:
约束条件:Restrictions:
步骤602利用拉格朗日乘子法求解,令:Step 602 utilizes Lagrange multiplier method to solve, make:
令J对G(ejω)和μ分别求导,并使其为零,可得:Let J take derivatives of G(e jω ) and μ respectively, and make them zero, we can get:
步骤603求解此方程子,得到后滤波器的最优估计,即:Step 603 solves this equation to obtain the optimal estimate of the post-filter, namely:
再将步骤410得到的噪声功率谱估计和505得到的听觉掩蔽阈值Cthr带入,步骤604完成后滤波器的设计。Then estimate the noise power spectrum obtained in
根据本说明书,本发明进一步的修改和变化对于所述领域的技术人员是显而易见的。因此,本说明将被视为说明性的并且其目的是向所属领域技术人员讲授用于执行本发明的一般方法。应当理解,本说明书示出和描述的本发明的形式就被看作是当前的优选实施例。Further modifications and variations of the invention will be apparent to those skilled in the art from the present description. Accordingly, the description is to be regarded as illustrative and its purpose is to teach the general method for carrying out the invention to those skilled in the art. It should be understood that the form of the invention shown and described in this specification is to be considered as the presently preferred embodiment.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102503930A CN101778322B (en) | 2009-12-07 | 2009-12-07 | Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102503930A CN101778322B (en) | 2009-12-07 | 2009-12-07 | Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101778322A CN101778322A (en) | 2010-07-14 |
CN101778322B true CN101778322B (en) | 2013-09-25 |
Family
ID=42514612
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009102503930A Expired - Fee Related CN101778322B (en) | 2009-12-07 | 2009-12-07 | Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101778322B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102157156B (en) * | 2011-03-21 | 2012-10-10 | 清华大学 | Single-channel voice enhancement method and system |
US20140114650A1 (en) * | 2012-10-22 | 2014-04-24 | Mitsubishi Electric Research Labs, Inc. | Method for Transforming Non-Stationary Signals Using a Dynamic Model |
CN102945674A (en) * | 2012-12-03 | 2013-02-27 | 上海理工大学 | Method for realizing noise reduction processing on speech signal by using digital noise reduction algorithm |
CN104575511B (en) * | 2013-10-22 | 2019-05-10 | 陈卓 | Sound enhancement method and device |
EP2876900A1 (en) * | 2013-11-25 | 2015-05-27 | Oticon A/S | Spatial filter bank for hearing system |
CN105244036A (en) * | 2014-06-27 | 2016-01-13 | 中兴通讯股份有限公司 | Microphone speech enhancement method and microphone speech enhancement device |
US9401158B1 (en) * | 2015-09-14 | 2016-07-26 | Knowles Electronics, Llc | Microphone signal fusion |
EP3360250B1 (en) | 2015-11-18 | 2020-09-02 | Huawei Technologies Co., Ltd. | A sound signal processing apparatus and method for enhancing a sound signal |
CN105792074B (en) * | 2016-02-26 | 2019-02-05 | 西北工业大学 | A kind of voice signal processing method and device |
CN107370898B (en) * | 2016-05-11 | 2020-07-07 | 华为终端有限公司 | Ring tone playing method, terminal and storage medium thereof |
CN110858485B (en) * | 2018-08-23 | 2023-06-30 | 阿里巴巴集团控股有限公司 | Voice enhancement method, device, equipment and storage medium |
CN110875052A (en) * | 2018-08-31 | 2020-03-10 | 深圳市优必选科技有限公司 | Robot speech denoising method, robot device and storage device |
CN109979478A (en) * | 2019-04-08 | 2019-07-05 | 网易(杭州)网络有限公司 | Voice de-noising method and device, storage medium and electronic equipment |
CN115249484A (en) * | 2021-04-27 | 2022-10-28 | 大众问问(北京)信息科技有限公司 | Voice signal processing method, apparatus, computer device and storage medium |
CN113362856A (en) * | 2021-06-21 | 2021-09-07 | 国网上海市电力公司 | Sound fault detection method and device applied to power Internet of things |
CN113658605B (en) * | 2021-10-18 | 2021-12-17 | 成都启英泰伦科技有限公司 | Speech enhancement method based on deep learning assisted RLS filtering processing |
EP4307298A4 (en) * | 2021-12-20 | 2024-04-03 | Shenzhen Shokz Co., Ltd. | METHOD AND SYSTEM FOR RECOGNIZING SPEECH ACTIVITIES AND METHOD AND SYSTEM FOR SPEECH IMPROVEMENT |
-
2009
- 2009-12-07 CN CN2009102503930A patent/CN101778322B/en not_active Expired - Fee Related
Non-Patent Citations (2)
Title |
---|
基于高斯-拉普拉斯-伽玛模型和人耳听觉掩蔽效应的信号子空间语音增强算法;程宁等;《声学学报》;20091130;第34卷(第6期);第555-561页 * |
程宁等.基于高斯-拉普拉斯-伽玛模型和人耳听觉掩蔽效应的信号子空间语音增强算法.《声学学报》.2009,第34卷(第6期),第555-561页. |
Also Published As
Publication number | Publication date |
---|---|
CN101778322A (en) | 2010-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101778322B (en) | Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic | |
CN101777349B (en) | Signal subspace microphone array voice enhancement method based on auditory perception characteristics | |
US20230421973A1 (en) | Electronic device using a compound metric for sound enhancement | |
EP1509065B1 (en) | Method for processing audio-signals | |
Hu et al. | A generalized subspace approach for enhancing speech corrupted by colored noise | |
EP3899936B1 (en) | Source separation using an estimation and control of sound quality | |
CN106504763A (en) | Multi-target Speech Enhancement Method Based on Microphone Array Based on Blind Source Separation and Spectral Subtraction | |
US8880396B1 (en) | Spectrum reconstruction for automatic speech recognition | |
US20090304203A1 (en) | Method and device for binaural signal enhancement | |
US20120082322A1 (en) | Sound scene manipulation | |
CN107316648A (en) | A kind of sound enhancement method based on coloured noise | |
CN102903368A (en) | Method and equipment for separating convoluted blind sources | |
WO2007083814A1 (en) | Sound source separation device and sound source separation method | |
CN113257270B (en) | Multi-channel voice enhancement method based on reference microphone optimization | |
CN105390142A (en) | Digital hearing aid voice noise elimination method | |
Yee et al. | A noise reduction postfilter for binaurally linked single-microphone hearing aids utilizing a nearby external microphone | |
Saadoune et al. | Perceptual subspace speech enhancement using variance of the reconstruction error | |
Çolak et al. | A novel voice activity detection for multi-channel noise reduction | |
Saleem et al. | On improvement of speech intelligibility and quality: A survey of unsupervised single channel speech enhancement algorithms | |
Jabloun et al. | A multi-microphone signal subspace approach for speech enhancement | |
Miyazaki et al. | Theoretical analysis of parametric blind spatial subtraction array and its application to speech recognition performance prediction | |
Bavkar et al. | PCA based single channel speech enhancement method for highly noisy environment | |
Li et al. | Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments | |
Meng et al. | Fully Automatic Balance between Directivity Factor and White Noise Gain for Large-scale Microphone Arrays in Diffuse Noise Fields. | |
Rhodes | Real-Time Wind Noise Detection and Suppression with Neural-Based Signal Reconstruction for Mult-Channel, Low-Power Devices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130925 |
|
CF01 | Termination of patent right due to non-payment of annual fee |