CN114724574A

CN114724574A - Double-microphone noise reduction method with adjustable expected sound source direction

Info

Publication number: CN114724574A
Application number: CN202210157383.8A
Authority: CN
Inventors: 赵清颖; 陈喆; 殷福亮
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2022-07-08
Anticipated expiration: 2042-02-21
Also published as: CN114724574B

Abstract

The invention discloses a double-microphone noise reduction method with adjustable expected sound source direction, which comprises the following steps: preprocessing the noisy signal x received by the dual microphones₁(t) and x₂(t) discrete sampling, pre-emphasis, framing and windowing, and performing short-time Fourier transform to obtain frequency domain signal X₁(omega) and X₂(ω); a beam forming process, introducing a virtual microphone at the middle point of the connection line of the two microphones, and performing frequency domain signal X according to a central difference format₁(omega) and X₂(omega) performing a differential transformation to construct a differential signal Y₁(omega) and Y₂(ω). Calculating a difference signal Y₁(omega) and Y₂(ω) and the ratio of the statistical averages is recorded as a directivity function Γ (ω, θ), the properties of the directivity function Γ (ω, θ) are analyzed,it is directly mapped to the noise masking value λ (ω) by the normalization function. Mixing X₁Multiplying (omega) by lambda (omega) to obtain a signal R with the competing direction noise eliminated₁(ω); post-wiener filtering process, for R₁Estimating signal energy and noise energy in (omega) to obtain channel signal-to-noise ratio and calculating gain function to further eliminate R₁Residual noise in (ω).

Description

A dual-microphone noise reduction method with adjustable desired sound source direction

技术领域technical field

本发明涉及语音信号降噪技术领域，尤其涉及一种期望声源方向可调的双麦克风降噪方法。The present invention relates to the technical field of noise reduction of speech signals, and in particular, to a method for noise reduction of dual microphones with adjustable desired sound source direction.

背景技术Background technique

蓝牙耳机等便携设备自从问世以来逐渐成为人们日常生活中提升效率的好工具，但是用户在使用其拨打或接听电话时，如果受到背景噪声及非目标方向语音等的干扰，通话质量会急剧下降。在这种情况下，保留靠近用户说话方向的语音，在保证语音不失真的前提下尽可能地抑制背景噪声及非目标方向语音成为迫切需求。Portable devices such as Bluetooth headsets have gradually become a good tool for improving efficiency in people's daily lives since their introduction. However, when users use them to make or receive calls, if they are disturbed by background noise and non-target voices, the call quality will drop sharply. In this case, it is an urgent need to retain the voice close to the user's speaking direction, and to suppress background noise and non-target voice as much as possible on the premise of ensuring that the voice is not distorted.

现有的广义旁瓣抵消器(generalized sidelobe canceller，GSC)和延迟波束形成器使用多个麦克风记录信号进行空间滤波。对于蓝牙耳机等便携设备来说，GSC过于复杂，超出了微型器件的能力范围。延迟波束形成技术如一阶差分麦克风(the first-orderdifferential microphone，FDM)和自适应零形成方法(adaptive null-forming，ANF)只需要两个麦克风，是适合于尺寸限制和实时处理的设置。但这种固定波束形成器在0°处有一个最大增益，在180°处有一个零点，不能消除零点以外方向的噪声。基于输入信号之间的相干函数的算法，讨论了相干函数中实部和虚部的性质，以产生不同的掩蔽噪声的手段。基于相干函数的方法不依赖于噪声统计量，但目标方向不可调整。用于助听器的竞争方向噪声消除方法将频谱估计和阵列波束形成相结合来抑制噪声。在纯噪声区间估计指向性系数，并对其进行更新以适应移动噪声。类似地，此方法只能将期望方向设置为0°附近的有限范围。由于声源的位置有时是不恒定的，因此在实际应用中设计声源方向可调的降噪算法是很重要的。Existing generalized sidelobe cancellers (GSCs) and delayed beamformers use multiple microphone recording signals for spatial filtering. For portable devices such as Bluetooth headsets, GSCs are too complex and beyond the capabilities of tiny devices. Delayed beamforming techniques such as the first-order differential microphone (FDM) and adaptive null-forming (ANF) require only two microphones and are suitable for size-constrained and real-time processing settings. But this fixed beamformer has a maximum gain at 0° and a zero at 180°, which cannot eliminate noise in directions other than the zero. Based on the algorithm of the coherence function between the input signals, the properties of the real and imaginary parts of the coherence function are discussed to generate different means of masking noise. Methods based on coherence functions do not rely on noise statistics, but the target orientation is not adjustable. Competitive directional noise cancellation methods for hearing aids combine spectral estimation and array beamforming to suppress noise. Directivity coefficients are estimated in the pure noise interval and updated to accommodate moving noise. Similarly, this method can only set the desired direction to a limited range around 0°. Since the position of the sound source is sometimes not constant, it is important to design a noise reduction algorithm with adjustable sound source direction in practical applications.

为了解决波束形成直接应用于近距离双麦克风系统时无法准确消除非目标方向声音及期望声源方向不能根据需求设定等问题，提出一种基于波束形成与维纳滤波的两步去噪方法。测试结果表明：在低信噪比、多种类型噪声源共同存在的情境下，此方法可以有效地恢复原始信号的能量分布特点，降低背景噪声和非目标方向语音，明显提升信噪比，另外，本发明可预置期望声源方向，以适应声源变化的场景。In order to solve the problem that the non-target direction sound cannot be accurately eliminated when beamforming is directly applied to the short-range dual-microphone system and the desired sound source direction cannot be set according to the requirements, a two-step denoising method based on beamforming and Wiener filtering is proposed. The test results show that this method can effectively restore the energy distribution characteristics of the original signal, reduce the background noise and non-target direction speech, and significantly improve the signal-to-noise ratio in the case of low signal-to-noise ratio and the coexistence of various types of noise sources. , the present invention can preset the desired sound source direction to adapt to the scene where the sound source changes.

发明内容SUMMARY OF THE INVENTION

根据现有技术存在的问题，本发明公开了一种期望声源方向可调的双麦克风降噪方法，具体包括：According to the problems existing in the prior art, the present invention discloses a dual-microphone noise reduction method with adjustable desired sound source direction, which specifically includes:

预处理过程：对双麦克风接收到的带噪信号x₁(t)和x₂(t)进行离散采样、预加重、分帧及加窗处理，再经过短时傅里叶变换得到频域信号X₁(ω)和X₂(ω)；Preprocessing process: Discrete sampling, pre-emphasis, framing and windowing of the noisy signals x ₁ (t) and x ₂ (t) received by the dual microphones, and then short-time Fourier transform to obtain frequency domain signals X ₁ (ω) and X ₂ (ω);

波束形成过程：在双麦克风连线的中点处引入虚拟麦克风，根据中心差分格式对频域信号X₁(ω)和X₂(ω)进行差分变换，构造差分信号Y₁(ω)和Y₂(ω)，计算差分信号Y₁(ω)和Y₂(ω)的功率谱的统计平均值，并将统计平均值的比值记为方向性函数Γ(ω,θ)，分析方向性函数Γ(ω,θ)的性质，通过归一化函数将其直接映射为噪声掩蔽值λ(ω)，将频域信号X₁(ω)与噪声掩蔽值λ(ω)相乘得到消除掉竞争方向噪声后的信号R₁(ω)；Beamforming process: introduce a virtual microphone at the midpoint of the two-microphone connection, perform differential transformation on the frequency domain signals X ₁ (ω) and X ₂ (ω) according to the center differential format, and construct the differential signals Y ₁ (ω) and Y ₂ (ω), calculate the statistical mean of the power spectra of the differential signals Y ₁ (ω) and Y ₂ (ω), and record the ratio of the statistical mean as the directional function Γ(ω, θ), and analyze the directional function The properties of Γ(ω, θ) are directly mapped to the noise masking value λ(ω) through the normalization function, and the frequency domain signal X ₁ (ω) is multiplied by the noise masking value λ(ω) to eliminate competition. Signal R ₁ (ω) after directional noise;

后置维纳滤波过程：对R₁(ω)中的信号能量和噪声能量进行估计得到通道信噪比并计算增益函数，从而消除信号R₁(ω)中的残余噪声。Post-Wiener filtering process: Estimate the signal energy and noise energy in R ₁ (ω) to obtain the channel SNR and calculate the gain function, thereby eliminating the residual noise in the signal R ₁ (ω).

进一步的，所述预处理过程为：Further, the preprocessing process is:

将带噪信号x₁(t)和x₂(t)进行离散采样，再对语音的高频部分进行预加重处理；The noisy signals x ₁ (t) and x ₂ (t) are discretely sampled, and then the high-frequency part of the speech is pre-emphasized;

将采样信号x₁(n)和x₂(n)分为长度为10ms的帧后加等长的汉明窗w(n)，加窗后的信号通入缓存区留待处理，经过短时傅里叶变换得到当前帧频域信号，根据实数序列傅里叶变换的共轭对称性，输出前1/2个频点的信号进行波束形成处理。Divide the sampled signals x ₁ (n) and x ₂ (n) into frames with a length of 10ms and then add an equal-length Hamming window w(n), and the windowed signals pass into the buffer area for processing. The Lie transform obtains the current frame frequency domain signal, and according to the conjugate symmetry of the Fourier transform of the real number sequence, the signal of the first 1/2 frequency points is output for beamforming processing.

进一步的，所述波束形成过程包括幅度对齐、计算功率谱、计算方向性函数值、计算阈值和归一化映射；Further, the beamforming process includes amplitude alignment, calculation of power spectrum, calculation of directivity function value, calculation of threshold value and normalization mapping;

幅度对齐方式为对频域信号X₁(ω)和X₂(ω)分别乘以比例因子进行幅度对齐；The amplitude alignment method is to multiply the frequency domain signals X ₁ (ω) and X ₂ (ω) by the scale factor respectively to perform amplitude alignment;

计算功率谱时：假设期望波束为S(ω)，其方向预置为α，在双路麦克风中点处引入虚拟麦克风接收期望波束S(ω)，根据中心差分格式和频域信号X₁(ω)、X₂(ω)与期望波束S(ω)的空间关系构造差分信号Y₁(ω)和Y₂(ω)，计算差分信号Y₁(ω)和Y₂(ω)的功率谱；When calculating the power spectrum: Assuming that the desired beam is S(ω) and its direction is preset to α, introduce a virtual microphone at the midpoint of the two-way microphone to receive the desired beam S(ω), according to the central difference format and the frequency domain signal X ₁ ( The spatial relationship between ω), X ₂ (ω) and the desired beam S(ω) constructs the differential signals Y ₁ (ω) and Y ₂ (ω), and calculates the power spectrum of the differential signals Y ₁ (ω) and Y ₂ (ω) ;

计算方向性函数值时：其中差分信号Y₁(ω)和Y₂(ω)功率谱的统计平均值的比值为方向性函数Γ(ω,θ)的值，讨论Γ(ω,θ)的性质，当声源实际入射方向θ等于给定的期望声源入射方向α时，Γ(ω,θ)趋于无穷，并且在θ＝α轴的两边，函数值单调且近似对称；When calculating the value of the directional function: the ratio of the statistical average of the power spectrum of the differential signal Y ₁ (ω) and Y ₂ (ω) is the value of the directional function Γ(ω, θ). Properties, when the actual incident direction θ of the sound source is equal to the given desired sound source incident direction α, Γ(ω, θ) tends to infinity, and on both sides of the θ=α axis, the function value is monotonic and approximately symmetrical;

计算阈值及归一化映射时：由于Γ(ω,θ)趋于无穷大，根据预先设定的主瓣宽度Θ计算一个阈值Ω，通过sigmoid函数的归一化映射，直接将Γ(ω,θ)映射成对应频点的噪声掩蔽值λ(ω)，将X₁(ω)与λ(ω)相乘得到消除掉竞争方向噪声后的信号R₁(ω)。When calculating the threshold and normalized mapping: Since Γ(ω, θ) tends to infinity, a threshold Ω is calculated according to the preset main lobe width Θ, and the Γ(ω, θ is directly converted by the normalized mapping of the sigmoid function. ) is mapped to the noise masking value λ(ω) of the corresponding frequency point, and X ₁ (ω) is multiplied by λ(ω) to obtain the signal R ₁ (ω) after eliminating the noise in the competing direction.

进一步的，所述后置维纳滤波过程包括计算信噪比指数、计算对数谱偏差、修改或重置噪声标志和计算增益函数值；Further, the post-Wiener filtering process includes calculating the signal-to-noise ratio index, calculating the logarithmic spectral deviation, modifying or resetting the noise flag and calculating the gain function value;

计算信噪比指数时，将信号R₁(ω)根据临界带宽准则分为若干个通道，估计每个通道的能量，将通道噪声能量估计初始化为前四帧的通道能量，据此计算通道信噪比指数；When calculating the signal-to-noise ratio index, the signal R ₁ (ω) is divided into several channels according to the critical bandwidth criterion, the energy of each channel is estimated, the channel noise energy estimation is initialized as the channel energy of the first four frames, and the channel signal is calculated accordingly. noise ratio index;

计算对数谱偏差时，设计非线性数据表作为语音指标表，将信噪比指数映射成度量语音质量的一组数字，将一定频率范围内的语音指标总和作为当前通道的语音质量评估结果，对当前通道的信号能量取对数，并计算长时对数谱能量与短时对数谱能量的偏差；When calculating the logarithmic spectral deviation, a non-linear data table is designed as a speech index table, the signal-to-noise ratio index is mapped to a set of numbers to measure the speech quality, and the sum of the speech indexes in a certain frequency range is used as the speech quality evaluation result of the current channel. Take the logarithm of the signal energy of the current channel, and calculate the deviation of the long-term logarithmic spectral energy and the short-term logarithmic spectral energy;

修改或重置噪声标志，根据计算所得的语音指标总和、信噪比指数、对数谱偏差参数信息，判断当前帧是语音帧还是噪声帧，重置噪声更新标志，检查前几帧的更新标志，如果噪声长时间得不到更新，认为结果已不可靠，则强制更新信噪比指数；Modify or reset the noise flag, judge whether the current frame is a speech frame or a noise frame, reset the noise update flag, and check the update flags of the previous frames according to the calculated sum of speech indicators, signal-to-noise ratio index, and logarithmic spectral deviation parameter information , if the noise cannot be updated for a long time and the result is considered unreliable, the SNR index will be forced to update;

计算增益函数值时，利用通道信噪比指数计算通道增益值去除残余的背景噪声，根据噪声更新标志的结果，更新下一帧的噪声能量估计。When calculating the gain function value, use the channel signal-to-noise ratio index to calculate the channel gain value to remove the residual background noise, and update the noise energy estimate of the next frame according to the result of the noise update flag.

由于采用了上述技术方案，本发明提供的一种期望声源方向可调的双麦克风降噪方法，该方法在对双麦克风信号进行预处理后，首先计算构造的差分信号功率谱统计平均值的比值作为方向性函数的值，然后通过归一化函数的映射得到噪声的掩蔽值。同时，在下一步安装维纳滤波器，通过估计信噪比指数及计算增益函数来减少残余噪声；本发明提出的算法简单高效，不同的噪声场景中受非目标声音干扰的语音被增强后，其信噪比和质量均有显著提高。Due to the adoption of the above technical solution, the present invention provides a dual-microphone noise reduction method with adjustable desired sound source direction. After preprocessing the dual-microphone signals, the method first calculates the statistical average value of the power spectrum of the constructed differential signal. The ratio is used as the value of the directional function, and then the masking value of the noise is obtained by the mapping of the normalized function. At the same time, a Wiener filter is installed in the next step to reduce residual noise by estimating the signal-to-noise ratio index and calculating the gain function; the algorithm proposed by the present invention is simple and efficient. The signal-to-noise ratio and quality are significantly improved.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请中记载的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments described in this application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1为本发明方法的整体示意图；Fig. 1 is the overall schematic diagram of the method of the present invention;

图2为本发明中预处理过程的示意图；Fig. 2 is the schematic diagram of pretreatment process in the present invention;

图3为本发明中波束形成过程的示意图；3 is a schematic diagram of a beamforming process in the present invention;

图4为本发明中声源传播示意图；4 is a schematic diagram of sound source propagation in the present invention;

图5为本发明中方向性函数示意图；5 is a schematic diagram of the directional function in the present invention;

图6为本发明中后置维纳滤波过程的示意图；6 is a schematic diagram of a post-Wiener filtering process in the present invention;

图7为单噪声源不同信噪比时本发明与其他降噪方法PESQ对比结果图；7 is a graph showing the comparison results of the present invention and other noise reduction methods PESQ when a single noise source has different signal-to-noise ratios;

图8为多噪声源不同信噪比时本发明与其他降噪方法PESQ对比结果图；8 is a graph showing the comparison results of the present invention and other noise reduction methods PESQ when there are multiple noise sources with different signal-to-noise ratios;

图9为单噪声源不同信噪比时本发明与其他降噪方法SegSNR对比结果图；FIG. 9 is a graph showing the comparison result of SegSNR between the present invention and other noise reduction methods when a single noise source has different signal-to-noise ratios;

图10为多噪声源不同信噪比时本发明与其他降噪方法SegSNR对比结果图；FIG. 10 is a graph showing the comparison result of SegSNR between the present invention and other noise reduction methods when there are multiple noise sources and different signal-to-noise ratios;

图11为本发明中期望声源方向不同时结果示意图。FIG. 11 is a schematic diagram of the result when the desired sound source directions are different in the present invention.

具体实施方式Detailed ways

为使本发明的技术方案和优点更加清楚，下面结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚完整的描述：In order to make the technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely below with reference to the accompanying drawings in the embodiments of the present invention:

如图1所示的一种期望声源方向可调的双麦克风降噪方法，在实施过程中包括：预处理过程、波束形成过程和后置维纳滤波过程。本发明公开的方法具体步骤如下：As shown in FIG. 1 , a dual-microphone noise reduction method with adjustable sound source directions is included in the implementation process: a preprocessing process, a beamforming process, and a post-Wiener filtering process. The specific steps of the method disclosed in the present invention are as follows:

S1：预处理过程如图2，对双麦克风接收的带噪信号x₁(t)和x₂(t)进行离散采样、预加重、分帧及加窗后，通过短时傅里叶变换得到频域信号X₁(ω)和X₂(ω)，具体采用如下方式：S1: The preprocessing process is shown in Figure 2. After discrete sampling, pre-emphasis, framing and windowing of the noisy signals x ₁ (t) and x ₂ (t) received by the dual microphones, they are obtained by short-time Fourier transform. The frequency domain signals X ₁ (ω) and X ₂ (ω) are specifically adopted as follows:

S11：对带噪的连续信号x₁(t)和x₂(t)先进行离散采样，采样频率为16kHz。通过一阶FIR高通数字滤波器来实现对语音高频部分的预加重，其中EMP_FAC为预加重系数。设n时刻的语音信号采样值为x(n)，经过预加重处理后的结果为：S11: First perform discrete sampling on the noisy continuous signals x ₁ (t) and x ₂ (t), and the sampling frequency is 16 kHz. The pre-emphasis on the high-frequency part of the speech is realized by a first-order FIR high-pass digital filter, where EMP_FAC is the pre-emphasis coefficient. Assuming that the sample value of the speech signal at time n is x(n), the result after pre-emphasis processing is:

z(n)＝x(n)-EMP_FAC*x(n-1) (1)z(n)=x(n)-EMP_FAC*x(n-1) (1)

取EMP_FAC＝0.8，在经过降噪过程后，利用短时傅里叶变换合成时域信号，还需要对其进行去加重操作，还原高频部分。Taking EMP_FAC=0.8, after the noise reduction process, the short-time Fourier transform is used to synthesize the time-domain signal, and a de-emphasis operation is needed to restore the high-frequency part.

S12：将采样信号x₁(n)和x₂(n)分帧，帧长为10ms，对分帧后的信号加等长的汉明窗，窗函数公式如式(2)，加窗后的信号通入缓存区，缓存区长度是FFT点数的5倍。S12: Divide the sampled signals x ₁ (n) and x ₂ (n) into frames, the frame length is 10ms, and add a Hamming window of equal length to the framed signals. The window function formula is as formula (2), after adding the window The signal passes into the buffer area, and the length of the buffer area is 5 times the number of FFT points.

S13：经过快速傅里叶变换得到当前帧频域信号，根据实数序列快速傅里叶变换的共轭对称性，输出前1/2个频点的信号进行后续的算法处理。S13: Obtain the current frame frequency domain signal through fast Fourier transform, and output the signal of the first 1/2 frequency points for subsequent algorithm processing according to the conjugate symmetry of the fast Fourier transform of the real number sequence.

S2：波束形成过程如图3，在双麦克风连线的中点处引入一个虚拟麦克风，将得到的频域信号X₁(ω)和X₂(ω)根据中心差分格式进行差分变换，构造差分信号Y₁(ω)和Y₂(ω)，计算Y₁(ω)和Y₂(ω)的功率谱的统计平均值，并将统计平均值的比值记为方向性函数Γ(ω,θ)。分析Γ(ω,θ)的性质，通过归一化函数可将其直接映射为噪声掩蔽值，具体采用如下方式：S2: The beamforming process is shown in Figure 3. A virtual microphone is introduced at the midpoint of the two-microphone connection, and the obtained frequency domain signals X ₁ (ω) and X ₂ (ω) are differentially transformed according to the central difference format to construct a differential Signals Y ₁ (ω) and Y ₂ (ω), calculate the statistical mean of the power spectra of Y ₁ (ω) and Y ₂ (ω), and record the ratio of the statistical mean as the directional function Γ(ω,θ ). Analyzing the properties of Γ(ω,θ), it can be directly mapped to the noise masking value through the normalization function. The specific method is as follows:

S21：幅度对齐，尽管将声场假设为远场，但双麦克风的接收信号在幅度上还是有细微的差异。为了进一步符合假设，先对两路频域信号X₁(ω)和X₂(ω)分别乘以比例因子进行幅度对齐。S21: Amplitude alignment, although the sound field is assumed to be far field, the received signals of the dual microphones have subtle differences in amplitude. In order to further meet the assumption, firstly, the two frequency domain signals X ₁ (ω) and X ₂ (ω) are multiplied by scale factors to align their amplitudes.

S22：假设期望波束为S(ω)，其方向预置为α，在双路麦克风中点处引入一个虚拟麦克风接收此信号，声源传播示意图如图4。X₁(ω)、X₂(ω)与S(ω)的空间关系为：S22: Assuming that the desired beam is S(ω), its direction is preset to α, and a virtual microphone is introduced at the midpoint of the two-way microphone to receive the signal. The schematic diagram of sound source propagation is shown in Figure 4. The spatial relationship between X ₁ (ω), X ₂ (ω) and S(ω) is:

其中，d为麦克风间距，v为声速，θ为实际的声源入射方向；根据中心差分格式和X₁(ω)、X₂(ω)与S(ω)的空间关系，构造差分信号Y₁(ω)和Y₂(ω)，并计算Y₁(ω)和Y₂(ω)的功率谱。Among them, d is the distance between the microphones, v is the speed of sound, and θ is the actual incident direction of the sound source; according to the central difference format and the spatial relationship between X ₁ (ω), X ₂ (ω) and S (ω), the differential signal Y ₁ is constructed (ω) and Y ₂ (ω), and calculate the power spectrum of Y ₁ (ω) and Y ₂ (ω).

S23：Y₁(ω)和Y₂(ω)功率谱的统计平均值的比值即为方向性函数Γ(ω,θ)的值。S23: The ratio of the statistical mean values of the power spectra of Y ₁ (ω) and Y ₂ (ω) is the value of the directional function Γ(ω, θ).

Γ(ω,θ)的图像如图5，发现当声源实际入射方向θ等于给定的期望声源入射方向α时，Γ(ω,θ)趋于无穷；并且在θ＝α轴的两边，函数值单调且近似对称；The image of Γ(ω, θ) is shown in Figure 5. It is found that when the actual incident direction θ of the sound source is equal to the given desired sound source incident direction α, Γ(ω, θ) tends to infinity; and on both sides of the θ=α axis , the function value is monotonic and approximately symmetrical;

S24：由于当θ＝α时，Γ(ω,θ)趋于的无穷大在实际计算时无法达到，需要根据预先设定的主瓣宽度Θ计算一个阈值Ω。S24: Since when θ=α, the infinity that Γ(ω, θ) tends to cannot be reached in actual calculation, a threshold Ω needs to be calculated according to the preset main lobe width Θ.

由于sigmoid函数是一条值域为(0,1)的S型函数，通过一定的形变，可直接将Γ(ω,θ)归一化映射成对应频点的噪声掩蔽值λ(ω)。Since the sigmoid function is a sigmoid function with a value range of (0, 1), through a certain deformation, Γ(ω, θ) can be directly normalized and mapped to the noise mask value λ(ω) of the corresponding frequency point.

S25：掩蔽掉竞争方向的噪声后的声音为：S25: The sound after masking the noise of the competing direction is:

R₁(ω)＝λ(ω)X₁(ω) (8)R ₁ (ω)=λ(ω)X ₁ (ω) (8)

S3：后置维纳滤波过程如图6，通过对R₁(ω)中信号能量和噪声能量的估计，得到通道信噪比并计算增益函数，进一步消除R₁(ω)中的残余噪声，具体采用如下方式：S3: The post-Wiener filtering process is shown in Figure 6. By estimating the signal energy and noise energy in R ₁ (ω), the channel SNR is obtained and the gain function is calculated to further eliminate the residual noise in R ₁ (ω), Specifically, the following methods are used:

S31：将R₁(ω)根据临界带宽准则分为NUM_CHAN个通道。由于语音能量主要集中在0.3～3.4kHz，低频时，对应通道比较窄；高频时，对应通道比较宽。针对每个通道，估计其能量，其中，β为平滑因子，M为当前通道中频点的个数，m表示当前帧通道的标号，i为当前帧的标号，k为当前通道中频点的标号。S31: Divide R ₁ (ω) into NUM_CHAN channels according to the critical bandwidth criterion. Since the speech energy is mainly concentrated in 0.3-3.4kHz, the corresponding channel is relatively narrow at low frequency; the corresponding channel is relatively wide at high frequency. For each channel, estimate its energy, where β is the smoothing factor, M is the number of intermediate frequency points in the current channel, m is the label of the current frame channel, i is the label of the current frame, and k is the current channel. The label of the intermediate frequency point.

S32：将通道噪声估计初始化为前四帧的通道能量，信噪比可以由(10)计算：S32: Initialize the channel noise estimate to the channel energy of the first four frames, and the signal-to-noise ratio can be calculated by (10):

S33：设计一个非线性数据表作为语音指标表，将信噪比指数(量化的信噪比值)映射到度量语音质量的一组数字。信噪比较高时，认为语音质量较高，计算0.3～3.4kHz频率范围内的语音指标总和。S33: Design a nonlinear data table as a speech index table, and map the SNR index (quantized SNR value) to a set of numbers that measure speech quality. When the signal-to-noise ratio is high, the voice quality is considered to be high, and the sum of the voice indicators in the frequency range of 0.3 to 3.4 kHz is calculated.

S34：计算前HI_CHAN个通道的总噪声能量估计(tne)和总能量估计(tce)，即0.3～3.4kHz频率范围内的噪声能量总和和通道能量总和。S34: Calculate the total noise energy estimation (tne) and the total energy estimation (tce) of the first HI_CHAN channels, that is, the total noise energy and the channel energy in the frequency range of 0.3-3.4 kHz.

S35：计算当前通道能量的对数谱，并将长时对数谱能量和短时对数谱能量的偏差记为ch_enrg_dev。S35: Calculate the log spectrum of the current channel energy, and record the deviation between the long-term log spectrum energy and the short-term log spectrum energy as ch_enrg_dev.

ch_enrg_db(i,m)＝10lg(ch_enrg(i,m)) (13)ch_enrg_db(i,m)=10lg(ch_enrg(i,m)) (13)

S36：计算长时积分常数alpha，alpha是关于总通道能量(tce)的函数，即高tce(-40dB)时，缓慢平滑(alpha＝0.99)；低tce(-60dB)时，快速平滑(alpha＝0.50)。S36: Calculate the long-term integral constant alpha, alpha is a function of the total channel energy (tce), that is, when the tce is high (-40dB), it is smoothed slowly (alpha=0.99); when the tce is low (-60dB), it is smoothed quickly (alpha = 0.50).

S37：计算并更新长时对数谱能量。S37: Calculate and update the long-term logarithmic spectral energy.

S38：根据计算所得的语音指标总和、信噪比、对数谱偏差等参数，通过比较，重置噪声更新标志Update_flag。“Update_flag＝TRUE”表示当前帧是噪声帧，“Update_flag＝FALSE”表示当前帧是语音帧。之后还需要检查前几帧的噪声更新标志，如果噪声长时间得不到更新，认为现结果不可靠，需要强制更新信噪比指数。S38 : Reset the noise update flag Update_flag through comparison according to the calculated speech index sum, signal-to-noise ratio, logarithmic spectral deviation and other parameters. "Update_flag=TRUE" indicates that the current frame is a noise frame, and "Update_flag=FALSE" indicates that the current frame is a speech frame. After that, it is necessary to check the noise update flags of the previous frames. If the noise cannot be updated for a long time, the results are considered unreliable, and the SNR index needs to be updated forcibly.

S39：利用得到的通道信噪比指数计算通道增益ftmp2。S39: Calculate the channel gain ftmp2 by using the obtained channel SNR index.

如果噪声更新标志Update_flag＝TRUE，即认为当前帧判定为噪声帧，此时需要更新噪声的能量估计。If the noise update flag Update_flag=TRUE, it is considered that the current frame is determined as a noise frame, and the energy estimation of the noise needs to be updated at this time.

为了验证本发明的有效性，进行了若干测试。需要说明，为了验证方法适用于多种类型的声音，用于评估的语音数据来源于TIMIT数据库，噪声包括Babble噪声和竞争方向语音。本发明中的实验结果均是处理10段语音数据平均得到的结果。To verify the effectiveness of the present invention, several tests were performed. It should be noted that in order to verify that the method is suitable for multiple types of sounds, the speech data used for evaluation comes from the TIMIT database, and the noise includes Babble noise and competing direction speech. The experimental results in the present invention are all averaged results obtained by processing 10 segments of speech data.

将本发明与Coherence和SNR-coherence两种方法进行了比较，首先设置α＝0°。图7及图8为不同方法在加入各种噪声(包括竞争语音和Babble噪声)后的PESQ得分。显然，本发明优于Coherence方法，与SNR-coherence相差不多。一般情况下，本发明的PESQ结果比未处理信号至少高0.5，且在多个噪声源条件下仍能保持该效果。The present invention is compared with two methods, Coherence and SNR-coherence, first setting α=0°. Figures 7 and 8 show the PESQ scores of different methods after adding various noises (including competing speech and Babble noise). Obviously, the present invention is superior to the Coherence method, and comparable to SNR-coherence. In general, the PESQ results of the present invention are at least 0.5 higher than the unprocessed signal, and this effect is maintained under conditions of multiple noise sources.

图9和图10显示了当干扰为非目标语音和Babble噪声时，在低信噪比(-5dB和0dB)的情况下，本发明的SegSNR值比未处理时至少提高了5dB。本发明的SegSNR结果既高于SNR-Coherence方法，又几乎等于Coherence方法。此外，在有多个噪声源的情况下，本发明保持了最优效果。Figures 9 and 10 show that when the interference is non-target speech and Babble noise, in the case of low SNR (-5dB and 0dB), the SegSNR value of the present invention is improved by at least 5dB compared to the unprocessed one. The SegSNR results of the present invention are both higher than the SNR-Coherence method and almost equal to the Coherence method. Furthermore, the present invention maintains optimum performance in the presence of multiple noise sources.

同时，将期望方向设置在其他角度时的评价结果如图11所示。可以看出，与处理前的声音相比，本发明仍能保持良好的噪声抑制能力。Meanwhile, the evaluation results when the desired direction is set at other angles are shown in FIG. 11 . It can be seen that, compared with the sound before processing, the present invention can still maintain a good noise suppression capability.

以上比较性测试均显示了本发明良好的降噪性能和不错的工作稳定性。The above comparative tests all show the good noise reduction performance and good working stability of the present invention.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，根据本发明的技术方案及其发明构思加以等同替换或改变，都应涵盖在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention, but the protection scope of the present invention is not limited to this. The equivalent replacement or change of the inventive concept thereof shall be included within the protection scope of the present invention.

Claims

1. A method for reducing noise of a dual microphone with adjustable expected sound source direction is characterized by comprising the following steps:

the pretreatment process comprises the following steps: for noisy signal x received by dual microphones₁(t) and x₂(t) discrete sampling, pre-emphasis, framing and windowing, and performing short-time Fourier transform to obtain frequency domain signal X₁(omega) and X₂(ω)；

And (3) beam forming process: introducing a virtual microphone at the midpoint of the two-microphone line, and applying the frequency domain signal X according to the central difference format₁(omega) and X₂(omega) performing differential conversion to construct a differential signal Y₁(omega) and Y₂(ω) calculating a difference signal Y₁(omega) and Y₂(ω) a statistical average of the power spectrum, and the ratio of the statistical average is recorded as a directivity function Γ (ω, θ), the properties of the directivity function Γ (ω, θ) are analyzed, it is directly mapped to a noise masking value λ (ω) by a normalization function, and the frequency domain signal X is converted into a frequency domain signal X₁Multiplying (omega) by the noise masking value lambda (omega) to obtain a signal R with the competing direction noise eliminated₁(ω)；

Post wiener filtering process: to R₁Estimating signal energy and noise energy in (omega) to obtain channel signal-to-noise ratio and calculating gain function, thereby eliminating signal R₁Residual noise in (ω).

2. The method of claim 1, wherein: the pretreatment process comprises the following steps:

will carry the noise signal x₁(t) and x₂(t) discrete sampling is carried out, and then pre-emphasis processing is carried out on the high-frequency part of the voice;

sampling signal x₁(n) and x₂(n) dividing the signals into frames with the length of 10ms, adding equal-length Hamming windows w (n), introducing the windowed signals into a buffer area for processing, obtaining the frequency domain signals of the current frame through short-time Fourier transform, and outputting the signals of the first 1/2 frequency points for beam forming processing according to the conjugate symmetry of real number sequence Fourier transform.

3. The method of claim 1, wherein: the beam forming process comprises amplitude alignment, power spectrum calculation, directivity function value calculation, threshold calculation and normalization mapping;

the amplitude alignment mode is to the frequency domain signal X₁(omega) and X₂(ω) multiplying by a scaling factor respectively for amplitude alignment;

when calculating the power spectrum: assuming that the desired beam is S (co),the direction of the two-way microphone is preset to be alpha, a virtual microphone is introduced at the midpoint of the two-way microphone to receive a desired beam S (omega), and a central difference format and a frequency domain signal X are used₁(ω)、X₂The spatial relationship of (ω) to the desired beam S (ω) constructs a differential signal Y₁(omega) and Y₂(ω) calculating a difference signal Y₁(omega) and Y₂(ω) a power spectrum;

when calculating the directional function value: wherein the differential signal Y₁(omega) and Y₂(ω) the ratio of the statistical average of the power spectra is the value of the directivity function Γ (ω, θ), which tends to infinity when the actual sound source incidence direction θ is equal to the given desired sound source incidence direction α, and which functions monotonically and approximately symmetrically on both sides of the α -axis, discussing the nature of Γ (ω, θ);

when calculating the threshold and normalizing the mapping: as gamma (omega, theta) tends to be infinite, a threshold omega is calculated according to a preset main lobe width theta, the gamma (omega, theta) is directly mapped into a noise masking value lambda (omega) of a corresponding frequency point through normalized mapping of a sigmoid function, and X is used for mapping X to the noise masking value lambda (omega) of the corresponding frequency point₁Multiplying (omega) by lambda (omega) to obtain a signal R with the competing direction noise eliminated₁(ω)

4. The method of claim 1, wherein: the post-wiener filtering process comprises the steps of calculating a signal-to-noise ratio index, calculating a logarithmic spectrum deviation, modifying or resetting a noise mark and calculating a gain function value;

when calculating the SNR index, it willSignal R₁(omega) dividing the channel into a plurality of channels according to a critical bandwidth criterion, estimating the energy of each channel, initializing the channel noise energy estimation into the channel energy of the first four frames, and calculating a channel signal-to-noise ratio index according to the channel noise energy estimation;

when calculating the logarithmic spectrum deviation, designing a nonlinear data table as a voice index table, mapping the signal-to-noise ratio index into a group of numbers for measuring the voice quality, taking the sum of the voice indexes in a certain frequency range as the voice quality evaluation result of the current channel, taking the logarithm of the signal energy of the current channel, and calculating the deviation of the long-time logarithmic spectrum energy and the short-time logarithmic spectrum energy;

modifying or resetting the noise mark, judging whether the current frame is a voice frame or a noise frame according to the calculated voice index sum, the signal-to-noise ratio index and the log spectrum deviation parameter information, resetting the noise updating mark, checking the updating marks of the previous frames, and if the noise cannot be updated for a long time and the result is unreliable, forcibly updating the signal-to-noise ratio index;

when the gain function value is calculated, the channel signal-to-noise ratio index is used for calculating the channel gain value to remove residual background noise, and the noise energy estimation of the next frame is updated according to the result of the noise updating mark.