CN104091593A - Speech endpoint detection algorithm adopting perceptual speech spectrum structure boundary parameters - Google Patents

Speech endpoint detection algorithm adopting perceptual speech spectrum structure boundary parameters Download PDF

Info

Publication number
CN104091593A
CN104091593A CN201410175090.8A CN201410175090A CN104091593A CN 104091593 A CN104091593 A CN 104091593A CN 201410175090 A CN201410175090 A CN 201410175090A CN 104091593 A CN104091593 A CN 104091593A
Authority
CN
China
Prior art keywords
speech
noise
spectrum
algorithm
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410175090.8A
Other languages
Chinese (zh)
Other versions
CN104091593B (en
Inventor
吴迪
赵鹤鸣
陶智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Cheng Bang Energy Conservation Science & Technology Co Ltd
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201410175090.8A priority Critical patent/CN104091593B/en
Publication of CN104091593A publication Critical patent/CN104091593A/en
Application granted granted Critical
Publication of CN104091593B publication Critical patent/CN104091593B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本发明属于语音识别领域,公开了一种采用感知语谱结构边界参数(PSSB)的语音端点检测算法。在对含噪语音进行基于听觉感知特性的语音增强之后,针对语音信号的连续分布特性与残留噪声的随机分布特性之间的不同点,对增强后语音的时-频语谱进行二维增强,从而进一步突出连续分布的纯净语音的语谱结构。通过对增强后语音语谱结构的二维边界检测,提出PSSB参数,并用于端点检测。实验结果表明,在白噪声-10dB到10dB的各种信噪比环境下,采用PSSB参数的端点检测算法更有效地检测出语音的端点。在-10dB的极低信噪比下,提出的方法仍然有75.2%的正确率。

The invention belongs to the field of speech recognition and discloses a speech endpoint detection algorithm using perceptual spectrum structure boundary parameters (PSSB). After the speech enhancement based on the auditory perception characteristics of the noisy speech, according to the difference between the continuous distribution characteristics of the speech signal and the random distribution characteristics of the residual noise, two-dimensional enhancement is performed on the time-frequency spectrum of the enhanced speech, Thus further highlighting the spectral structure of continuous distribution of pure speech. Through the two-dimensional boundary detection of the enhanced speech spectrum structure, the PSSB parameters are proposed and used for endpoint detection. The experimental results show that the endpoint detection algorithm using PSSB parameters can detect the endpoint of the speech more effectively in various signal-to-noise ratio environments from -10dB to 10dB of white noise. Under the extremely low SNR of -10dB, the proposed method still has a correct rate of 75.2%.

Description

采用感知语谱结构边界参数的语音端点检测算法Speech Endpoint Detection Algorithm Using Perceptual Spectral Structure Boundary Parameters

技术领域 technical field

本发明属于语音识别领域,涉及一种语音端点检测算法,尤其涉及一种采用感知语谱结构边界参数的语音端点检测算法。  The invention belongs to the field of speech recognition and relates to a speech endpoint detection algorithm, in particular to a speech endpoint detection algorithm using perceptual language spectrum structure boundary parameters. the

背景技术 Background technique

作为语音识别和说话人识别的基础,正确有效的端点检测,可以大大提高说话人识别系统和语音识别系统的识别率。在实验室高信噪比环境下,传统的端点检测算法可以很好地检测出语音端点。然而在低信噪比环境下,大多数端点检测算法的性能均急剧下降。  As the basis of speech recognition and speaker recognition, correct and effective endpoint detection can greatly improve the recognition rate of speaker recognition system and speech recognition system. In the environment of high signal-to-noise ratio in the laboratory, the traditional endpoint detection algorithm can detect the voice endpoint very well. However, under low SNR environment, the performance of most endpoint detection algorithms drops sharply. the

近年来,很多学者对噪声鲁棒的端点检测进行了研究。Ganapathiraju(A. Ganapathiraju, et al. Comparison of Energy-Based Endpoint Detectors for Speech Signal Processing . In Proc. lEEE Publications, 1996; 500-503)等人采用短时能量和短时过零率相结合的方法(Energy and Zero-Crossing Rate,EZCR)进行端点检测的研究。这种方法相对于传统的能量方法,端点检测具有更好的鲁棒性。然而这种方法无法在更低信噪比的环境下发挥作用。陈振标等人(陈振标, 徐波。基于子带能量特征的最优化语音端点检测算法研究。声学学报, 2005;30(2):171-176)根据语音的频域能量分布特点,研究了子带幅度[Sub-Band Amplitude,SBA] 及能量,并采用更具区分性和抗噪性的多个子带能量和图像处理中常用的最优化边缘检测相结合的检测算法来进行端点检测,使得端点检测在复杂噪声环境下的性能有明显改善。此外,Zhang等人(Xueying Zhang ,et al. A Speech Endpoint Detection Method Based on Wavelet Coefficient Variance and Sub-Band Amplitude Variance. . In Proc. lEEE ICICIC, 2006; 105-109)提出了一种利用小波系数(Wavelet Coefficient,WC)的方法,利用小波分析的方法进行端点检测,由于该方法能够在各尺度分析信号,所以能够在一定程度上区分出语音段和噪声段。Wu等人(Bing-Fei Wu, Kun-Ching Wang. Robust Endpoint Detection Algorithm Based on the Adaptive Band-Partitioning Spectral Entropy in Adverse Environments. IEEE Transactions on Speech and Audio Processing, 2005; 13(5):762-775)把自适应子带谱熵(Adaptive Band-Partitioning Spectral, ABSE)的方法用于端点检测。该方法可以很好的区分语音的子带信号与噪声,并在含有噪声的环境下取得了较好的端点检测正确率。Li(Q.Li, et al. A Robust real-time endpoint detector with energy normalization for ASR in adverse environments. International Conference on Acoustics Speech and Signal Processing, 2001; 574-577)借鉴图像处理中最优化边缘检测的方法用于语音的端点检测,采用一个滤波器加上三态决策逻辑进行端点检测,因此在不同信噪比的情况下不需要调整门限。该方法结合了图像处理的算法,对端点检测起到了很好的辅助作用。然而,以上这些方法在低信噪比环境下,都无法得到较高的端点检测正确率。  In recent years, many scholars have conducted research on noise-robust endpoint detection. Ganapathiraju (A. Ganapathiraju, et al. Comparison of Energy-Based Endpoint Detectors for Speech Signal Processing . In Proc. lEEE Publications, 1996; 500-503) et al. adopted a method combining short-term energy and short-term zero-crossing rate ( Energy and Zero-Crossing Rate, EZCR) for endpoint detection research. Compared with traditional energy methods, this method has better robustness for endpoint detection. However, this method cannot work in an environment with a lower signal-to-noise ratio. Chen Zhenbiao et al. (Chen Zhenbiao, Xu Bo. Research on Optimal Speech Endpoint Detection Algorithm Based on Subband Energy Features. Acoustica Sinica, 2005;30(2):171-176) studied subband Amplitude [Sub-Band Amplitude, SBA] and energy, and use a detection algorithm that combines more distinguishable and anti-noise multiple sub-band energies and optimized edge detection commonly used in image processing to perform endpoint detection, making endpoint detection Performance in complex noise environments has been significantly improved. In addition, Zhang et al. ( Xueying Zhang , et al. A Speech Endpoint Detection Method Based on Wavelet Coefficient Variance and Sub-Band Amplitude Variance. . In Proc. lEEE ICICIC, 2006; 105-109) proposed a method using wavelet coefficients ( Wavelet Coefficient, WC) method uses wavelet analysis method for endpoint detection. Since this method can analyze signals at various scales, it can distinguish speech segments and noise segments to a certain extent. Wu et al. (Bing-Fei Wu, Kun-Ching Wang. Robust Endpoint Detection Algorithm Based on the Adaptive Band-Partitioning Spectral Entropy in Adverse Environments. IEEE Transactions on Speech and Audio Processing, 2005; 13(5):562-7 The adaptive band-partitioning spectral entropy (Adaptive Band-Partitioning Spectral, ABSE) method is used for endpoint detection. This method can distinguish the speech sub-band signal and noise very well, and achieves a good accuracy rate of endpoint detection in the environment containing noise. Li (Q.Li, et al. A Robust real-time endpoint detector with energy normalization for ASR in adverse environments. International Conference on Acoustics Speech and Signal Processing, 2001; 574-577) draws on the method of optimizing edge detection in image processing It is used for endpoint detection of speech, and a filter plus three-state decision logic is used for endpoint detection, so there is no need to adjust the threshold in the case of different signal-to-noise ratios. This method combines the algorithm of image processing, which plays a very good auxiliary role in the detection of endpoints. However, none of the above methods can obtain a high accuracy rate of endpoint detection in a low signal-to-noise ratio environment. the

发明内容 Contents of the invention

要解决的技术问题:低信噪比环境下,常规的端点检测方法的端点检测正确率非常低的问题。  Technical problem to be solved: In the environment of low signal-to-noise ratio, the correct rate of endpoint detection of conventional endpoint detection methods is very low. the

技术方案:针对低信噪比下语音信号与噪声信号在时-频域二维空间的不同特征,并结合基于听觉感知特性的语音增强算法,提出感知语谱结构边界参数PSSB (Perception Spectrogram Structure Boundary),并将其用于端点检测。首先,对低信噪比语音进行基于听觉掩蔽特性的语音增强。与传统的语音增强算法相比,这种方法更有效地保留住人耳可感知的语音成分。在此基础之上,在二维层面中考虑纯净语音语谱在时间轴上的连续分布特性,对含噪语音进行二维增强,使语音的语谱结构更进一步突显出来,同时抑制了噪声的语谱结构。最后寻找出连续分布的纯净语音语谱结构的二维边界,并提出PSSB参数用于端点检测。  Technical solution: Aiming at the different characteristics of speech signals and noise signals in the two-dimensional space of time-frequency domain under low signal-to-noise ratio, combined with the speech enhancement algorithm based on auditory perception characteristics, the perceptual spectrum structure boundary parameter PSSB (Perception Spectrogram Structure Boundary) is proposed ), and use it for endpoint detection. Firstly, speech enhancement based on auditory masking characteristics is performed on speech with low signal-to-noise ratio. Compared with traditional speech enhancement algorithms, this method more effectively preserves the speech components perceivable by the human ear. On this basis, considering the continuous distribution characteristics of the pure speech spectrum on the time axis in the two-dimensional level, two-dimensional enhancement is performed on the noisy speech, so that the spectral structure of the speech is further highlighted, and the noise is suppressed at the same time. spectral structure. Finally, the two-dimensional boundary of the continuous distribution of the pure speech spectral structure is found, and the PSSB parameters are proposed for endpoint detection. the

1.基于听觉感知特性的语音增强  1. Speech enhancement based on auditory perception characteristics

低信噪比环境下,大多数端点检测算法无法很好地检测出语音端点,甚至完全失效。而人类却可以在噪音较强的环境中识别出语音段。在噪音环境下,人耳的听觉感知特性起到了重要的作用。采用人耳听觉感知特性中的听觉掩蔽特性,可以在一定程度上抑制噪声而更多的保留语音成分。本发明提出的PSSB参数,先采用基于听觉掩蔽特性的语音增强,在保护语音的基础上尽可能的抑制噪声。这种语音增强方法,最重要的是计算掩蔽阈值。掩蔽阈值的计算以及语音增强系统如下: In a low SNR environment, most endpoint detection algorithms cannot detect voice endpoints well, or even fail completely. Humans, on the other hand, can recognize speech segments in noisy environments. In a noisy environment, the auditory perception characteristics of the human ear play an important role. Using the auditory masking characteristic in the auditory perception characteristics of the human ear can suppress noise to a certain extent and retain more speech components. The PSSB parameters proposed by the present invention first adopt the speech enhancement based on the auditory masking characteristic, and suppress the noise as much as possible on the basis of protecting the speech. In this speech enhancement method, the most important thing is to calculate the masking threshold. The calculation of the masking threshold and the speech enhancement system are as follows:

(1)   Bark阈功率谱  (1) Bark threshold power spectrum

语音信号x(n)经过快速傅立叶变换(FFT)变成频域信号,信号功率谱为: The speech signal x(n) is converted into a frequency domain signal by fast Fourier transform (FFT) , the signal power spectrum is:

                                           (1) (1)

Bark功率谱为: The Bark power spectrum is:

B i = Σ k = b li b hi P ( k ) - - - ( 2 )                        其中表示第i段Bark频带的能量,  表示第i段最低的频率, 表示第i段最高的频率。 B i = Σ k = b li b hi P ( k ) - - - ( 2 ) in Indicates the energy of the i-th Bark band, Indicates the lowest frequency of segment i, Indicates the highest frequency of segment i.

(2)   扩散Bark域功率谱 (2) Diffused Bark domain power spectrum

引入扩散函数,它是一个矩阵,满足条件: Introduce the spread function , which is a matrix that satisfies the condition:

                                                     (3) (3)

定义式如下: The definition formula is as follows:

                    (4) (4)

表示两个频带的频带号之差。 Indicates the difference between the band numbers of the two bands.

CC ii == ΣΣ jj == 11 jj maxmax SS ijij ·&Center Dot; BB ii ,, ii == 1,21,2 .. .. .. ii maxmax -- -- -- (( 55 ))

(3) 掩蔽能量的偏移函数及掩蔽阈值的计算 (3) Offset function of masking energy and masking threshold calculation

                                            (6) (6)

T i = 10 log 10 ( C i ) - ( O i / 10 ) - - - ( 7 )                          取值在0和1之间,由语音含量决定。是第i段Bark频带的掩蔽阈值,将其改称为,其中b的含义与前面的i相同。 T i = 10 log 10 ( C i ) - ( o i / 10 ) - - - ( 7 ) The value is between 0 and 1, determined by the speech content. is the masking threshold of the i-th Bark band, which is renamed as , where b has the same meaning as the previous i.

和安静听阈的阈值:  and the threshold of the quiet hearing threshold:

      (8) (8)

相比较,取其最大值,作为最终拟合的掩蔽阈值。其中相应的Bark掩蔽曲线。 In comparison, the maximum value is taken as the masking threshold for the final fitting. in for The corresponding Bark masking curve.

(4)谱相减和减参数的调节  (4) Adjustment of spectrum subtraction and subtraction parameters

谱相减算法采用的增益函数如下: The gain function used by the spectral subtraction algorithm is as follows:

H ( k ) = ( 1 - &alpha; &CenterDot; [ | D ( k ) | | Y ( k ) | ] &gamma; ) 1 / &gamma; , [ | D ( k ) | | Y ( k ) | ] &gamma; < 1 &alpha; + &beta; ( &beta; &CenterDot; [ | D ( k ) | | Y ( k ) | ] &gamma; ) 1 / &gamma; , else - - - ( 9 )                        首先计算每一帧语音的不同Bark域的噪声掩蔽阈值,然后根据噪声掩蔽阈值得到自适应的减参数:若掩蔽阈值较高,残留噪声会很自然地被掩蔽而使人耳听不见,在这种情况下,减参数取它们的最小值;掩蔽阈值较低时,残留噪声对人耳的影响很大,有必要去减少它。对于每一帧m,掩蔽阈值的最小值与每帧的减参数的最大值有关。减参数的应用有如下关系式: h ( k ) = ( 1 - &alpha; &CenterDot; [ | D. ( k ) | | Y ( k ) | ] &gamma; ) 1 / &gamma; , [ | D. ( k ) | | Y ( k ) | ] &gamma; < 1 &alpha; + &beta; ( &beta; &Center Dot; [ | D. ( k ) | | Y ( k ) | ] &gamma; ) 1 / &gamma; , else - - - ( 9 ) First calculate the noise masking threshold of different Bark domains of each frame of speech, and then get the adaptive subtraction parameter according to the noise masking threshold , : If the masking threshold is high, the residual noise will be masked naturally to make it inaudible to the human ear. In this case, the subtraction parameters take their minimum value; when the masking threshold is low, the residual noise will have a great influence on the human ear. large, it is necessary to reduce it. For each frame m, the masking threshold The minimum value of and the subtraction parameter per frame and about the maximum value of . The application of the subtraction parameter has the following relation:

,  ,

                         (10) (10)

其中,分别为的最小值和最大值。分别是参数的最小值和最大值。当时,;当时,。式中和 分别是逐帧得到的掩蔽阈值的最小值和最大值。实验中,我们对各个参数的取值如下: in, and respectively minimum and maximum values of . , and , are the parameters , minimum and maximum values of . when hour, ;when hour, . In the formula and are the minimum and maximum masking thresholds obtained frame by frame, respectively. In the experiment, we set the values of each parameter as follows:

(5)实时噪声功率谱估计 (5) Real-time noise power spectrum estimation

语音增强需要实时性特别高的噪声谱估计方法。采用基于约束方差频谱平滑和最小值跟踪的噪声功率谱估计方法。该算法的核心是约束方差的平滑滤波器,它控制了短时平滑功率谱的方差,使得对最小值的跟踪更为准确。该方法估计的噪声谱能及时追踪噪声突变,不产生明显噪声谱延时,且精确度优于其它方法估计的噪声谱。 Speech enhancement requires noise spectrum estimation methods with particularly high real-time performance. Noise power spectrum estimation method based on constrained variance spectral smoothing and minimum value tracking is adopted. The core of the algorithm is the variance-constrained smoothing filter, which controls the variance of the short-term smooth power spectrum, making the tracking of the minimum more accurate. The noise spectrum estimated by this method can track the noise mutation in time without obvious noise spectrum delay, and its accuracy is better than that estimated by other methods.

(6)语音增强系统  (6) Speech Enhancement System

根据掩蔽阈值得到自适应的减参数、。语音增强系统如图1所示。 Get an adaptive subtraction parameter according to the masking threshold ,. The voice enhancement system is shown in Figure 1.

2 语音的二维增强  2 Two-dimensional enhancement of speech

低信噪比的语音经过语音增强之后,由于谱相减的作用,噪声和语音同时被衰减。然而,由于语音中浊音段含有能量较高的共振峰等结构,在二维时-频域中,语音语谱的低频区域即使在噪声干扰下,还是具有较高的信噪比。并且这些含有较高语音能量的结构在时间上通常是连续分布的。因此,只要我们在语音信号二维的语谱中,找到这些连续分布的高能量区域,并且由此找出相连的清音段,就可以得到语音的起始和终止端点。边界检测,在我们的方法中是个寻找连续分布二维数据结构的算法。 After the speech with low signal-to-noise ratio is enhanced, the noise and the speech are attenuated at the same time due to the effect of spectral subtraction. However, because voiced segments in speech contain structures such as formants with high energy, in the two-dimensional time-frequency domain, the low-frequency region of the speech spectrum has a high signal-to-noise ratio even under noise interference. And these structures containing higher speech energy are usually distributed continuously in time. Therefore, as long as we find these continuously distributed high-energy regions in the two-dimensional speech spectrum of the speech signal, and thus find out the connected unvoiced segments, we can get the start and end endpoints of the speech. Boundary detection, in our approach, is an algorithm that finds continuously distributed 2D data structures.

然而,不论低信噪比的语音信号是否经过语音增强,噪声(经过语音增强后为残留音乐噪声)都将在边界检测中,留下噪声语谱结构的边界。纯净语音的语谱结构将被噪声的语谱结构干扰混淆,这将对寻找纯净语音的语谱结构产生极大的干扰作用。如图2和图3所示。  However, no matter whether the speech signal with low SNR is speech enhanced or not, the noise (residual music noise after speech enhancement) will leave the boundary of the noise spectral structure in the boundary detection. The spectral structure of pure speech will be confused by the spectral structure of noise, which will have a great interference effect on finding the spectral structure of pure speech. As shown in Figure 2 and Figure 3. the

图2是含有-5dB白噪声的语音的语谱图。图中可以看到,连续分布的黑色横条纹是语音信号(在高频段,能量较低的语音信号已经被噪声掩蔽掉,从语谱图中已经看不到高频区域的共振峰结构),而黑色雪花状背景是白噪声。图3是经过语音增强后的语谱图,噪声经过语音增强之后,被大大地削弱,但是仍然存在残留的强弱不一的音乐噪声。本发明把这些残留噪声分为能量较强的残留噪声和能量较弱的残留噪声,如图3。这些噪声,都将极大地干扰求取语音的端点。因此,在求取语音端点之前,针对残留噪声的语谱结构和纯净语音的语谱结构之间的不同,本发明对语音进行二维增强,包括二维噪声腐蚀算法和二维语音膨胀算法。  Figure 2 is a spectrogram of a speech containing -5dB white noise. As can be seen in the figure, the continuous black horizontal stripes are speech signals (in the high-frequency band, the speech signal with lower energy has been masked by noise, and the formant structure in the high-frequency region cannot be seen from the spectrogram), And the black snowflake-like background is white noise. Figure 3 is a spectrogram after speech enhancement. After speech enhancement, the noise is greatly weakened, but there are still residual musical noises of different strengths. The present invention divides these residual noises into residual noises with stronger energy and residual noises with weaker energy, as shown in FIG. 3 . These noises will greatly interfere with the endpoints for obtaining voice. Therefore, before obtaining the speech endpoint, the present invention performs two-dimensional enhancement on the speech, including a two-dimensional noise erosion algorithm and a two-dimensional speech expansion algorithm, for the difference between the spectral structure of the residual noise and the pure speech. the

二维噪声腐蚀算法2D Noise Erosion Algorithm

在二维数据的增强处理算法中,腐蚀算法可以减弱或消除特定的二维结构。我们发现,在语音增强之后的语音语谱中,能量较弱的残留噪声(灰暗的雪花状结构),通常都是随机分布的,如图3所示。而且它们具有较小的尺寸和能量。这些结构虽然不如图3中的白噪声强,但仍然干扰求取纯净语音的语谱结构边界。本发明针对以上特点,提出二维噪声腐蚀算法,用于削弱这样的二维结构。 Among the enhancement processing algorithms for 2D data, corrosion algorithms can weaken or eliminate specific 2D structures. We found that in the speech spectrum after speech enhancement, residual noise with weak energy (dark snowflake-like structure) is usually randomly distributed, as shown in Figure 3. And they have smaller size and energy. Although these structures are not as strong as the white noise in Figure 3, they still interfere with the spectral structure boundary of pure speech. Aiming at the above characteristics, the present invention proposes a two-dimensional noise erosion algorithm for weakening such a two-dimensional structure.

对语音语谱的二维噪声腐蚀算法,由以下过程决定。首先,对语音进行短时傅立叶变换,每一帧的频谱由下式计算:  The two-dimensional noise erosion algorithm for the speech spectrum is determined by the following process. First, perform a short-time Fourier transform on the speech, and the spectrum of each frame Calculated by the following formula:

                               (11) (11)

是第m帧语音信号,是第m帧语音信号的频谱。N为帧的长度和短时傅立叶变换点数。是Hamming窗。每帧的语音信号功率谱可以表示为: is the speech signal of the mth frame, is the frequency spectrum of the speech signal of the mth frame. N is the length of the frame and the number of short-time Fourier transform points. It is a Hamming window. The speech signal power spectrum of each frame can be expressed as:

                                    (12) (12)

即定义为语音信号的语谱。 That is, it is defined as the spectrum of the speech signal.

的二维噪声腐蚀被定义为:  right The 2D noise erosion of is defined as:

                                             (13) (13)

其中是结构元素,的定义域,的定义域。平移参数必须在的定义域内,且必须在的定义域之内。对信号进行二维噪声腐蚀,作用是双重的:(1)如果所有元素都为正,则输出的信号趋向于比原始信号更弱;(2)输入的语谱信号中,噪声语谱结构如果和结构元素类似,则它将被削弱,削弱的程度取决于噪声的语谱结构形状以及结构元素的形状。 in is a structural element, yes domain of definition, yes domain of definition. translation parameter gotta be within the domain of definition, and gotta be within the domain of definition. The effect of two-dimensional noise erosion on the signal is two-fold: (1) if all elements are positive, the output signal tends to be weaker than the original signal; (2) in the input spectral signal, the noise spectral structure if Similar to structural elements, it will be attenuated, and the degree of attenuation depends on the shape of the spectral structure of the noise and the shape of the structural elements.

在语音的语谱结构中,腐蚀算法同时削弱噪声和语音。本发明提出的二维噪声腐蚀算法的目的,就是能够相对更多地削弱噪声,而更好地保留语音。针对能量较弱的残留噪声语谱的结构形态,二维噪声腐蚀算法的结构元素被定义为下式:  In the spectral structure of speech, erosion algorithms attenuate both noise and speech. The purpose of the two-dimensional noise erosion algorithm proposed by the present invention is to attenuate the noise relatively more and preserve the speech better. Structural elements of the two-dimensional noise erosion algorithm for the structural shape of the residual noise spectrum with weak energy is defined as:

                                                (14) (14)

这样的结构元素比较接近能量较弱的残留噪声的语谱结构(较小的点)。因此用结构元素对语谱进行二维噪声腐蚀,可以在一定程度上削弱这种噪声。 Such structural elements Compare the spectral structure (smaller dots) closer to the less energetic residual noise. So with the structuring element Carrying out two-dimensional noise erosion on the speech spectrum can weaken this noise to a certain extent.

二维语音膨胀算法Two-dimensional Speech Expansion Algorithm

语音经过二维噪声腐蚀算法,能量较弱的残留噪声被很好的抑制。然而,由于能量较强的残留噪声(如图3)和纯净语音之间,在能量上有近似性,如果过度地腐蚀,将会同时削弱纯净语音的二维结构。膨胀算法可以使和结构元素相似的二维语谱结构得到增强,不相似的二维语谱结构被相对削弱。因此,本发明针对能量较强的残留噪声和纯净语音结构之间的不同,提出二维语音膨胀算法。本发明把结构元素定义为与连续分布的纯净语音相似的结构。这样就可以相对的抑制这种噪声结构。 After the speech is passed through the two-dimensional noise erosion algorithm, the residual noise with weak energy is well suppressed. However, due to the similarity in energy between the residual noise with strong energy (as shown in Figure 3) and the pure speech, if the corrosion is excessive, the two-dimensional structure of the pure speech will be weakened at the same time. The expansion algorithm can enhance the two-dimensional spectral structure similar to the structural elements, and relatively weaken the dissimilar two-dimensional spectral structure. Therefore, the present invention proposes a two-dimensional speech expansion algorithm for the difference between residual noise with strong energy and pure speech structure. The present invention defines a structural element as a structure similar to a continuous distribution of pure speech. In this way, the noise structure can be relatively suppressed.

针对二维噪声腐蚀的结果,二维语音膨胀算法由下式定义:  Results for 2D noise erosion , two-dimensional speech expansion algorithm is defined by:

                      (15) (15)

其中是结构元素,的定义域,的定义域。从理论上讲,可以认为结构元素在语谱中的所有位置平移,结构元素的值与二维信号的值相加,并且计算最大值。对语音信号进行二维语音膨胀是双重作用的:(1)如果所有元素都为正,则输出的信号趋向于比原始信号更强;(2)输入的语谱信号中,某种结构是否被相对增强,取决于膨胀所用的结构元素的值和形状。 in is a structural element, yes domain of definition, yes domain of definition. Theoretically, it can be considered that the structure element is translated at all positions in the spectrum, the value of the structure element is added to the value of the two-dimensional signal, and the maximum value is calculated. Two-dimensional speech dilation on a speech signal has a dual effect: (1) if all elements are positive, the output signal tends to be stronger than the original signal; (2) whether a certain structure in the input spectral signal is Relative enhancement, depending on the value and shape of the structuring elements used for dilation.

膨胀算法,在增强语音结构的同时,也会增强相应的噪声结构。本发明提出的二维语音膨胀算法的目的是,尽量的增强语音结构,而相对抑制噪音结构。纯净语音信号浊音的语谱结构通常都是沿着时间轴伸展的长条形,而能量较强的残留噪声的语谱结构通常都是大小不一的正方形或圆形,如图3所示。因此,把结构元素定义为沿着时间轴伸展的长条形状,以此来增强所有类似结构,同时可以相对削弱结构不同的噪声结构。  The expansion algorithm, while enhancing the speech structure, will also enhance the corresponding noise structure. The purpose of the two-dimensional speech expansion algorithm proposed by the present invention is to enhance the speech structure as much as possible and relatively suppress the noise structure. The spectral structure of voiced sounds of pure speech signals is usually a long strip extending along the time axis, while the spectral structure of residual noise with strong energy is usually square or circular in different sizes, as shown in Figure 3. Therefore, the structure element is defined as a strip shape extending along the time axis, so as to enhance all similar structures, and at the same time, it can relatively weaken noise structures with different structures. the

所以,二维语音膨胀算法中的结构元素被定义为如下形状:  Therefore, the structural elements in the two-dimensional speech expansion algorithm is defined as the following shape:

                                                  (16) (16)

这里的是水平的沿着时间方向伸展的结构元素。所有跟它相似的结构,都将得到增强。由于纯净语音的语谱结构通常在时间上是连续分布的,它类似于,因此纯净语音的结构得到加强。而能量较强的残留噪声的语谱结构,通常是大的圆点或方点状,它的结构被相对削弱了。 here It is a horizontal structural element that extends along the time direction. All structures similar to it will be enhanced. Since the spectral structure of pure speech is usually distributed continuously in time, it is similar to , so the structure of pure speech is strengthened. The spectral structure of residual noise with strong energy is usually in the shape of large round or square dots, and its structure is relatively weakened.

3 感知语谱结构边界 (PSSB) 参数与端点检测算法  3 Perceptual Spectral Structure Boundary (PSSB) Parameters and Endpoint Detection Algorithm

3.1 感知语谱结构边界 (PSSB) 参数 3.1 Perceptual Spectral Structure Boundary (PSSB) Parameters

本发明在二维层面上考虑纯净语音语谱在时间轴上的连续分布特性,对含噪语音进行二维增强,使语音的语谱结构,更进一步突显出来,同时抑制了噪声的语谱结构。之后,本发明将寻找出纯净语音连续分布的语谱结构边界,并提出感知语谱结构边界参数PSSB用于端点检测。 The present invention considers the continuous distribution characteristics of the pure speech spectrum on the time axis on the two-dimensional level, and performs two-dimensional enhancement on the noisy speech, so that the spectral structure of the speech is further highlighted, and the spectral structure of the noise is suppressed at the same time . Afterwards, the present invention will find out the spectral structure boundary of the continuous distribution of pure speech, and propose a perceptual spectral structure boundary parameter PSSB for endpoint detection.

对于感知语谱结构边界参数PSSB来讲,要首先求解出语谱结构的边界信息。边界检测是求解二维结构边界的重要方法。连续二维信号的边界可以用一阶导数确定的梯度表示。本发明用公式(17)中的邻域模型逼近语音二维增强的结果的梯度。  For the perceptual spectral structure boundary parameter PSSB, the boundary information of the spectral structure must be solved first. Boundary detection is an important method to solve the boundary of 2D structures. The boundary of a continuous two-dimensional signal can be represented by a gradient determined by the first derivative. The present invention uses the neighborhood model in the formula (17) to approach the result of speech two-dimensional enhancement gradient.

                                                      (17)                                            (17)

是此邻域模型的中心点。而中心邻域的梯度,可以由下式表示: is the center point of this neighborhood model. The gradient of the central neighborhood can be expressed by the following formula:

                                             (18) (18)

由公式(19)和公式(20)确定: and Determined by formula (19) and formula (20):

                                              (19)                                               (19)

                                        (20)     (20)

即为的边界,它可以描述含噪语音语谱中的语音信号连续分布的边界信息。 that is The boundary, which can describe the boundary information of the continuous distribution of the speech signal in the noisy speech spectrum.

通过对和语音语谱的分析,我们发现在低信噪比的环境下,语音高频区域的信号及语谱特征都被噪声掩蔽掉,而在低频区域,语音浊音段的语谱结构仍然相对噪声有很高的能量,具有可求解的语谱边界。而且越往低频处,这种现象越明显。这是因为语音浊音段的能量主要集中在中低频前几个共振峰处。因此,在求得了语音语谱的边界之后,在语谱每一帧的频率轴上对所有的进行加权求和,使低频区域得到更高的权重,从而得到感知语谱结构边界参数PSSB。  by right And the analysis of the speech spectrum, we found that in the environment of low SNR, the signal and spectral features of the high-frequency region of the speech are masked by the noise, while in the low-frequency region, the spectral structure of the voiced segment of the speech is still relative to the noise. Very high energy, with solvable spectral boundaries. And the lower the frequency, the more obvious this phenomenon is. This is because the energy of the speech voiced segment is mainly concentrated in the first few formants of the middle and low frequencies. Therefore, after obtaining the boundary of the speech spectrum After that, on the frequency axis of each frame of the spectrum for all The weighted summation is performed to make the low-frequency region get a higher weight, so as to obtain the perceptual spectral structure boundary parameter PSSB.

提出感知语谱结构边界参数PSSB如下式:  The perceptual spectral structure boundary parameter PSSB is proposed as follows:

                    (21) (twenty one)

其中是第m帧的PSSB参数,M是总帧数。 in is the PSSB parameter of the mth frame, and M is the total number of frames.

PSSB参数可以很好的体现出一帧中语音浊音段信号的相对含量,对噪声具有很好的鲁棒性。  PSSB parameters It can well reflect the relative content of the speech voiced segment signal in a frame, and has good robustness to noise.

3.2 语音端点检测  3.2 Voice endpoint detection

语音中浊音段通常具有较长的连续分布时间。而清音段有两种分布类型:(1)清音分布在语音段中间;(2)清音分布在语音段起始处。 Voiced segments in speech usually have a longer continuous distribution time. The unvoiced segment has two distribution types: (1) the unvoiced sound is distributed in the middle of the speech segment; (2) the unvoiced sound is distributed at the beginning of the speech segment.

通过实验发现,语音段中间的清音可以被很好的识别成语音段(PSSB参数大于阈值0.5)。这是由于,一个语音单词中间的清音通常比较短,而本发明采用的是重叠50%的帧移方法。这种方法可以把单词中间的清音和旁边的浊音联合起来进行语谱分析,从而在此清音帧中体现出旁边浊音帧的信息。  Through experiments, it is found that the unvoiced sound in the middle of the speech segment can be well recognized as a speech segment (PSSB parameter is greater than the threshold 0.5). This is because the unvoiced sound in the middle of a phonetic word is usually shorter, and what the present invention adopts is the frame shifting method of overlapping 50%. This method can combine the unvoiced sound in the middle of the word and the voiced sound next to it for spectral analysis, so that the information of the voiced sound frame next to it can be reflected in the unvoiced sound frame. the

然而,随着信噪比的降低,特别是低于0dB时,语音段起始处的清音的PSSB区分特性减弱(数值较小)。若单纯以某一固定阈值进行端点划分,针对清音的检测,性能会急剧下降。但是,尽管清音的PSSB相对浊音比较小,但是它通常仍然有一定的PSSB区分特性(数值较小但不为零)。因此本发明采用了针对语音连续性分布特点的检测方法,以此来区别对待浊音段和端点处的清音段。具体端点检测方法如下:  However, as the signal-to-noise ratio decreases, especially below 0dB, the PSSB discrimination characteristic of unvoiced sounds at the beginning of the speech segment weakens (small value). If only a fixed threshold is used to divide the endpoints, the performance of unvoiced sound detection will drop sharply. However, although the PSSB of unvoiced sounds is relatively small relative to voiced sounds, it usually still has some PSSB distinguishing properties (small but not zero). Therefore, the present invention adopts a detection method aimed at the distribution characteristics of speech continuity, so as to treat voiced segments and unvoiced segments at endpoints differently. The specific endpoint detection method is as follows:

(1)首先检测出PSSB参数大于阈值a并且连续分布m帧的语音段,此段为检测到的浊音段。 (1) First detect the speech segment whose PSSB parameter is greater than the threshold a and is continuously distributed in m frames, and this segment is the detected voiced segment.

(2)以此段为基础,所有跟此段连在一起并且连续大于等于阈值b的段,定义为语音段。阈值b的值取的较小,实验中,b的值取0.01到0.05都具有较好的识别结果。这样可以把PSSB数值较小的清音段识别出来。  (2) Based on this segment, all segments connected with this segment and continuously greater than or equal to the threshold b are defined as speech segments. The value of the threshold b is small. In the experiment, the value of b is between 0.01 and 0.05, which has better recognition results. In this way, unvoiced segments with smaller PSSB values can be identified. the

(3)此语音段的起点和终点即为语音端点。  (3) The start and end points of this speech segment are the speech endpoints. the

经过实验测试,对于白噪声,当a=0.5,b=0.01,m=20时,系统的性能较好。  After experimental testing, for white noise, when a=0.5, b=0.01, m=20, the performance of the system is better. the

本发明的端点检测算法的框图如图4所示。  The block diagram of the endpoint detection algorithm of the present invention is shown in FIG. 4 . the

有益效果:  Beneficial effect:

实验设计在不同信噪比环境下。输入的低信噪比语音是16k采样,16位量化。使用汉明窗,帧长256,帧移128。语音选自TIMIT语音数据库,白噪声来自NoiseX-92 噪声数据库。图5是数据库中的一段语音实例(artists)的波形图,图6是加入白噪声使信噪比达到-10dB的低信噪比语音波形。 The experiments were designed under different signal-to-noise ratio environments. The input speech with low signal-to-noise ratio is 16k samples and 16-bit quantization. Using the Hamming window, the frame length is 256, and the frame shift is 128. The speech is selected from the TIMIT speech database, and the white noise is from the NoiseX-92 noise database. Figure 5 is a waveform diagram of a speech example (artists) in the database, and Figure 6 is a speech waveform with a low SNR of -10dB by adding white noise.

图5中,语音的起始点是第40帧,终点是87帧。而当语音信号加入白噪声,使信噪比达到-10dB时,语音信号已经完全被淹没在白噪声之中。传统的端点检测算法,无法从这样的语音信号中有效地提取出语音端点。  In Fig. 5, the starting point of the voice is the 40th frame, and the ending point is the 87th frame. And when the voice signal is added with white noise to make the signal-to-noise ratio reach -10dB, the voice signal has been completely submerged in the white noise. Traditional endpoint detection algorithms cannot effectively extract speech endpoints from such speech signals. the

图7是纯净语音实例(artists)的语谱图,图8此低信噪比语音的语谱图,而图9是经过基于听觉掩蔽特性的语音增强之后的语谱图。  Figure 7 is the spectrogram of pure speech examples (artists), Figure 8 is the spectrogram of this low SNR speech, and Figure 9 is the spectrogram after speech enhancement based on auditory masking characteristics. the

从图8中可以看出,-10dB低信噪比下的语音,大部分语谱结构已经被噪声淹没掉,只有在低频区域的共振峰结构还能和噪声区分开来。经过语音增强之后,从图9中可以看出,噪声信号和语音信号同时被语音增强的作用削弱了,而且还残留有随机分布的音乐噪声。这是由于谱减类算法本身固有的特性决定的。  It can be seen from Figure 8 that most of the spectral structure of the speech at -10dB low SNR has been submerged by the noise, and only the formant structure in the low frequency region can be distinguished from the noise. After speech enhancement, it can be seen from Figure 9 that the noise signal and the speech signal are weakened by speech enhancement at the same time, and there are still randomly distributed music noises. This is due to the inherent characteristics of the spectral subtraction algorithm itself. the

如果直接从图9的语谱求取语谱的边界,噪声和语音仍然难以区分开。因此需要在语音的语谱中再做二维增强。如图10和图11所示。  If the boundary of the spectrum is obtained directly from the spectrum in Figure 9, it is still difficult to distinguish noise from speech. Therefore, it is necessary to do two-dimensional enhancement in the speech spectrum. As shown in Figure 10 and Figure 11. the

图10是图9经过二维噪声腐蚀算法后的结果。相对于图9可以看出,除了能量较强的残留噪声和低频处语音的共振峰结构之外,其他残留噪声在一定程度上被抑制了。图11是对图10中语音的语谱结构进行二维语音膨胀算法后的结果。可以看出,随机分布的能量较强的噪声语谱结构,被相对削弱。语音的语谱结构被相对增强。  Figure 10 is the result of Figure 9 after the two-dimensional noise erosion algorithm. It can be seen from FIG. 9 that, except for the residual noise with strong energy and the formant structure of speech at low frequencies, other residual noises are suppressed to a certain extent. Fig. 11 is the result of performing the two-dimensional speech expansion algorithm on the spectral structure of the speech in Fig. 10 . It can be seen that the noise spectral structure with strong energy of random distribution is relatively weakened. The spectral structure of speech is relatively enhanced. the

之后,对图11边界检测,如图12。可以看到,40帧到85帧之间,低频区域的语音语谱边界结构被很好的求解出来。然而,由于仍然残留少量噪声的二维结构,在非语音区域,有很多中高频噪声的边界结构被表示出来。这是不希望被看到的。因此,在PSSB参数中,低频区域的边界结构赋予了更高的权重。这样,语音和噪声,就被很好地区分开来。如图13。  Afterwards, the boundary detection of Figure 11 is performed, as shown in Figure 12. It can be seen that between frame 40 and frame 85, the speech spectrum boundary structure in the low-frequency region is well solved. However, in non-speech regions, a lot of boundary structures with mid- and high-frequency noise are represented due to the 2D structure with a small amount of noise still remaining. This is not expected to be seen. Therefore, in the PSSB parameters, the boundary structure in the low-frequency region is given higher weight. In this way, speech and noise are well distinguished. Figure 13.

图13是由图12求解出的PSSB参数。很明显,在-10dB的情况下,语音信号的PSSB参数仍然能在时间轴上有很突出的区分特性。在做端点检测的时候,对PSSB参数做连续性检测,如果PSSB参数数值连续大于0,并且,连续大于阈值0.5的帧数大于20帧,则把此段数值连续大于0的PSSB参数判断为语音段。  FIG. 13 is the PSSB parameters obtained from FIG. 12 . Obviously, in the case of -10dB, the PSSB parameters of the speech signal can still have outstanding distinguishing characteristics on the time axis. When doing endpoint detection, the PSSB parameter is tested for continuity. If the value of the PSSB parameter is continuously greater than 0, and the number of frames continuously greater than the threshold 0.5 is greater than 20 frames, then the PSSB parameter whose value is continuously greater than 0 is judged as voice part.

实验中,本发明的端点检测算法(PSSB)对比其它四种端点检测算法,并比较它们的正确率。这四种方法分别是:1,能量-短时过零率(EZCR);2,子带幅度法(SBA);3,小波系数法(WC);4,子带谱熵法(ABSE)。本发明选取TIMIT语音数据库中70个单词作为端点检测的对象,每个单词做3次端点检测。按一定权值加入NoiseX-92 噪声数据库中白噪声,得到不同信噪比的语音。我们设定,误差小于4帧的端点检测为正确的结果。定义端点检测正确率=正确的结果/总的用于端点检测的语音段数量。表1和图14显示了各种算法在不同信噪比下的端点检测正确率。  In the experiment, the endpoint detection algorithm ( PSSB ) of the present invention is compared with other four endpoint detection algorithms, and their accuracy rates are compared. The four methods are: 1. Energy-short time zero crossing rate (EZCR); 2. Subband amplitude method (SBA); 3. Wavelet coefficient method (WC); 4. Subband spectral entropy method (ABSE). The present invention selects 70 words in the TIMIT speech database as the object of endpoint detection, and performs endpoint detection 3 times for each word. Add the white noise in the NoiseX-92 noise database according to a certain weight to get speech with different SNR. We set an endpoint detection with an error of less than 4 frames as the correct result. Define endpoint detection accuracy rate = correct result/total number of speech segments used for endpoint detection. Table 1 and Figure 14 show the accuracy of endpoint detection for various algorithms at different SNRs.

   the

表1 在不同信噪比下的端点检测正确率(%) Table 1 The correct rate of endpoint detection under different signal-to-noise ratios (%)

表1中的“*”,表示该算法在此条件下失效,此时我们认为正确率为零。由表1和图14和可以看出,在10dB的情况下,EZCR、SBA和WC三种传统方法,端点检测正确率已经低于86%。当信噪比低于零时,这三种方法完全失效,说明这些方法对噪声没有很好的鲁棒性能。ABSE方法正确率相对较高,这是因为该方法也是分析纯净语音的高能量成分,并做出端点检测。本发明的采用PSSB参数的方法相对与ABSE有着更高的端点识别率。在-10dB的情况下,仍然有75.2%的正确识别率。 "*" in Table 1 indicates that the algorithm fails under this condition, and we believe that the correct rate is zero. It can be seen from Table 1 and Figure 14 that in the case of 10dB, the correct rate of endpoint detection of the three traditional methods of EZCR, SBA and WC is already lower than 86%. When the SNR is below zero, the three methods fail completely, indicating that these methods are not robust to noise. The accuracy rate of the ABSE method is relatively high, because this method also analyzes the high-energy components of pure speech and makes endpoint detection. Compared with ABSE, the method using PSSB parameters in the present invention has a higher endpoint recognition rate. In the case of -10dB, there is still a correct recognition rate of 75.2%.

   the

附图说明:  Description of drawings:

图1为基于听觉特性的语音增强系统; Fig. 1 is the speech enhancement system based on auditory characteristics;

图2含有-5dB白噪声语音的语谱图; Fig. 2 contains the spectrogram of -5dB white noise speech;

图3语音增强之后的语谱图; Spectrogram after Fig. 3 speech enhancement;

图4为采用PSSB参数的端点检测算法; Fig. 4 is the endpoint detection algorithm adopting PSSB parameter;

图5为纯净语音; Fig. 5 is pure voice;

图6为-10dB低信噪比语音; Fig. 6 is -10dB low SNR voice;

图7为纯净语音信号语谱图; Fig. 7 is pure speech signal spectrogram;

图8为-10dB低信噪比语音信号语谱图; Fig. 8 is -10dB low SNR speech signal spectrogram;

图9为语音增强结果; Fig. 9 is speech enhancement result;

图10为经过二维噪声腐蚀算法后的语谱图; Fig. 10 is the spectrogram after two-dimensional noise erosion algorithm;

图11为经过二维语音膨胀算法后的语谱图; Fig. 11 is the spectrogram after two-dimensional speech expansion algorithm;

图12为语谱边界; Fig. 12 is spectrum boundary;

图13为PSSB参数及端点检测 Figure 13 shows the PSSB parameters and endpoint detection

图14为端点检测结果对比。 Figure 14 is a comparison of endpoint detection results.

   the

具体实施方式 Detailed ways

实施例1  Example 1

第一步:基于听觉感知特性的语音增强;采用基于听觉掩蔽特性的语音增强,在保护语音的基础上尽可能的抑制噪声;所述的语音增强方法中掩蔽阈值的计算以及语音增强系统如下: The first step: speech enhancement based on auditory perception characteristics; adopt speech enhancement based on auditory masking characteristics, suppress noise as much as possible on the basis of protecting speech; the calculation of masking threshold and the speech enhancement system in the described speech enhancement method are as follows:

ⅰ.Bark阈功率谱 ⅰ. Bark threshold power spectrum

语音信号x(n)经过快速傅立叶变换(FFT)变成频域信号,信号功率谱为: The speech signal x(n) is converted into a frequency domain signal by fast Fourier transform (FFT) , the signal power spectrum is:

                                           (1) (1)

Bark功率谱为: The Bark power spectrum is:

B i = &Sigma; k = b li b hi P ( k ) - - - ( 2 )                          其中表示第i段Bark频带的能量,  表示第i段最低的频率, 表示第i段最高的频率; B i = &Sigma; k = b li b hi P ( k ) - - - ( 2 ) in Indicates the energy of the i-th Bark band, Indicates the lowest frequency of segment i, Indicates the highest frequency of the i segment;

ⅱ.扩散Bark域功率谱 ⅱ. Diffused Bark Domain Power Spectrum

引入扩散函数,它是一个矩阵,满足条件: Introduce the spread function , which is a matrix that satisfies the condition:

                                                     (3) (3)

定义式如下: The definition formula is as follows:

                    (4) (4)

表示两个频带的频带号之差; Indicates the difference between the band numbers of the two bands;

C i = &Sigma; j = 1 j max S ij &CenterDot; B i , i = 1,2 . . . i max - - - ( 5 )                         ⅲ. 掩蔽能量的偏移函数及掩蔽阈值的计算 C i = &Sigma; j = 1 j max S ij &CenterDot; B i , i = 1,2 . . . i max - - - ( 5 ) ⅲ. Offset function of masking energy and masking threshold calculation

                                            (6) (6)

T i = 10 log 10 ( C i ) - ( O i / 10 ) - - - ( 7 )                          取值在0和1之间,由语音含量决;是第i段Bark频带的掩蔽阈值,将其改称为,其中b的含义与前面的i相同; T i = 10 log 10 ( C i ) - ( o i / 10 ) - - - ( 7 ) The value is between 0 and 1, depending on the voice content; is the masking threshold of the i-th Bark band, which is renamed as , where the meaning of b is the same as the previous i;

和安静听阈的阈值: and thresholds for quiet hearing:

      (8) (8)

相比较,取其最大值,作为最终拟合的掩蔽阈值;其中相应的Bark掩蔽曲线; Compare and take the maximum , as the masking threshold for the final fit; where for The corresponding Bark masking curve;

ⅳ.谱相减和减参数的调节 ⅳ. Spectral subtraction and adjustment of subtraction parameters

谱相减算法采用的增益函数如下: The gain function used by the spectral subtraction algorithm is as follows:

H ( k ) = ( 1 - &alpha; &CenterDot; [ | D ( k ) | | Y ( k ) | ] &gamma; ) 1 / &gamma; , [ | D ( k ) | | Y ( k ) | ] &gamma; < 1 &alpha; + &beta; ( &beta; &CenterDot; [ | D ( k ) | | Y ( k ) | ] &gamma; ) 1 / &gamma; , else - - - ( 9 )                        首先计算每一帧语音的不同Bark域的噪声掩蔽阈值,然后根据噪声掩蔽阈值得到自适应的减参数:若掩蔽阈值较高,残留噪声会很自然地被掩蔽而使人耳听不见,在这种情况下,减参数取它们的最小值;掩蔽阈值较低时,残留噪声对人耳的影响很大,有必要去减少它;对于每一帧m,掩蔽阈值的最小值与每帧的减参数的最大值有关;减参数的应用有如下关系式: h ( k ) = ( 1 - &alpha; &CenterDot; [ | D. ( k ) | | Y ( k ) | ] &gamma; ) 1 / &gamma; , [ | D. ( k ) | | Y ( k ) | ] &gamma; < 1 &alpha; + &beta; ( &beta; &Center Dot; [ | D. ( k ) | | Y ( k ) | ] &gamma; ) 1 / &gamma; , else - - - ( 9 ) First calculate the noise masking threshold of different Bark domains of each frame of speech, and then get the adaptive subtraction parameter according to the noise masking threshold , : If the masking threshold is high, the residual noise will be masked naturally to make it inaudible to the human ear. In this case, the subtraction parameters take their minimum value; when the masking threshold is low, the residual noise will have a great influence on the human ear. large, it is necessary to reduce it; for each frame m, the masking threshold The minimum value of and the subtraction parameter per frame and is related to the maximum value; the application of the subtraction parameter has the following relationship:

,  ,

                         (10) (10)

其中,分别为的最小值和最大值;分别是参数的最小值和最大值;当时,;当时,;式中和 分别是逐帧得到的掩蔽阈值的最小值和最大值;实验中,我们对各个参数的取值如下: in, and respectively The minimum and maximum values of ; , and , are the parameters , The minimum and maximum values of ; when hour, ;when hour, ; where and They are the minimum and maximum values of the masking threshold obtained frame by frame; in the experiment, we set the values of each parameter as follows:

ⅴ.实时噪声功率谱估计;采用基于约束方差频谱平滑和最小值跟踪的噪声功率谱估计方法。 v. Real-time noise power spectrum estimation; using a noise power spectrum estimation method based on constrained variance spectral smoothing and minimum value tracking.

ⅵ.语音增强系统;根据掩蔽阈值得到自适应的减参数、ⅵ. Speech enhancement system; according to the masking threshold, the adaptive subtraction parameter, ;

第二步: 语音的二维增强; The second step: two-dimensional enhancement of speech;

2.1二维噪声腐蚀算法 2.1 Two-dimensional noise erosion algorithm

对语音语谱的二维噪声腐蚀算法,由以下过程决定;首先,对语音进行短时傅立叶变换,每一帧的频谱由下式计算: The two-dimensional noise erosion algorithm for the speech spectrum is determined by the following process; first, the short-time Fourier transform is performed on the speech, and the spectrum of each frame Calculated by the following formula:

                               (11) (11)

是第m帧语音信号,是第m帧语音信号的频谱;N为帧的长度和短时傅立叶变换点数;是Hamming窗;每帧的语音信号功率谱可以表示为: is the speech signal of the mth frame, is the frequency spectrum of the m- th frame speech signal; N is the length of the frame and the number of short-time Fourier transform points; Is the Hamming window; the speech signal power spectrum of each frame can be expressed as:

                                    (12) (12)

即定义为语音信号的语谱; That is, it is defined as the spectrum of the speech signal;

的二维噪声腐蚀被定义为: right The 2D noise erosion of is defined as:

                                             (13) (13)

其中是结构元素,的定义域,的定义域;平移参数必须在的定义域内,且必须在的定义域之内; in is a structural element, yes domain of definition, yes domain of definition; translation parameters gotta be within the domain of definition, and gotta be within the domain of definition;

针对能量较弱的残留噪声语谱的结构形态,二维噪声腐蚀算法的结构元素被定义为下式: Structural elements of the two-dimensional noise erosion algorithm for the structural shape of the residual noise spectrum with weak energy is defined as:

                                                (14) (14)

2.2 二维语音膨胀算法 2.2 Two-dimensional Speech Expansion Algorithm

针对二维噪声腐蚀的结果,二维语音膨胀算法由下式定义: Results for 2D noise erosion , two-dimensional speech expansion algorithm is defined by:

                      (15) (15)

其中是结构元素,的定义域,的定义域; in is a structural element, yes domain of definition, yes domain of definition;

所以,二维语音膨胀算法中的结构元素被定义为如下形状: Therefore, the structural elements in the two-dimensional speech expansion algorithm is defined as the following shape:

                                                  (16) (16)

第三步:感知语谱结构边界 (PSSB) 参数与端点检测算法 Step 3: Perceptual Spectral Structure Boundary (PSSB) Parameters and Endpoint Detection Algorithm

3.1感知语谱结构边界(PSSB)参数 3.1 Perceptual Spectral Structure Boundary (PSSB) Parameters

本发明用公式(17)中的邻域模型逼近语音二维增强的结果的梯度; The present invention uses the neighborhood model in the formula (17) to approach the result of speech two-dimensional enhancement gradient;

                                                      (17)                                           (17)

是此邻域模型的中心点;而中心邻域的梯度,可以由下式表示: is the center point of this neighborhood model; and the gradient of the center neighborhood can be expressed by the following formula:

                                             (18) (18)

由公式(19)和公式(20)确定: and Determined by formula (19) and formula (20):

                                              (19)                                               (19)

                                        (20)     (20)

即为的边界,它可以描述含噪语音语谱中的语音信号连续分布的边界信息。 that is The boundary, which can describe the boundary information of the continuous distribution of the speech signal in the noisy speech spectrum.

提出感知语谱结构边界参数PSSB如下式:  The perceptual spectral structure boundary parameter PSSB is proposed as follows:

                    (21) (twenty one)

其中是第m帧的PSSB参数,M是总帧数; in is the PSSB parameter of the mth frame, and M is the total number of frames;

3.2 语音端点检测 3.2 Voice endpoint detection

采用了针对语音连续性分布特点的检测方法,以此来区别对待浊音段和端点处的清音段;具体端点检测方法如下: A detection method aimed at the distribution characteristics of speech continuity is used to distinguish between voiced segments and unvoiced segments at the endpoints; the specific endpoint detection method is as follows:

(1)首先检测出PSSB参数大于阈值a并且连续分布m帧的语音段,此段为检测到的浊音段; (1) at first detect the speech segment that PSSB parameter is greater than threshold value a and distributes m frame continuously, and this segment is the voiced sound segment that detects;

(2)以此段为基础,所有跟此段连在一起并且连续大于等于阈值b的段,定义为语音段;阈值b的值取的较小,实验中,b的值取0.01到0.05都具有较好的识别结果。这样可以把PSSB数值较小的清音段识别出来; (2) Based on this segment, all segments that are connected with this segment and are continuously greater than or equal to the threshold b are defined as speech segments; the value of the threshold b is smaller, and in the experiment, the value of b is between 0.01 and 0.05. have better recognition results. In this way, unvoiced segments with smaller PSSB values can be identified;

(3)此语音段的起点和终点即为语音端点。 (3) The start and end points of this speech segment are the speech endpoints.

实验设计在不同信噪比环境下;输入的低信噪比语音是16k采样,16位量化;使用汉明窗,帧长256,帧移128;语音选自TIMIT语音数据库,白噪声来自NoiseX-92 噪声数据库。  The experimental design is in different SNR environments; the input speech with low SNR is 16k samples, 16-bit quantization; using the Hamming window, the frame length is 256, and the frame shift is 128; the speech is selected from the TIMIT speech database, and the white noise is from NoiseX- 92 Noise Database. the

Claims (5)

1.一种采用感知语谱结构边界参数的语音端点检测算法,其特征在于所述的算法步骤如下:(1)基于听觉感知特性的语音增强;(2)语音的二维增强,包括二维噪声腐蚀算法和二维语音膨胀算法;(3)感知语谱结构边界 (PSSB) 参数与语音端点检测。 1. A speech endpoint detection algorithm using perceptual spectrum structure boundary parameters, characterized in that the algorithm steps are as follows: (1) speech enhancement based on auditory perception characteristics; (2) two-dimensional enhancement of speech, including two-dimensional Noise erosion algorithm and two-dimensional speech expansion algorithm; (3) Perceptual Spectral Structure Boundary (PSSB) parameters and speech endpoint detection. 2.根据权利要求1所述的一种采用感知语谱结构边界参数的语音端点检测算法,其特征在于所述的所述的算法步骤如下: 2. a kind of speech endpoint detection algorithm that adopts perceptual spectrum structure boundary parameter according to claim 1, it is characterized in that described described algorithm step is as follows: 第一步:基于听觉感知特性的语音增强;采用基于听觉掩蔽特性的语音增强,在保护语音的基础上尽可能的抑制噪声;所述的语音增强方法中掩蔽阈值的计算以及语音增强系统如下: The first step: speech enhancement based on auditory perception characteristics; adopt speech enhancement based on auditory masking characteristics, suppress noise as much as possible on the basis of protecting speech; the calculation of masking threshold and the speech enhancement system in the described speech enhancement method are as follows: ⅰ.Bark阈功率谱  ⅰ. Bark threshold power spectrum 语音信号x(n)经过快速傅立叶变换(FFT)变成频域信号,信号功率谱为: The speech signal x(n) is converted into a frequency domain signal by fast Fourier transform (FFT) , the signal power spectrum is:                                             (1) (1) Bark功率谱为: The Bark power spectrum is:                           其中表示第i段Bark频带的能量,  表示第i段最低的频率, 表示第i段最高的频率; in Indicates the energy of the i-th Bark band, Indicates the lowest frequency of segment i, Indicates the highest frequency of the i segment; ⅱ.扩散Bark域功率谱 ⅱ. Diffused Bark Domain Power Spectrum 引入扩散函数,它是一个矩阵,满足条件: Introduce the spread function , which is a matrix that satisfies the condition:                                                       (3) (3) 定义式如下: The definition formula is as follows:                      (4) (4) 表示两个频带的频带号之差; Indicates the difference between the band numbers of the two bands;                          ⅲ. 掩蔽能量的偏移函数及掩蔽阈值的计算 ⅲ. Offset function of masking energy and masking threshold calculation                                              (6) (6)                          取值在0和1之间,由语音含量决;是第i段Bark频带的掩蔽阈值,将其改称为,其中b的含义与前面的i相同; The value is between 0 and 1, depending on the voice content; is the masking threshold of the i-th Bark band, which is renamed as , where the meaning of b is the same as the previous i; 和安静听阈的阈值: and thresholds for quiet hearing:        (8) (8) 相比较,取其最大值,作为最终拟合的掩蔽阈值;其中相应的Bark掩蔽曲线; Compare and take the maximum , as the masking threshold for the final fit; where for The corresponding Bark masking curve; ⅳ.谱相减和减参数的调节 ⅳ. Spectral subtraction and adjustment of subtraction parameters 谱相减算法采用的增益函数如下: The gain function used by the spectral subtraction algorithm is as follows:     首先计算每一帧语音的不同Bark域的噪声掩蔽阈值,然后根据噪声掩蔽阈值得到自适应的减参数:若掩蔽阈值较高,残留噪声会很自然地被掩蔽而使人耳听不见,在这种情况下,减参数取它们的最小值;掩蔽阈值较低时,残留噪声对人耳的影响很大,有必要去减少它;对于每一帧m,掩蔽阈值的最小值与每帧的减参数的最大值有关;减参数的应用有如下关系式: First calculate the noise masking threshold of different Bark domains of each frame of speech, and then get the adaptive subtraction parameter according to the noise masking threshold , : If the masking threshold is high, the residual noise will be masked naturally to make it inaudible to the human ear. In this case, the subtraction parameters take their minimum value; when the masking threshold is low, the residual noise will have a great influence on the human ear. large, it is necessary to reduce it; for each frame m, the masking threshold The minimum value of and the subtraction parameter per frame and is related to the maximum value; the application of the subtraction parameter has the following relationship: ,  ,                           (10) (10) 其中,分别为的最小值和最大值;分别是参数的最小值和最大值;当时,;当时,;式中和 分别是逐帧得到的掩蔽阈值的最小值和最大值;实验中,我们对各个参数的取值如下: in, and respectively The minimum and maximum values of ; , and , are the parameters , The minimum and maximum values of ; when hour, ;when hour, ; where and They are the minimum and maximum values of the masking threshold obtained frame by frame; in the experiment, we set the values of each parameter as follows: ⅴ.实时噪声功率谱估计;采用基于约束方差频谱平滑和最小值跟踪的噪声功率谱估计方法; v. Real-time noise power spectrum estimation; using a noise power spectrum estimation method based on constrained variance spectrum smoothing and minimum value tracking;  ⅵ.语音增强系统;根据掩蔽阈值得到自适应的减参数;  ⅵ. Speech enhancement system; get adaptive subtraction parameters according to the masking threshold , ; 第二步: 语音的二维增强; The second step: two-dimensional enhancement of speech; 2.1二维噪声腐蚀算法 2.1 Two-dimensional noise erosion algorithm 对语音语谱的二维噪声腐蚀算法,由以下过程决定;首先,对语音进行短时傅立叶变换,每一帧的频谱由下式计算: The two-dimensional noise erosion algorithm for the speech spectrum is determined by the following process; first, the short-time Fourier transform is performed on the speech, and the spectrum of each frame Calculated by the following formula:                                 (11) (11) 是第m帧语音信号,是第m帧语音信号的频谱;N为帧的长度和短时傅立叶变换点数;是Hamming窗;每帧的语音信号功率谱可以表示为: is the speech signal of the mth frame, is the frequency spectrum of the m- th frame speech signal; N is the length of the frame and the number of short-time Fourier transform points; Is the Hamming window; the speech signal power spectrum of each frame can be expressed as:                                      (12) (12) 即定义为语音信号的语谱; That is, it is defined as the spectrum of the speech signal; 的二维噪声腐蚀被定义为: right The 2D noise erosion of is defined as:                                               (13) (13) 其中是结构元素,的定义域,的定义域;平移参数必须在的定义域内,且必须在的定义域之内; in is a structural element, yes domain of definition, yes domain of definition; translation parameters gotta be within the domain of definition, and gotta be within the domain of definition; 针对能量较弱的残留噪声语谱的结构形态,二维噪声腐蚀算法的结构元素被定义为下式: Structural elements of the two-dimensional noise erosion algorithm for the structural shape of the residual noise spectrum with weak energy is defined as:                                                           (14) (14) 2.2 二维语音膨胀算法 2.2 Two-dimensional Speech Expansion Algorithm 针对二维噪声腐蚀的结果,二维语音膨胀算法由下式定义: Results for 2D noise erosion , two-dimensional speech expansion algorithm is defined by:                                              (15) (15) 其中是结构元素,的定义域,的定义域; in is a structural element, yes domain of definition, yes domain of definition; 所以,二维语音膨胀算法中的结构元素被定义为如下形状: Therefore, the structural element in the two-dimensional speech expansion algorithm is defined as the following shape:                                                    (16) (16) 第三步:感知语谱结构边界 (PSSB) 参数与端点检测算法 Step 3: Perceptual Spectral Structure Boundary (PSSB) Parameters and Endpoint Detection Algorithm 3.1感知语谱结构边界(PSSB)参数 3.1 Perceptual Spectral Structure Boundary (PSSB) Parameters 本发明用公式(17)中的邻域模型逼近语音二维增强的结果的梯度; The present invention uses the neighborhood model in the formula (17) to approach the result of speech two-dimensional enhancement gradient;                                                        (17)                                           (17) 是此邻域模型的中心点;而中心邻域的梯度,可以由下式表示: is the center point of this neighborhood model; and the gradient of the center neighborhood can be expressed by the following formula:                                               (18) (18) 由公式(19)和公式(20)确定: and Determined by formula (19) and formula (20):                                                 (19)                                                 (19)                                          (20)     (20) 即为的边界,它可以描述含噪语音语谱中的语音信号连续分布的边界信息; that is The boundary of , which can describe the boundary information of the continuous distribution of the speech signal in the noisy speech spectrum; 提出感知语谱结构边界参数PSSB如下式: The perceptual spectral structure boundary parameter PSSB is proposed as follows:                                       (21) (twenty one) 其中是第m帧的PSSB参数,M是总帧数; in is the PSSB parameter of the mth frame, and M is the total number of frames; 3.2 语音端点检测 3.2 Voice endpoint detection 采用了针对语音连续性分布特点的检测方法,以此来区别对待浊音段和端点处的清音段;具体端点检测方法如下: A detection method aimed at the distribution characteristics of speech continuity is used to distinguish between voiced segments and unvoiced segments at the endpoints; the specific endpoint detection method is as follows: (1)首先检测出PSSB参数大于阈值a并且连续分布m帧的语音段,此段为检测到的浊音段; (1) at first detect the speech segment that PSSB parameter is greater than threshold value a and distributes m frame continuously, and this segment is the voiced sound segment that detects; (2)以此段为基础,所有跟此段连在一起并且连续大于等于阈值b的段,定义为语音段;阈值b的值取的较小,实验中,b的值取0.01到0.05都具有较好的识别结果;这样可以把PSSB数值较小的清音段识别出来; (2) Based on this segment, all segments that are connected with this segment and are continuously greater than or equal to the threshold b are defined as speech segments; the value of the threshold b is relatively small. In the experiment, the value of b is between 0.01 and 0.05. It has a better recognition result; in this way, the unvoiced segment with a smaller PSSB value can be recognized; (3)此语音段的起点和终点即为语音端点。 (3) The start and end points of this speech segment are the speech endpoints. 3.根据权利要求2所述的一种采用感知语谱结构边界参数的语音端点检测算法,其特征在于:实验设计在不同信噪比环境下;输入的低信噪比语音是16k采样,16位量化。 3. a kind of speech endpoint detection algorithm that adopts perceptual spectrum structure boundary parameter according to claim 2, it is characterized in that: experimental design is under different SNR environments; The low SNR speech of input is 16k sampling, 16 bit quantization. 4. 根据权利要求2所述的一种采用感知语谱结构边界参数的语音端点检测算法,其特征在于:使用汉明窗,帧长256,帧移128。 4. A speech endpoint detection algorithm using perceptual spectrum structure boundary parameters according to claim 2, characterized in that: Hamming window is used, the frame length is 256, and the frame shift is 128. 5.根据权利要求2所述的一种采用感知语谱结构边界参数的语音端点检测算法,其特征在于:语音选自TIMIT语音数据库,白噪声来自NoiseX-92 噪声数据库。 5. a kind of speech endpoint detection algorithm that adopts perceptual spectrum structure boundary parameter according to claim 2, is characterized in that: speech is selected from TIMIT speech database, and white noise is from NoiseX-92 noise database.
CN201410175090.8A 2014-04-29 2014-04-29 Speech endpoint detection algorithm adopting perceptual speech spectrum structure boundary parameters Expired - Fee Related CN104091593B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410175090.8A CN104091593B (en) 2014-04-29 2014-04-29 Speech endpoint detection algorithm adopting perceptual speech spectrum structure boundary parameters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410175090.8A CN104091593B (en) 2014-04-29 2014-04-29 Speech endpoint detection algorithm adopting perceptual speech spectrum structure boundary parameters

Publications (2)

Publication Number Publication Date
CN104091593A true CN104091593A (en) 2014-10-08
CN104091593B CN104091593B (en) 2017-02-15

Family

ID=51639303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410175090.8A Expired - Fee Related CN104091593B (en) 2014-04-29 2014-04-29 Speech endpoint detection algorithm adopting perceptual speech spectrum structure boundary parameters

Country Status (1)

Country Link
CN (1) CN104091593B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104867493A (en) * 2015-04-10 2015-08-26 武汉工程大学 Multi-fractal dimension endpoint detection method based on wavelet transform
CN106653004A (en) * 2016-12-26 2017-05-10 苏州大学 Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient
CN107742522A (en) * 2017-10-23 2018-02-27 科大讯飞股份有限公司 Target voice acquisition methods and device based on microphone array
CN108122552A (en) * 2017-12-15 2018-06-05 上海智臻智能网络科技股份有限公司 Voice mood recognition methods and device
CN109979478A (en) * 2019-04-08 2019-07-05 网易(杭州)网络有限公司 Voice de-noising method and device, storage medium and electronic equipment
CN111028858A (en) * 2019-12-31 2020-04-17 云知声智能科技股份有限公司 Method and device for detecting voice start-stop time
CN111063371A (en) * 2019-12-21 2020-04-24 华南理工大学 A method for estimating the number of speech syllables based on spectrogram time difference
CN112557510A (en) * 2020-12-11 2021-03-26 广西交科集团有限公司 Concrete pavement void intelligent detection device and detection method thereof
CN112863517A (en) * 2021-01-19 2021-05-28 苏州大学 Speech recognition method based on perceptual spectrum convergence rate

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599269A (en) * 2009-07-02 2009-12-09 中国农业大学 Speech endpoint detection method and device
CN102982801A (en) * 2012-11-12 2013-03-20 中国科学院自动化研究所 Phonetic feature extracting method for robust voice recognition
CN103489446A (en) * 2013-10-10 2014-01-01 福州大学 Twitter identification method based on self-adaption energy detection under complex environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599269A (en) * 2009-07-02 2009-12-09 中国农业大学 Speech endpoint detection method and device
CN102982801A (en) * 2012-11-12 2013-03-20 中国科学院自动化研究所 Phonetic feature extracting method for robust voice recognition
CN103489446A (en) * 2013-10-10 2014-01-01 福州大学 Twitter identification method based on self-adaption energy detection under complex environment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NIMA DERAKHSHAN ETC: "Noise power spectrum estimation using constrained variance spectral smoothing and minima tracking", 《SPEECH COMMUNICATION》 *
吴迪: "基于听觉特性及语谱特性的语音增强", 《中国优秀博硕士学位论文全文数据库(硕士)科技信息辑》 *
肖纯智 等: "一种基于语谱图分析的语音增强算法", 《电声技术》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104867493A (en) * 2015-04-10 2015-08-26 武汉工程大学 Multi-fractal dimension endpoint detection method based on wavelet transform
CN104867493B (en) * 2015-04-10 2018-08-03 武汉工程大学 Multifractal Dimension end-point detecting method based on wavelet transformation
CN106653004B (en) * 2016-12-26 2019-07-26 苏州大学 Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient
CN106653004A (en) * 2016-12-26 2017-05-10 苏州大学 Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient
CN107742522A (en) * 2017-10-23 2018-02-27 科大讯飞股份有限公司 Target voice acquisition methods and device based on microphone array
CN107742522B (en) * 2017-10-23 2022-01-14 科大讯飞股份有限公司 Target voice obtaining method and device based on microphone array
CN108122552A (en) * 2017-12-15 2018-06-05 上海智臻智能网络科技股份有限公司 Voice mood recognition methods and device
CN109979478A (en) * 2019-04-08 2019-07-05 网易(杭州)网络有限公司 Voice de-noising method and device, storage medium and electronic equipment
CN111063371A (en) * 2019-12-21 2020-04-24 华南理工大学 A method for estimating the number of speech syllables based on spectrogram time difference
CN111063371B (en) * 2019-12-21 2023-04-21 华南理工大学 Speech syllable number estimation method based on spectrogram time difference
CN111028858A (en) * 2019-12-31 2020-04-17 云知声智能科技股份有限公司 Method and device for detecting voice start-stop time
CN111028858B (en) * 2019-12-31 2022-02-18 云知声智能科技股份有限公司 Method and device for detecting voice start-stop time
CN112557510A (en) * 2020-12-11 2021-03-26 广西交科集团有限公司 Concrete pavement void intelligent detection device and detection method thereof
CN112863517A (en) * 2021-01-19 2021-05-28 苏州大学 Speech recognition method based on perceptual spectrum convergence rate
CN112863517B (en) * 2021-01-19 2023-01-06 苏州大学 Speech Recognition Method Based on Convergence Rate of Perceptual Spectrum

Also Published As

Publication number Publication date
CN104091593B (en) 2017-02-15

Similar Documents

Publication Publication Date Title
CN104091593B (en) Speech endpoint detection algorithm adopting perceptual speech spectrum structure boundary parameters
Yegnanarayana et al. Enhancement of reverberant speech using LP residual signal
CN103854662B (en) Adaptive voice detection method based on multiple domain Combined estimator
May et al. Noise-robust speaker recognition combining missing data techniques and universal background modeling
Hu et al. A perceptually motivated approach for speech enhancement
Verteletskaya et al. Noise reduction based on modified spectral subtraction method
CN105023572A (en) Noised voice end point robustness detection method
CN106653004B (en) Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient
CN105575406A (en) Noise robustness detection method based on likelihood ratio test
Upadhyay et al. An improved multi-band spectral subtraction algorithm for enhancing speech in various noise environments
CN112233657B (en) Speech enhancement method based on low-frequency syllable recognition
Hsu et al. Voice activity detection based on frequency modulation of harmonics
CN109102823B (en) A Speech Enhancement Method Based on Subband Spectral Entropy
Yang et al. Voice activity detection algorithm based on long-term pitch information
Hsu et al. Modulation Wiener filter for improving speech intelligibility
Upadhyay et al. The spectral subtractive-type algorithms for enhancing speech in noisy environments
Gowda et al. AM-FM based filter bank analysis for estimation of spectro-temporal envelopes and its application for speaker recognition in noisy reverberant environments.
Li et al. Robust speech endpoint detection based on improved adaptive band-partitioning spectral entropy
Upadhyay et al. Single-Channel Speech Enhancement Using Critical-Band Rate Scale Based Improved Multi-Band Spectral Subtraction
Kingsbury et al. Improving ASR performance for reverberant speech
Hsieh et al. Histogram equalization of real and imaginary modulation spectra for noise-robust speech recognition.
Jin et al. An improved speech endpoint detection based on spectral subtraction and adaptive sub-band spectral entropy
Wu et al. Robust speech recognition by selecting mel-filter banks
Krishnamoorthy et al. Modified spectral subtraction method for enhancement of noisy speech
Gouda et al. Robust Automatic Speech Recognition system based on using adaptive time-frequency masking

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Huang Xujiang

Inventor after: Wu Di

Inventor before: Wu Di

Inventor before: Zhao Heming

Inventor before: Tao Zhi

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20180321

Address after: Room 202, room two, No. 868, West Ring Road, Jiangsu, Jiangsu

Patentee after: Suzhou Cheng Bang energy conservation science & Technology Co., Ltd.

Address before: 215000 Suzhou Industrial Park, Jiangsu Road, No. 199

Patentee before: Soochow University

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170215

Termination date: 20180429