CN105957520A - Voice state detection method suitable for echo cancellation system - Google Patents

Voice state detection method suitable for echo cancellation system Download PDF

Info

Publication number
CN105957520A
CN105957520A CN201610519040.6A CN201610519040A CN105957520A CN 105957520 A CN105957520 A CN 105957520A CN 201610519040 A CN201610519040 A CN 201610519040A CN 105957520 A CN105957520 A CN 105957520A
Authority
CN
China
Prior art keywords
signal
gaussian
speech
voice
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610519040.6A
Other languages
Chinese (zh)
Other versions
CN105957520B (en
Inventor
王珂
明萌
纪红
李曦
张鹤立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201610519040.6A priority Critical patent/CN105957520B/en
Publication of CN105957520A publication Critical patent/CN105957520A/en
Application granted granted Critical
Publication of CN105957520B publication Critical patent/CN105957520B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

本发明是一种适用于回声消除系统的语音状态检测方法,涉及基于IP网络的语音交互技术领域。本发明利用噪声训练样本和语音训练样本构造支持向量机(SVM)分类器,待检测信号是分块后的远端和近端信号,使用构造好的基于高斯混合模型的SVM分类器对本分块远端信号进行VAD判决,如果判断结果为无语音,停止滤波器更新和滤波,直接输出近端语音信号,如果判断远端有语音,进行双端通话判决;当处于双端通话时,停止滤波器系数更新,对近端信号进行滤波;否则,根据远端信号进行滤波器系数更新和滤波。本发明提高了语音活动性检测的准确性,避免将双端静音状态误判为双端通话状态,防止了在没有参考信号的情况下滤波器的错误更新和滤波。

The invention relates to a voice state detection method suitable for an echo cancellation system, and relates to the technical field of voice interaction based on an IP network. The present invention utilizes noise training samples and voice training samples to construct a Support Vector Machine (SVM) classifier, the signal to be detected is the far-end and near-end signals after the block, and the SVM classifier based on the Gaussian mixture model constructed is used to classify the block The far-end signal is judged by VAD. If the judgment result is no voice, stop filter update and filtering, and directly output the near-end voice signal. If it is judged that there is voice at the far end, double-end call judgment is performed; when there is a double-end call, stop filtering The filter coefficients are updated to filter the near-end signal; otherwise, the filter coefficients are updated and filtered according to the far-end signal. The invention improves the accuracy of voice activity detection, avoids misjudgment of the double-end mute state as the double-end talk state, and prevents wrong update and filtering of the filter under the condition of no reference signal.

Description

一种适用于回声消除系统的语音状态检测方法A Speech State Detection Method Applicable to Echo Cancellation System

技术领域technical field

本发明涉及基于IP网络的语音交互技术领域,具体是指一种适用于回声消除系统的语音状态检测方法。The invention relates to the technical field of voice interaction based on an IP network, in particular to a voice state detection method suitable for an echo cancellation system.

背景技术Background technique

回声消除技术广泛应用于电话会议系统、车载蓝牙系统、IP电话等基于IP网络的语音交互系统中,用以消除扬声器播放的声音经过多种路径传播后被麦克风拾取,并传回到系统远端形成的声学回声。回声消除的核心思想是通过一个自适应滤波器模拟回声路径,并将估计回声信号从麦克风拾取的信号中减去。Echo cancellation technology is widely used in voice interactive systems based on IP networks such as teleconferencing systems, car bluetooth systems, IP phones, etc., to eliminate the sound played by the speaker after being picked up by the microphone after being propagated through various paths, and transmitted back to the remote end of the system Acoustic echoes formed. The core idea of echo cancellation is to simulate the echo path through an adaptive filter and subtract the estimated echo signal from the signal picked up by the microphone.

语音状态检测在回声消除中起着至关重要的作用。在声音信号进入滤波器之前需要首先对当前语音状态进行判断,根据系统所处的语音状态决定滤波器的工作状态。是否能准确迅速地判断系统语音状态,对回声消除的效果有很大的影响。Voice state detection plays a vital role in echo cancellation. Before the sound signal enters the filter, it is necessary to judge the current voice state first, and determine the working state of the filter according to the voice state of the system. Whether the voice state of the system can be judged accurately and quickly has a great influence on the effect of echo cancellation.

现有的回声消除系统通常直接使用DTD(Double Talk Detection,双端通话检测)算法判断系统是否处于双端通话状态,并在双端通话状态下停止滤波器系数更新,防止这种情况下滤波器由于受到近端语音的干扰而发散。常用的DTD算法——Geigel算法通过比较近端信号和远端信号的幅度值判断是否存在近端语音,在近端信号与远端信号幅度的比值ξ(g)大于特定值T时认为系统处于双端通话状态。即当:Existing echo cancellation systems usually directly use the DTD (Double Talk Detection) algorithm to determine whether the system is in a double-talk state, and stop updating the filter coefficients in the double-talk state to prevent the filter from Divergent due to interference from near-end speech. The commonly used DTD algorithm— Geigel algorithm judges whether there is near-end speech by comparing the amplitude values of the near-end signal and the far-end signal. Double talk status. That is when:

ξξ (( gg )) == || ythe y (( kk )) || mm aa xx {{ || xx (( kk -- 11 )) || ,, ...... ,, || xx (( kk -- NN )) || }} >> TT

时,认为存在近端语音,系统处于双端通话状态。其中|y(k)|是近端语音幅度值,max{|x(k-1)|,...,|x(k-N)|}是远端语音信号前N个采样点的最大幅度值。门限T根据回声路径衰减来确定,通常可以取0.5;N通常与滤波器长度相等。, the near-end voice is considered to exist, and the system is in the double-end conversation state. Where |y(k)| is the near-end voice amplitude value, max{|x(k-1)|,...,|x(k-N)|} is the maximum amplitude value of the first N sampling points of the far-end voice signal . The threshold T is determined according to the attenuation of the echo path, and can usually be 0.5; N is usually equal to the filter length.

但该方法存在如下缺点:But this method has the following disadvantages:

1、Geigel算法假设了近端语音远大于远端的回声信号,并不完全符合回声消除的实际情况,因此在某些情况下不是很准确。1. The Geigel algorithm assumes that the near-end voice is much larger than the far-end echo signal, which does not fully conform to the actual situation of echo cancellation, so it is not very accurate in some cases.

2、不进行远端VAD(Voice Activity Detection,语音活动性检测)就直接进行DTD可能会导致双端静音状态被误判为双端通话状态。2. Directly performing DTD without performing remote VAD (Voice Activity Detection, voice activity detection) may cause the double-end mute state to be misjudged as a double-end call state.

3、仅在双端通话状态下停止滤波器系数更新,在远端语音不存在的状态下持续进行滤波和系数更新可能导致滤波器发散,并从近端信号中错误地减去并不存在的远端语音。3. Stop filter coefficient update only in the double-talk state, continuous filtering and coefficient update in the state where the far-end voice does not exist may cause the filter to diverge and incorrectly subtract non-existent from the near-end signal Far end voice.

发明内容Contents of the invention

为了克服上述的三个问题,本发明提出一种结合VAD和DTD的语音状态检测方法,并根据检测结果设计新的滤波和更新策略以提高检测准确率,避免语音状态的误判,防止滤波器的错误更新和滤波。In order to overcome the above three problems, the present invention proposes a voice state detection method combining VAD and DTD, and designs a new filtering and update strategy according to the detection results to improve detection accuracy, avoid misjudgment of voice state, and prevent filter error updates and filtering.

本发明提供的一种适用于回声消除系统的语音状态检测方法,实现步骤如下:A kind of voice state detection method suitable for echo cancellation system provided by the present invention, the implementation steps are as follows:

第一步:利用噪声训练样本和语音训练样本构造支持向量机SVM分类器。The first step: use noise training samples and voice training samples to construct a support vector machine SVM classifier.

分别对噪声训练样本和语音训练样本进行特征值提取和高斯混合模型GMM训练,构造对应的高斯超向量。利用高斯超向量构造SVM分类器核函数,以及语音信号和噪声信号对应的SVM模型,使用构造好的核函数和SVM模型构造得到SVM分类器。The feature value extraction and Gaussian mixture model GMM training are performed on the noise training samples and speech training samples respectively, and the corresponding Gaussian supervectors are constructed. The Gaussian super vector is used to construct the SVM classifier kernel function, and the SVM model corresponding to the speech signal and the noise signal, and the SVM classifier is obtained by using the constructed kernel function and the SVM model.

第二步:待检测信号是分块后的远端和近端信号。使用构造好的基于高斯混合模型的SVM分类器对本分块远端信号进行VAD判决。Step 2: The signal to be detected is the far-end and near-end signals after block. Use the constructed SVM classifier based on the Gaussian mixture model to make VAD judgment on the remote signal of this block.

对本分块远端信号进行特征值提取和GMM训练,构造高斯超向量。将本分块远端信号对应的高斯超向量输入到构造好的SVM分类器中进行判决。如果分类为噪声,判断结果为无语音,则停止滤波器更新和滤波,直接输出近端语音信号。否则说明远端有语音,进行下一步的双端通话判决。Perform eigenvalue extraction and GMM training on the remote signal of this block to construct a Gaussian supervector. Input the Gaussian supervector corresponding to the far-end signal of this block into the constructed SVM classifier for judgment. If it is classified as noise and the judgment result is no speech, then stop filter updating and filtering, and directly output the near-end speech signal. Otherwise, it means that there is voice at the far end, and the next step of double-end call judgment is carried out.

第三步:判断系统是否属于双端通话状态。Step 3: Determine whether the system is in the double-end call state.

计算远端信号和误差信号的归一化互相关ξXECC,比较归一化互相关ξXECC和设置的门限TXECC,当ξXECC<TXECC时,近端有语音,系统处于双端通话状态,停止滤波器系数更新,对近端信号进行滤波。当ξXECC≥TXECC时,近端无语音,根据远端信号进行滤波器系数更新和滤波。Calculate the normalized cross-correlation ξ XECC of the far-end signal and the error signal, compare the normalized cross-correlation ξ XECC with the set threshold T XECC , when ξ XECC < T XECC , there is voice at the near end, and the system is in a double-ended conversation state , stop updating the filter coefficients, and filter the near-end signal. When ξ XECC ≥ T XECC , there is no speech at the near end, and the filter coefficients are updated and filtered according to the far end signal.

本发明的优点与积极效果在于:Advantage and positive effect of the present invention are:

(1)使用基于高斯混合模型的支持向量机算法对远端信号进行语音活动性检测,提高了语音活动性检测的准确性,克服了常用的基于能量的语音活动性检测方法存在的在低信噪比条件下检测不准确的问题。(1) Use the support vector machine algorithm based on the Gaussian mixture model to detect the voice activity of the far-end signal, which improves the accuracy of voice activity detection and overcomes the low signal quality of the commonly used energy-based voice activity detection method. The problem of inaccurate detection under the condition of noise ratio.

(2)在双端通话检测之前首先进行远端语音活动性检测,在远端有语音时再进行双端通话检测,能够避免将双端静音状态误判为双端通话状态。采用基于互相关的双端通话检测算法,提高了双端通话检测的准确性。(2) The remote voice activity detection is performed first before the double-end call detection, and then the double-end call detection is performed when there is voice at the far end, which can avoid misjudgment of the double-end mute state as the double-end call state. A double-talk detection algorithm based on cross-correlation is adopted to improve the accuracy of double-talk detection.

(3)根据系统所处的不同语音状态采取不同的滤波和更新策略。与传统回声消除系统仅在双端通话时停止滤波器系数更新相比,在远端无语音的状态下也停止滤波器系数更新和滤波,可以进一步防止在没有参考信号的情况下滤波器的错误更新和滤波。(3) Different filtering and updating strategies are adopted according to different speech states of the system. Compared with the traditional echo cancellation system, which only stops the update of filter coefficients when double-talking, the update and filtering of filter coefficients are also stopped in the state of no voice at the far end, which can further prevent the error of the filter in the absence of a reference signal update and filter.

附图说明Description of drawings

图1是本发明的适用于回声消除系统的语音状态检测方法的整体流程示意图;Fig. 1 is the overall schematic flow chart of the voice state detection method applicable to the echo cancellation system of the present invention;

图2是本发明实施例仿真所用的两段PCM流示意图;Fig. 2 is a schematic diagram of two sections of PCM streams used in the emulation of the embodiment of the present invention;

图3是本发明实施例仅使用基于能量的DTD检测进行回声消除的效果示意图;Fig. 3 is a schematic diagram of the effect of echo cancellation using only energy-based DTD detection in an embodiment of the present invention;

图4是本发明实施例采用本发明方法进行回声消除的效果示意图;Fig. 4 is a schematic diagram of the effect of echo cancellation using the method of the present invention in an embodiment of the present invention;

图5是本发明实施例使用改进前的回声消除库的Sipdroid回声消除效果示意图;Fig. 5 is a schematic diagram of the Sipdroid echo cancellation effect using the echo cancellation library before the embodiment of the present invention;

图6是本发明实施例使用改进后的回声消除库的Sipdroid回声消除效果示意图;Fig. 6 is a schematic diagram of the Sipdroid echo cancellation effect using the improved echo cancellation library according to the embodiment of the present invention;

具体实施方式detailed description

下面将结合附图和实施例对本发明作进一步的详细说明。The present invention will be further described in detail with reference to the accompanying drawings and embodiments.

本发明方法在DTD之前首先对远端信号进行VAD,在VAD检测出远端信号不存在时直接停止滤波器系数更新和滤波,以防止滤波器发散及错误地滤波。在VAD检测出存在远端语音时再进行DTD,并在双端通话时停止滤波器系数更新。其中使用的VAD算法是基于GMM(Gaussian Mixture Model,高斯混合模型)的SVM(Support Vector Machine,支持向量机)算法,该算法利用GMM构造特征超向量,将GMM超向量用于SVM的特征值输入及核函数构造,准确率高于常用的基于能量或相关性的VAD算法。使用的DTD算法是基于远端信号与误差信号互相关的DTD,准确率也高于常用的基于能量的Geigel算法。通过将远端VAD和DTD结合起来,可以提高语音状态检测的准确性。通过在不同语音状态下采取不同的滤波策略,可以防止滤波器的发散及错误的滤波,大大改善回声消除的效果。The method of the invention performs VAD on the far-end signal before the DTD, and directly stops updating and filtering the filter coefficients when the VAD detects that the far-end signal does not exist, so as to prevent the filter from diverging and erroneously filtering. DTD is performed when the VAD detects that there is a far-end voice, and the update of the filter coefficients is stopped when the double-end talk occurs. The VAD algorithm used is the SVM (Support Vector Machine, Support Vector Machine) algorithm based on GMM (Gaussian Mixture Model, Gaussian Mixture Model). And kernel function construction, the accuracy rate is higher than the commonly used VAD algorithm based on energy or correlation. The DTD algorithm used is based on the DTD of the cross-correlation between the remote signal and the error signal, and the accuracy rate is also higher than the commonly used Geigel algorithm based on energy. By combining the far-end VAD and DTD, the accuracy of voice state detection can be improved. By adopting different filtering strategies in different speech states, the divergence of the filter and wrong filtering can be prevented, and the effect of echo cancellation can be greatly improved.

结合图1说明本发明的适用于回声消除系统的语音状态检测方法的各步骤。The steps of the speech state detection method applicable to the echo cancellation system of the present invention are described with reference to FIG. 1 .

步骤一,利用噪声训练样本和语音训练样本构造SVM分类器,包括步骤S101~S103。Step 1, using noise training samples and speech training samples to construct an SVM classifier, including steps S101-S103.

步骤S101:对噪声信号训练样本和语音信号训练样本进行特征值提取。这里采用的特征值是Mel倒谱系数(MFCC)。MFCC具体提取过程:对信号进行预加重、分块及加窗处理,将加窗后的分块经过快速傅里叶变换(FFT)求出每一分块的频谱参数。将每一分块的频谱参数通过一组由K个三角形带通滤波器所组成的Mel刻度滤波器,K个Mel带通滤波器编号从0到K-1,将每个频带的输出取对数,求出每一个输出的对数能量,对每个分块语音信号获得对应的K个对数频谱。K为正整数,一般取值为20~30。最后将得到的K个对数频谱进行余弦变换求出Mel倒谱系数。将对数频谱经过离散余弦变换变换到倒谱频域得到Mel倒谱系数的公式如下:Step S101: Perform feature value extraction on noise signal training samples and speech signal training samples. The eigenvalues used here are Mel cepstral coefficients (MFCC). The specific extraction process of MFCC: pre-emphasis, block and window processing are performed on the signal, and the windowed blocks are subjected to fast Fourier transform (FFT) to obtain the spectral parameters of each block. Pass the spectral parameters of each block through a set of Mel scale filters composed of K triangular bandpass filters, and the K Mel bandpass filters are numbered from 0 to K-1, and the output of each frequency band is paired Number, calculate the logarithmic energy of each output, and obtain the corresponding K logarithmic spectrum for each block speech signal. K is a positive integer, and generally takes a value of 20-30. Finally, the obtained K logarithmic spectrums are cosine transformed to obtain the Mel cepstrum coefficients. The formula for obtaining the Mel cepstrum coefficient by transforming the logarithmic spectrum into the cepstrum frequency domain through discrete cosine transform is as follows:

mm ii (( ll )) == &Sigma;&Sigma; kk == 00 KK -- 11 SS ii (( kk )) cc oo sthe s (( &pi;&pi; ll (( kk ++ 11 // 22 )) KK )) ,, 00 &le;&le; kk << KK ,, 00 &le;&le; ll << LL -- -- -- (( 11 ))

其中,Si(k)为第i个分块信号通过编号k的带通滤波器后对应得到的对数频谱,K为Mel带通滤波器的个数,mi(l)为第i个分块语音信号的MFCC的第l阶参数,L为提取的MFCC的总阶数,公式(1)中i表示对应第i个分块,i为正整数。Among them, S i (k) is the logarithmic spectrum corresponding to the i-th block signal after passing through the band-pass filter numbered k, K is the number of Mel band-pass filters, m i (l) is the i-th The first order parameter of the MFCC of the block speech signal, L is the total order number of the extracted MFCC, i in the formula (1) represents the corresponding i block, and i is a positive integer.

步骤S102:生成噪声信号训练样本和语音信号训练样本对应的高斯超向量。Step S102: Generate Gaussian supervectors corresponding to the training samples of the noise signal and the training samples of the speech signal.

分别利用噪声信号训练样本和语音信号训练样本的MFCC参数建立噪声信号和语音信号对应的高斯混合模型。GMM本质上是一种多维概率密度函数,N阶高斯混合模型g(x)是由N个单高斯分布的线性组合来描述帧特征在特征空间的分布,对某一分块,g(x)表示如下:The Gaussian mixture models corresponding to the noise signal and the speech signal are established by using the MFCC parameters of the noise signal training samples and the speech signal training samples respectively. GMM is essentially a multidimensional probability density function. The N-order Gaussian mixture model g(x) is a linear combination of N single Gaussian distributions to describe the distribution of frame features in the feature space. For a block, g(x) Expressed as follows:

gg (( xx )) == &Sigma;&Sigma; ii == 11 NN ww ii pp ii (( xx )) -- -- -- (( 22 ))

其中,x是训练样本本分块的MFCC参数构成的L维特征向量,N是高斯混合模型的阶数,pi(x)为高斯混合模型的第i个高斯分量,wi为高斯混合模型分量pi(x)的加权因子。Among them, x is the L-dimensional feature vector formed by the MFCC parameters of the training sample block, N is the order of the Gaussian mixture model, p i (x) is the i-th Gaussian component of the Gaussian mixture model, and w i is the Gaussian mixture model Weighting factor for component p i (x).

pi(x)表示如下:p i (x) is expressed as follows:

pp ii (( xx )) == 11 (( 22 &pi;&pi; )) LL 22 || &Sigma;&Sigma; ii || 11 22 expexp {{ -- (( xx -- &mu;&mu; ii )) TT &Sigma;&Sigma; ii -- 11 (( xx -- &mu;&mu; ii )) 22 }} -- -- -- (( 33 ))

其中,Σi是第i个高斯分量的协方差矩阵,μi是第i个高斯分量的均值向量,因此,GMM模型的参数集λ可表示如下:Among them, Σ i is the covariance matrix of the i-th Gaussian component, μ i is the mean vector of the i-th Gaussian component, therefore, the parameter set λ of the GMM model can be expressed as follows:

λ=(wiii),i=1,2,...,N (4)λ=(w i , μ ii ), i=1,2,...,N (4)

相应的高斯混合模型g(x)可以表示为:The corresponding Gaussian mixture model g(x) can be expressed as:

gg (( xx )) == &Sigma;&Sigma; ii == 11 NN ww ii NN (( xx ;; &mu;&mu; ii ,, &Sigma;&Sigma; ii )) -- -- -- (( 55 ))

其中,N(.)表示高斯概率密度函数。Among them, N(.) represents the Gaussian probability density function.

建立GMM模型的过程实际上就是通过训练估计GMM模型的参数的过程。可以采用最大期望EM算法进行模型参数更新。该算法有两个主要步骤:期望E步和最大化M步。E步利用当前的参数集计算完整数据的似然度函数的期望值,M步通过最大化期望函数获取新的参数。E步和M步一直迭代直至收敛。最后分别可以得到语音和噪声的GMM模型,设为g(s)和g(n),s表示语音信号,n表示噪声信号。The process of establishing the GMM model is actually the process of estimating the parameters of the GMM model through training. The maximum expectation EM algorithm can be used to update the model parameters. The algorithm has two main steps: the expectation E step and the maximization M step. Step E uses the current parameter set to calculate the expected value of the likelihood function of the complete data, and step M obtains new parameters by maximizing the expected function. E-step and M-step are iterated until convergence. Finally, the GMM models of speech and noise can be obtained respectively, set as g(s) and g(n), s represents the speech signal, and n represents the noise signal.

利用建立好的高斯混合模型构造高斯超向量。高斯超向量是高斯混合模型的参数构造而成的,可以将语音和噪声的GMM高斯超向量ms和mn分别表示如下:Use the established Gaussian mixture model to construct a Gaussian supervector. The Gaussian supervector is constructed from the parameters of the Gaussian mixture model, and the GMM Gaussian supervectors m s and m n of speech and noise can be expressed as follows:

mm sthe s == (( (( ww 11 &Sigma;&Sigma; 11 -- 11 // 22 &mu;&mu; 11 sthe s )) TT ,, (( ww 22 &Sigma;&Sigma; 22 -- 11 // 22 &mu;&mu; 22 sthe s )) TT ,, ...... ,, (( ww NN &Sigma;&Sigma; NN -- 11 // 22 &mu;&mu; NN sthe s )) TT )) -- -- -- (( 66 ))

mm nno == (( (( ww 11 &Sigma;&Sigma; 11 -- 11 // 22 &mu;&mu; 11 nno )) TT ,, (( ww 22 &Sigma;&Sigma; 22 -- 11 // 22 &mu;&mu; 22 nno )) TT ,, ...... ,, (( ww NN &Sigma;&Sigma; NN -- 11 // 22 &mu;&mu; NN nno )) TT )) -- -- -- (( 77 ))

为g(s)中各高斯分量的均值向量,为g(n)中各高斯分量的均值向量。 is the mean vector of each Gaussian component in g(s), is the mean vector of each Gaussian component in g(n).

步骤S103:利用构造好的高斯超向量构造SVM分类器。分别利用噪声信号和语音信号对应的高斯超向量mn和ms建立噪声信号和语音信号对应的SVM模型。利用噪声信号和语音信号对应的高斯超向量mn和ms构造K-L核函数。该核函数使用两个GMM概率分布之间的K-L散度构造而成。Step S103: Construct an SVM classifier using the constructed Gaussian supervector. Using the Gaussian supervectors m n and m s corresponding to the noise signal and the speech signal respectively, the SVM models corresponding to the noise signal and the speech signal are established. The KL kernel function is constructed by using the Gaussian supervectors m n and m s corresponding to the noise signal and the speech signal. This kernel function is constructed using the KL divergence between two GMM probability distributions.

由语音和噪声的GMM超向量mn和ms构造的核函数K(n,s)具体表达式如下:The specific expression of the kernel function K(n,s) constructed from the GMM supervectors m n and m s of speech and noise is as follows:

KK (( nno ,, sthe s )) == &Sigma;&Sigma; ii == 11 NN (( ww ii &Sigma;&Sigma; -- 11 22 &mu;&mu; ii nno )) TT (( ww ii &Sigma;&Sigma; -- 11 22 &mu;&mu; ii sthe s )) -- -- -- (( 88 ))

确定核函数、语音信号的SVM和噪声信号的SVM后可以得到SVM分类器。After determining the kernel function, the SVM of the speech signal and the SVM of the noise signal, the SVM classifier can be obtained.

步骤二,使用构造好的基于GMM的SVM分类器对本分块远端信号进行VAD判决。输入SVM分类器的待检测信号是分块后的远端和近端信号。需要首先进行傅里叶变换转换到频域,然后根据信号频谱计算信号分块的特征值,即MFCC、归一化互相关等。具体可分为步骤S201~S203。Step 2: Use the constructed GMM-based SVM classifier to make a VAD decision on the remote signal of this block. The signal to be detected input to the SVM classifier is the divided far-end and near-end signals. It is necessary to perform Fourier transform conversion to the frequency domain first, and then calculate the eigenvalues of the signal block according to the signal spectrum, that is, MFCC, normalized cross-correlation, etc. Specifically, it can be divided into steps S201-S203.

步骤S201:本分块远端信号MFCC参数提取。MFCC参数的具体提取过程同步骤101,通过公式(1)最终得到本分块远端信号对应的MFCC参数。Step S201: Extract the MFCC parameters of the remote signal of the block. The specific extraction process of the MFCC parameters is the same as step 101, and the MFCC parameters corresponding to the remote signal of this block are finally obtained through the formula (1).

步骤S202:本分块远端信号对应的高斯超向量生成。利用本分块远端信号MFCC参数建立高斯混合模型,并利用建立好的高斯混合模型构造本分块远端信号对应的高斯超向量。高斯超向量生成方法同步骤S102,如公式(6)和(7)所示。Step S202: Generate a Gaussian supervector corresponding to the remote signal of this block. The MFCC parameters of the remote signal of this block are used to establish a Gaussian mixture model, and the Gaussian supervector corresponding to the remote signal of this block is constructed by using the established Gaussian mixture model. The Gaussian supervector generation method is the same as step S102, as shown in formulas (6) and (7).

步骤S203:将本分块远端信号对应的高斯超向量输入到构造好的SVM分类器中,使用基于GMM的SVM算法进行语音/噪声分类。得出远端语音的VAD判决结果。如果分类为噪声,判断结果为无语音,则停止滤波器更新和滤波,直接输出近端语音信号。如果分类为语音,说明远端有语音,进行下一步的双端通话判决。Step S203: Input the Gaussian supervector corresponding to the far-end signal of this block into the constructed SVM classifier, and use the GMM-based SVM algorithm to perform speech/noise classification. Obtain the VAD judgment result of the far-end voice. If it is classified as noise and the judgment result is no speech, then stop filter updating and filtering, and directly output the near-end speech signal. If it is classified as voice, it means that there is voice at the far end, and the next step of double-end call judgment is performed.

步骤三,判断系统是否属于双端通话状态。Step 3, judging whether the system is in a double-end conversation state.

步骤S301:计算误差信号。Step S301: Calculate the error signal.

自适应滤波器系数模拟了回声路径,因此本分块远端信号与自适应滤波器系数进行卷积可以得到估计回声信号xT(n)w(n),误差信号e(n)即为本分块的近端信号d(n)与估计回声信号xT(n)w(n)之差。The adaptive filter coefficient simulates the echo path, so the convolution of the remote signal of this block with the adaptive filter coefficient can obtain the estimated echo signal x T (n)w(n), and the error signal e(n) is the original The difference between the block's near-end signal d(n) and the estimated echo signal xT (n)w(n).

自适应滤波器系数是根据自适应算法,利用误差信号和远端信号不断更新的。一种常用的更新算法——LMS算法的更新公式如下:The adaptive filter coefficients are continuously updated using the error signal and the remote signal according to an adaptive algorithm. A commonly used update algorithm - the update formula of the LMS algorithm is as follows:

w(n+1)=w(n)+2μe(n)x(n) (9)w(n+1)=w(n)+2μe(n)x(n) (9)

其中,μ是步长,w(n)是滤波器权重向量,e(n)是误差信号,x(n)是远端信号。n代表第n个时刻(采样点)。where μ is the step size, w(n) is the filter weight vector, e(n) is the error signal, and x(n) is the far-end signal. n represents the nth moment (sampling point).

步骤S302:计算远端信号和误差信号的归一化互相关。由于时域的互相关运算可以转换为频域的点乘,即两个信号频谱值逐点相乘,因此可以直接利用远端信号频谱X(k)和误差信号频谱E(k)求得该归一化互相关的值,计算复杂度较低。归一化互相关在频域的计算方法:Step S302: Calculate the normalized cross-correlation between the remote signal and the error signal. Since the cross-correlation operation in the time domain can be converted into a dot product in the frequency domain, that is, two signal spectrum values are multiplied point by point, so the remote signal spectrum X(k) and the error signal spectrum E(k) can be directly used to obtain the The value of the normalized cross-correlation has low computational complexity. Calculation method of normalized cross-correlation in frequency domain:

&xi;&xi; Xx EE. CC CC == mm aa xx kk EE. &lsqb;&lsqb; Xx (( kk )) EE. (( kk )) &rsqb;&rsqb; EE. &lsqb;&lsqb; Xx (( kk )) 22 &rsqb;&rsqb; EE. &lsqb;&lsqb; EE. (( kk )) 22 &rsqb;&rsqb; -- -- -- (( 1010 ))

ξXECC表示远端信号和误差信号的归一化互相关,k表示频点。ξ XECC represents the normalized cross-correlation of the remote signal and the error signal, and k represents the frequency point.

步骤S303:DTD判决。比较远端信号和误差信号的归一化互相关ξXECC和归一化互相关门限。当近端无语音时,远端信号和误差信号的归一化互相关ξXECC应该等于1,而近端有语音时,归一化互相关ξXECC小于1。因此,可以设置一个略小于1的常数TXECC作为门限值,TXECC通常取值在0.9到1之间,且该门限值根据检测结果实时更新。更新的算法根据实际情况选取。一个好的门限值应该使误报概率和漏报概率都相对较小。例如:可以首先任意选择一个略小于1的常数,然后设置近端语音为0,计算误报概率和漏报概率,在一定范围内调整TXECC,直到误报概率和漏报概率都较小。Step S303: DTD decision. Compare the normalized cross-correlation ξ XECC and the normalized cross-correlation threshold of the far-end signal and the error signal. When there is no speech at the near end, the normalized cross-correlation ξXECC of the far-end signal and the error signal should be equal to 1, and when there is speech at the near-end, the normalized cross-correlation ξXECC is less than 1. Therefore, a constant T XECC slightly smaller than 1 can be set as a threshold value, and T XECC usually has a value between 0.9 and 1, and the threshold value is updated in real time according to the detection result. The updated algorithm is selected according to the actual situation. A good threshold should make both the probability of false positive and the probability of false negative relatively small. For example: first arbitrarily select a constant slightly smaller than 1, then set the near-end voice to 0, calculate the false alarm probability and false negative probability, and adjust T XECC within a certain range until the false positive probability and false negative probability are both small.

当归一化互相关小于门限时,即:When the normalized cross-correlation is less than the threshold, that is:

ξXECC<TXECC (11)系统处于双端通话状态,停止滤波器系数更新,直接使用原来的滤波器系数对近端信号进行滤波;否则,不存在近端语音,只存在远端语音,这时既进行滤波器系数更新,也进行滤波。ξ XECC < T XECC (11) When the system is in the state of double-ended conversation, stop updating the filter coefficients, and directly use the original filter coefficients to filter the near-end signal; otherwise, there is no near-end voice, only far-end voice, which Both update the filter coefficients and perform filtering.

将本发明提出的语音状态检测方法应用于实际的回声消除系统中,包括两个终端,使用VoIP软件Sipdroid对实际通话效果进行验证。The voice state detection method proposed by the present invention is applied to an actual echo cancellation system, including two terminals, and the actual call effect is verified by using the VoIP software Sipdroid.

首先使用matlab对本发明提出的结合VAD和DTD的语音状态检测方法进行仿真。仿真所用的语音信号包括1段30秒的远端语音PCM(Pulse Code Modulation,脉冲编码调制)流以及1段与之对应的近端语音PCM流,采样频率均为8000Hz。在回声消除系统中,滤波器的长度设为128,自适应滤波算法采用BFDAF算法(即频域的NLMS算法),而语音状态检测算法采用本发明提出的语音状态检测方法。First, use matlab to simulate the speech state detection method combined with VAD and DTD proposed by the present invention. The speech signals used in the simulation include a 30-second far-end speech PCM (Pulse Code Modulation, pulse code modulation) stream and a corresponding near-end speech PCM stream, and the sampling frequency is 8000 Hz. In the echo cancellation system, the length of the filter is set to 128, the adaptive filtering algorithm adopts the BFDAF algorithm (that is, the NLMS algorithm in the frequency domain), and the speech state detection algorithm adopts the speech state detection method proposed by the present invention.

如图2所示,为仿真所用的两段PCM流。从上至下依次为远端信号波形、近端信号波形。横坐标为时间,单位s;纵坐标为幅度值。采用原有的语音状态检测方法,即仅使用基于能量的DTD检测,回声消除效果如图3所示。从图中可以看出,在VAD未改进的条件下,前半段的回声消除效果较好,但还是存在少量残余回声;后半段的效果则不是很理想,原声被消除得比较多,回声消除后的信号产生了较大失真。As shown in Figure 2, it is two sections of PCM streams used for simulation. From top to bottom, there are far-end signal waveforms and near-end signal waveforms. The abscissa is time, the unit is s; the ordinate is the amplitude value. Using the original voice state detection method, that is, only using energy-based DTD detection, the echo cancellation effect is shown in Figure 3. It can be seen from the figure that under the condition of unimproved VAD, the echo cancellation effect in the first half is better, but there is still a small amount of residual echo; After the signal has a large distortion.

采用本发明提出的语音状态检测方法,回声消除的效果如图4所示。对比改进之前和改进之后分别进行回声消除后得到的两段PCM流,可以看出回声消除效果在改进语音状态检测方法后有明显的改善。残余回声消除更加彻底,近端语音也几乎没有出现失真现象。Using the speech state detection method proposed by the present invention, the effect of echo cancellation is shown in FIG. 4 . Comparing the two PCM streams obtained after echo cancellation before and after improvement, it can be seen that the effect of echo cancellation has been significantly improved after improving the speech state detection method. Residual echo cancellation is more thorough, and near-end speech is almost free of distortion.

为了进一步验证本发明提出的语音状态检测方法在实际回声消除系统中的效果,对该方法编写相应的C程序,并利用语音通信软件Sipdroid对该方法进行测试。In order to further verify the effect of the voice state detection method proposed by the present invention in the actual echo cancellation system, the corresponding C program is written for the method, and the method is tested by using the voice communication software Sipdroid.

根据本发明的语音状态检测方法的步骤修改回声消除库WebRTC中执行VAD和DTD的部分,然后在Sipdroid中调用该回声消除库。在不同环境下使用Sipdroid进行实际双端通话并进行录音,保存回声消除前后的语音PCM流,以便进行回声消除效果分析。According to the steps of the speech state detection method of the present invention, modify the part of implementing VAD and DTD in the echo cancellation library WebRTC, and then call the echo cancellation library in Sipdroid. Use Sipdroid to make actual double-ended calls and record them in different environments, and save the voice PCM stream before and after echo cancellation for the purpose of echo cancellation effect analysis.

为了在取出语音流后进行观察分析时比较方便和清晰,每次测试中,两位通话者依次从1到10进行报数。在不同环境下,分别对改进前和改进后的Sipdroid版本进行多次通话测试以便进行对比。In order to make observation and analysis more convenient and clear after taking out the voice stream, in each test, the two callers reported the number from 1 to 10 in turn. In different environments, the Sipdroid version before and after improvement were tested several times for comparison.

首先对使用改进前的回声消除库的Sipdroid回声消除效果进行多次通话测试,并取出远端、近端和回声消除后的PCM流。测试结果如图5所示,图中仅截取报数部分的PCM流。其中,第一段PCM流是远端信号,第二段PCM流是近端信号,第三段PCM流是回声消除后的近端信号。可见,回声消除效果不是很理想,报数部分有少许残余回声,虚线框圈出部分。其他测试结果大部分与此类似。Firstly, the Sipdroid echo cancellation effect using the pre-improved echo cancellation library is tested for multiple calls, and the far-end, near-end and echo-cancelled PCM streams are taken out. The test results are shown in Figure 5, in which only the PCM stream of the reporting part is intercepted. Wherein, the first segment of the PCM stream is a far-end signal, the second segment of the PCM stream is a near-end signal, and the third segment of the PCM stream is a near-end signal after echo cancellation. It can be seen that the echo cancellation effect is not very ideal, there is a little residual echo in the reporting part, and the dotted line circles the part. Most of the other test results were similar.

然后,对使用改进后的回声消除库的Sipdroid的回声消除效果也使用同样方法进行多次通话测试,并取出远端、近端和回声消除后的PCM流。图6为比较有代表性的一次测试结果。与图5类似,图中第一段PCM流是远端信号,第二段PCM流是近端信号,第三段PCM流是回声消除后的近端信号。可见,使用本发明改进后的语音检测方法后,回声消除效果比较理想,报数部分的残余回声消除比较彻底,如虚线框圈出部分,同时原声的保留也没有受到影响。多次测试发现,在不同环境下,回声消除的效果会受到一定影响,稳定性还有待进一步提高。但在大多数情况下,使用本发明的语音状态检测方法后的回声消除效果都较改进前的回声消除效果有明显改善。Then, use the same method to test the echo cancellation effect of Sipdroid using the improved echo cancellation library, and take out the far-end, near-end and echo-cancelled PCM streams. Figure 6 is a representative test result. Similar to FIG. 5 , the first segment of the PCM stream in the figure is the far-end signal, the second segment of the PCM stream is the near-end signal, and the third segment of the PCM stream is the near-end signal after echo cancellation. It can be seen that after using the improved speech detection method of the present invention, the echo cancellation effect is relatively ideal, and the residual echo cancellation of the reporting part is relatively thorough, such as the part circled by the dotted line, and the original sound is not affected at the same time. Multiple tests have found that in different environments, the effect of echo cancellation will be affected to a certain extent, and the stability needs to be further improved. However, in most cases, the echo cancellation effect after using the speech state detection method of the present invention is significantly improved compared with the echo cancellation effect before the improvement.

Claims (5)

1.一种适用于回声消除系统的语音状态检测方法,其特征在于,实现步骤如下: 1. a kind of speech state detection method that is applicable to echo cancellation system, it is characterized in that, realization steps are as follows: 第一步:利用噪声训练样本和语音训练样本构造支持向量机SVM分类器; The first step: constructing a support vector machine SVM classifier using noise training samples and voice training samples; 分别对噪声训练样本和语音训练样本进行特征值提取和高斯混合模型GMM训练,构造对应的高斯超向量,然后利用高斯超向量构造SVM分类器的核函数,以及语音信号和噪声信号对应的SVM模型;使用构造好的核函数和SVM模型构造得到SVM分类器; Perform feature value extraction and Gaussian mixture model GMM training on the noise training samples and speech training samples respectively, construct the corresponding Gaussian supervector, and then use the Gaussian supervector to construct the kernel function of the SVM classifier, and the SVM model corresponding to the speech signal and the noise signal ; Use the constructed kernel function and the SVM model to construct the SVM classifier; 第二步:待检测信号是分块后的远端和近端信号,使用构造好的SVM分类器对本分块远端信号进行VAD判决;VAD表示语音活动性检测; The second step: the signal to be detected is the far-end and near-end signals after the block, and the constructed SVM classifier is used to perform VAD judgment on the block far-end signal; VAD means voice activity detection; 对本分块远端信号进行特征值提取和GMM训练,构造高斯超向量,然后本分块远端信号对应的高斯超向量输入到构造好的SVM分类器中进行判决;如果判断结果为噪声,表示无语音,则停止滤波器更新和滤波,直接输出近端语音信号,否则说明远端有语音,进行下一步的双端通话判决; Perform eigenvalue extraction and GMM training on the remote signal of this block to construct a Gaussian supervector, and then input the Gaussian supervector corresponding to the remote signal of this block into the constructed SVM classifier for judgment; if the judgment result is noise, it means If there is no voice, stop filter update and filtering, and directly output the near-end voice signal, otherwise it means that there is voice at the far end, and proceed to the next double-ended call judgment; 第三步:判断系统是否属于双端通话状态; Step 3: Determine whether the system is in a double-ended call state; 计算远端信号和误差信号的归一化互相关ξXECC;比较归一化互相关ξXECC和设置的门限TXECC,当ξXECC<TXECC时,系统处于双端通话状态,停止滤波器系数更新,对近端信号进行滤波;否则,近端无语音,根据远端信号进行滤波器系数更新和滤波。 Calculate the normalized cross-correlation ξ XECC of the remote signal and the error signal; compare the normalized cross-correlation ξ XECC with the set threshold T XECC , when ξ XECC < T XECC , the system is in double-talk state, stop the filter coefficient Update, filter the near-end signal; otherwise, if there is no speech at the near-end, update and filter the filter coefficients according to the far-end signal. 2.根据权利要求1所述的一种适用于回声消除系统的语音状态检测方法,其特征在于,所述的第一步构造SVM分类器,包括如下步骤: 2. a kind of speech state detection method that is applicable to echo cancellation system according to claim 1, is characterized in that, described first step constructs SVM classifier, comprises the steps: 步骤S101:对噪声信号训练样本和语音信号训练样本进行特征值提取;所采用的特征值是Mel倒谱系数MFCC; Step S101: Extracting eigenvalues from the noise signal training samples and speech signal training samples; the adopted eigenvalues are Mel cepstral coefficients MFCC; MFCC的提取过程是:对信号进行预加重、分块及加窗处理,将加窗后的分块经过快速傅里叶变换FFT求出每一分块的频谱参数;将每一分块的频谱参数通过一组由K个三角形带通滤波器所组成的Mel刻度滤波器,并对每个频带的输出取对数,获得对数频谱;设K个带通滤波器的编号从0到K-1,则第i个分块通过编号k的带通滤波器后对应得到的对数频谱为Si(k),第i个分块的MFCC的第l阶参数mi(l)为: The extraction process of MFCC is: pre-emphasis, block and window processing are performed on the signal, the block after windowing is subjected to fast Fourier transform FFT to obtain the spectral parameters of each block; the spectrum of each block is The parameters pass through a set of Mel scale filters composed of K triangular band-pass filters, and take the logarithm of the output of each frequency band to obtain the logarithmic spectrum; set the numbers of the K band-pass filters from 0 to K- 1, then the logarithmic spectrum corresponding to the i-th block passing through the band-pass filter numbered k is S i (k), and the l-th order parameter m i (l) of the MFCC of the i-th block is: 其中,L为提取的MFCC的总阶数; Among them, L is the total order of the extracted MFCC; 步骤S102:生成噪声信号训练样本和语音信号训练样本的高斯超向量; Step S102: generating Gaussian supervectors of noise signal training samples and speech signal training samples; 分别利用噪声信号训练样本和语音信号训练样本的MFCC参数建立噪声信号和语音信号对应的高斯混合模型; Using the MFCC parameters of the noise signal training sample and the speech signal training sample respectively to establish a Gaussian mixture model corresponding to the noise signal and the speech signal; 对某一分块,N阶高斯混合模型g(x)表示为: For a block, the N-order Gaussian mixture model g(x) is expressed as: 其中,x是训练样本本分块的MFCC参数构成的L维特征向量,pi(x)为高斯混合模型的第i个高斯分量,wi为第i个高斯分量的加权因子;Σi是第i个高斯分量的协方差矩阵,μi是第i个高斯分量的均值向量; Among them, x is the L-dimensional feature vector formed by the MFCC parameters of the training sample block, p i (x) is the i-th Gaussian component of the Gaussian mixture model, and w i is the weighting factor of the i-th Gaussian component; Σ i is The covariance matrix of the i-th Gaussian component, μ i is the mean vector of the i-th Gaussian component; 高斯混合模型g(x)进一步表示为:N(.)表示高斯概率密度函数; The Gaussian mixture model g(x) is further expressed as: N(.) represents the Gaussian probability density function; 采用最大期望算法进行高斯混合模型参数的更新,设最后得到语音信号训练样本的高斯混合模型为g(s),其中各高斯分量的均值向量为s表示语音信号;最后得到的噪声信号训练样本的高斯混合模型为g(n),其中各高斯分量的均值向量为n表示噪声信号;利用建立好的高斯混合模型构造语音信号训练样本和噪声信号训练样本的高斯超向量ms和mn分别为: The maximum expectation algorithm is used to update the parameters of the Gaussian mixture model, and the Gaussian mixture model of the speech signal training sample is finally obtained as g(s), where the mean vector of each Gaussian component is s represents the speech signal; the Gaussian mixture model of the finally obtained noise signal training sample is g(n), where the mean vector of each Gaussian component is n represents the noise signal; use the established Gaussian mixture model to construct the Gaussian supervectors m s and m n of the speech signal training samples and the noise signal training samples respectively: 步骤S103:利用构造好的高斯超向量构造SVM分类器; Step S103: using the constructed Gaussian super vector to construct an SVM classifier; 分别利用高斯超向量mn和ms建立噪声信号和语音信号对应的SVM模型; Use the Gaussian super vector m n and m s to establish the SVM model corresponding to the noise signal and the speech signal; 利用高斯超向量mn和ms构造核函数K(n,s)如下: Using the Gaussian supervectors m n and m s to construct the kernel function K(n,s) is as follows: 确定核函数、语音信号的SVM模型和噪声信号的SVM,得到SVM分类器。 Determine the kernel function, the SVM model of the speech signal and the SVM of the noise signal, and obtain the SVM classifier. 3.根据权利要求1或2所述的一种适用于回声消除系统的语音状态检测方法,其特征在于,所述的第三步中,计算误差信号的方法是:将本分块远端信号与自适应滤波器系数进行卷积得到估计回声信号,误差信号为本分块近端信号与估计回声信号之差。 3. according to claim 1 or 2, a kind of speech state detection method that is applicable to echo cancellation system, it is characterized in that, in the described 3rd step, the method for calculating error signal is: the far-end signal of this division block The estimated echo signal is obtained by convolution with the coefficient of the adaptive filter, and the error signal is the difference between the local block's near-end signal and the estimated echo signal. 4.根据权利要求1或2所述的一种适用于回声消除系统的语音状态检测方法,其特征在于,所述的第三步中,根据下面公式计算远端信号和误差信号的归一化互相关ξXECC4. a kind of speech state detection method that is applicable to echo cancellation system according to claim 1 or 2, it is characterized in that, in the described 3rd step, calculate the normalization of far-end signal and error signal according to following formula Cross-correlation ξXECC : 其中,k表示频点,X(k)为远端信号频谱,E(k)为误差信号频谱。 Among them, k represents the frequency point, X(k) is the spectrum of the remote signal, and E(k) is the spectrum of the error signal. 5.根据权利要求1或2所述的一种适用于回声消除系统的语音状态检测方法,其特征在于,所述的第三步中,设置的门限TXECC为0.9到1之间的值,并根据判决结果进行实时更新。 5. a kind of speech state detection method that is applicable to echo cancellation system according to claim 1 or 2, is characterized in that, in the described 3rd step, the threshold T XECC of setting is the value between 0.9 to 1, And it will be updated in real time according to the judgment result.
CN201610519040.6A 2016-07-04 2016-07-04 A Speech State Detection Method Applicable to Echo Cancellation System Active CN105957520B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610519040.6A CN105957520B (en) 2016-07-04 2016-07-04 A Speech State Detection Method Applicable to Echo Cancellation System

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610519040.6A CN105957520B (en) 2016-07-04 2016-07-04 A Speech State Detection Method Applicable to Echo Cancellation System

Publications (2)

Publication Number Publication Date
CN105957520A true CN105957520A (en) 2016-09-21
CN105957520B CN105957520B (en) 2019-10-11

Family

ID=56903377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610519040.6A Active CN105957520B (en) 2016-07-04 2016-07-04 A Speech State Detection Method Applicable to Echo Cancellation System

Country Status (1)

Country Link
CN (1) CN105957520B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448661A (en) * 2016-09-23 2017-02-22 华南理工大学 Audio type detection method based on pure voice and background noise two-level modeling
CN107888792A (en) * 2017-10-19 2018-04-06 浙江大华技术股份有限公司 A kind of echo cancel method, apparatus and system
CN108429994A (en) * 2017-02-15 2018-08-21 阿里巴巴集团控股有限公司 Audio identification, echo cancel method, device and equipment
CN109068012A (en) * 2018-07-06 2018-12-21 南京时保联信息科技有限公司 A kind of double talk detection method for audio conference system
CN109215672A (en) * 2017-07-05 2019-01-15 上海谦问万答吧云计算科技有限公司 A kind of processing method of acoustic information, device and equipment
CN109309764A (en) * 2017-07-28 2019-02-05 北京搜狗科技发展有限公司 Audio data processing method, device, electronic equipment and storage medium
CN109348072A (en) * 2018-08-30 2019-02-15 湖北工业大学 A double-ended talk detection method applied to echo cancellation system
CN109379501A (en) * 2018-12-17 2019-02-22 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109448748A (en) * 2018-12-17 2019-03-08 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109473123A (en) * 2018-12-05 2019-03-15 百度在线网络技术(北京)有限公司 Voice activity detection method and device
CN109493878A (en) * 2018-12-17 2019-03-19 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109547655A (en) * 2018-12-30 2019-03-29 广东大仓机器人科技有限公司 Echo cancellation processing method for network voice call
CN106448661B (en) * 2016-09-23 2019-07-16 华南理工大学 Audio types detection method based on clean speech and the modeling of ambient noise the two poles of the earth
CN110246516A (en) * 2019-07-25 2019-09-17 福建师范大学福清分校 The processing method of small space echo signal in a kind of voice communication
CN110944089A (en) * 2019-11-04 2020-03-31 中移(杭州)信息技术有限公司 Double-talk detection method and electronic equipment
CN111048118A (en) * 2019-12-24 2020-04-21 大众问问(北京)信息科技有限公司 Voice signal processing method and device and terminal
CN111049848A (en) * 2019-12-23 2020-04-21 腾讯科技(深圳)有限公司 Call method, device, system, server and storage medium
CN111161748A (en) * 2020-02-20 2020-05-15 百度在线网络技术(北京)有限公司 Double-talk state detection method and device and electronic equipment
CN111294473A (en) * 2019-01-28 2020-06-16 展讯通信(上海)有限公司 Signal processing method and device
CN112133324A (en) * 2019-06-06 2020-12-25 北京京东尚科信息技术有限公司 Call state detection method, device, computer system and medium
CN112614500A (en) * 2019-09-18 2021-04-06 北京声智科技有限公司 Echo cancellation method, device, equipment and computer storage medium
CN112637833A (en) * 2020-12-21 2021-04-09 新疆品宣生物科技有限责任公司 Communication terminal information detection method and device
CN113223546A (en) * 2020-12-28 2021-08-06 南京愔宜智能科技有限公司 Audio and video conference system and echo cancellation device for same
CN113241085A (en) * 2021-04-29 2021-08-10 北京梧桐车联科技有限责任公司 Echo cancellation method, device, equipment and readable storage medium
CN114242106A (en) * 2020-09-09 2022-03-25 中车株洲电力机车研究所有限公司 A voice processing method and device thereof
CN115273909A (en) * 2022-07-28 2022-11-01 歌尔科技有限公司 Voice activity detection method, device, equipment and computer readable storage medium
CN117437929A (en) * 2023-12-21 2024-01-23 睿云联(厦门)网络通讯技术有限公司 Real-time echo cancellation method based on neural network
CN118645113A (en) * 2024-08-14 2024-09-13 腾讯科技(深圳)有限公司 A method, device, equipment, medium and product for processing speech signals

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012009047A1 (en) * 2010-07-12 2012-01-19 Audience, Inc. Monaural noise suppression based on computational auditory scene analysis
WO2013040414A1 (en) * 2011-09-16 2013-03-21 Qualcomm Incorporated Mobile device context information using speech detection
CN103151039A (en) * 2013-02-07 2013-06-12 中国科学院自动化研究所 Speaker age identification method based on SVM (Support Vector Machine)
CN103258532A (en) * 2012-11-28 2013-08-21 河海大学常州校区 Method for recognizing Chinese speech emotions based on fuzzy support vector machine
CN105657110A (en) * 2016-02-26 2016-06-08 深圳Tcl数字技术有限公司 Voice communication echo cancellation method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012009047A1 (en) * 2010-07-12 2012-01-19 Audience, Inc. Monaural noise suppression based on computational auditory scene analysis
WO2013040414A1 (en) * 2011-09-16 2013-03-21 Qualcomm Incorporated Mobile device context information using speech detection
CN103258532A (en) * 2012-11-28 2013-08-21 河海大学常州校区 Method for recognizing Chinese speech emotions based on fuzzy support vector machine
CN103151039A (en) * 2013-02-07 2013-06-12 中国科学院自动化研究所 Speaker age identification method based on SVM (Support Vector Machine)
CN105657110A (en) * 2016-02-26 2016-06-08 深圳Tcl数字技术有限公司 Voice communication echo cancellation method and device

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448661B (en) * 2016-09-23 2019-07-16 华南理工大学 Audio types detection method based on clean speech and the modeling of ambient noise the two poles of the earth
CN106448661A (en) * 2016-09-23 2017-02-22 华南理工大学 Audio type detection method based on pure voice and background noise two-level modeling
CN108429994A (en) * 2017-02-15 2018-08-21 阿里巴巴集团控股有限公司 Audio identification, echo cancel method, device and equipment
CN108429994B (en) * 2017-02-15 2020-10-09 阿里巴巴集团控股有限公司 Audio identification and echo cancellation method, device and equipment
CN109215672A (en) * 2017-07-05 2019-01-15 上海谦问万答吧云计算科技有限公司 A kind of processing method of acoustic information, device and equipment
CN109309764B (en) * 2017-07-28 2021-09-03 北京搜狗科技发展有限公司 Audio data processing method and device, electronic equipment and storage medium
CN109309764A (en) * 2017-07-28 2019-02-05 北京搜狗科技发展有限公司 Audio data processing method, device, electronic equipment and storage medium
US11151976B2 (en) 2017-10-19 2021-10-19 Zhejiang Dahua Technology Co., Ltd. Methods and systems for operating a signal filter device
WO2019076328A1 (en) * 2017-10-19 2019-04-25 Zhejiang Dahua Technology Co., Ltd. Methods and systems for operating a signal filter device
CN107888792A (en) * 2017-10-19 2018-04-06 浙江大华技术股份有限公司 A kind of echo cancel method, apparatus and system
CN107888792B (en) * 2017-10-19 2019-09-17 浙江大华技术股份有限公司 A kind of echo cancel method, apparatus and system
CN109068012A (en) * 2018-07-06 2018-12-21 南京时保联信息科技有限公司 A kind of double talk detection method for audio conference system
CN109348072B (en) * 2018-08-30 2021-03-02 湖北工业大学 Double-end call detection method applied to echo cancellation system
CN109348072A (en) * 2018-08-30 2019-02-15 湖北工业大学 A double-ended talk detection method applied to echo cancellation system
CN109473123A (en) * 2018-12-05 2019-03-15 百度在线网络技术(北京)有限公司 Voice activity detection method and device
US11127416B2 (en) 2018-12-05 2021-09-21 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for voice activity detection
CN109379501B (en) * 2018-12-17 2021-12-21 嘉楠明芯(北京)科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109448748A (en) * 2018-12-17 2019-03-08 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109493878B (en) * 2018-12-17 2021-08-31 嘉楠明芯(北京)科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109448748B (en) * 2018-12-17 2021-08-03 嘉楠明芯(北京)科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109379501A (en) * 2018-12-17 2019-02-22 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109493878A (en) * 2018-12-17 2019-03-19 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109547655A (en) * 2018-12-30 2019-03-29 广东大仓机器人科技有限公司 Echo cancellation processing method for network voice call
CN111294473B (en) * 2019-01-28 2022-01-04 展讯通信(上海)有限公司 Signal processing method and device
CN111294473A (en) * 2019-01-28 2020-06-16 展讯通信(上海)有限公司 Signal processing method and device
CN112133324A (en) * 2019-06-06 2020-12-25 北京京东尚科信息技术有限公司 Call state detection method, device, computer system and medium
CN110246516A (en) * 2019-07-25 2019-09-17 福建师范大学福清分校 The processing method of small space echo signal in a kind of voice communication
CN112614500A (en) * 2019-09-18 2021-04-06 北京声智科技有限公司 Echo cancellation method, device, equipment and computer storage medium
CN110944089A (en) * 2019-11-04 2020-03-31 中移(杭州)信息技术有限公司 Double-talk detection method and electronic equipment
CN111049848A (en) * 2019-12-23 2020-04-21 腾讯科技(深圳)有限公司 Call method, device, system, server and storage medium
US11842751B2 (en) 2019-12-23 2023-12-12 Tencent Technology (Shenzhen) Company Limited Call method, apparatus, and system, server, and storage medium
CN111049848B (en) * 2019-12-23 2021-11-23 腾讯科技(深圳)有限公司 Call method, device, system, server and storage medium
CN111048118B (en) * 2019-12-24 2022-07-26 大众问问(北京)信息科技有限公司 Voice signal processing method and device and terminal
CN111048118A (en) * 2019-12-24 2020-04-21 大众问问(北京)信息科技有限公司 Voice signal processing method and device and terminal
CN111161748A (en) * 2020-02-20 2020-05-15 百度在线网络技术(北京)有限公司 Double-talk state detection method and device and electronic equipment
US11804235B2 (en) 2020-02-20 2023-10-31 Baidu Online Network Technology (Beijing) Co., Ltd. Double-talk state detection method and device, and electronic device
CN114242106A (en) * 2020-09-09 2022-03-25 中车株洲电力机车研究所有限公司 A voice processing method and device thereof
CN112637833B (en) * 2020-12-21 2022-10-11 新疆品宣生物科技有限责任公司 Communication terminal information detection method and equipment
CN112637833A (en) * 2020-12-21 2021-04-09 新疆品宣生物科技有限责任公司 Communication terminal information detection method and device
CN113223546A (en) * 2020-12-28 2021-08-06 南京愔宜智能科技有限公司 Audio and video conference system and echo cancellation device for same
CN113241085A (en) * 2021-04-29 2021-08-10 北京梧桐车联科技有限责任公司 Echo cancellation method, device, equipment and readable storage medium
CN113241085B (en) * 2021-04-29 2022-07-22 北京梧桐车联科技有限责任公司 Echo cancellation method, device, equipment and readable storage medium
CN115273909A (en) * 2022-07-28 2022-11-01 歌尔科技有限公司 Voice activity detection method, device, equipment and computer readable storage medium
CN115273909B (en) * 2022-07-28 2024-07-30 歌尔科技有限公司 Voice activity detection method, device, equipment and computer readable storage medium
CN117437929A (en) * 2023-12-21 2024-01-23 睿云联(厦门)网络通讯技术有限公司 Real-time echo cancellation method based on neural network
CN117437929B (en) * 2023-12-21 2024-03-08 睿云联(厦门)网络通讯技术有限公司 Real-time echo cancellation method based on neural network
CN118645113A (en) * 2024-08-14 2024-09-13 腾讯科技(深圳)有限公司 A method, device, equipment, medium and product for processing speech signals
CN118645113B (en) * 2024-08-14 2024-10-29 腾讯科技(深圳)有限公司 A method, device, equipment, medium and product for processing speech signals

Also Published As

Publication number Publication date
CN105957520B (en) 2019-10-11

Similar Documents

Publication Publication Date Title
CN105957520B (en) A Speech State Detection Method Applicable to Echo Cancellation System
US11017791B2 (en) Deep neural network-based method and apparatus for combining noise and echo removal
CN112735456A (en) Speech enhancement method based on DNN-CLSTM network
Carbajal et al. Multiple-input neural network-based residual echo suppression
US9633671B2 (en) Voice quality enhancement techniques, speech recognition techniques, and related systems
CN109841206A (en) A kind of echo cancel method based on deep learning
Zhao et al. Late reverberation suppression using recurrent neural networks with long short-term memory
CN105469785A (en) Voice activity detection method in communication-terminal double-microphone denoising system and apparatus thereof
CN112687276B (en) Audio signal processing method and device and storage medium
CN106033673B (en) A kind of near-end voice signals detection method and device
Hou et al. Domain adversarial training for speech enhancement
Zhang et al. A Robust and Cascaded Acoustic Echo Cancellation Based on Deep Learning.
CN114530160A (en) Model training method, echo cancellation method, system, device and storage medium
CN111883154A (en) Echo cancellation method and device, computer-readable storage medium, and electronic device
CN106161820B (en) An Inter-Channel Decorrelation Method for Stereo Acoustic Echo Cancellation
CN105957536A (en) Frequency domain echo eliminating method based on channel aggregation degree
Zhang et al. LCSM: A lightweight complex spectral mapping framework for stereophonic acoustic echo cancellation
KR100844176B1 (en) Statistical Model-based Residual Echo Cancellation
CN110148421B (en) Residual echo detection method, terminal and device
CN112133324A (en) Call state detection method, device, computer system and medium
Peer et al. Reverberation matching for speaker recognition
Schmidt et al. Reduction of non-stationary noise using a non-negative latent variable decomposition
CN116453532A (en) Echo cancellation method of acoustic echo
Bavkar et al. PCA based single channel speech enhancement method for highly noisy environment
CN113345457B (en) Acoustic echo cancellation adaptive filter based on Bayes theory and filtering method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant