CN105957520A - Voice state detection method suitable for echo cancellation system - Google Patents

Voice state detection method suitable for echo cancellation system Download PDF

Info

Publication number
CN105957520A
CN105957520A CN201610519040.6A CN201610519040A CN105957520A CN 105957520 A CN105957520 A CN 105957520A CN 201610519040 A CN201610519040 A CN 201610519040A CN 105957520 A CN105957520 A CN 105957520A
Authority
CN
China
Prior art keywords
voice
signal
piecemeal
training sample
gauss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610519040.6A
Other languages
Chinese (zh)
Other versions
CN105957520B (en
Inventor
王珂
明萌
纪红
李曦
张鹤立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201610519040.6A priority Critical patent/CN105957520B/en
Publication of CN105957520A publication Critical patent/CN105957520A/en
Application granted granted Critical
Publication of CN105957520B publication Critical patent/CN105957520B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

The invention relates to a voice state detection method suitable for an echo cancellation system. The voice state detection method relates to the field of voice interaction technologies based on an IP network. The voice state detection method comprises the steps of: constructing a support vector machine (SVM) classifier by utilizing noise training samples and voice training samples, wherein signals to be detected are far-end and near-end signals after blocking, carrying out VAD judgment on the block far-end signal by adopting the constructed SVM classifier based on a Gaussian mixture model, stopping updating and filtering of a filter and outputting a near-end voice signal directly if the judgment result is that no voice exists, and carrying out double-end conversation judgment when judging that voice exists at a far end; stopping updating coefficients of the filter when in double-end conversation, and filtering the near-end signal; otherwise, conducting coefficient updating and filtering of the filter according to the far-end signal. The voice state detection method improves accuracy of voice activity detection, prevents a double-end mute state from being misjudged to be a double-end conversation state, and prevents error updating and filtering of the filter without a reference signal.

Description

A kind of voice status detection method being applicable to echo cancelling system
Technical field
The present invention relates to the technical field of voice interaction of IP based network, specifically refer to a kind of voice being applicable to echo cancelling system Condition detection method.
Background technology
Echo cancellation technology is widely used in the voice of the IP based networks such as TeleConference Bridge, on-vehicle Bluetooth system, IP phone and hands over Mutually in system, the sound play in order to eliminate speaker is picked up by mike after multiple propagated, and it is remote to transfer back to system The acoustic echo that end is formed.The core concept of echo cancellor is by a sef-adapting filter analog echo path, and will estimate Echo signal deducts from the signal that mike picks up.
Voice status detection plays vital effect in echo cancellor.Needed first before acoustical signal enters wave filter Current speech state is judged, determines the duty of wave filter according to the voice status residing for system.Whether can be the most fast Judge system voice state fastly, the effect of echo cancellor is had a great impact.
Existing echo cancelling system typically directly uses DTD (Double Talk Detection, double talk detection) algorithm to sentence Whether disconnected system is in dual end communication state, and stops filter coefficient update under dual end communication state, prevents in this case Wave filter dissipates due to the interference by near-end speech.Conventional DTD algorithm Geigel algorithm is by closer end signal Near-end speech is judged whether, at the ratio ξ of near end signal Yu remote signaling amplitude with the range value of remote signaling(g)More than spy Think during definite value T that system is in dual end communication state.I.e. when:
ξ ( g ) = | y ( k ) | m a x { | x ( k - 1 ) | , ... , | x ( k - N ) | } > T
Time, it is believed that there is near-end speech, system is in dual end communication state.Wherein | y (k) | is near-end speech range value, Max{ | x (k-1) | ..., | x (k-N) | } it is the maximum amplitude value of far-end speech signal top n sampled point.Thresholding T is according to echo road Footpath decay determines, generally can take 0.5;N is the most equal with filter length.
But the method has a disadvantage in that
1, Geigel algorithm assume that near-end speech is much larger than the echo signal of far-end, and not in full conformity with the actual feelings of echo cancellor Condition, is not the most the most accurate.
2, not carrying out far-end VAD (Voice Activity Detection, Voice activity detector) may with regard to directly carrying out DTD Both-end mute state can be caused to be mistaken for dual end communication state.
3, under dual end communication state, only stop filter coefficient update, be persistently filtered under the non-existent state of far-end speech Filter divergence may be caused with coefficient update, and proximally signal deducts and non-existent far-end speech mistakenly.
Summary of the invention
In order to overcome above-mentioned three problem, the present invention proposes the voice status detection method of a kind of combination VAD and DTD, and Design new filtering and more New Policy to improve Detection accuracy according to testing result, it is to avoid the erroneous judgement of voice status, prevent filtering The mistake of device updates and filtering.
A kind of voice status detection method being applicable to echo cancelling system that the present invention provides, it is achieved step is as follows:
The first step: utilize noise training sample and voice training sample architecture support vector machines grader.
Respectively noise training sample and voice training sample are carried out characteristics extraction and gauss hybrid models GMM training, structure Corresponding Gauss super vector.Gauss super vector is utilized to construct SVM classifier kernel function, and voice signal and noise signal pair The SVM model answered, uses the kernel function constructed and SVM model construction to obtain SVM classifier.
Second step: signal to be detected is the proximally and distally signal after piecemeal.It is based on gauss hybrid models that use has constructed SVM classifier carries out VAD judgement to this piecemeal remote signaling.
This piecemeal remote signaling is carried out characteristics extraction and GMM training, constructs Gauss super vector.By this piecemeal remote signaling Corresponding Gauss super vector is input in the SVM classifier constructed make decisions.If being categorized as noise, it is judged that result is Without voice, then stop filter update and filtering, directly export near-end voice signals.Otherwise explanation far-end has voice, under carrying out The dual end communication judgement of one step.
3rd step: judge whether system belongs to dual end communication state.
Calculate remote signaling and the normalized crosscorrelation ξ of error signalXECC, compare normalized crosscorrelation ξXECCWith the thresholding arranged TXECC, work as ξXECC< TXECCTime, near-end has voice, system to be in dual end communication state, stops filter coefficient update, right Near end signal is filtered.Work as ξXECC≥TXECCTime, near-end without voice, according to remote signaling be filtered device coefficient update and Filtering.
Advantages of the present invention with have the active effect that
(1) use algorithm of support vector machine based on gauss hybrid models that remote signaling is carried out Voice activity detector, improve The accuracy of Voice activity detector, overcome that conventional Voice activity detector method based on energy exists at low signal-to-noise ratio Under the conditions of detect inaccurate problem.
(2) before double talk detection, first carry out far-end speech detection of activity, carry out both-end when far-end has voice again and lead to Words detection, it is possible to avoid both-end mute state is mistaken for dual end communication state.Double talk detection based on cross-correlation is used to calculate Method, improves the accuracy of double talk detection.
(3) different filtering and more New Policy are taked according to the different phonetic state residing for system.With tradition echo cancelling system only Stop filter coefficient update comparing when dual end communication, also stop filter coefficient update and filter when far-end is without voice Ripple, can prevent the mistake of wave filter in the case of without reference to signal from updating and filtering further.
Accompanying drawing explanation
Fig. 1 is the overall flow schematic diagram of the voice status detection method being applicable to echo cancelling system of the present invention;
Fig. 2 is two sections of PCM stream schematic diagrams used by embodiment of the present invention emulation;
Fig. 3 is that the embodiment of the present invention only uses DTD based on energy detection to carry out the effect schematic diagram of echo cancellor;
Fig. 4 is that the embodiment of the present invention uses the inventive method to carry out the effect schematic diagram of echo cancellor;
Fig. 5 is the Sipdroid echo cancellor effect schematic diagram that the embodiment of the present invention uses the echo cancellor storehouse before improving;
Fig. 6 is the Sipdroid echo cancellor effect schematic diagram that the embodiment of the present invention uses the echo cancellor storehouse after improving;
Detailed description of the invention
Below in conjunction with drawings and Examples, the present invention is described in further detail.
First the inventive method carried out VAD to remote signaling before DTD, detected in the presence of remote signaling is not straight at VAD Connect stopping filter coefficient update and filtering, to prevent filter divergence and to filter mistakenly.Detect at VAD and there is far-end language Carry out DTD during sound again, and stop filter coefficient update when dual end communication.The vad algorithm wherein used is based on GMM The SVM (Support Vector Machine, support vector machine) of (Gaussian Mixture Model, gauss hybrid models) Algorithm, this algorithm utilizes GMM structural feature super vector, and GMM super vector is used for eigenvalue input and the core letter of SVM Number structure, accuracy rate higher than conventional based on energy or the vad algorithm of dependency.The DTD algorithm used is to believe based on far-end Number with the DTD of error signal cross-correlation, accuracy rate is also above conventional Geigel algorithm based on energy.By by far-end VAD Combine with DTD, the accuracy of voice status detection can be improved.By taking different filtering under different phonetic state Strategy, is possible to prevent dissipating and the filtering of mistake of wave filter, is substantially improved the effect of echo cancellor.
Each step of the voice status detection method being applicable to echo cancelling system of the present invention is described in conjunction with Fig. 1.
Step one, utilizes noise training sample and voice training sample architecture SVM classifier, including step S101~S103.
Step S101: noise signal training sample and voice signal training sample are carried out characteristics extraction.Here the feature used Value is Mel cepstrum coefficient (MFCC).MFCC specifically extracts process: signal is carried out preemphasis, piecemeal and windowing process, Piecemeal after windowing is obtained through fast Fourier transform (FFT) frequency spectrum parameter of each piecemeal.Frequency spectrum by each piecemeal Parameter passes through one group of Mel scale wave filter being made up of K triangle strip bandpass filter, K Mel band filter numbering From 0 to K-1, the output of each frequency band is taken the logarithm, obtain the logarithmic energy of each output, to each piecemeal voice signal Obtain K corresponding log spectrum.K is positive integer, and general value is 20~30.Finally K the log spectrum obtained is entered Mel cepstrum coefficient is obtained in row cosine transform.Log spectrum is transformed to cepstrum frequency domain through discrete cosine transform and obtains Mel cepstrum The formula of coefficient is as follows:
m i ( l ) = &Sigma; k = 0 K - 1 S i ( k ) c o s ( &pi; l ( k + 1 / 2 ) K ) , 0 &le; k < K , 0 &le; l < L - - - ( 1 )
Wherein, SiK () is the log spectrum that i-th piecemeal signal is obtained by correspondence after the band filter of numbering k, K is Mel The number of band filter, miL () is the l rank parameter of the MFCC of i-th piecemeal voice signal, L is the MFCC extracted Total exponent number, in formula (1), i represents corresponding i-th piecemeal, and i is positive integer.
Step S102: generate the Gauss super vector that noise signal training sample is corresponding with voice signal training sample.
The MFCC parameter being utilized respectively noise signal training sample and voice signal training sample sets up noise signal and voice signal Corresponding gauss hybrid models.GMM is substantially a kind of Multi-dimensional probability density function, N rank gauss hybrid models g (x) be by The linear combination of N number of single Gauss distribution describes the distribution at feature space of the frame feature, and to a certain piecemeal, g (x) is expressed as follows:
g ( x ) = &Sigma; i = 1 N w i p i ( x ) - - - ( 2 )
Wherein, x is the L dimensional feature vector of the MFCC parameter composition of this piecemeal of training sample, and N is the rank of gauss hybrid models Number, piX () is the i-th Gaussian component of gauss hybrid models, wiFor gauss hybrid models component piThe weighter factor of (x).
piX () is expressed as follows:
p i ( x ) = 1 ( 2 &pi; ) L 2 | &Sigma; i | 1 2 exp { - ( x - &mu; i ) T &Sigma; i - 1 ( x - &mu; i ) 2 } - - - ( 3 )
Wherein, ΣiIt is the covariance matrix of i-th Gaussian component, μiIt is the mean vector of i-th Gaussian component, therefore, GMM The parameter set λ of model can be expressed as follows:
λ=(wiii), i=1,2 ..., N (4)
Corresponding gauss hybrid models g (x) can be expressed as:
g ( x ) = &Sigma; i = 1 N w i N ( x ; &mu; i , &Sigma; i ) - - - ( 5 )
Wherein, N (.) represents Gaussian probability-density function.
The process setting up GMM model is actually the process of the parameter being estimated GMM model by training.Can use Big expectation EM algorithm carries out model parameter renewal.This algorithm has two key steps: expectation E step and maximization M walk.E walks Utilizing the expected value of the likelihood score function of current parameter set calculating partial data, M step obtains new by maximizing expectation function Parameter.E step and M walk always iteration until convergence.The GMM model of voice and noise can be obtained the most respectively, be set to g (s) Represent that voice signal, n represent noise signal with g (n), s.
Utilize the gauss hybrid models structure Gauss super vector established.Gauss super vector be gauss hybrid models parametric configuration and Become, can be by the GMM Gauss super vector m of voice and noisesAnd mnIt is expressed as follows respectively:
m s = ( ( w 1 &Sigma; 1 - 1 / 2 &mu; 1 s ) T , ( w 2 &Sigma; 2 - 1 / 2 &mu; 2 s ) T , ... , ( w N &Sigma; N - 1 / 2 &mu; N s ) T ) - - - ( 6 )
m n = ( ( w 1 &Sigma; 1 - 1 / 2 &mu; 1 n ) T , ( w 2 &Sigma; 2 - 1 / 2 &mu; 2 n ) T , ... , ( w N &Sigma; N - 1 / 2 &mu; N n ) T ) - - - ( 7 )
For the mean vector of Gaussian component each in g (s),Equal for Gaussian component each in g (n) Value vector.
Step S103: utilize the Gauss super vector structure SVM classifier constructed.It is utilized respectively noise signal and voice signal Corresponding Gauss super vector mnAnd msSet up the SVM model that noise signal is corresponding with voice signal.Utilize noise signal and voice The Gauss super vector m that signal is correspondingnAnd msStructure K-L kernel function.This kernel function uses between two GMM probability distribution K-L divergence structure forms.
By the GMM super vector m of voice and noisenAnd msStructure kernel function K (n, s) expression is as follows:
K ( n , s ) = &Sigma; i = 1 N ( w i &Sigma; - 1 2 &mu; i n ) T ( w i &Sigma; - 1 2 &mu; i s ) - - - ( 8 )
SVM classifier can be obtained after determining the SVM of kernel function, the SVM of voice signal and noise signal.
Step 2, uses the SVM classifier based on GMM constructed that this piecemeal remote signaling is carried out VAD judgement.Defeated Entering the signal to be detected of SVM classifier is the proximally and distally signal after piecemeal.Need first to carry out Fourier transformation to be transformed into Frequency domain, then according to the eigenvalue of signal spectrum signal calculated piecemeal, i.e. MFCC, normalized crosscorrelation etc..Particularly may be divided into Step S201~S203.
Step S201: this piecemeal remote signaling MFCC parameter extraction.The concrete of MFCC parameter extracts course synchronization rapid 101, The MFCC parameter that this piecemeal remote signaling is corresponding is finally given by formula (1).
Step S202: the Gauss super vector that this piecemeal remote signaling is corresponding generates.This piecemeal remote signaling MFCC parameter is utilized to build Vertical gauss hybrid models, and utilize the gauss hybrid models established to construct the Gauss super vector that this piecemeal remote signaling is corresponding.High This super vector generation method is with step S102, as shown in formula (6) and (7).
Step S203: be input in the SVM classifier constructed by Gauss super vector corresponding for this piecemeal remote signaling, uses SVM algorithm based on GMM carries out speech/noise classification.Draw the VAD court verdict of far-end speech.If being categorized as making an uproar Sound, it is judged that result is without voice, then stop filter update and filtering, directly export near-end voice signals.If being categorized as language Sound, illustrates that far-end has voice, carries out next step dual end communication judgement.
Step 3, it is judged that whether system belongs to dual end communication state.
Step S301: error signal.
Adaptive filter coefficient simulates echo path, and therefore this piecemeal remote signaling and adaptive filter coefficient carry out convolution Estimated echo signal x can be obtainedTN () w (n), error signal e (n) is near end signal d (n) of this piecemeal and believes with estimated echo Number xTThe difference of (n) w (n).
Adaptive filter coefficient is according to adaptive algorithm, utilizes error signal and remote signaling to constantly update.A kind of conventional The more new formula of update algorithm LMS algorithm as follows:
W (n+1)=w (n)+2 μ e (n) x (n) (9)
Wherein, μ is step-length, and w (n) is filter weight vector, and e (n) is error signal, and x (n) is remote signaling.N represents N-th moment (sampled point).
Step S302: calculate remote signaling and the normalized crosscorrelation of error signal.Owing to the computing cross-correlation of time domain can be changed For the dot product of frequency domain, i.e. two signal spectrum value pointwises are multiplied, and therefore can directly utilize remote signaling frequency spectrum X (k) and error is believed Number frequency spectrum E (k) tries to achieve the value of this normalized crosscorrelation, and computation complexity is relatively low.Normalized crosscorrelation is in the computational methods of frequency domain:
&xi; X E C C = m a x k E &lsqb; X ( k ) E ( k ) &rsqb; E &lsqb; X ( k ) 2 &rsqb; E &lsqb; E ( k ) 2 &rsqb; - - - ( 10 )
ξXECCRepresenting remote signaling and the normalized crosscorrelation of error signal, k represents frequency.
Step S303:DTD is adjudicated.The normalized crosscorrelation ξ of distant end signal and error signalXECCAnd normalized crosscorrelation Thresholding.When near-end is without voice, the normalized crosscorrelation ξ of remote signaling and error signalXECC1 should be equal to, and near-end has language During sound, normalized crosscorrelation ξXECCLess than 1.Therefore, it can to arrange one be slightly less than 1 constant TXECCAs threshold value, TXECC Generally value is between 0.9 to 1, and this threshold value is according to testing result real-time update.The algorithm updated selects according to practical situation Take.One good threshold value should make misinformation probability and miss probability the most relatively small.Such as: can the most arbitrarily select one Being slightly less than the constant of 1, then arranging near-end speech is 0, calculates misinformation probability and miss probability, adjusts T within the specific limitsXECC, Until misinformation probability and miss probability are the least.
When normalized crosscorrelation is less than thresholding, it may be assumed that
ξXECC< TXECC (11) System is in dual end communication state, stops filter coefficient update, directly uses original filter coefficient to carry out near end signal Filtering;Otherwise, there is not near-end speech, only exist far-end speech, be the most both filtered device coefficient update, be also carried out filtering.
The voice status detection method that the present invention proposes is applied in the echo cancelling system of reality, including two terminals, uses Actual communication effect is verified by VoIP software Sipdroid.
The voice status detection method of combination VAD and DTD proposed the present invention first by matlab emulates.Emulation Voice signal used includes that 1 section of far-end speech PCM of 30 seconds (Pulse Code Modulation, pulse code modulation) flows And 1 section of corresponding near-end speech PCM stream, sample frequency is 8000Hz.In echo cancelling system, wave filter Length be set to 128, adaptive filter algorithm uses BFDAF algorithm (i.e. the NLMS algorithm of frequency domain), and voice status inspection Method of determining and calculating uses the voice status detection method that the present invention proposes.
As in figure 2 it is shown, two sections of PCM stream used by emulation.It is followed successively by remote signaling waveform, near end signal waveform from top to bottom. Abscissa is the time, unit s;Vertical coordinate is range value.Use original voice status detection method, the most only use based on energy DTD detection, echo cancellor effect is as shown in Figure 3.It can be seen that under conditions of VAD does not improves, first half The echo cancellor effect of section is preferable, but there is a small amount of residual echo;The effect of second half section is the most less desirable, and primary sound is disappeared Removing to compare many, the signal after echo cancellor creates bigger distortion.
Using the voice status detection method that the present invention proposes, the effect of echo cancellor is as shown in Figure 4.Contrast before improving and change The two sections of PCM stream obtained after carrying out echo cancellor respectively after entering, it can be seen that echo cancellor effect is improving voice status detection Improve significantly after method.Residual echo eliminates more thorough, and near-end speech is also almost without distortion phenomenon occur.
In order to verify voice status detection method that the present invention the proposes effect in actual echo elimination system further, to the party Method writes corresponding c program, and utilizes voice communication software Sipdroid to test the method.
The step amendment echo cancellor storehouse WebRTC of the voice status detection method according to the present invention performs VAD's and DTD Part, then calls this echo cancellor storehouse in Sipdroid.Sipdroid is used to carry out actual dual end communication under various circumstances And record, preserve the voice PCM stream before and after echo cancellor, in order to carry out echo cancellor effect analysis.
In order to take out carry out observation analysis after voice flow time more convenient and clear, in testing every time, two callers successively from 1 to 10 carry out count off.Under various circumstances, respectively the Sipdroid version before improving and after improvement is carried out repeatedly speaking test To contrast.
First the Sipdroid echo cancellor effect to the echo cancellor storehouse used before improving carries out repeatedly speaking test, and takes out remote PCM stream after end, near-end and echo cancellor.Test result is as it is shown in figure 5, only intercept the PCM stream of count off part in figure. Wherein, first paragraph PCM stream is remote signaling, and second segment PCM stream is near end signal, and the 3rd section of PCM stream is echo cancellor After near end signal.Visible, echo cancellor effect is less desirable, and count off part has a little residual echo, dotted line frame to iris out portion Point.Other test result major parts are similar.
Then, same method is also used to carry out repeatedly the echo cancellor effect of the Sipdroid in the echo cancellor storehouse used after improving Speaking test, and take out the PCM stream after far-end, near-end and echo cancellor.Fig. 6 is a relatively representational test result. Similar with Fig. 5, in figure, first paragraph PCM stream is remote signaling, and second segment PCM stream is near end signal, the 3rd section of PCM stream It it is the near end signal after echo cancellor.Visible, after the speech detection method after using the present invention to improve, echo cancellor effectiveness comparison Ideal, the residual echo of count off part eliminates ratio more thoroughly, and as part irised out by dotted line frame, the reservation of primary sound simultaneously is not the most subject to Impact.Repeatedly testing discovery, under various circumstances, the effect of echo cancellor can be under some influence, and stability need into one Step improves.But in most of the cases, use the echo cancellor effect after the voice status detection method of the present invention the most compared with before-improvement Echo cancellor effect have clear improvement.

Claims (5)

1. the voice status detection method being applicable to echo cancelling system, it is characterised in that realize step as follows:
The first step: utilize noise training sample and voice training sample architecture support vector machines grader;
Respectively noise training sample and voice training sample are carried out characteristics extraction and gauss hybrid models GMM training, the Gauss super vector that structure is corresponding, then the kernel function of Gauss super vector structure SVM classifier, and the SVM model that voice signal is corresponding with noise signal are utilized;The kernel function constructed and SVM model construction is used to obtain SVM classifier;
Second step: signal to be detected is the proximally and distally signal after piecemeal, uses the SVM classifier constructed that this piecemeal remote signaling is carried out VAD judgement;VAD represents Voice activity detector;
This piecemeal remote signaling carrying out characteristics extraction and GMM training, constructs Gauss super vector, the Gauss super vector that then this piecemeal remote signaling is corresponding is input in the SVM classifier constructed make decisions;If it is judged that be noise, indicating without voice, then stop filter update and filtering, directly exporting near-end voice signals, otherwise explanation far-end has voice, carries out next step dual end communication judgement;
3rd step: judge whether system belongs to dual end communication state;
Calculate remote signaling and the normalized crosscorrelation ξ of error signalXECC;Relatively normalized crosscorrelation ξXECCWith the thresholding T arrangedXECC, work as ξXECC< TXECCTime, system is in dual end communication state, stops filter coefficient update, is filtered near end signal;Otherwise, near-end, without voice, is filtered device coefficient update and filtering according to remote signaling.
A kind of voice status detection method being applicable to echo cancelling system the most according to claim 1, it is characterised in that described first step structure SVM classifier, comprises the steps:
Step S101: noise signal training sample and voice signal training sample are carried out characteristics extraction;The eigenvalue used is Mel cepstrum coefficient MFCC;
The extraction process of MFCC is: signal is carried out preemphasis, piecemeal and windowing process, the piecemeal after windowing is obtained through fast Fourier transform FFT the frequency spectrum parameter of each piecemeal;By the frequency spectrum parameter of each piecemeal by one group of Mel scale wave filter being made up of K triangle strip bandpass filter, and the output to each frequency band is taken the logarithm, it is thus achieved that log spectrum;If the numbering of K band filter is from 0 to K-1, then i-th piecemeal is S by the log spectrum obtained corresponding after the band filter of numbering ki(k), l rank parameter m of the MFCC of i-th piecemeali(l) be:
Wherein, L is total exponent number of the MFCC extracted;
Step S102: generate noise signal training sample and the Gauss super vector of voice signal training sample;
It is utilized respectively noise signal training sample and sets up, with the MFCC parameter of voice signal training sample, the gauss hybrid models that noise signal is corresponding with voice signal;
To a certain piecemeal, N rank gauss hybrid models g (x) is expressed as:
Wherein, x is the L dimensional feature vector of the MFCC parameter composition of this piecemeal of training sample, piX () is the i-th Gaussian component of gauss hybrid models, wiWeighter factor for i-th Gaussian component;ΣiIt is the covariance matrix of i-th Gaussian component, μiIt it is the mean vector of i-th Gaussian component;
Gauss hybrid models g (x) is further represented as:N (.) represents Gaussian probability-density function;
Using EM algorithm to carry out the renewal of gauss hybrid models parameter, if the gauss hybrid models finally obtaining voice signal training sample is g (s), the mean vector of the most each Gaussian component isS represents voice signal;The gauss hybrid models of the noise signal training sample finally obtained is g (n), and the mean vector of the most each Gaussian component isN represents noise signal;Utilize the gauss hybrid models structure voice signal training sample and the Gauss super vector m of noise signal training sample establishedsAnd mnIt is respectively as follows:
Step S103: utilize the Gauss super vector structure SVM classifier constructed;
It is utilized respectively Gauss super vector mnAnd msSet up the SVM model that noise signal is corresponding with voice signal;
Utilize Gauss super vector mnAnd msStructure kernel function K (n, s) as follows:
Determine kernel function, the SVM model of voice signal and the SVM of noise signal, obtain SVM classifier.
A kind of voice status detection method being applicable to echo cancelling system the most according to claim 1 and 2, it is characterized in that, in the 3rd described step, the method of error signal is: this piecemeal remote signaling and adaptive filter coefficient being carried out convolution and obtains estimated echo signal, error signal is the difference of this piecemeal near end signal and estimated echo signal.
A kind of voice status detection method being applicable to echo cancelling system the most according to claim 1 and 2, it is characterised in that in the 3rd described step, calculates remote signaling and the normalized crosscorrelation ξ of error signal according to formula belowXECC:
Wherein, k represents that frequency, X (k) are remote signaling frequency spectrum, and E (k) is error signal spectrum.
A kind of voice status detection method being applicable to echo cancelling system the most according to claim 1 and 2, it is characterised in that in the 3rd described step, the thresholding T of settingXECCIt is the value between 0.9 to 1, and carries out real-time update according to court verdict.
CN201610519040.6A 2016-07-04 2016-07-04 A kind of voice status detection method suitable for echo cancelling system Active CN105957520B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610519040.6A CN105957520B (en) 2016-07-04 2016-07-04 A kind of voice status detection method suitable for echo cancelling system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610519040.6A CN105957520B (en) 2016-07-04 2016-07-04 A kind of voice status detection method suitable for echo cancelling system

Publications (2)

Publication Number Publication Date
CN105957520A true CN105957520A (en) 2016-09-21
CN105957520B CN105957520B (en) 2019-10-11

Family

ID=56903377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610519040.6A Active CN105957520B (en) 2016-07-04 2016-07-04 A kind of voice status detection method suitable for echo cancelling system

Country Status (1)

Country Link
CN (1) CN105957520B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448661A (en) * 2016-09-23 2017-02-22 华南理工大学 Audio type detection method based on pure voice and background noise two-level modeling
CN107888792A (en) * 2017-10-19 2018-04-06 浙江大华技术股份有限公司 A kind of echo cancel method, apparatus and system
CN108429994A (en) * 2017-02-15 2018-08-21 阿里巴巴集团控股有限公司 Audio identification, echo cancel method, device and equipment
CN109068012A (en) * 2018-07-06 2018-12-21 南京时保联信息科技有限公司 A kind of double talk detection method for audio conference system
CN109215672A (en) * 2017-07-05 2019-01-15 上海谦问万答吧云计算科技有限公司 A kind of processing method of acoustic information, device and equipment
CN109309764A (en) * 2017-07-28 2019-02-05 北京搜狗科技发展有限公司 Audio data processing method, device, electronic equipment and storage medium
CN109348072A (en) * 2018-08-30 2019-02-15 湖北工业大学 A kind of double talk detection method applied to acoustic echo cancellation system
CN109379501A (en) * 2018-12-17 2019-02-22 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109448748A (en) * 2018-12-17 2019-03-08 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109473123A (en) * 2018-12-05 2019-03-15 百度在线网络技术(北京)有限公司 Voice activity detection method and device
CN109493878A (en) * 2018-12-17 2019-03-19 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109547655A (en) * 2018-12-30 2019-03-29 广东大仓机器人科技有限公司 A kind of method of the echo cancellation process of voice-over-net call
CN106448661B (en) * 2016-09-23 2019-07-16 华南理工大学 Audio types detection method based on clean speech and the modeling of ambient noise the two poles of the earth
CN110246516A (en) * 2019-07-25 2019-09-17 福建师范大学福清分校 The processing method of small space echo signal in a kind of voice communication
CN110944089A (en) * 2019-11-04 2020-03-31 中移(杭州)信息技术有限公司 Double-talk detection method and electronic equipment
CN111048118A (en) * 2019-12-24 2020-04-21 大众问问(北京)信息科技有限公司 Voice signal processing method and device and terminal
CN111049848A (en) * 2019-12-23 2020-04-21 腾讯科技(深圳)有限公司 Call method, device, system, server and storage medium
CN111161748A (en) * 2020-02-20 2020-05-15 百度在线网络技术(北京)有限公司 Double-talk state detection method and device and electronic equipment
CN111294473A (en) * 2019-01-28 2020-06-16 展讯通信(上海)有限公司 Signal processing method and device
CN112133324A (en) * 2019-06-06 2020-12-25 北京京东尚科信息技术有限公司 Call state detection method, device, computer system and medium
CN112614500A (en) * 2019-09-18 2021-04-06 北京声智科技有限公司 Echo cancellation method, device, equipment and computer storage medium
CN112637833A (en) * 2020-12-21 2021-04-09 新疆品宣生物科技有限责任公司 Communication terminal information detection method and device
CN113223546A (en) * 2020-12-28 2021-08-06 南京愔宜智能科技有限公司 Audio and video conference system and echo cancellation device for same
CN113241085A (en) * 2021-04-29 2021-08-10 北京梧桐车联科技有限责任公司 Echo cancellation method, device, equipment and readable storage medium
CN114242106A (en) * 2020-09-09 2022-03-25 中车株洲电力机车研究所有限公司 Voice processing method and device
CN115273909A (en) * 2022-07-28 2022-11-01 歌尔科技有限公司 Voice activity detection method, device, equipment and computer readable storage medium
CN117437929A (en) * 2023-12-21 2024-01-23 睿云联(厦门)网络通讯技术有限公司 Real-time echo cancellation method based on neural network
CN118645113A (en) * 2024-08-14 2024-09-13 腾讯科技(深圳)有限公司 Voice signal processing method, device, equipment, medium and product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012009047A1 (en) * 2010-07-12 2012-01-19 Audience, Inc. Monaural noise suppression based on computational auditory scene analysis
WO2013040414A1 (en) * 2011-09-16 2013-03-21 Qualcomm Incorporated Mobile device context information using speech detection
CN103151039A (en) * 2013-02-07 2013-06-12 中国科学院自动化研究所 Speaker age identification method based on SVM (Support Vector Machine)
CN103258532A (en) * 2012-11-28 2013-08-21 河海大学常州校区 Method for recognizing Chinese speech emotions based on fuzzy support vector machine
CN105657110A (en) * 2016-02-26 2016-06-08 深圳Tcl数字技术有限公司 Voice communication echo cancellation method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012009047A1 (en) * 2010-07-12 2012-01-19 Audience, Inc. Monaural noise suppression based on computational auditory scene analysis
WO2013040414A1 (en) * 2011-09-16 2013-03-21 Qualcomm Incorporated Mobile device context information using speech detection
CN103258532A (en) * 2012-11-28 2013-08-21 河海大学常州校区 Method for recognizing Chinese speech emotions based on fuzzy support vector machine
CN103151039A (en) * 2013-02-07 2013-06-12 中国科学院自动化研究所 Speaker age identification method based on SVM (Support Vector Machine)
CN105657110A (en) * 2016-02-26 2016-06-08 深圳Tcl数字技术有限公司 Voice communication echo cancellation method and device

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448661B (en) * 2016-09-23 2019-07-16 华南理工大学 Audio types detection method based on clean speech and the modeling of ambient noise the two poles of the earth
CN106448661A (en) * 2016-09-23 2017-02-22 华南理工大学 Audio type detection method based on pure voice and background noise two-level modeling
CN108429994A (en) * 2017-02-15 2018-08-21 阿里巴巴集团控股有限公司 Audio identification, echo cancel method, device and equipment
CN108429994B (en) * 2017-02-15 2020-10-09 阿里巴巴集团控股有限公司 Audio identification and echo cancellation method, device and equipment
CN109215672A (en) * 2017-07-05 2019-01-15 上海谦问万答吧云计算科技有限公司 A kind of processing method of acoustic information, device and equipment
CN109309764B (en) * 2017-07-28 2021-09-03 北京搜狗科技发展有限公司 Audio data processing method and device, electronic equipment and storage medium
CN109309764A (en) * 2017-07-28 2019-02-05 北京搜狗科技发展有限公司 Audio data processing method, device, electronic equipment and storage medium
US11151976B2 (en) 2017-10-19 2021-10-19 Zhejiang Dahua Technology Co., Ltd. Methods and systems for operating a signal filter device
WO2019076328A1 (en) * 2017-10-19 2019-04-25 Zhejiang Dahua Technology Co., Ltd. Methods and systems for operating a signal filter device
CN107888792A (en) * 2017-10-19 2018-04-06 浙江大华技术股份有限公司 A kind of echo cancel method, apparatus and system
CN107888792B (en) * 2017-10-19 2019-09-17 浙江大华技术股份有限公司 A kind of echo cancel method, apparatus and system
CN109068012A (en) * 2018-07-06 2018-12-21 南京时保联信息科技有限公司 A kind of double talk detection method for audio conference system
CN109348072B (en) * 2018-08-30 2021-03-02 湖北工业大学 Double-end call detection method applied to echo cancellation system
CN109348072A (en) * 2018-08-30 2019-02-15 湖北工业大学 A kind of double talk detection method applied to acoustic echo cancellation system
CN109473123A (en) * 2018-12-05 2019-03-15 百度在线网络技术(北京)有限公司 Voice activity detection method and device
US11127416B2 (en) 2018-12-05 2021-09-21 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for voice activity detection
CN109379501B (en) * 2018-12-17 2021-12-21 嘉楠明芯(北京)科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109448748A (en) * 2018-12-17 2019-03-08 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109493878B (en) * 2018-12-17 2021-08-31 嘉楠明芯(北京)科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109448748B (en) * 2018-12-17 2021-08-03 嘉楠明芯(北京)科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109379501A (en) * 2018-12-17 2019-02-22 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109493878A (en) * 2018-12-17 2019-03-19 杭州嘉楠耘智信息科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109547655A (en) * 2018-12-30 2019-03-29 广东大仓机器人科技有限公司 A kind of method of the echo cancellation process of voice-over-net call
CN111294473B (en) * 2019-01-28 2022-01-04 展讯通信(上海)有限公司 Signal processing method and device
CN111294473A (en) * 2019-01-28 2020-06-16 展讯通信(上海)有限公司 Signal processing method and device
CN112133324A (en) * 2019-06-06 2020-12-25 北京京东尚科信息技术有限公司 Call state detection method, device, computer system and medium
CN110246516A (en) * 2019-07-25 2019-09-17 福建师范大学福清分校 The processing method of small space echo signal in a kind of voice communication
CN112614500A (en) * 2019-09-18 2021-04-06 北京声智科技有限公司 Echo cancellation method, device, equipment and computer storage medium
CN110944089A (en) * 2019-11-04 2020-03-31 中移(杭州)信息技术有限公司 Double-talk detection method and electronic equipment
CN111049848A (en) * 2019-12-23 2020-04-21 腾讯科技(深圳)有限公司 Call method, device, system, server and storage medium
US11842751B2 (en) 2019-12-23 2023-12-12 Tencent Technology (Shenzhen) Company Limited Call method, apparatus, and system, server, and storage medium
CN111049848B (en) * 2019-12-23 2021-11-23 腾讯科技(深圳)有限公司 Call method, device, system, server and storage medium
CN111048118B (en) * 2019-12-24 2022-07-26 大众问问(北京)信息科技有限公司 Voice signal processing method and device and terminal
CN111048118A (en) * 2019-12-24 2020-04-21 大众问问(北京)信息科技有限公司 Voice signal processing method and device and terminal
CN111161748A (en) * 2020-02-20 2020-05-15 百度在线网络技术(北京)有限公司 Double-talk state detection method and device and electronic equipment
US11804235B2 (en) 2020-02-20 2023-10-31 Baidu Online Network Technology (Beijing) Co., Ltd. Double-talk state detection method and device, and electronic device
CN114242106A (en) * 2020-09-09 2022-03-25 中车株洲电力机车研究所有限公司 Voice processing method and device
CN112637833B (en) * 2020-12-21 2022-10-11 新疆品宣生物科技有限责任公司 Communication terminal information detection method and equipment
CN112637833A (en) * 2020-12-21 2021-04-09 新疆品宣生物科技有限责任公司 Communication terminal information detection method and device
CN113223546A (en) * 2020-12-28 2021-08-06 南京愔宜智能科技有限公司 Audio and video conference system and echo cancellation device for same
CN113241085A (en) * 2021-04-29 2021-08-10 北京梧桐车联科技有限责任公司 Echo cancellation method, device, equipment and readable storage medium
CN113241085B (en) * 2021-04-29 2022-07-22 北京梧桐车联科技有限责任公司 Echo cancellation method, device, equipment and readable storage medium
CN115273909A (en) * 2022-07-28 2022-11-01 歌尔科技有限公司 Voice activity detection method, device, equipment and computer readable storage medium
CN115273909B (en) * 2022-07-28 2024-07-30 歌尔科技有限公司 Voice activity detection method, device, equipment and computer readable storage medium
CN117437929A (en) * 2023-12-21 2024-01-23 睿云联(厦门)网络通讯技术有限公司 Real-time echo cancellation method based on neural network
CN117437929B (en) * 2023-12-21 2024-03-08 睿云联(厦门)网络通讯技术有限公司 Real-time echo cancellation method based on neural network
CN118645113A (en) * 2024-08-14 2024-09-13 腾讯科技(深圳)有限公司 Voice signal processing method, device, equipment, medium and product
CN118645113B (en) * 2024-08-14 2024-10-29 腾讯科技(深圳)有限公司 Voice signal processing method, device, equipment, medium and product

Also Published As

Publication number Publication date
CN105957520B (en) 2019-10-11

Similar Documents

Publication Publication Date Title
CN105957520A (en) Voice state detection method suitable for echo cancellation system
WO2020042706A1 (en) Deep learning-based acoustic echo cancellation method
CN109524020B (en) Speech enhancement processing method
CN107123430A (en) Echo cancellation method, device, conference tablet and computer storage medium
US20200105287A1 (en) Deep neural network-based method and apparatus for combining noise and echo removal
Carbajal et al. Multiple-input neural network-based residual echo suppression
CN104157293B (en) The signal processing method of targeted voice signal pickup in a kind of enhancing acoustic environment
Pfeifenberger et al. DNN-based speech mask estimation for eigenvector beamforming
CN111161752A (en) Echo cancellation method and device
CN106486131A (en) A kind of method and device of speech de-noising
CN103440872B (en) The denoising method of transient state noise
CN106157964A (en) A kind of determine the method for system delay in echo cancellor
CN106157967A (en) Impulse noise mitigation
CN108417224A (en) The training and recognition methods of two way blocks model and system
CN108922515A (en) Speech model training method, audio recognition method, device, equipment and medium
CN104427143B (en) residual echo detection method and system
CN111883154B (en) Echo cancellation method and device, computer-readable storage medium, and electronic device
CN111223492A (en) Echo path delay estimation method and device
CN106161820B (en) A kind of interchannel decorrelation method for stereo acoustic echo canceler
CN107274892A (en) Method for distinguishing speek person and device
CN112382301A (en) Noise-containing voice gender identification method and system based on lightweight neural network
CN115083431A (en) Echo cancellation method and device, electronic equipment and computer readable medium
Dionelis et al. Modulation-domain Kalman filtering for monaural blind speech denoising and dereverberation
CN103971697B (en) Sound enhancement method based on non-local mean filtering
CN105654959A (en) Self-adaptive filtering coefficient updating method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant