CN105957520A - Voice state detection method suitable for echo cancellation system - Google Patents
Voice state detection method suitable for echo cancellation system Download PDFInfo
- Publication number
- CN105957520A CN105957520A CN201610519040.6A CN201610519040A CN105957520A CN 105957520 A CN105957520 A CN 105957520A CN 201610519040 A CN201610519040 A CN 201610519040A CN 105957520 A CN105957520 A CN 105957520A
- Authority
- CN
- China
- Prior art keywords
- voice
- signal
- piecemeal
- training sample
- gauss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 35
- 230000000694 effects Effects 0.000 claims abstract description 29
- 238000001914 filtration Methods 0.000 claims abstract description 19
- 238000012706 support-vector machine Methods 0.000 claims abstract description 8
- 239000000203 mixture Substances 0.000 claims abstract description 4
- 230000011664 signaling Effects 0.000 claims description 35
- 238000004891 communication Methods 0.000 claims description 20
- 230000009977 dual effect Effects 0.000 claims description 19
- 238000000034 method Methods 0.000 claims description 17
- 238000001228 spectrum Methods 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 8
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 239000004576 sand Substances 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 3
- 230000003993 interaction Effects 0.000 abstract description 2
- 230000000903 blocking effect Effects 0.000 abstract 1
- 238000012360 testing method Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000014155 detection of activity Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
Abstract
The invention relates to a voice state detection method suitable for an echo cancellation system. The voice state detection method relates to the field of voice interaction technologies based on an IP network. The voice state detection method comprises the steps of: constructing a support vector machine (SVM) classifier by utilizing noise training samples and voice training samples, wherein signals to be detected are far-end and near-end signals after blocking, carrying out VAD judgment on the block far-end signal by adopting the constructed SVM classifier based on a Gaussian mixture model, stopping updating and filtering of a filter and outputting a near-end voice signal directly if the judgment result is that no voice exists, and carrying out double-end conversation judgment when judging that voice exists at a far end; stopping updating coefficients of the filter when in double-end conversation, and filtering the near-end signal; otherwise, conducting coefficient updating and filtering of the filter according to the far-end signal. The voice state detection method improves accuracy of voice activity detection, prevents a double-end mute state from being misjudged to be a double-end conversation state, and prevents error updating and filtering of the filter without a reference signal.
Description
Technical field
The present invention relates to the technical field of voice interaction of IP based network, specifically refer to a kind of voice being applicable to echo cancelling system
Condition detection method.
Background technology
Echo cancellation technology is widely used in the voice of the IP based networks such as TeleConference Bridge, on-vehicle Bluetooth system, IP phone and hands over
Mutually in system, the sound play in order to eliminate speaker is picked up by mike after multiple propagated, and it is remote to transfer back to system
The acoustic echo that end is formed.The core concept of echo cancellor is by a sef-adapting filter analog echo path, and will estimate
Echo signal deducts from the signal that mike picks up.
Voice status detection plays vital effect in echo cancellor.Needed first before acoustical signal enters wave filter
Current speech state is judged, determines the duty of wave filter according to the voice status residing for system.Whether can be the most fast
Judge system voice state fastly, the effect of echo cancellor is had a great impact.
Existing echo cancelling system typically directly uses DTD (Double Talk Detection, double talk detection) algorithm to sentence
Whether disconnected system is in dual end communication state, and stops filter coefficient update under dual end communication state, prevents in this case
Wave filter dissipates due to the interference by near-end speech.Conventional DTD algorithm Geigel algorithm is by closer end signal
Near-end speech is judged whether, at the ratio ξ of near end signal Yu remote signaling amplitude with the range value of remote signaling(g)More than spy
Think during definite value T that system is in dual end communication state.I.e. when:
Time, it is believed that there is near-end speech, system is in dual end communication state.Wherein | y (k) | is near-end speech range value,
Max{ | x (k-1) | ..., | x (k-N) | } it is the maximum amplitude value of far-end speech signal top n sampled point.Thresholding T is according to echo road
Footpath decay determines, generally can take 0.5;N is the most equal with filter length.
But the method has a disadvantage in that
1, Geigel algorithm assume that near-end speech is much larger than the echo signal of far-end, and not in full conformity with the actual feelings of echo cancellor
Condition, is not the most the most accurate.
2, not carrying out far-end VAD (Voice Activity Detection, Voice activity detector) may with regard to directly carrying out DTD
Both-end mute state can be caused to be mistaken for dual end communication state.
3, under dual end communication state, only stop filter coefficient update, be persistently filtered under the non-existent state of far-end speech
Filter divergence may be caused with coefficient update, and proximally signal deducts and non-existent far-end speech mistakenly.
Summary of the invention
In order to overcome above-mentioned three problem, the present invention proposes the voice status detection method of a kind of combination VAD and DTD, and
Design new filtering and more New Policy to improve Detection accuracy according to testing result, it is to avoid the erroneous judgement of voice status, prevent filtering
The mistake of device updates and filtering.
A kind of voice status detection method being applicable to echo cancelling system that the present invention provides, it is achieved step is as follows:
The first step: utilize noise training sample and voice training sample architecture support vector machines grader.
Respectively noise training sample and voice training sample are carried out characteristics extraction and gauss hybrid models GMM training, structure
Corresponding Gauss super vector.Gauss super vector is utilized to construct SVM classifier kernel function, and voice signal and noise signal pair
The SVM model answered, uses the kernel function constructed and SVM model construction to obtain SVM classifier.
Second step: signal to be detected is the proximally and distally signal after piecemeal.It is based on gauss hybrid models that use has constructed
SVM classifier carries out VAD judgement to this piecemeal remote signaling.
This piecemeal remote signaling is carried out characteristics extraction and GMM training, constructs Gauss super vector.By this piecemeal remote signaling
Corresponding Gauss super vector is input in the SVM classifier constructed make decisions.If being categorized as noise, it is judged that result is
Without voice, then stop filter update and filtering, directly export near-end voice signals.Otherwise explanation far-end has voice, under carrying out
The dual end communication judgement of one step.
3rd step: judge whether system belongs to dual end communication state.
Calculate remote signaling and the normalized crosscorrelation ξ of error signalXECC, compare normalized crosscorrelation ξXECCWith the thresholding arranged
TXECC, work as ξXECC< TXECCTime, near-end has voice, system to be in dual end communication state, stops filter coefficient update, right
Near end signal is filtered.Work as ξXECC≥TXECCTime, near-end without voice, according to remote signaling be filtered device coefficient update and
Filtering.
Advantages of the present invention with have the active effect that
(1) use algorithm of support vector machine based on gauss hybrid models that remote signaling is carried out Voice activity detector, improve
The accuracy of Voice activity detector, overcome that conventional Voice activity detector method based on energy exists at low signal-to-noise ratio
Under the conditions of detect inaccurate problem.
(2) before double talk detection, first carry out far-end speech detection of activity, carry out both-end when far-end has voice again and lead to
Words detection, it is possible to avoid both-end mute state is mistaken for dual end communication state.Double talk detection based on cross-correlation is used to calculate
Method, improves the accuracy of double talk detection.
(3) different filtering and more New Policy are taked according to the different phonetic state residing for system.With tradition echo cancelling system only
Stop filter coefficient update comparing when dual end communication, also stop filter coefficient update and filter when far-end is without voice
Ripple, can prevent the mistake of wave filter in the case of without reference to signal from updating and filtering further.
Accompanying drawing explanation
Fig. 1 is the overall flow schematic diagram of the voice status detection method being applicable to echo cancelling system of the present invention;
Fig. 2 is two sections of PCM stream schematic diagrams used by embodiment of the present invention emulation;
Fig. 3 is that the embodiment of the present invention only uses DTD based on energy detection to carry out the effect schematic diagram of echo cancellor;
Fig. 4 is that the embodiment of the present invention uses the inventive method to carry out the effect schematic diagram of echo cancellor;
Fig. 5 is the Sipdroid echo cancellor effect schematic diagram that the embodiment of the present invention uses the echo cancellor storehouse before improving;
Fig. 6 is the Sipdroid echo cancellor effect schematic diagram that the embodiment of the present invention uses the echo cancellor storehouse after improving;
Detailed description of the invention
Below in conjunction with drawings and Examples, the present invention is described in further detail.
First the inventive method carried out VAD to remote signaling before DTD, detected in the presence of remote signaling is not straight at VAD
Connect stopping filter coefficient update and filtering, to prevent filter divergence and to filter mistakenly.Detect at VAD and there is far-end language
Carry out DTD during sound again, and stop filter coefficient update when dual end communication.The vad algorithm wherein used is based on GMM
The SVM (Support Vector Machine, support vector machine) of (Gaussian Mixture Model, gauss hybrid models)
Algorithm, this algorithm utilizes GMM structural feature super vector, and GMM super vector is used for eigenvalue input and the core letter of SVM
Number structure, accuracy rate higher than conventional based on energy or the vad algorithm of dependency.The DTD algorithm used is to believe based on far-end
Number with the DTD of error signal cross-correlation, accuracy rate is also above conventional Geigel algorithm based on energy.By by far-end VAD
Combine with DTD, the accuracy of voice status detection can be improved.By taking different filtering under different phonetic state
Strategy, is possible to prevent dissipating and the filtering of mistake of wave filter, is substantially improved the effect of echo cancellor.
Each step of the voice status detection method being applicable to echo cancelling system of the present invention is described in conjunction with Fig. 1.
Step one, utilizes noise training sample and voice training sample architecture SVM classifier, including step S101~S103.
Step S101: noise signal training sample and voice signal training sample are carried out characteristics extraction.Here the feature used
Value is Mel cepstrum coefficient (MFCC).MFCC specifically extracts process: signal is carried out preemphasis, piecemeal and windowing process,
Piecemeal after windowing is obtained through fast Fourier transform (FFT) frequency spectrum parameter of each piecemeal.Frequency spectrum by each piecemeal
Parameter passes through one group of Mel scale wave filter being made up of K triangle strip bandpass filter, K Mel band filter numbering
From 0 to K-1, the output of each frequency band is taken the logarithm, obtain the logarithmic energy of each output, to each piecemeal voice signal
Obtain K corresponding log spectrum.K is positive integer, and general value is 20~30.Finally K the log spectrum obtained is entered
Mel cepstrum coefficient is obtained in row cosine transform.Log spectrum is transformed to cepstrum frequency domain through discrete cosine transform and obtains Mel cepstrum
The formula of coefficient is as follows:
Wherein, SiK () is the log spectrum that i-th piecemeal signal is obtained by correspondence after the band filter of numbering k, K is Mel
The number of band filter, miL () is the l rank parameter of the MFCC of i-th piecemeal voice signal, L is the MFCC extracted
Total exponent number, in formula (1), i represents corresponding i-th piecemeal, and i is positive integer.
Step S102: generate the Gauss super vector that noise signal training sample is corresponding with voice signal training sample.
The MFCC parameter being utilized respectively noise signal training sample and voice signal training sample sets up noise signal and voice signal
Corresponding gauss hybrid models.GMM is substantially a kind of Multi-dimensional probability density function, N rank gauss hybrid models g (x) be by
The linear combination of N number of single Gauss distribution describes the distribution at feature space of the frame feature, and to a certain piecemeal, g (x) is expressed as follows:
Wherein, x is the L dimensional feature vector of the MFCC parameter composition of this piecemeal of training sample, and N is the rank of gauss hybrid models
Number, piX () is the i-th Gaussian component of gauss hybrid models, wiFor gauss hybrid models component piThe weighter factor of (x).
piX () is expressed as follows:
Wherein, ΣiIt is the covariance matrix of i-th Gaussian component, μiIt is the mean vector of i-th Gaussian component, therefore, GMM
The parameter set λ of model can be expressed as follows:
λ=(wi,μi,Σi), i=1,2 ..., N (4)
Corresponding gauss hybrid models g (x) can be expressed as:
Wherein, N (.) represents Gaussian probability-density function.
The process setting up GMM model is actually the process of the parameter being estimated GMM model by training.Can use
Big expectation EM algorithm carries out model parameter renewal.This algorithm has two key steps: expectation E step and maximization M walk.E walks
Utilizing the expected value of the likelihood score function of current parameter set calculating partial data, M step obtains new by maximizing expectation function
Parameter.E step and M walk always iteration until convergence.The GMM model of voice and noise can be obtained the most respectively, be set to g (s)
Represent that voice signal, n represent noise signal with g (n), s.
Utilize the gauss hybrid models structure Gauss super vector established.Gauss super vector be gauss hybrid models parametric configuration and
Become, can be by the GMM Gauss super vector m of voice and noisesAnd mnIt is expressed as follows respectively:
For the mean vector of Gaussian component each in g (s),Equal for Gaussian component each in g (n)
Value vector.
Step S103: utilize the Gauss super vector structure SVM classifier constructed.It is utilized respectively noise signal and voice signal
Corresponding Gauss super vector mnAnd msSet up the SVM model that noise signal is corresponding with voice signal.Utilize noise signal and voice
The Gauss super vector m that signal is correspondingnAnd msStructure K-L kernel function.This kernel function uses between two GMM probability distribution
K-L divergence structure forms.
By the GMM super vector m of voice and noisenAnd msStructure kernel function K (n, s) expression is as follows:
SVM classifier can be obtained after determining the SVM of kernel function, the SVM of voice signal and noise signal.
Step 2, uses the SVM classifier based on GMM constructed that this piecemeal remote signaling is carried out VAD judgement.Defeated
Entering the signal to be detected of SVM classifier is the proximally and distally signal after piecemeal.Need first to carry out Fourier transformation to be transformed into
Frequency domain, then according to the eigenvalue of signal spectrum signal calculated piecemeal, i.e. MFCC, normalized crosscorrelation etc..Particularly may be divided into
Step S201~S203.
Step S201: this piecemeal remote signaling MFCC parameter extraction.The concrete of MFCC parameter extracts course synchronization rapid 101,
The MFCC parameter that this piecemeal remote signaling is corresponding is finally given by formula (1).
Step S202: the Gauss super vector that this piecemeal remote signaling is corresponding generates.This piecemeal remote signaling MFCC parameter is utilized to build
Vertical gauss hybrid models, and utilize the gauss hybrid models established to construct the Gauss super vector that this piecemeal remote signaling is corresponding.High
This super vector generation method is with step S102, as shown in formula (6) and (7).
Step S203: be input in the SVM classifier constructed by Gauss super vector corresponding for this piecemeal remote signaling, uses
SVM algorithm based on GMM carries out speech/noise classification.Draw the VAD court verdict of far-end speech.If being categorized as making an uproar
Sound, it is judged that result is without voice, then stop filter update and filtering, directly export near-end voice signals.If being categorized as language
Sound, illustrates that far-end has voice, carries out next step dual end communication judgement.
Step 3, it is judged that whether system belongs to dual end communication state.
Step S301: error signal.
Adaptive filter coefficient simulates echo path, and therefore this piecemeal remote signaling and adaptive filter coefficient carry out convolution
Estimated echo signal x can be obtainedTN () w (n), error signal e (n) is near end signal d (n) of this piecemeal and believes with estimated echo
Number xTThe difference of (n) w (n).
Adaptive filter coefficient is according to adaptive algorithm, utilizes error signal and remote signaling to constantly update.A kind of conventional
The more new formula of update algorithm LMS algorithm as follows:
W (n+1)=w (n)+2 μ e (n) x (n) (9)
Wherein, μ is step-length, and w (n) is filter weight vector, and e (n) is error signal, and x (n) is remote signaling.N represents
N-th moment (sampled point).
Step S302: calculate remote signaling and the normalized crosscorrelation of error signal.Owing to the computing cross-correlation of time domain can be changed
For the dot product of frequency domain, i.e. two signal spectrum value pointwises are multiplied, and therefore can directly utilize remote signaling frequency spectrum X (k) and error is believed
Number frequency spectrum E (k) tries to achieve the value of this normalized crosscorrelation, and computation complexity is relatively low.Normalized crosscorrelation is in the computational methods of frequency domain:
ξXECCRepresenting remote signaling and the normalized crosscorrelation of error signal, k represents frequency.
Step S303:DTD is adjudicated.The normalized crosscorrelation ξ of distant end signal and error signalXECCAnd normalized crosscorrelation
Thresholding.When near-end is without voice, the normalized crosscorrelation ξ of remote signaling and error signalXECC1 should be equal to, and near-end has language
During sound, normalized crosscorrelation ξXECCLess than 1.Therefore, it can to arrange one be slightly less than 1 constant TXECCAs threshold value, TXECC
Generally value is between 0.9 to 1, and this threshold value is according to testing result real-time update.The algorithm updated selects according to practical situation
Take.One good threshold value should make misinformation probability and miss probability the most relatively small.Such as: can the most arbitrarily select one
Being slightly less than the constant of 1, then arranging near-end speech is 0, calculates misinformation probability and miss probability, adjusts T within the specific limitsXECC,
Until misinformation probability and miss probability are the least.
When normalized crosscorrelation is less than thresholding, it may be assumed that
ξXECC< TXECC (11)
System is in dual end communication state, stops filter coefficient update, directly uses original filter coefficient to carry out near end signal
Filtering;Otherwise, there is not near-end speech, only exist far-end speech, be the most both filtered device coefficient update, be also carried out filtering.
The voice status detection method that the present invention proposes is applied in the echo cancelling system of reality, including two terminals, uses
Actual communication effect is verified by VoIP software Sipdroid.
The voice status detection method of combination VAD and DTD proposed the present invention first by matlab emulates.Emulation
Voice signal used includes that 1 section of far-end speech PCM of 30 seconds (Pulse Code Modulation, pulse code modulation) flows
And 1 section of corresponding near-end speech PCM stream, sample frequency is 8000Hz.In echo cancelling system, wave filter
Length be set to 128, adaptive filter algorithm uses BFDAF algorithm (i.e. the NLMS algorithm of frequency domain), and voice status inspection
Method of determining and calculating uses the voice status detection method that the present invention proposes.
As in figure 2 it is shown, two sections of PCM stream used by emulation.It is followed successively by remote signaling waveform, near end signal waveform from top to bottom.
Abscissa is the time, unit s;Vertical coordinate is range value.Use original voice status detection method, the most only use based on energy
DTD detection, echo cancellor effect is as shown in Figure 3.It can be seen that under conditions of VAD does not improves, first half
The echo cancellor effect of section is preferable, but there is a small amount of residual echo;The effect of second half section is the most less desirable, and primary sound is disappeared
Removing to compare many, the signal after echo cancellor creates bigger distortion.
Using the voice status detection method that the present invention proposes, the effect of echo cancellor is as shown in Figure 4.Contrast before improving and change
The two sections of PCM stream obtained after carrying out echo cancellor respectively after entering, it can be seen that echo cancellor effect is improving voice status detection
Improve significantly after method.Residual echo eliminates more thorough, and near-end speech is also almost without distortion phenomenon occur.
In order to verify voice status detection method that the present invention the proposes effect in actual echo elimination system further, to the party
Method writes corresponding c program, and utilizes voice communication software Sipdroid to test the method.
The step amendment echo cancellor storehouse WebRTC of the voice status detection method according to the present invention performs VAD's and DTD
Part, then calls this echo cancellor storehouse in Sipdroid.Sipdroid is used to carry out actual dual end communication under various circumstances
And record, preserve the voice PCM stream before and after echo cancellor, in order to carry out echo cancellor effect analysis.
In order to take out carry out observation analysis after voice flow time more convenient and clear, in testing every time, two callers successively from
1 to 10 carry out count off.Under various circumstances, respectively the Sipdroid version before improving and after improvement is carried out repeatedly speaking test
To contrast.
First the Sipdroid echo cancellor effect to the echo cancellor storehouse used before improving carries out repeatedly speaking test, and takes out remote
PCM stream after end, near-end and echo cancellor.Test result is as it is shown in figure 5, only intercept the PCM stream of count off part in figure.
Wherein, first paragraph PCM stream is remote signaling, and second segment PCM stream is near end signal, and the 3rd section of PCM stream is echo cancellor
After near end signal.Visible, echo cancellor effect is less desirable, and count off part has a little residual echo, dotted line frame to iris out portion
Point.Other test result major parts are similar.
Then, same method is also used to carry out repeatedly the echo cancellor effect of the Sipdroid in the echo cancellor storehouse used after improving
Speaking test, and take out the PCM stream after far-end, near-end and echo cancellor.Fig. 6 is a relatively representational test result.
Similar with Fig. 5, in figure, first paragraph PCM stream is remote signaling, and second segment PCM stream is near end signal, the 3rd section of PCM stream
It it is the near end signal after echo cancellor.Visible, after the speech detection method after using the present invention to improve, echo cancellor effectiveness comparison
Ideal, the residual echo of count off part eliminates ratio more thoroughly, and as part irised out by dotted line frame, the reservation of primary sound simultaneously is not the most subject to
Impact.Repeatedly testing discovery, under various circumstances, the effect of echo cancellor can be under some influence, and stability need into one
Step improves.But in most of the cases, use the echo cancellor effect after the voice status detection method of the present invention the most compared with before-improvement
Echo cancellor effect have clear improvement.
Claims (5)
1. the voice status detection method being applicable to echo cancelling system, it is characterised in that realize step as follows:
The first step: utilize noise training sample and voice training sample architecture support vector machines grader;
Respectively noise training sample and voice training sample are carried out characteristics extraction and gauss hybrid models GMM training, the Gauss super vector that structure is corresponding, then the kernel function of Gauss super vector structure SVM classifier, and the SVM model that voice signal is corresponding with noise signal are utilized;The kernel function constructed and SVM model construction is used to obtain SVM classifier;
Second step: signal to be detected is the proximally and distally signal after piecemeal, uses the SVM classifier constructed that this piecemeal remote signaling is carried out VAD judgement;VAD represents Voice activity detector;
This piecemeal remote signaling carrying out characteristics extraction and GMM training, constructs Gauss super vector, the Gauss super vector that then this piecemeal remote signaling is corresponding is input in the SVM classifier constructed make decisions;If it is judged that be noise, indicating without voice, then stop filter update and filtering, directly exporting near-end voice signals, otherwise explanation far-end has voice, carries out next step dual end communication judgement;
3rd step: judge whether system belongs to dual end communication state;
Calculate remote signaling and the normalized crosscorrelation ξ of error signalXECC;Relatively normalized crosscorrelation ξXECCWith the thresholding T arrangedXECC, work as ξXECC< TXECCTime, system is in dual end communication state, stops filter coefficient update, is filtered near end signal;Otherwise, near-end, without voice, is filtered device coefficient update and filtering according to remote signaling.
A kind of voice status detection method being applicable to echo cancelling system the most according to claim 1, it is characterised in that described first step structure SVM classifier, comprises the steps:
Step S101: noise signal training sample and voice signal training sample are carried out characteristics extraction;The eigenvalue used is Mel cepstrum coefficient MFCC;
The extraction process of MFCC is: signal is carried out preemphasis, piecemeal and windowing process, the piecemeal after windowing is obtained through fast Fourier transform FFT the frequency spectrum parameter of each piecemeal;By the frequency spectrum parameter of each piecemeal by one group of Mel scale wave filter being made up of K triangle strip bandpass filter, and the output to each frequency band is taken the logarithm, it is thus achieved that log spectrum;If the numbering of K band filter is from 0 to K-1, then i-th piecemeal is S by the log spectrum obtained corresponding after the band filter of numbering ki(k), l rank parameter m of the MFCC of i-th piecemeali(l) be:
Wherein, L is total exponent number of the MFCC extracted;
Step S102: generate noise signal training sample and the Gauss super vector of voice signal training sample;
It is utilized respectively noise signal training sample and sets up, with the MFCC parameter of voice signal training sample, the gauss hybrid models that noise signal is corresponding with voice signal;
To a certain piecemeal, N rank gauss hybrid models g (x) is expressed as:
Wherein, x is the L dimensional feature vector of the MFCC parameter composition of this piecemeal of training sample, piX () is the i-th Gaussian component of gauss hybrid models, wiWeighter factor for i-th Gaussian component;ΣiIt is the covariance matrix of i-th Gaussian component, μiIt it is the mean vector of i-th Gaussian component;
Gauss hybrid models g (x) is further represented as:N (.) represents Gaussian probability-density function;
Using EM algorithm to carry out the renewal of gauss hybrid models parameter, if the gauss hybrid models finally obtaining voice signal training sample is g (s), the mean vector of the most each Gaussian component isS represents voice signal;The gauss hybrid models of the noise signal training sample finally obtained is g (n), and the mean vector of the most each Gaussian component isN represents noise signal;Utilize the gauss hybrid models structure voice signal training sample and the Gauss super vector m of noise signal training sample establishedsAnd mnIt is respectively as follows:
Step S103: utilize the Gauss super vector structure SVM classifier constructed;
It is utilized respectively Gauss super vector mnAnd msSet up the SVM model that noise signal is corresponding with voice signal;
Utilize Gauss super vector mnAnd msStructure kernel function K (n, s) as follows:
Determine kernel function, the SVM model of voice signal and the SVM of noise signal, obtain SVM classifier.
A kind of voice status detection method being applicable to echo cancelling system the most according to claim 1 and 2, it is characterized in that, in the 3rd described step, the method of error signal is: this piecemeal remote signaling and adaptive filter coefficient being carried out convolution and obtains estimated echo signal, error signal is the difference of this piecemeal near end signal and estimated echo signal.
A kind of voice status detection method being applicable to echo cancelling system the most according to claim 1 and 2, it is characterised in that in the 3rd described step, calculates remote signaling and the normalized crosscorrelation ξ of error signal according to formula belowXECC:
Wherein, k represents that frequency, X (k) are remote signaling frequency spectrum, and E (k) is error signal spectrum.
A kind of voice status detection method being applicable to echo cancelling system the most according to claim 1 and 2, it is characterised in that in the 3rd described step, the thresholding T of settingXECCIt is the value between 0.9 to 1, and carries out real-time update according to court verdict.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610519040.6A CN105957520B (en) | 2016-07-04 | 2016-07-04 | A kind of voice status detection method suitable for echo cancelling system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610519040.6A CN105957520B (en) | 2016-07-04 | 2016-07-04 | A kind of voice status detection method suitable for echo cancelling system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105957520A true CN105957520A (en) | 2016-09-21 |
CN105957520B CN105957520B (en) | 2019-10-11 |
Family
ID=56903377
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610519040.6A Active CN105957520B (en) | 2016-07-04 | 2016-07-04 | A kind of voice status detection method suitable for echo cancelling system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105957520B (en) |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106448661A (en) * | 2016-09-23 | 2017-02-22 | 华南理工大学 | Audio type detection method based on pure voice and background noise two-level modeling |
CN107888792A (en) * | 2017-10-19 | 2018-04-06 | 浙江大华技术股份有限公司 | A kind of echo cancel method, apparatus and system |
CN108429994A (en) * | 2017-02-15 | 2018-08-21 | 阿里巴巴集团控股有限公司 | Audio identification, echo cancel method, device and equipment |
CN109068012A (en) * | 2018-07-06 | 2018-12-21 | 南京时保联信息科技有限公司 | A kind of double talk detection method for audio conference system |
CN109215672A (en) * | 2017-07-05 | 2019-01-15 | 上海谦问万答吧云计算科技有限公司 | A kind of processing method of acoustic information, device and equipment |
CN109309764A (en) * | 2017-07-28 | 2019-02-05 | 北京搜狗科技发展有限公司 | Audio data processing method, device, electronic equipment and storage medium |
CN109348072A (en) * | 2018-08-30 | 2019-02-15 | 湖北工业大学 | A kind of double talk detection method applied to acoustic echo cancellation system |
CN109379501A (en) * | 2018-12-17 | 2019-02-22 | 杭州嘉楠耘智信息科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN109448748A (en) * | 2018-12-17 | 2019-03-08 | 杭州嘉楠耘智信息科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN109473123A (en) * | 2018-12-05 | 2019-03-15 | 百度在线网络技术(北京)有限公司 | Voice activity detection method and device |
CN109493878A (en) * | 2018-12-17 | 2019-03-19 | 杭州嘉楠耘智信息科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN109547655A (en) * | 2018-12-30 | 2019-03-29 | 广东大仓机器人科技有限公司 | A kind of method of the echo cancellation process of voice-over-net call |
CN106448661B (en) * | 2016-09-23 | 2019-07-16 | 华南理工大学 | Audio types detection method based on clean speech and the modeling of ambient noise the two poles of the earth |
CN110246516A (en) * | 2019-07-25 | 2019-09-17 | 福建师范大学福清分校 | The processing method of small space echo signal in a kind of voice communication |
CN110944089A (en) * | 2019-11-04 | 2020-03-31 | 中移(杭州)信息技术有限公司 | Double-talk detection method and electronic equipment |
CN111048118A (en) * | 2019-12-24 | 2020-04-21 | 大众问问(北京)信息科技有限公司 | Voice signal processing method and device and terminal |
CN111049848A (en) * | 2019-12-23 | 2020-04-21 | 腾讯科技(深圳)有限公司 | Call method, device, system, server and storage medium |
CN111161748A (en) * | 2020-02-20 | 2020-05-15 | 百度在线网络技术(北京)有限公司 | Double-talk state detection method and device and electronic equipment |
CN111294473A (en) * | 2019-01-28 | 2020-06-16 | 展讯通信(上海)有限公司 | Signal processing method and device |
CN112133324A (en) * | 2019-06-06 | 2020-12-25 | 北京京东尚科信息技术有限公司 | Call state detection method, device, computer system and medium |
CN112614500A (en) * | 2019-09-18 | 2021-04-06 | 北京声智科技有限公司 | Echo cancellation method, device, equipment and computer storage medium |
CN112637833A (en) * | 2020-12-21 | 2021-04-09 | 新疆品宣生物科技有限责任公司 | Communication terminal information detection method and device |
CN113223546A (en) * | 2020-12-28 | 2021-08-06 | 南京愔宜智能科技有限公司 | Audio and video conference system and echo cancellation device for same |
CN113241085A (en) * | 2021-04-29 | 2021-08-10 | 北京梧桐车联科技有限责任公司 | Echo cancellation method, device, equipment and readable storage medium |
CN114242106A (en) * | 2020-09-09 | 2022-03-25 | 中车株洲电力机车研究所有限公司 | Voice processing method and device |
CN115273909A (en) * | 2022-07-28 | 2022-11-01 | 歌尔科技有限公司 | Voice activity detection method, device, equipment and computer readable storage medium |
CN117437929A (en) * | 2023-12-21 | 2024-01-23 | 睿云联(厦门)网络通讯技术有限公司 | Real-time echo cancellation method based on neural network |
CN118645113A (en) * | 2024-08-14 | 2024-09-13 | 腾讯科技(深圳)有限公司 | Voice signal processing method, device, equipment, medium and product |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012009047A1 (en) * | 2010-07-12 | 2012-01-19 | Audience, Inc. | Monaural noise suppression based on computational auditory scene analysis |
WO2013040414A1 (en) * | 2011-09-16 | 2013-03-21 | Qualcomm Incorporated | Mobile device context information using speech detection |
CN103151039A (en) * | 2013-02-07 | 2013-06-12 | 中国科学院自动化研究所 | Speaker age identification method based on SVM (Support Vector Machine) |
CN103258532A (en) * | 2012-11-28 | 2013-08-21 | 河海大学常州校区 | Method for recognizing Chinese speech emotions based on fuzzy support vector machine |
CN105657110A (en) * | 2016-02-26 | 2016-06-08 | 深圳Tcl数字技术有限公司 | Voice communication echo cancellation method and device |
-
2016
- 2016-07-04 CN CN201610519040.6A patent/CN105957520B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012009047A1 (en) * | 2010-07-12 | 2012-01-19 | Audience, Inc. | Monaural noise suppression based on computational auditory scene analysis |
WO2013040414A1 (en) * | 2011-09-16 | 2013-03-21 | Qualcomm Incorporated | Mobile device context information using speech detection |
CN103258532A (en) * | 2012-11-28 | 2013-08-21 | 河海大学常州校区 | Method for recognizing Chinese speech emotions based on fuzzy support vector machine |
CN103151039A (en) * | 2013-02-07 | 2013-06-12 | 中国科学院自动化研究所 | Speaker age identification method based on SVM (Support Vector Machine) |
CN105657110A (en) * | 2016-02-26 | 2016-06-08 | 深圳Tcl数字技术有限公司 | Voice communication echo cancellation method and device |
Cited By (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106448661B (en) * | 2016-09-23 | 2019-07-16 | 华南理工大学 | Audio types detection method based on clean speech and the modeling of ambient noise the two poles of the earth |
CN106448661A (en) * | 2016-09-23 | 2017-02-22 | 华南理工大学 | Audio type detection method based on pure voice and background noise two-level modeling |
CN108429994A (en) * | 2017-02-15 | 2018-08-21 | 阿里巴巴集团控股有限公司 | Audio identification, echo cancel method, device and equipment |
CN108429994B (en) * | 2017-02-15 | 2020-10-09 | 阿里巴巴集团控股有限公司 | Audio identification and echo cancellation method, device and equipment |
CN109215672A (en) * | 2017-07-05 | 2019-01-15 | 上海谦问万答吧云计算科技有限公司 | A kind of processing method of acoustic information, device and equipment |
CN109309764B (en) * | 2017-07-28 | 2021-09-03 | 北京搜狗科技发展有限公司 | Audio data processing method and device, electronic equipment and storage medium |
CN109309764A (en) * | 2017-07-28 | 2019-02-05 | 北京搜狗科技发展有限公司 | Audio data processing method, device, electronic equipment and storage medium |
US11151976B2 (en) | 2017-10-19 | 2021-10-19 | Zhejiang Dahua Technology Co., Ltd. | Methods and systems for operating a signal filter device |
WO2019076328A1 (en) * | 2017-10-19 | 2019-04-25 | Zhejiang Dahua Technology Co., Ltd. | Methods and systems for operating a signal filter device |
CN107888792A (en) * | 2017-10-19 | 2018-04-06 | 浙江大华技术股份有限公司 | A kind of echo cancel method, apparatus and system |
CN107888792B (en) * | 2017-10-19 | 2019-09-17 | 浙江大华技术股份有限公司 | A kind of echo cancel method, apparatus and system |
CN109068012A (en) * | 2018-07-06 | 2018-12-21 | 南京时保联信息科技有限公司 | A kind of double talk detection method for audio conference system |
CN109348072B (en) * | 2018-08-30 | 2021-03-02 | 湖北工业大学 | Double-end call detection method applied to echo cancellation system |
CN109348072A (en) * | 2018-08-30 | 2019-02-15 | 湖北工业大学 | A kind of double talk detection method applied to acoustic echo cancellation system |
CN109473123A (en) * | 2018-12-05 | 2019-03-15 | 百度在线网络技术(北京)有限公司 | Voice activity detection method and device |
US11127416B2 (en) | 2018-12-05 | 2021-09-21 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for voice activity detection |
CN109379501B (en) * | 2018-12-17 | 2021-12-21 | 嘉楠明芯(北京)科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN109448748A (en) * | 2018-12-17 | 2019-03-08 | 杭州嘉楠耘智信息科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN109493878B (en) * | 2018-12-17 | 2021-08-31 | 嘉楠明芯(北京)科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN109448748B (en) * | 2018-12-17 | 2021-08-03 | 嘉楠明芯(北京)科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN109379501A (en) * | 2018-12-17 | 2019-02-22 | 杭州嘉楠耘智信息科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN109493878A (en) * | 2018-12-17 | 2019-03-19 | 杭州嘉楠耘智信息科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN109547655A (en) * | 2018-12-30 | 2019-03-29 | 广东大仓机器人科技有限公司 | A kind of method of the echo cancellation process of voice-over-net call |
CN111294473B (en) * | 2019-01-28 | 2022-01-04 | 展讯通信(上海)有限公司 | Signal processing method and device |
CN111294473A (en) * | 2019-01-28 | 2020-06-16 | 展讯通信(上海)有限公司 | Signal processing method and device |
CN112133324A (en) * | 2019-06-06 | 2020-12-25 | 北京京东尚科信息技术有限公司 | Call state detection method, device, computer system and medium |
CN110246516A (en) * | 2019-07-25 | 2019-09-17 | 福建师范大学福清分校 | The processing method of small space echo signal in a kind of voice communication |
CN112614500A (en) * | 2019-09-18 | 2021-04-06 | 北京声智科技有限公司 | Echo cancellation method, device, equipment and computer storage medium |
CN110944089A (en) * | 2019-11-04 | 2020-03-31 | 中移(杭州)信息技术有限公司 | Double-talk detection method and electronic equipment |
CN111049848A (en) * | 2019-12-23 | 2020-04-21 | 腾讯科技(深圳)有限公司 | Call method, device, system, server and storage medium |
US11842751B2 (en) | 2019-12-23 | 2023-12-12 | Tencent Technology (Shenzhen) Company Limited | Call method, apparatus, and system, server, and storage medium |
CN111049848B (en) * | 2019-12-23 | 2021-11-23 | 腾讯科技(深圳)有限公司 | Call method, device, system, server and storage medium |
CN111048118B (en) * | 2019-12-24 | 2022-07-26 | 大众问问(北京)信息科技有限公司 | Voice signal processing method and device and terminal |
CN111048118A (en) * | 2019-12-24 | 2020-04-21 | 大众问问(北京)信息科技有限公司 | Voice signal processing method and device and terminal |
CN111161748A (en) * | 2020-02-20 | 2020-05-15 | 百度在线网络技术(北京)有限公司 | Double-talk state detection method and device and electronic equipment |
US11804235B2 (en) | 2020-02-20 | 2023-10-31 | Baidu Online Network Technology (Beijing) Co., Ltd. | Double-talk state detection method and device, and electronic device |
CN114242106A (en) * | 2020-09-09 | 2022-03-25 | 中车株洲电力机车研究所有限公司 | Voice processing method and device |
CN112637833B (en) * | 2020-12-21 | 2022-10-11 | 新疆品宣生物科技有限责任公司 | Communication terminal information detection method and equipment |
CN112637833A (en) * | 2020-12-21 | 2021-04-09 | 新疆品宣生物科技有限责任公司 | Communication terminal information detection method and device |
CN113223546A (en) * | 2020-12-28 | 2021-08-06 | 南京愔宜智能科技有限公司 | Audio and video conference system and echo cancellation device for same |
CN113241085A (en) * | 2021-04-29 | 2021-08-10 | 北京梧桐车联科技有限责任公司 | Echo cancellation method, device, equipment and readable storage medium |
CN113241085B (en) * | 2021-04-29 | 2022-07-22 | 北京梧桐车联科技有限责任公司 | Echo cancellation method, device, equipment and readable storage medium |
CN115273909A (en) * | 2022-07-28 | 2022-11-01 | 歌尔科技有限公司 | Voice activity detection method, device, equipment and computer readable storage medium |
CN115273909B (en) * | 2022-07-28 | 2024-07-30 | 歌尔科技有限公司 | Voice activity detection method, device, equipment and computer readable storage medium |
CN117437929A (en) * | 2023-12-21 | 2024-01-23 | 睿云联(厦门)网络通讯技术有限公司 | Real-time echo cancellation method based on neural network |
CN117437929B (en) * | 2023-12-21 | 2024-03-08 | 睿云联(厦门)网络通讯技术有限公司 | Real-time echo cancellation method based on neural network |
CN118645113A (en) * | 2024-08-14 | 2024-09-13 | 腾讯科技(深圳)有限公司 | Voice signal processing method, device, equipment, medium and product |
CN118645113B (en) * | 2024-08-14 | 2024-10-29 | 腾讯科技(深圳)有限公司 | Voice signal processing method, device, equipment, medium and product |
Also Published As
Publication number | Publication date |
---|---|
CN105957520B (en) | 2019-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105957520A (en) | Voice state detection method suitable for echo cancellation system | |
WO2020042706A1 (en) | Deep learning-based acoustic echo cancellation method | |
CN109524020B (en) | Speech enhancement processing method | |
CN107123430A (en) | Echo cancellation method, device, conference tablet and computer storage medium | |
US20200105287A1 (en) | Deep neural network-based method and apparatus for combining noise and echo removal | |
Carbajal et al. | Multiple-input neural network-based residual echo suppression | |
CN104157293B (en) | The signal processing method of targeted voice signal pickup in a kind of enhancing acoustic environment | |
Pfeifenberger et al. | DNN-based speech mask estimation for eigenvector beamforming | |
CN111161752A (en) | Echo cancellation method and device | |
CN106486131A (en) | A kind of method and device of speech de-noising | |
CN103440872B (en) | The denoising method of transient state noise | |
CN106157964A (en) | A kind of determine the method for system delay in echo cancellor | |
CN106157967A (en) | Impulse noise mitigation | |
CN108417224A (en) | The training and recognition methods of two way blocks model and system | |
CN108922515A (en) | Speech model training method, audio recognition method, device, equipment and medium | |
CN104427143B (en) | residual echo detection method and system | |
CN111883154B (en) | Echo cancellation method and device, computer-readable storage medium, and electronic device | |
CN111223492A (en) | Echo path delay estimation method and device | |
CN106161820B (en) | A kind of interchannel decorrelation method for stereo acoustic echo canceler | |
CN107274892A (en) | Method for distinguishing speek person and device | |
CN112382301A (en) | Noise-containing voice gender identification method and system based on lightweight neural network | |
CN115083431A (en) | Echo cancellation method and device, electronic equipment and computer readable medium | |
Dionelis et al. | Modulation-domain Kalman filtering for monaural blind speech denoising and dereverberation | |
CN103971697B (en) | Sound enhancement method based on non-local mean filtering | |
CN105654959A (en) | Self-adaptive filtering coefficient updating method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |