CN105810201A - Voice activity detection method and system - Google Patents

Voice activity detection method and system Download PDF

Info

Publication number
CN105810201A
CN105810201A CN201410853931.6A CN201410853931A CN105810201A CN 105810201 A CN105810201 A CN 105810201A CN 201410853931 A CN201410853931 A CN 201410853931A CN 105810201 A CN105810201 A CN 105810201A
Authority
CN
China
Prior art keywords
signal
noise
noise ratio
present frame
spectrum density
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410853931.6A
Other languages
Chinese (zh)
Other versions
CN105810201B (en
Inventor
孙廷玮
林福辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN201410853931.6A priority Critical patent/CN105810201B/en
Publication of CN105810201A publication Critical patent/CN105810201A/en
Application granted granted Critical
Publication of CN105810201B publication Critical patent/CN105810201B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Time-Division Multiplex Systems (AREA)
  • Noise Elimination (AREA)

Abstract

The invention provides a voice activity detection method and system. The method comprises the steps: calculating the spectrum density of a current frame of an audio signal; estimating an expected value of the spectrum density of noise; calculating the signal to noise ratio of the current frame based on the spectrum density of the current frame and the spectrum density of noise; and generating a voice activity detection result based on the signal to noise ratio of the current frame and a preset threshold. Therefore, the voice activity detection result is related with the probability statistics distribution of noise, thereby overcoming the impact on the detection result from noise. Meanwhile, the preset threshold is a dynamic threshold and is related with the change of noise, thereby enabling the detection result to be adapted to the noise environment of the current frame.

Description

Voice activity detection method and system thereof
Technical field
The present invention relates to speech recognition technology, particularly relate to a kind of voice activity detection method and system thereof.
Background technology
Voice activity detection (VoiceActivitydetection, VAD) is also referred to as speech detection, for detecting the presence or absence of voice in speech processes, thus the voice segments in signal and non-speech segment being separated.VAD can be used for Echo cancellation, noise suppression, language person identification and speech recognition etc..
Traditional vad algorithm often selects the features such as the short-time energy of audio signal, spectrum energy, zero-crossing rate to judge.Therefore, under the environment of pure voice environment and high s/n ratio, better performances.And under the background environment that low signal-to-noise ratio or noise are unstable, testing result can be subject to the impact of feature of noise amount, thus causing hydraulic performance decline.
Along with the development of speech recognition technology, the requirement of voice activity detection is also more and more higher.Accordingly, it would be desirable to a kind of VAD detection method, it is possible in the presence of a harsh environment, such as in the environment of noise instability or low signal-to-noise ratio, still keep good detection performance.
Summary of the invention
The problem that this invention address that makes the performance of voice activity detection will not decline under the environment of noise instability or low signal-to-noise ratio.
For solving the problems referred to above, the invention provides a kind of voice activity detection method, including: calculate the spectrum density of audio signal present frame;Calculate the expected value of noise spectrum density;Based on the expected value of the spectrum density of described present frame and described noise spectrum density, calculate the signal to noise ratio of present frame;And based on the signal to noise ratio of described present frame and pre-determined threshold, generate voice activity detection result.
Alternatively, the expected value of noise spectrum density is based on the statistical distribution calculating of noise.
Alternatively, the calculating of described signal to noise ratio is based on formula:
Wherein, SNR represents signal to noise ratio.
Alternatively, described pre-determined threshold is dynamic threshold and changes with the change of signal to noise ratio.
Alternatively, the calculating of described dynamic threshold is based on formula:
γ = 2 * D * er fc - 1 ( 2 P FA )
Wherein, γ represents that dynamic threshold, D represent the variance of signal to noise ratio, PFARepresent the probability of false-alarm.
Alternatively, when described signal to noise ratio is more than described pre-determined threshold, the voice activity detection result of generation is the present frame of described audio signal is voice segments;When described signal to noise ratio is less than described pre-determined threshold, the voice activity detection result of generation is the present frame of described audio signal is non-speech segment.
Present invention also offers a kind of voice activity detection system, including: receive unit, be used for receiving audio signal;Processing unit, for calculating the signal to noise ratio of present frame, the expected value of spectrum density and noise spectrum density that the signal to noise ratio of wherein said present frame is based on described audio signal present frame calculates;And judging unit, it is configured to based on the signal to noise ratio of described present frame and pre-determined threshold, voice activity detection result to be generated.
Alternatively, described processing unit includes: the first computing unit is for calculating the spectrum density of described audio signal present frame;Second computing unit is for calculating the expected value of noise spectrum density;And the 3rd computing unit, for calculating the signal to noise ratio of described present frame.
Alternatively, the expected value of described noise spectrum density be based on noise statistical distribution calculate and come.
Alternatively, the calculating of described signal to noise ratio is based on formula:
Wherein, SNR represents signal to noise ratio.
Alternatively, described pre-determined threshold is dynamic threshold and changes with the change of signal to noise ratio.
Alternatively, described processing unit farther includes: the 4th computing unit, is used for calculating described dynamic threshold.
Alternatively, the calculating of described dynamic threshold is based on formula:
γ = 2 * D * er fc - 1 ( 2 P FA )
Wherein, γ represents that dynamic threshold, D represent the variance of signal to noise ratio, PFARepresent the probability of false-alarm.
Alternatively, when described signal to noise ratio is more than described pre-determined threshold, the voice activity detection result of generation is the present frame of described audio signal is voice segments;When described signal to noise ratio is less than described pre-determined threshold, the present frame that voice activity detection result is described audio signal of generation is non-speech segment.
Compared with prior art, technical scheme has the advantage that
First, the VAD judged result of the present invention is based on the statistical distribution of noise, rather than generated by the statistical distribution of voice signal.Specifically, voice activity detection method provided by the invention needs the probability distribution of statistics noise, and based on the expected value of this estimation noise spectrum density, and then generate judged result.Due in actual life, noise is belonging to the signal of long-term stability, therefore, as long as the probability distribution statistical of noise is appropriate, then no matter VAD judged result is when the present frame of voice signal to be measured is in steady noise environment, when being in the environment of unstable noise, all can have and detect performance preferably.
Further, when carrying out VAD and judging, employing is dynamic threshold, and this dynamic threshold is relevant with the change of noise, so that dynamic threshold can change along with the change of noise signal, with the background environment that good self adaptation is current.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the voice activity detection method of one embodiment of the invention;
Fig. 2 is the structural representation of the voice activity detection system of one embodiment of the invention;And
Fig. 3 is the structural representation of the processing unit in the voice activity detection system of one embodiment of the invention.
Detailed description of the invention
Understandable for enabling the above-mentioned purpose of the present invention, feature and advantage to become apparent from, below in conjunction with accompanying drawing, specific embodiments of the invention are described in detail.
With reference to Fig. 1, illustrate the voice activity detection method 100 of one embodiment of the invention.The method comprises the following steps.
S101, calculates the spectrum density of audio signal present frame.
In the method 100, voice activity detection carries out frame by frame.Specifically, audio signal is divided into multiple frame, then each frame of this audio signal is detected respectively, and then determines voice segments and the non-speech segment of audio signal.Wherein, the length range of each frame can set that as 10ms to 30ms.Therefore, the present frame of audio signal is the frame being currently needed for carrying out voice activity detection.
In certain embodiments, the spectrum density of described present frame is the frequency spectral density (PowerSpectralDensity of present frame, PSD), thus the power that the audio signal calculating present frame has in cell frequency, using the tolerance of the characteristic quantity as present frame.
In certain embodiments, the calculating of the spectrum density of audio signal present frame can adopt Pasteur's Power estimation algorithm (BartlettAlgorithm).In certain embodiments, the calculating of the spectrum density of present frame can also adopt periodogram algorithm.The algorithm of present frame spectrum density is not limited as by the present invention.
S103, calculates the expected value of noise spectrum density.
The expected value of noise spectrum density is based on the statistical distribution of noise signal and calculates, and wherein, the statistical distribution of noise is to carry out under the pure noisy environment not having voice signal.
In certain embodiments, the estimation of noise spectrum density can adopt the algorithm with less variance, and ratio Pasteur's algorithm described above, to improve the detection performance of VAD.This is because, method 100, when the present frame of audio signal carrying out VAD and judging, namely when judging that present frame is voice segments or non-speech segment, relates to the variance of noise, can elaborate in this below step.
It is noted that voice activity detection method 100 provided by the invention is built upon on some assumed conditions.Specifically, assume initially that and voice signal and what noise signal was independent from be absent from incidence relation from each other;Secondly assume that noise signal is steady in a long-term, and voice signal is short-term stability.It addition, these hypothesis are consistent with practical situation, therefore, the method and system based on these hypothesis also has real value.
S105, based on the expected value of the spectrum density of present frame and noise spectrum density, calculates signal to noise ratio (SNR).
The calculating of signal to noise ratio can based on formula:
Wherein, the expected value of noise spectrum density be based on signal statistical distribution calculate and come, therefore, the signal to noise ratio of present frame be also based on noise statistical distribution calculate obtain.
S107, based on the variance of SNR, calculates dynamic threshold γ.
The calculating of dynamic threshold γ can based on formula:
γ = 2 * D * er fc - 1 ( 2 P FA )
Wherein, D represents the variance of SNR, PFARepresent false-alarm (FalseAlarm) probability.False-alarm probability refers to that noise is mistaken for the probability of voice, and namely when not having voice signal, signal to noise ratio snr is judged as the probability more than γ.
Therefore, dynamic threshold is relevant with the variance of SNR, namely relevant with the change of SNR, simultaneously, when noise instability, the variance of SNR can change, so, the value of dynamic threshold can change along with the change of noise, therefore, voice activity detection method 100 provided by the invention can the change of self adaptation noise, thus under or the environment of signal to noise ratio unstable at noise, detection performance will not decline.
It addition, dynamic threshold also with false-alarm probability PFARelevant, in actual applications, it is possible to by controlling false-alarm probability, improve the performance of voice activity detection method 100.
S109, based on SNR and γ, generates the VAD judged result of present frame.
Wherein, when SNR is more than γ, the VAD judged result of generation is present frame is voice segments, and namely the present frame of audio signal is voice segments;When SNR is less than γ, the VAD judged result of generation is present frame is non-speech segment, and namely the present frame of audio signal is non-speech segment.
In certain embodiments, the generation of VAD judged result may be based on the SNR of present frame and what fixed threshold value generated.Concrete, when SNR is more than this fixed threshold value, then judge that present frame is as voice segments;When SNR is less than this fixed threshold, then judge that present frame is as non-speech segment.
Therefore, VAD method 100 provided by the invention is based on the statistical distribution of noise, rather than what voice-based statistical distribution carried out.Meanwhile, the generation of VAD judged result is relevant with the change (variance of signal to noise ratio) of real-time signal to noise ratio (SNR of present frame) and noise.Such that it is able to overcome the impact that VAD judged result is caused by noise instability or low signal-to-noise ratio.
It addition, the generation of VAD judged result only needs to consider the SNR of present frame, without considering priori and posteriority SNR.Therefore, speech detection method 100 provided by the invention is simpler, such that it is able to improve the efficiency of detection.
With reference to Fig. 2, illustrate the voice activity detection system 200 of one embodiment of the invention.This system includes: receives unit 201 and is used for receiving audio signal;Processing unit 203 is for calculating the signal to noise ratio of described audio signal present frame, and wherein, the expected value of spectrum density and noise spectrum density that this signal to noise ratio is based on present frame calculates acquisition;And judging unit 205 is used for generating VAD judged result, wherein this judged result be based on processing unit 203 calculate signal to noise ratio and pre-determined threshold generate.
With reference to Fig. 3, processing unit 203 includes the first computing unit 2031 for calculating the spectrum density of present frame, and the second computing unit 2033 is for calculating the expected value of noise spectrum density, and the 3rd computing unit 2035 is for calculating the signal to noise ratio of present frame.
The signal to noise ratio (SNR) that 3rd computing unit 2035 calculates is based on formula:
Wherein, the expected value of noise spectrum density be based on signal statistical distribution calculate and come, therefore, the signal to noise ratio of present frame be also based on noise statistical distribution calculate obtain.It addition, the statistical distribution of noise is to carry out in the pure noise situation not having voice signal movable.
In certain embodiments, the calculating of audio signal present frame spectrum density and noise spectrum density can adopt Pasteur's Power estimation algorithm (BartlettAlgorithm), periodogram algorithm etc..The present invention is not limited as when this.But, it should be noted that the estimation of noise spectrum density is preferably with the algorithm with less variance, ratio Pasteur's algorithm described above, thus improving the detection performance of VAD method, this is because the variance that the judging unit of this system 200 is based on noise generates VAD judged result.
Described spectrum density can be frequency spectral density (PowerSpectralDensity, PSD), thus calculating the power having in the audio signal cell frequency of present frame, as the tolerance to present frame.
In certain embodiments, described pre-determined threshold is dynamic threshold, and this dynamic threshold is calculated by processing unit 203.Specifically, processing unit 203 still further comprises the 4th computing unit 2037, is used for calculating dynamic threshold, and the calculating of this dynamic threshold γ can based on formula:
γ = 2 * D * er fc - 1 ( 2 P FA )
Wherein, D represents the variance of SNR, PFARepresent false-alarm (FalseAlarm) probability.False-alarm probability refers to that noise is mistaken for the probability of voice, and namely when not having voice signal, signal to noise ratio snr is judged as the probability more than γ.
Therefore, dynamic threshold is relevant with the variance of SNR, namely relevant with the change of SNR, simultaneously, when noise instability, the variance of SNR can change, so, the value of dynamic threshold can change along with the change of noise, and therefore, the judged result of VAD can the change of self adaptation noise.
It addition, dynamic threshold also with false-alarm probability PFARelevant, in actual applications, it is possible to by reducing the probability of false-alarm, improve the VAD degree of accuracy judged.
Judging unit 205 is when carrying out VAD and judging, if the signal to noise ratio snr of present frame is more than dynamic threshold γ, then present frame is judged as voice segments;If the signal to noise ratio snr of present frame is less than dynamic threshold γ, then present frame is judged as non-speech segment.
In system 200 provided by the invention, the generation of VAD judged result is relevant with the change (variance of signal to noise ratio) of real-time signal to noise ratio (SNR of present frame) and noise.Thus overcoming the impact that noise is unstable or VAD judged result is caused by low signal-to-noise ratio.It addition, each frame is carried out VAD when judging, it is only necessary to consider the situation of current SNR, without the situation considering priori and posteriority SNR, it is judged that method is simpler.
System 200 can be passed through audio signal is carried out VAD judgement frame by frame, to determine voice segments and the non-speech segment of this audio signal.
System 200 can further include performance element 207 be configured to can: the different frame (voice segments and non-speech segment) of audio signal is performed different operations by the VAD judged result based on judging unit 205, for instance, identify, decoding etc..
Although present disclosure is as above, but the present invention is not limited to this.Any those skilled in the art, without departing from the spirit and scope of the present invention, all can make various changes or modifications, and therefore protection scope of the present invention should be as the criterion with claim limited range.

Claims (14)

1. a voice activity detection method, it is characterised in that including:
Calculate the spectrum density of audio signal present frame;
Calculate the expected value of noise spectrum density;
Based on the expected value of the spectrum density of described present frame and described noise spectrum density, calculate the signal to noise ratio of present frame;With
Based on signal to noise ratio and the pre-determined threshold of described present frame, generate voice activity detection result.
2. method according to claim 1, it is characterised in that wherein the expected value of noise spectrum density is based on the statistical distribution calculating of noise.
3. method according to claim 1, it is characterised in that the calculating of wherein said signal to noise ratio is based on formula:
Wherein, SNR represents signal to noise ratio.
4. method according to claim 1, it is characterised in that wherein said pre-determined threshold is dynamic threshold and changes with the change of signal to noise ratio.
5. method according to claim 4, it is characterised in that the calculating of wherein said dynamic threshold is based on formula:
γ = 2 * D * erfc - 1 ( 2 P FA )
Wherein, γ represents that dynamic threshold, D represent the variance of signal to noise ratio, PFARepresent the probability of false-alarm.
6. method according to claim 1, it is characterised in that when described signal to noise ratio is more than described pre-determined threshold, the voice activity detection result of generation is the present frame of described audio signal is voice segments;When described signal to noise ratio is less than described pre-determined threshold, the voice activity detection result of generation is the present frame of described audio signal is non-speech segment.
7. a voice activity detection system, it is characterised in that including:
Receive unit, be used for receiving audio signal;
Processing unit, for calculating the signal to noise ratio of present frame, the expected value of spectrum density and noise spectrum density that the signal to noise ratio of wherein said present frame is based on described audio signal present frame calculates;And
Judging unit, is configured to based on the signal to noise ratio of described present frame and pre-determined threshold, to generate voice activity detection result.
8. system according to claim 7, it is characterised in that described processing unit includes: the first computing unit is for calculating the spectrum density of described audio signal present frame;Second computing unit is for calculating the expected value of noise spectrum density;And the 3rd computing unit, for calculating the signal to noise ratio of described present frame.
9. system according to claim 7, it is characterised in that the expected value of described noise spectrum density be based on noise statistical distribution calculate and come.
10. system according to claim 7, it is characterised in that the calculating of wherein said signal to noise ratio is based on formula:
Wherein, SNR represents signal to noise ratio.
11. system according to claim 7, it is characterised in that wherein said pre-determined threshold is dynamic threshold and changes with the change of signal to noise ratio.
12. system according to claim 11, it is characterised in that described processing unit farther includes: the 4th computing unit, it is used for calculating described dynamic threshold.
13. system according to claim 12, it is characterised in that the calculating of wherein said dynamic threshold is based on formula:
γ = 2 * D * erfc - 1 ( 2 P FA )
Wherein, γ represents that dynamic threshold, D represent the variance of signal to noise ratio, PFARepresent the probability of false-alarm.
14. system according to claim 7, it is characterised in that when described signal to noise ratio is more than described pre-determined threshold, the voice activity detection result of generation is the present frame of described audio signal is voice segments;When described signal to noise ratio is less than described pre-determined threshold, the present frame that voice activity detection result is described audio signal of generation is non-speech segment.
CN201410853931.6A 2014-12-31 2014-12-31 Voice activity detection method and its system Active CN105810201B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410853931.6A CN105810201B (en) 2014-12-31 2014-12-31 Voice activity detection method and its system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410853931.6A CN105810201B (en) 2014-12-31 2014-12-31 Voice activity detection method and its system

Publications (2)

Publication Number Publication Date
CN105810201A true CN105810201A (en) 2016-07-27
CN105810201B CN105810201B (en) 2019-07-02

Family

ID=56464829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410853931.6A Active CN105810201B (en) 2014-12-31 2014-12-31 Voice activity detection method and its system

Country Status (1)

Country Link
CN (1) CN105810201B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106356070A (en) * 2016-08-29 2017-01-25 广州市百果园网络科技有限公司 Audio signal processing method and device
CN106384597A (en) * 2016-08-31 2017-02-08 广州市百果园网络科技有限公司 Audio frequency data processing method and device
CN106910507A (en) * 2017-01-23 2017-06-30 中国科学院声学研究所 A kind of method and system detected with identification
CN107393553A (en) * 2017-07-14 2017-11-24 深圳永顺智信息科技有限公司 Aural signature extracting method for voice activity detection
CN107910016A (en) * 2017-12-19 2018-04-13 河海大学 A kind of noise containment determination methods of noisy speech
WO2019183747A1 (en) * 2018-03-26 2019-10-03 深圳市汇顶科技股份有限公司 Voice detection method and apparatus
CN112053702A (en) * 2020-09-30 2020-12-08 北京大米科技有限公司 Voice processing method and device and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1354455A (en) * 2000-11-18 2002-06-19 深圳市中兴通讯股份有限公司 Sound activation detection method for identifying speech and music from noise environment
CN1783211A (en) * 2004-11-25 2006-06-07 Lg电子株式会社 Speech detection method
CN101010722A (en) * 2004-08-30 2007-08-01 诺基亚公司 Detection of voice activity in an audio signal
CN101080765A (en) * 2005-05-09 2007-11-28 株式会社东芝 Voice activity detection apparatus and method
CN101197130A (en) * 2006-12-07 2008-06-11 华为技术有限公司 Sound activity detecting method and detector thereof
CN101599269A (en) * 2009-07-02 2009-12-09 中国农业大学 Sound end detecting method and device
CN102800322A (en) * 2011-05-27 2012-11-28 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity
CN103903634A (en) * 2012-12-25 2014-07-02 中兴通讯股份有限公司 Voice activation detection (VAD), and method and apparatus for the VAD

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1354455A (en) * 2000-11-18 2002-06-19 深圳市中兴通讯股份有限公司 Sound activation detection method for identifying speech and music from noise environment
CN101010722A (en) * 2004-08-30 2007-08-01 诺基亚公司 Detection of voice activity in an audio signal
CN1783211A (en) * 2004-11-25 2006-06-07 Lg电子株式会社 Speech detection method
CN101080765A (en) * 2005-05-09 2007-11-28 株式会社东芝 Voice activity detection apparatus and method
CN101197130A (en) * 2006-12-07 2008-06-11 华为技术有限公司 Sound activity detecting method and detector thereof
CN101599269A (en) * 2009-07-02 2009-12-09 中国农业大学 Sound end detecting method and device
CN102800322A (en) * 2011-05-27 2012-11-28 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity
CN103903634A (en) * 2012-12-25 2014-07-02 中兴通讯股份有限公司 Voice activation detection (VAD), and method and apparatus for the VAD

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106356070B (en) * 2016-08-29 2019-10-29 广州市百果园网络科技有限公司 A kind of acoustic signal processing method and device
CN106356070A (en) * 2016-08-29 2017-01-25 广州市百果园网络科技有限公司 Audio signal processing method and device
CN106384597A (en) * 2016-08-31 2017-02-08 广州市百果园网络科技有限公司 Audio frequency data processing method and device
CN106910507A (en) * 2017-01-23 2017-06-30 中国科学院声学研究所 A kind of method and system detected with identification
CN106910507B (en) * 2017-01-23 2020-04-24 中国科学院声学研究所 Detection and identification method and system
CN107393553A (en) * 2017-07-14 2017-11-24 深圳永顺智信息科技有限公司 Aural signature extracting method for voice activity detection
CN107910016A (en) * 2017-12-19 2018-04-13 河海大学 A kind of noise containment determination methods of noisy speech
CN107910016B (en) * 2017-12-19 2021-07-27 河海大学 Noise tolerance judgment method for noisy speech
WO2019183747A1 (en) * 2018-03-26 2019-10-03 深圳市汇顶科技股份有限公司 Voice detection method and apparatus
CN110537223A (en) * 2018-03-26 2019-12-03 深圳市汇顶科技股份有限公司 The method and apparatus of speech detection
CN110537223B (en) * 2018-03-26 2022-07-05 深圳市汇顶科技股份有限公司 Voice detection method and device
CN112053702A (en) * 2020-09-30 2020-12-08 北京大米科技有限公司 Voice processing method and device and electronic equipment
CN112053702B (en) * 2020-09-30 2024-03-19 北京大米科技有限公司 Voice processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN105810201B (en) 2019-07-02

Similar Documents

Publication Publication Date Title
CN105810201A (en) Voice activity detection method and system
Moattar et al. A simple but efficient real-time voice activity detection algorithm
Aneeja et al. Single frequency filtering approach for discriminating speech and nonspeech
JP5874344B2 (en) Voice determination device, voice determination method, and voice determination program
CN106098076B (en) One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise
WO2021114733A1 (en) Noise suppression method for processing at different frequency bands, and system thereof
CN105321528B (en) A kind of Microphone Array Speech detection method and device
CN104091603B (en) Endpoint detection system based on fundamental frequency and calculation method thereof
CN103325386A (en) Method and system for signal transmission control
CN110047470A (en) A kind of sound end detecting method
CN104464722A (en) Voice activity detection method and equipment based on time domain and frequency domain
CN103730126B (en) Noise suppressing method and noise silencer
CN103440872A (en) Transient state noise removing method
CN104867499A (en) Frequency-band-divided wiener filtering and de-noising method used for hearing aid and system thereof
CN110265058A (en) Estimate the ambient noise in audio signal
KR20180067920A (en) System and method for end-point detection of speech based in harmonic component
CN106504760A (en) Broadband background noise and speech Separation detecting system and method
Kumar Mean-median based noise estimation method using spectral subtraction for speech enhancement technique
CN103310800B (en) A kind of turbid speech detection method of anti-noise jamming and system
CN106486133B (en) One kind is uttered long and high-pitched sounds scene recognition method and equipment
WO2021197566A1 (en) Noise supression for speech enhancement
JP2014194437A (en) Voice processing device, voice processing method and voice processing program
KR101559716B1 (en) Dual-microphone voice activity detection method based on power level difference ratio, and apparatus thereof
CN112102818A (en) Signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation
Asgari et al. Voice activity detection using entropy in spectrum domain

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant