CN105810201A - Voice activity detection method and system - Google Patents
Voice activity detection method and system Download PDFInfo
- Publication number
- CN105810201A CN105810201A CN201410853931.6A CN201410853931A CN105810201A CN 105810201 A CN105810201 A CN 105810201A CN 201410853931 A CN201410853931 A CN 201410853931A CN 105810201 A CN105810201 A CN 105810201A
- Authority
- CN
- China
- Prior art keywords
- signal
- noise
- noise ratio
- present frame
- spectrum density
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Time-Division Multiplex Systems (AREA)
- Noise Elimination (AREA)
Abstract
The invention provides a voice activity detection method and system. The method comprises the steps: calculating the spectrum density of a current frame of an audio signal; estimating an expected value of the spectrum density of noise; calculating the signal to noise ratio of the current frame based on the spectrum density of the current frame and the spectrum density of noise; and generating a voice activity detection result based on the signal to noise ratio of the current frame and a preset threshold. Therefore, the voice activity detection result is related with the probability statistics distribution of noise, thereby overcoming the impact on the detection result from noise. Meanwhile, the preset threshold is a dynamic threshold and is related with the change of noise, thereby enabling the detection result to be adapted to the noise environment of the current frame.
Description
Technical field
The present invention relates to speech recognition technology, particularly relate to a kind of voice activity detection method and system thereof.
Background technology
Voice activity detection (VoiceActivitydetection, VAD) is also referred to as speech detection, for detecting the presence or absence of voice in speech processes, thus the voice segments in signal and non-speech segment being separated.VAD can be used for Echo cancellation, noise suppression, language person identification and speech recognition etc..
Traditional vad algorithm often selects the features such as the short-time energy of audio signal, spectrum energy, zero-crossing rate to judge.Therefore, under the environment of pure voice environment and high s/n ratio, better performances.And under the background environment that low signal-to-noise ratio or noise are unstable, testing result can be subject to the impact of feature of noise amount, thus causing hydraulic performance decline.
Along with the development of speech recognition technology, the requirement of voice activity detection is also more and more higher.Accordingly, it would be desirable to a kind of VAD detection method, it is possible in the presence of a harsh environment, such as in the environment of noise instability or low signal-to-noise ratio, still keep good detection performance.
Summary of the invention
The problem that this invention address that makes the performance of voice activity detection will not decline under the environment of noise instability or low signal-to-noise ratio.
For solving the problems referred to above, the invention provides a kind of voice activity detection method, including: calculate the spectrum density of audio signal present frame;Calculate the expected value of noise spectrum density;Based on the expected value of the spectrum density of described present frame and described noise spectrum density, calculate the signal to noise ratio of present frame;And based on the signal to noise ratio of described present frame and pre-determined threshold, generate voice activity detection result.
Alternatively, the expected value of noise spectrum density is based on the statistical distribution calculating of noise.
Alternatively, the calculating of described signal to noise ratio is based on formula:
Wherein, SNR represents signal to noise ratio.
Alternatively, described pre-determined threshold is dynamic threshold and changes with the change of signal to noise ratio.
Alternatively, the calculating of described dynamic threshold is based on formula:
Wherein, γ represents that dynamic threshold, D represent the variance of signal to noise ratio, PFARepresent the probability of false-alarm.
Alternatively, when described signal to noise ratio is more than described pre-determined threshold, the voice activity detection result of generation is the present frame of described audio signal is voice segments;When described signal to noise ratio is less than described pre-determined threshold, the voice activity detection result of generation is the present frame of described audio signal is non-speech segment.
Present invention also offers a kind of voice activity detection system, including: receive unit, be used for receiving audio signal;Processing unit, for calculating the signal to noise ratio of present frame, the expected value of spectrum density and noise spectrum density that the signal to noise ratio of wherein said present frame is based on described audio signal present frame calculates;And judging unit, it is configured to based on the signal to noise ratio of described present frame and pre-determined threshold, voice activity detection result to be generated.
Alternatively, described processing unit includes: the first computing unit is for calculating the spectrum density of described audio signal present frame;Second computing unit is for calculating the expected value of noise spectrum density;And the 3rd computing unit, for calculating the signal to noise ratio of described present frame.
Alternatively, the expected value of described noise spectrum density be based on noise statistical distribution calculate and come.
Alternatively, the calculating of described signal to noise ratio is based on formula:
Wherein, SNR represents signal to noise ratio.
Alternatively, described pre-determined threshold is dynamic threshold and changes with the change of signal to noise ratio.
Alternatively, described processing unit farther includes: the 4th computing unit, is used for calculating described dynamic threshold.
Alternatively, the calculating of described dynamic threshold is based on formula:
Wherein, γ represents that dynamic threshold, D represent the variance of signal to noise ratio, PFARepresent the probability of false-alarm.
Alternatively, when described signal to noise ratio is more than described pre-determined threshold, the voice activity detection result of generation is the present frame of described audio signal is voice segments;When described signal to noise ratio is less than described pre-determined threshold, the present frame that voice activity detection result is described audio signal of generation is non-speech segment.
Compared with prior art, technical scheme has the advantage that
First, the VAD judged result of the present invention is based on the statistical distribution of noise, rather than generated by the statistical distribution of voice signal.Specifically, voice activity detection method provided by the invention needs the probability distribution of statistics noise, and based on the expected value of this estimation noise spectrum density, and then generate judged result.Due in actual life, noise is belonging to the signal of long-term stability, therefore, as long as the probability distribution statistical of noise is appropriate, then no matter VAD judged result is when the present frame of voice signal to be measured is in steady noise environment, when being in the environment of unstable noise, all can have and detect performance preferably.
Further, when carrying out VAD and judging, employing is dynamic threshold, and this dynamic threshold is relevant with the change of noise, so that dynamic threshold can change along with the change of noise signal, with the background environment that good self adaptation is current.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the voice activity detection method of one embodiment of the invention;
Fig. 2 is the structural representation of the voice activity detection system of one embodiment of the invention;And
Fig. 3 is the structural representation of the processing unit in the voice activity detection system of one embodiment of the invention.
Detailed description of the invention
Understandable for enabling the above-mentioned purpose of the present invention, feature and advantage to become apparent from, below in conjunction with accompanying drawing, specific embodiments of the invention are described in detail.
With reference to Fig. 1, illustrate the voice activity detection method 100 of one embodiment of the invention.The method comprises the following steps.
S101, calculates the spectrum density of audio signal present frame.
In the method 100, voice activity detection carries out frame by frame.Specifically, audio signal is divided into multiple frame, then each frame of this audio signal is detected respectively, and then determines voice segments and the non-speech segment of audio signal.Wherein, the length range of each frame can set that as 10ms to 30ms.Therefore, the present frame of audio signal is the frame being currently needed for carrying out voice activity detection.
In certain embodiments, the spectrum density of described present frame is the frequency spectral density (PowerSpectralDensity of present frame, PSD), thus the power that the audio signal calculating present frame has in cell frequency, using the tolerance of the characteristic quantity as present frame.
In certain embodiments, the calculating of the spectrum density of audio signal present frame can adopt Pasteur's Power estimation algorithm (BartlettAlgorithm).In certain embodiments, the calculating of the spectrum density of present frame can also adopt periodogram algorithm.The algorithm of present frame spectrum density is not limited as by the present invention.
S103, calculates the expected value of noise spectrum density.
The expected value of noise spectrum density is based on the statistical distribution of noise signal and calculates, and wherein, the statistical distribution of noise is to carry out under the pure noisy environment not having voice signal.
In certain embodiments, the estimation of noise spectrum density can adopt the algorithm with less variance, and ratio Pasteur's algorithm described above, to improve the detection performance of VAD.This is because, method 100, when the present frame of audio signal carrying out VAD and judging, namely when judging that present frame is voice segments or non-speech segment, relates to the variance of noise, can elaborate in this below step.
It is noted that voice activity detection method 100 provided by the invention is built upon on some assumed conditions.Specifically, assume initially that and voice signal and what noise signal was independent from be absent from incidence relation from each other;Secondly assume that noise signal is steady in a long-term, and voice signal is short-term stability.It addition, these hypothesis are consistent with practical situation, therefore, the method and system based on these hypothesis also has real value.
S105, based on the expected value of the spectrum density of present frame and noise spectrum density, calculates signal to noise ratio (SNR).
The calculating of signal to noise ratio can based on formula:
Wherein, the expected value of noise spectrum density be based on signal statistical distribution calculate and come, therefore, the signal to noise ratio of present frame be also based on noise statistical distribution calculate obtain.
S107, based on the variance of SNR, calculates dynamic threshold γ.
The calculating of dynamic threshold γ can based on formula:
Wherein, D represents the variance of SNR, PFARepresent false-alarm (FalseAlarm) probability.False-alarm probability refers to that noise is mistaken for the probability of voice, and namely when not having voice signal, signal to noise ratio snr is judged as the probability more than γ.
Therefore, dynamic threshold is relevant with the variance of SNR, namely relevant with the change of SNR, simultaneously, when noise instability, the variance of SNR can change, so, the value of dynamic threshold can change along with the change of noise, therefore, voice activity detection method 100 provided by the invention can the change of self adaptation noise, thus under or the environment of signal to noise ratio unstable at noise, detection performance will not decline.
It addition, dynamic threshold also with false-alarm probability PFARelevant, in actual applications, it is possible to by controlling false-alarm probability, improve the performance of voice activity detection method 100.
S109, based on SNR and γ, generates the VAD judged result of present frame.
Wherein, when SNR is more than γ, the VAD judged result of generation is present frame is voice segments, and namely the present frame of audio signal is voice segments;When SNR is less than γ, the VAD judged result of generation is present frame is non-speech segment, and namely the present frame of audio signal is non-speech segment.
In certain embodiments, the generation of VAD judged result may be based on the SNR of present frame and what fixed threshold value generated.Concrete, when SNR is more than this fixed threshold value, then judge that present frame is as voice segments;When SNR is less than this fixed threshold, then judge that present frame is as non-speech segment.
Therefore, VAD method 100 provided by the invention is based on the statistical distribution of noise, rather than what voice-based statistical distribution carried out.Meanwhile, the generation of VAD judged result is relevant with the change (variance of signal to noise ratio) of real-time signal to noise ratio (SNR of present frame) and noise.Such that it is able to overcome the impact that VAD judged result is caused by noise instability or low signal-to-noise ratio.
It addition, the generation of VAD judged result only needs to consider the SNR of present frame, without considering priori and posteriority SNR.Therefore, speech detection method 100 provided by the invention is simpler, such that it is able to improve the efficiency of detection.
With reference to Fig. 2, illustrate the voice activity detection system 200 of one embodiment of the invention.This system includes: receives unit 201 and is used for receiving audio signal;Processing unit 203 is for calculating the signal to noise ratio of described audio signal present frame, and wherein, the expected value of spectrum density and noise spectrum density that this signal to noise ratio is based on present frame calculates acquisition;And judging unit 205 is used for generating VAD judged result, wherein this judged result be based on processing unit 203 calculate signal to noise ratio and pre-determined threshold generate.
With reference to Fig. 3, processing unit 203 includes the first computing unit 2031 for calculating the spectrum density of present frame, and the second computing unit 2033 is for calculating the expected value of noise spectrum density, and the 3rd computing unit 2035 is for calculating the signal to noise ratio of present frame.
The signal to noise ratio (SNR) that 3rd computing unit 2035 calculates is based on formula:
Wherein, the expected value of noise spectrum density be based on signal statistical distribution calculate and come, therefore, the signal to noise ratio of present frame be also based on noise statistical distribution calculate obtain.It addition, the statistical distribution of noise is to carry out in the pure noise situation not having voice signal movable.
In certain embodiments, the calculating of audio signal present frame spectrum density and noise spectrum density can adopt Pasteur's Power estimation algorithm (BartlettAlgorithm), periodogram algorithm etc..The present invention is not limited as when this.But, it should be noted that the estimation of noise spectrum density is preferably with the algorithm with less variance, ratio Pasteur's algorithm described above, thus improving the detection performance of VAD method, this is because the variance that the judging unit of this system 200 is based on noise generates VAD judged result.
Described spectrum density can be frequency spectral density (PowerSpectralDensity, PSD), thus calculating the power having in the audio signal cell frequency of present frame, as the tolerance to present frame.
In certain embodiments, described pre-determined threshold is dynamic threshold, and this dynamic threshold is calculated by processing unit 203.Specifically, processing unit 203 still further comprises the 4th computing unit 2037, is used for calculating dynamic threshold, and the calculating of this dynamic threshold γ can based on formula:
Wherein, D represents the variance of SNR, PFARepresent false-alarm (FalseAlarm) probability.False-alarm probability refers to that noise is mistaken for the probability of voice, and namely when not having voice signal, signal to noise ratio snr is judged as the probability more than γ.
Therefore, dynamic threshold is relevant with the variance of SNR, namely relevant with the change of SNR, simultaneously, when noise instability, the variance of SNR can change, so, the value of dynamic threshold can change along with the change of noise, and therefore, the judged result of VAD can the change of self adaptation noise.
It addition, dynamic threshold also with false-alarm probability PFARelevant, in actual applications, it is possible to by reducing the probability of false-alarm, improve the VAD degree of accuracy judged.
Judging unit 205 is when carrying out VAD and judging, if the signal to noise ratio snr of present frame is more than dynamic threshold γ, then present frame is judged as voice segments;If the signal to noise ratio snr of present frame is less than dynamic threshold γ, then present frame is judged as non-speech segment.
In system 200 provided by the invention, the generation of VAD judged result is relevant with the change (variance of signal to noise ratio) of real-time signal to noise ratio (SNR of present frame) and noise.Thus overcoming the impact that noise is unstable or VAD judged result is caused by low signal-to-noise ratio.It addition, each frame is carried out VAD when judging, it is only necessary to consider the situation of current SNR, without the situation considering priori and posteriority SNR, it is judged that method is simpler.
System 200 can be passed through audio signal is carried out VAD judgement frame by frame, to determine voice segments and the non-speech segment of this audio signal.
System 200 can further include performance element 207 be configured to can: the different frame (voice segments and non-speech segment) of audio signal is performed different operations by the VAD judged result based on judging unit 205, for instance, identify, decoding etc..
Although present disclosure is as above, but the present invention is not limited to this.Any those skilled in the art, without departing from the spirit and scope of the present invention, all can make various changes or modifications, and therefore protection scope of the present invention should be as the criterion with claim limited range.
Claims (14)
1. a voice activity detection method, it is characterised in that including:
Calculate the spectrum density of audio signal present frame;
Calculate the expected value of noise spectrum density;
Based on the expected value of the spectrum density of described present frame and described noise spectrum density, calculate the signal to noise ratio of present frame;With
Based on signal to noise ratio and the pre-determined threshold of described present frame, generate voice activity detection result.
2. method according to claim 1, it is characterised in that wherein the expected value of noise spectrum density is based on the statistical distribution calculating of noise.
3. method according to claim 1, it is characterised in that the calculating of wherein said signal to noise ratio is based on formula:
Wherein, SNR represents signal to noise ratio.
4. method according to claim 1, it is characterised in that wherein said pre-determined threshold is dynamic threshold and changes with the change of signal to noise ratio.
5. method according to claim 4, it is characterised in that the calculating of wherein said dynamic threshold is based on formula:
Wherein, γ represents that dynamic threshold, D represent the variance of signal to noise ratio, PFARepresent the probability of false-alarm.
6. method according to claim 1, it is characterised in that when described signal to noise ratio is more than described pre-determined threshold, the voice activity detection result of generation is the present frame of described audio signal is voice segments;When described signal to noise ratio is less than described pre-determined threshold, the voice activity detection result of generation is the present frame of described audio signal is non-speech segment.
7. a voice activity detection system, it is characterised in that including:
Receive unit, be used for receiving audio signal;
Processing unit, for calculating the signal to noise ratio of present frame, the expected value of spectrum density and noise spectrum density that the signal to noise ratio of wherein said present frame is based on described audio signal present frame calculates;And
Judging unit, is configured to based on the signal to noise ratio of described present frame and pre-determined threshold, to generate voice activity detection result.
8. system according to claim 7, it is characterised in that described processing unit includes: the first computing unit is for calculating the spectrum density of described audio signal present frame;Second computing unit is for calculating the expected value of noise spectrum density;And the 3rd computing unit, for calculating the signal to noise ratio of described present frame.
9. system according to claim 7, it is characterised in that the expected value of described noise spectrum density be based on noise statistical distribution calculate and come.
10. system according to claim 7, it is characterised in that the calculating of wherein said signal to noise ratio is based on formula:
Wherein, SNR represents signal to noise ratio.
11. system according to claim 7, it is characterised in that wherein said pre-determined threshold is dynamic threshold and changes with the change of signal to noise ratio.
12. system according to claim 11, it is characterised in that described processing unit farther includes: the 4th computing unit, it is used for calculating described dynamic threshold.
13. system according to claim 12, it is characterised in that the calculating of wherein said dynamic threshold is based on formula:
Wherein, γ represents that dynamic threshold, D represent the variance of signal to noise ratio, PFARepresent the probability of false-alarm.
14. system according to claim 7, it is characterised in that when described signal to noise ratio is more than described pre-determined threshold, the voice activity detection result of generation is the present frame of described audio signal is voice segments;When described signal to noise ratio is less than described pre-determined threshold, the present frame that voice activity detection result is described audio signal of generation is non-speech segment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410853931.6A CN105810201B (en) | 2014-12-31 | 2014-12-31 | Voice activity detection method and its system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410853931.6A CN105810201B (en) | 2014-12-31 | 2014-12-31 | Voice activity detection method and its system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105810201A true CN105810201A (en) | 2016-07-27 |
CN105810201B CN105810201B (en) | 2019-07-02 |
Family
ID=56464829
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410853931.6A Active CN105810201B (en) | 2014-12-31 | 2014-12-31 | Voice activity detection method and its system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105810201B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106356070A (en) * | 2016-08-29 | 2017-01-25 | 广州市百果园网络科技有限公司 | Audio signal processing method and device |
CN106384597A (en) * | 2016-08-31 | 2017-02-08 | 广州市百果园网络科技有限公司 | Audio frequency data processing method and device |
CN106910507A (en) * | 2017-01-23 | 2017-06-30 | 中国科学院声学研究所 | A kind of method and system detected with identification |
CN107393553A (en) * | 2017-07-14 | 2017-11-24 | 深圳永顺智信息科技有限公司 | Aural signature extracting method for voice activity detection |
CN107910016A (en) * | 2017-12-19 | 2018-04-13 | 河海大学 | A kind of noise containment determination methods of noisy speech |
WO2019183747A1 (en) * | 2018-03-26 | 2019-10-03 | 深圳市汇顶科技股份有限公司 | Voice detection method and apparatus |
CN112053702A (en) * | 2020-09-30 | 2020-12-08 | 北京大米科技有限公司 | Voice processing method and device and electronic equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1354455A (en) * | 2000-11-18 | 2002-06-19 | 深圳市中兴通讯股份有限公司 | Sound activation detection method for identifying speech and music from noise environment |
CN1783211A (en) * | 2004-11-25 | 2006-06-07 | Lg电子株式会社 | Speech detection method |
CN101010722A (en) * | 2004-08-30 | 2007-08-01 | 诺基亚公司 | Detection of voice activity in an audio signal |
CN101080765A (en) * | 2005-05-09 | 2007-11-28 | 株式会社东芝 | Voice activity detection apparatus and method |
CN101197130A (en) * | 2006-12-07 | 2008-06-11 | 华为技术有限公司 | Sound activity detecting method and detector thereof |
CN101599269A (en) * | 2009-07-02 | 2009-12-09 | 中国农业大学 | Sound end detecting method and device |
CN102800322A (en) * | 2011-05-27 | 2012-11-28 | 中国科学院声学研究所 | Method for estimating noise power spectrum and voice activity |
CN103903634A (en) * | 2012-12-25 | 2014-07-02 | 中兴通讯股份有限公司 | Voice activation detection (VAD), and method and apparatus for the VAD |
-
2014
- 2014-12-31 CN CN201410853931.6A patent/CN105810201B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1354455A (en) * | 2000-11-18 | 2002-06-19 | 深圳市中兴通讯股份有限公司 | Sound activation detection method for identifying speech and music from noise environment |
CN101010722A (en) * | 2004-08-30 | 2007-08-01 | 诺基亚公司 | Detection of voice activity in an audio signal |
CN1783211A (en) * | 2004-11-25 | 2006-06-07 | Lg电子株式会社 | Speech detection method |
CN101080765A (en) * | 2005-05-09 | 2007-11-28 | 株式会社东芝 | Voice activity detection apparatus and method |
CN101197130A (en) * | 2006-12-07 | 2008-06-11 | 华为技术有限公司 | Sound activity detecting method and detector thereof |
CN101599269A (en) * | 2009-07-02 | 2009-12-09 | 中国农业大学 | Sound end detecting method and device |
CN102800322A (en) * | 2011-05-27 | 2012-11-28 | 中国科学院声学研究所 | Method for estimating noise power spectrum and voice activity |
CN103903634A (en) * | 2012-12-25 | 2014-07-02 | 中兴通讯股份有限公司 | Voice activation detection (VAD), and method and apparatus for the VAD |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106356070B (en) * | 2016-08-29 | 2019-10-29 | 广州市百果园网络科技有限公司 | A kind of acoustic signal processing method and device |
CN106356070A (en) * | 2016-08-29 | 2017-01-25 | 广州市百果园网络科技有限公司 | Audio signal processing method and device |
CN106384597A (en) * | 2016-08-31 | 2017-02-08 | 广州市百果园网络科技有限公司 | Audio frequency data processing method and device |
CN106910507A (en) * | 2017-01-23 | 2017-06-30 | 中国科学院声学研究所 | A kind of method and system detected with identification |
CN106910507B (en) * | 2017-01-23 | 2020-04-24 | 中国科学院声学研究所 | Detection and identification method and system |
CN107393553A (en) * | 2017-07-14 | 2017-11-24 | 深圳永顺智信息科技有限公司 | Aural signature extracting method for voice activity detection |
CN107910016A (en) * | 2017-12-19 | 2018-04-13 | 河海大学 | A kind of noise containment determination methods of noisy speech |
CN107910016B (en) * | 2017-12-19 | 2021-07-27 | 河海大学 | Noise tolerance judgment method for noisy speech |
WO2019183747A1 (en) * | 2018-03-26 | 2019-10-03 | 深圳市汇顶科技股份有限公司 | Voice detection method and apparatus |
CN110537223A (en) * | 2018-03-26 | 2019-12-03 | 深圳市汇顶科技股份有限公司 | The method and apparatus of speech detection |
CN110537223B (en) * | 2018-03-26 | 2022-07-05 | 深圳市汇顶科技股份有限公司 | Voice detection method and device |
CN112053702A (en) * | 2020-09-30 | 2020-12-08 | 北京大米科技有限公司 | Voice processing method and device and electronic equipment |
CN112053702B (en) * | 2020-09-30 | 2024-03-19 | 北京大米科技有限公司 | Voice processing method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN105810201B (en) | 2019-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105810201A (en) | Voice activity detection method and system | |
Moattar et al. | A simple but efficient real-time voice activity detection algorithm | |
Aneeja et al. | Single frequency filtering approach for discriminating speech and nonspeech | |
JP5874344B2 (en) | Voice determination device, voice determination method, and voice determination program | |
CN106098076B (en) | One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise | |
WO2021114733A1 (en) | Noise suppression method for processing at different frequency bands, and system thereof | |
CN105321528B (en) | A kind of Microphone Array Speech detection method and device | |
CN104091603B (en) | Endpoint detection system based on fundamental frequency and calculation method thereof | |
CN103325386A (en) | Method and system for signal transmission control | |
CN110047470A (en) | A kind of sound end detecting method | |
CN104464722A (en) | Voice activity detection method and equipment based on time domain and frequency domain | |
CN103730126B (en) | Noise suppressing method and noise silencer | |
CN103440872A (en) | Transient state noise removing method | |
CN104867499A (en) | Frequency-band-divided wiener filtering and de-noising method used for hearing aid and system thereof | |
CN110265058A (en) | Estimate the ambient noise in audio signal | |
KR20180067920A (en) | System and method for end-point detection of speech based in harmonic component | |
CN106504760A (en) | Broadband background noise and speech Separation detecting system and method | |
Kumar | Mean-median based noise estimation method using spectral subtraction for speech enhancement technique | |
CN103310800B (en) | A kind of turbid speech detection method of anti-noise jamming and system | |
CN106486133B (en) | One kind is uttered long and high-pitched sounds scene recognition method and equipment | |
WO2021197566A1 (en) | Noise supression for speech enhancement | |
JP2014194437A (en) | Voice processing device, voice processing method and voice processing program | |
KR101559716B1 (en) | Dual-microphone voice activity detection method based on power level difference ratio, and apparatus thereof | |
CN112102818A (en) | Signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation | |
Asgari et al. | Voice activity detection using entropy in spectrum domain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |