CN106098076B - One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise - Google Patents
One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise Download PDFInfo
- Publication number
- CN106098076B CN106098076B CN201610393406.XA CN201610393406A CN106098076B CN 106098076 B CN106098076 B CN 106098076B CN 201610393406 A CN201610393406 A CN 201610393406A CN 106098076 B CN106098076 B CN 106098076B
- Authority
- CN
- China
- Prior art keywords
- energy
- frequency domain
- time
- voice
- threshold value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 25
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 13
- 238000000034 method Methods 0.000 claims abstract description 23
- 238000012935 Averaging Methods 0.000 claims description 4
- 230000001186 cumulative effect Effects 0.000 claims description 2
- 230000007613 environmental effect Effects 0.000 abstract description 5
- 230000010365 information processing Effects 0.000 abstract description 2
- 230000002463 transducing effect Effects 0.000 abstract description 2
- 238000001228 spectrum Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000030808 detection of mechanical stimulus involved in sensory perception of sound Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/75—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
- G10L2025/786—Adaptive threshold
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Telephonic Communication Services (AREA)
Abstract
The present invention relates to the information processing technology and transducing signal process fields, it especially relates to a kind of based on the dynamic noise estimation adaptive automatic speech detection method of time-frequency domain, the present invention changes the detection for carrying out voice respectively according to the time domain short-time energy and a certain range frequency domain short-time energy of sound, the size of the background noise energy finally gone out according to dynamic estimation, it selects optimal as a result, thus the adaptability that the accuracy rate of speech recognition greatly improved and improve speech recognition to environmental change.
Description
Technical field
The present invention relates to the information processing technology and transducing signal process fields, especially relate to a kind of based on dynamic noise
Estimate time-frequency domain adaptive voice detection method.
Background technique
One hot spot in artificial intelligence application field is exactly speech recognition, and speech recognition has begun in every field at present
It is widely applied.The realization of speech detection is the pith of speech recognition system real-time implementation, and the purpose is in complicated reality
Voice segments and non-speech segment are distinguished in environment.There is document to show that discrimination is largely due to compared with lower part in practical application
Voice is not handled correctly, a large amount of non-voice information has seriously affected the accuracy rate of speech recognition system, especially answers
The speech recognition of much noise is had with environment, correct speech detection technology can be effectively reduced system operations amount, shorten system
The system processing time reduces mobile terminal transmission power and saves channel resource, improves speech recognition accuracy, especially carries on the back in complexity
Under scape noise, the superiority and inferiority of speech recognition system performance depends greatly on the superiority and inferiority of speech detection technology, therefore steadily and surely,
Accurately, in real time, the speech detection technology that adaptivity is strong and robustness is good be necessary to each speech recognition system.
Speech recognition technology is on mobile terminal especially mobile phone or voice remote controller in application, relying primarily on key side at present
Formula determines the starting and ending of voice, however this mode far says that application is then very inconvenient for largely, to far saying either
For the smart machine of the support speech recognition not taken, robot, automatic speech detection system is exactly essential
Component.
The main stream approach of current automatic speech detection is by short-time energy size in time domain, zero-crossing rate size, Yi Jipin
Domain Frequency band energy mean square deviation three kinds of methods detect, and it is equal that specific method formula finds out short-time energy, zero-crossing rate or frequency band energy
Then variance is compared with an empirical value, it is demonstrated experimentally that this independent relatively short-time energy size or zero-crossing rate size
Method it is bad for noisy environmental suitability, especially when application environment changes, the background of same environment is made an uproar
Sound can also occur to change accordingly, and frequency band energy mean square deviation method quiet environment is also adapted to it is bad.
To solve the above problems, need to invent a kind of variation according to time domain and spectrum domain voice average energy carries out language respectively
The detection of sound, the ambient noise size finally gone out according to dynamic estimation select optimal as a result, to which voice knowledge greatly improved
Other accuracy rate and adaptability to environmental change.
Summary of the invention
It is an object of the invention to overcome the above-mentioned deficiency in the presence of the prior art, providing a kind of can greatly improve voice
The accuracy rate of identification and speech detection method to environmental change adaptability.
In order to achieve the above object, the present invention provides following technical solutions.
One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise comprising following steps:
Step 1, is loaded into current frame data, and the current frame data is voice data in time domain;
Step 2 calculates the energy summation of every frame sound of voice data in the time domain as time domain short-time energy, and
It is frequency domain data that voice data in time domain described in every frame, which is passed through FFT transform,;
Step 3 chooses the frequency domain data certain frequency range subband data, calculates the certain frequency range subband
The energy of data is simultaneously cumulative as frequency domain short-time energy;
Step 4, ambient noise estimation unit calculate background noise energy, and frequency domain background energy computing unit calculates
Frequency domain background energy;
The time domain short-time energy is compared by step 5 with the background noise energy, and result is greater than the back
Scape noise energy is then voice, result be less than or equal to the background noise energy be then non-voice;
The frequency domain short-time energy is compared by step 6 with the frequency domain background energy, and result is greater than the frequency
Domain background energy is then voice, result be less than or equal to the frequency domain background energy be then non-voice;
The threshold value one of the background noise energy and a default is compared, first if more than threshold value by step 7
It selects to compare for voice in step 6 as a result, first selecting to compare the result for voice in step 5 if being less than or equal to threshold value;
Step 8, if the present frame result is detected as non-voice, by the time domain short-time energy of the present frame
It is sent in the ambient noise estimation unit and adds up, after being added to the first frame number, accumulated value is obtained divided by first frame number
The frequency domain short-time energy of the present frame is sent to the frequency domain background energy meter as output by new ambient noise
It calculates in unit and adds up, after being added to the second frame number, accumulated value is obtained into new frequency domain background energy divided by second frame number and is made
For output.
Common speech energy have short-time stability, and the background noise energy have it is long when stability, when described
Domain short-time energy is compared with the background noise energy, and comparison result is the time domain probability of voice as the moment, usually
The period can be much larger than during voice, because the time domain short-time energy, which can be regarded as, may contain voice and the background during non-voice
The acoustic energy of noise energy, and energy is mainly made of the background noise energy when time domain is long, the time domain short-time energy
Energy is big when longer than the time domain, then be voice probability with regard to big, and energy is that dynamic is calculated when the time domain is long, so
The variation that can well adapt to ambient noise utilizes method ratio of the time domain short-time energy compared with the background noise energy
Relatively it is suitble to quiet environment, in order to improve the accuracy of speech detection, uses the time domain short-time energy and the ambient noise
The new method that the method for the method of energy comparison and the frequency domain short-time energy compared with the frequency domain background energy combines into
Row speech detection improves the accuracy of speech detection.
As a preferred solution of the present invention, time domain short-time energy described in step 5 is compared with the background noise energy
Compared with method be to subtract the difference of the background noise energy compared with the threshold value two of default with the time domain short-time energy,
As a result second being greater than the threshold value is voice, second being as a result less than or equal to the threshold value is non-voice;
The method that frequency domain short-time energy described in step 6 is compared with the frequency domain background energy is with the frequency domain
Short-time energy subtracts the difference of the frequency domain background energy compared with the threshold value three of default, is as a result greater than the threshold value three then
For voice, being as a result less than or equal to the threshold value three is then non-voice.
As a preferred solution of the present invention, time domain short-time energy described in step 5 is compared with the background noise energy
Compared with method be with the time domain short-time energy with the ratio of the background noise energy compared with the threshold value four of default, knot
It is voice that fruit, which is greater than the threshold value four fundamental rules, and being as a result less than or equal to the threshold value four fundamental rules is non-voice;
The method that frequency domain short-time energy described in step 6 is compared with the frequency domain background energy is with the frequency domain
With the ratio of the frequency domain background energy compared with the threshold value five of default, being as a result greater than the threshold value five is then for short-time energy
Voice, being as a result less than or equal to the threshold value five is then non-voice.
As a preferred solution of the present invention, the frequency range is the frequency range that people's speech energy is mainly distributed, people's
Sound spectrum distribution is wider, and voice frequency band section can be arranged by two parameters, first is that upper threshold frequency, another
It is lower frequency threshold value, the sound of usually more than this frequency range is often ambient noise or other non-voice, in the frequency band
In range, environmental noise power receives biggish inhibition, in general voice energy be concentrated mainly on 300Hz to 4000Hz it
Between, and background noise energy is mainly distributed within 300Hz, the energy for taking voice to be mainly distributed frequency range is compared, because
This is in the frequency range, and when there is voice, the frequency domain short-time energy, which has, significantly increases, therefore in short-term with the time domain
Energy comparison is similar, with the frequency domain short-time energy compared with the frequency domain background energy, more than the threshold value of system setting
Three or the threshold value five, then the period maximum probability is voice.
As a preferred solution of the present invention, the time range size of the frame is between 10 milliseconds to 50 milliseconds, and described
One frame number and second frame number are by system configuration.
As a preferred solution of the present invention, the background noise energy is that the time domain during will be deemed as non-voice is short
Shi Nengliang carries out the result being averaging after adding up.
As a preferred solution of the present invention, the frequency domain background energy is that the frequency domain during will be deemed as non-voice is short
Shi Nengliang carries out the result being averaging after adding up.
Compared with prior art, beneficial effects of the present invention:
The present invention carries out the detection of voice according to the variation of time domain and spectrum domain voice average energy respectively, finally according to dynamic
The ambient noise size estimated selects optimal as a result, so that the accuracy rate of speech recognition greatly improved and to environment
The adaptability of variation.
Detailed description of the invention
Fig. 1 is flow chart of the present invention;
Fig. 2 is present invention operation block diagram.
Specific embodiment
Below with reference to embodiment and specific embodiment, the present invention is described in further detail, but should not understand this
It is only limitted to embodiment below for the range of aforementioned body of the present invention, it is all that this is belonged to based on the technology that the content of present invention is realized
The range of invention.
As shown in Figure 1, a kind of estimate time-frequency domain adaptive voice detection method based on dynamic noise comprising following step
It is rapid:
Step 1, is loaded into current frame data, and current frame data is voice data in time domain;
Step 2 calculates the energy summation of every frame sound of voice data in time domain as time domain short-time energy, and will be every
Voice data is frequency domain data by FFT transform in frame time domain;
Step 3 chooses frequency domain data certain frequency range subband data, calculates the energy of certain frequency range subband data
It measures and adds up as frequency domain short-time energy;
Step 4, ambient noise estimation unit calculate background noise energy, and frequency domain background energy computing unit calculates
Frequency domain background energy;
Time domain short-time energy is compared by step 5 with background noise energy, and result is greater than background noise energy
Then be voice, result be less than or equal to background noise energy be then non-voice;
Frequency domain short-time energy is compared by step 6 with frequency domain background energy, and result is greater than frequency domain background energy
Then be voice, result be less than or equal to frequency domain background energy be then non-voice;
The threshold value one of background noise energy and a default is compared, first selects if more than threshold value by step 7
Compare for voice in step 6 as a result, first selecting to compare the result for voice in step 5 if being less than or equal to threshold value;
The time domain short-time energy of present frame is sent to ambient noise if present frame result is detected as non-voice by step 8
In estimation unit add up, after being added to the first frame number, using accumulated value divided by the first frame number obtain new background noise energy as
Output, while the frequency domain short-time energy of present frame being sent in frequency domain background energy computing unit and is added up, it is added to the second frame number
Afterwards, accumulated value is obtained into new frequency domain background energy as output divided by the second frame number.
As depicted in figs. 1 and 2, it is first loaded into current frame data, current frame data is voice data in time domain, is worked as in loading
The calculating that time domain short-time energy is carried out after preceding frame data, passes through voice data in time domain while calculating time domain short-time energy
FFT transform is frequency domain data, then calculates frequency domain short-time energy, calculates background noise energy by ambient noise estimation unit,
Frequency domain background energy is calculated by frequency domain background energy computing unit, respectively by time domain short-time energy and background noise energy and frequency
Domain short-time energy is compared with frequency domain background energy, in the present embodiment using time domain short-time energy and background noise energy
Difference and the threshold value of default two are compared and the difference of frequency domain short-time energy and frequency domain background energy and default
The method that threshold value three is compared, time domain short-time energy subtracts the difference of background noise energy and the threshold value two of default compares
Compared with second being as a result greater than threshold value is voice, second being as a result less than or equal to threshold value is non-voice, frequency domain short-time energy subtracts frequency domain
For the difference of background energy compared with the threshold value three of default, being as a result greater than threshold value three is then voice, is as a result less than or equal to threshold
Value three is then non-voice, and two above-mentioned comparison results export, and the threshold value one that background noise energy is arranged with system carries out
Compare, first selects to compare for voice in step 6 if more than threshold value as a result, if first being less than or equal to threshold value selects step 5
Middle comparison is voice as a result, comparison result is that the result of non-voice is delivered to ambient noise respectively in step 5 and step 6
New background noise energy and new frequency domain background energy are calculated in energy estimation unit and frequency domain background energy computing unit,
The frequency range that people's speech energy is mainly distributed in the present embodiment takes 300Hz to 4000Hz, and the time range size of frame exists
Between 10 milliseconds to 50 milliseconds.
The ratio of time domain short-time energy and background noise energy and the threshold value four of default are used in another embodiment
It is compared and method that the threshold value five of the ratio of frequency domain short-time energy and frequency domain background energy and default is compared, when
With the ratio of background noise energy compared with the threshold value four of default, being as a result greater than threshold value four fundamental rules is voice for domain short-time energy,
As a result being less than or equal to threshold value four fundamental rules is non-voice, frequency domain short-time energy and the ratio of frequency domain background energy and the threshold of default
Value five compares, and being as a result greater than threshold value five is then voice, and being as a result less than or equal to threshold value five is then non-voice, remaining calculating process
Identical as previous embodiment, details are not described herein.
It can also be set in other embodiments using the difference using time domain short-time energy and background noise energy with system
Fixed threshold value six is compared and frequency domain short-time energy is compared with the ratio of frequency domain background energy and the threshold value seven of default
Compared with method etc..
Claims (7)
1. one kind estimates time-frequency domain adaptive voice detection method based on dynamic noise comprising following steps:
Step 1, is loaded into current frame data, and the current frame data is voice data in time domain;
Step 2 calculates the energy summation of every frame sound of voice data in the time domain as time domain short-time energy, and will be every
Voice data is frequency domain data by FFT transform in time domain described in frame;
Step 3 chooses the frequency domain data certain frequency range subband data, calculates the certain frequency range subband data
Energy and cumulative be used as frequency domain short-time energy;
Step 4, background noise energy estimation unit calculate background noise energy, and frequency domain background energy computing unit calculates
Frequency domain background energy;
The time domain short-time energy is compared by step 5 with the background noise energy, and result is to make an uproar greater than the background
Acoustic energy is then voice, result be less than or equal to the background noise energy be then non-voice;
The frequency domain short-time energy is compared by step 6 with the frequency domain background energy, and result is to carry on the back greater than the frequency domain
Scape energy is then voice, result be less than or equal to the frequency domain background energy be then non-voice;
The threshold value one of the background noise energy and a default is compared, first selects if more than threshold value by step 7
Compare for voice in step 6 as a result, first selecting to compare the result for voice in step 5 if being less than or equal to threshold value;
The time domain short-time energy of the present frame is sent to by step 8 if the present frame result is detected as non-voice
It adds up in the ambient noise estimation unit, after being added to the first frame number, accumulated value is obtained divided by first frame number new
The frequency domain short-time energy of the present frame is sent to the frequency domain background energy meter as output by background noise energy
It calculates in unit and adds up, after being added to the second frame number, accumulated value is obtained into new frequency domain background energy divided by second frame number and is made
For output.
2. according to claim 1 estimate time-frequency domain adaptive voice detection method based on dynamic noise, it is characterised in that:
The method that time domain short-time energy described in step 5 is compared with the background noise energy be with the time domain in short-term
Energy subtracts the difference of the background noise energy compared with the threshold value two of default, second being as a result greater than the threshold value is language
Sound, second being as a result less than or equal to the threshold value is non-voice;
The method that frequency domain short-time energy described in step 6 is compared with the frequency domain background energy be with the frequency domain in short-term
Energy subtracts the difference of the frequency domain background energy compared with the threshold value three of default, and being as a result greater than the threshold value three is then language
Sound, being as a result less than or equal to the threshold value three is then non-voice.
3. according to claim 1 estimate time-frequency domain adaptive voice detection method based on dynamic noise, it is characterised in that:
The method that time domain short-time energy described in step 5 is compared with the background noise energy be with the time domain in short-term
For energy with the ratio of the background noise energy compared with the threshold value four of default, being as a result greater than the threshold value four fundamental rules is language
Sound, being as a result less than or equal to the threshold value four fundamental rules is non-voice;
The method that frequency domain short-time energy described in step 6 is compared with the frequency domain background energy be with the frequency domain in short-term
For energy with the ratio of the frequency domain background energy compared with the threshold value five of default, being as a result greater than the threshold value five is then language
Sound, being as a result less than or equal to the threshold value five is then non-voice.
4. according to claim 1 estimate time-frequency domain adaptive voice detection method based on dynamic noise, it is characterised in that:
The frequency range is the frequency range that people's speech energy is mainly distributed, and the frequency range passes through upper threshold frequency and lower frequency
Threshold value determines.
5. according to claim 1 estimate time-frequency domain adaptive voice detection method based on dynamic noise, it is characterised in that:
The time range size of the frame between 10 milliseconds to 50 milliseconds, matched by system by first frame number and second frame number
It sets.
6. according to claim 1 estimate time-frequency domain adaptive voice detection method based on dynamic noise, it is characterised in that:
The background noise energy is that the time domain short-time energy during will be deemed as non-voice carries out the result being averaging after adding up.
7. according to claim 1 estimate time-frequency domain adaptive voice detection method based on dynamic noise, it is characterised in that:
The frequency domain background energy is that the frequency domain short-time energy during will be deemed as non-voice carries out the result being averaging after adding up.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610393406.XA CN106098076B (en) | 2016-06-06 | 2016-06-06 | One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610393406.XA CN106098076B (en) | 2016-06-06 | 2016-06-06 | One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106098076A CN106098076A (en) | 2016-11-09 |
CN106098076B true CN106098076B (en) | 2019-05-21 |
Family
ID=57447624
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610393406.XA Active CN106098076B (en) | 2016-06-06 | 2016-06-06 | One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106098076B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106531180B (en) * | 2016-12-10 | 2019-09-20 | 广州酷狗计算机科技有限公司 | Noise detecting method and device |
CN106448696A (en) * | 2016-12-20 | 2017-02-22 | 成都启英泰伦科技有限公司 | Adaptive high-pass filtering speech noise reduction method based on background noise estimation |
CN107833579B (en) * | 2017-10-30 | 2021-06-11 | 广州酷狗计算机科技有限公司 | Noise elimination method, device and computer readable storage medium |
CN108986830B (en) * | 2018-08-28 | 2021-02-09 | 安徽淘云科技有限公司 | Audio corpus screening method and device |
CN111261143B (en) * | 2018-12-03 | 2024-03-22 | 嘉楠明芯(北京)科技有限公司 | Voice wakeup method and device and computer readable storage medium |
CN110021305B (en) * | 2019-01-16 | 2021-08-20 | 上海惠芽信息技术有限公司 | Audio filtering method, audio filtering device and wearable equipment |
CN109616098B (en) * | 2019-02-15 | 2022-04-01 | 嘉楠明芯(北京)科技有限公司 | Voice endpoint detection method and device based on frequency domain energy |
WO2020252782A1 (en) * | 2019-06-21 | 2020-12-24 | 深圳市汇顶科技股份有限公司 | Voice detection method, voice detection device, voice processing chip and electronic apparatus |
CN111883182B (en) * | 2020-07-24 | 2024-03-19 | 平安科技(深圳)有限公司 | Human voice detection method, device, equipment and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101635865B (en) * | 2008-07-22 | 2012-07-04 | 中兴通讯股份有限公司 | System and method for preventing error detection of dual-tone multi-frequency signals |
CN101968957B (en) * | 2010-10-28 | 2012-02-01 | 哈尔滨工程大学 | Voice detection method under noise condition |
WO2015085532A1 (en) * | 2013-12-12 | 2015-06-18 | Spreadtrum Communications (Shanghai) Co., Ltd. | Signal noise reduction |
CN105575405A (en) * | 2014-10-08 | 2016-05-11 | 展讯通信(上海)有限公司 | Double-microphone voice active detection method and voice acquisition device |
CN105118502B (en) * | 2015-07-14 | 2017-05-10 | 百度在线网络技术(北京)有限公司 | End point detection method and system of voice identification system |
-
2016
- 2016-06-06 CN CN201610393406.XA patent/CN106098076B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106098076A (en) | 2016-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106098076B (en) | One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise | |
Moattar et al. | A simple but efficient real-time voice activity detection algorithm | |
US8515097B2 (en) | Single microphone wind noise suppression | |
CN104464722B (en) | Voice activity detection method and apparatus based on time domain and frequency domain | |
WO2016049611A1 (en) | Neural network voice activity detection employing running range normalization | |
CN103325386A (en) | Method and system for signal transmission control | |
CN105810201B (en) | Voice activity detection method and its system | |
CN107863099B (en) | Novel double-microphone voice detection and enhancement method | |
CN105810214B (en) | Voice-activation detecting method and device | |
WO2004075167A2 (en) | Log-likelihood ratio method for detecting voice activity and apparatus | |
WO2008011319A3 (en) | Method and system for near-end detection | |
CN106504760B (en) | Broadband ambient noise and speech Separation detection system and method | |
GB2426368A (en) | Using input signal quality in speeech recognition | |
CN102760444A (en) | Support vector machine based classification method of base-band time-domain voice-frequency signal | |
CN110349598A (en) | A kind of end-point detecting method under low signal-to-noise ratio environment | |
US11341988B1 (en) | Hybrid learning-based and statistical processing techniques for voice activity detection | |
CN103617801A (en) | Voice detection method and device and electronic equipment | |
CN103905656A (en) | Residual echo detection method and apparatus | |
WO2017128910A1 (en) | Method, apparatus and electronic device for determining speech presence probability | |
CN100369113C (en) | Method for adaptively improving speech recognition rate by means of gain | |
CN112216285B (en) | Multi-user session detection method, system, mobile terminal and storage medium | |
Sakhnov et al. | Dynamical energy-based speech/silence detector for speech enhancement applications | |
CN106486133B (en) | One kind is uttered long and high-pitched sounds scene recognition method and equipment | |
US20230247350A1 (en) | Processor | |
CN111755028A (en) | Near-field remote controller voice endpoint detection method and system based on fundamental tone characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |