CN106098076A - A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method - Google Patents
A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method Download PDFInfo
- Publication number
- CN106098076A CN106098076A CN201610393406.XA CN201610393406A CN106098076A CN 106098076 A CN106098076 A CN 106098076A CN 201610393406 A CN201610393406 A CN 201610393406A CN 106098076 A CN106098076 A CN 106098076A
- Authority
- CN
- China
- Prior art keywords
- energy
- frequency domain
- time
- voice
- threshold value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 24
- 230000003044 adaptive effect Effects 0.000 title claims description 12
- 238000000034 method Methods 0.000 claims abstract description 21
- 230000006978 adaptation Effects 0.000 claims abstract 2
- 230000001186 cumulative effect Effects 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims description 4
- 230000007613 environmental effect Effects 0.000 abstract description 6
- 230000010365 information processing Effects 0.000 abstract description 2
- 230000002463 transducing effect Effects 0.000 abstract description 2
- 230000000052 comparative effect Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000030808 detection of mechanical stimulus involved in sensory perception of sound Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/75—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
- G10L2025/786—Adaptive threshold
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Telephonic Communication Services (AREA)
Abstract
The present invention relates to the information processing technology and transducing signal process field, especially relate to a kind of based on dynamic noise estimation time-frequency domain self adaptation automatic speech detection method, the present invention carries out the detection of voice respectively according to the time domain short-time energy of sound and certain limit frequency domain short-time energy change, size finally according to the background noise energy that dynamic estimation goes out, select optimum result, thus the accuracy rate of speech recognition is greatly improved and improves the speech recognition adaptability to environmental change.
Description
Technical field
The present invention relates to the information processing technology and transducing signal process field, especially relate to a kind of based on dynamic noise
Estimate time-frequency domain adaptive voice detection method.
Background technology
One focus in artificial intelligence application field is exactly speech recognition, and current speech recognition has begun in every field
Extensively application.The realization of speech detection is the pith of speech recognition system real-time implementation, its objective is in complicated reality
Environment is distinguished voice segments and non-speech segment.Having document to show, in actual application, discrimination relatively lower part is largely due to
Correctly not processing voice, substantial amounts of non-voice information has had a strong impact on the accuracy rate of speech recognition system, particularly should
With environment with the speech recognition of much noise, correct speech detection technology can be effectively reduced system operations amount, shortens system
System processes the time, reduces mobile terminal and launches power and save channel resource, improves speech recognition accuracy, especially carries on the back in complexity
Under scape noise, the quality of speech recognition system performance depends greatly on the quality of speech detection technology, the most steadily and surely,
Accurately, in real time the speech detection technology that, adaptivity is strong and robustness is good is necessary to each speech recognition system.
When speech recognition technology is applied on mobile terminal especially mobile phone or voice remote controller at present, rely primarily on button side
Formula determines the starting and ending of voice, but for a large amount of, this mode the farthest says that application is the most very inconvenient, to far saying or
For the smart machine of support speech recognition that do not takes, robot, automatic speech detecting system is exactly requisite
Parts.
The main stream approach of current automatic speech detection is dependent on short-time energy size in time domain, zero-crossing rate size, Yi Jipin
Frequency band energy mean square deviation three kinds of methods in territory detect, and it is equal that concrete grammar formula obtains short-time energy, zero-crossing rate or frequency band energy
Variance, then compares with an empirical value, individually compares short-time energy size or zero-crossing rate size it is demonstrated experimentally that this
Method bad for noisy environmental suitability, especially when applied environment changes, the background of same environment is made an uproar
Sound also can occur to change accordingly, and frequency band energy mean square deviation method is also adapted to bad for quiet environment.
For solving the problems referred to above, need to invent a kind of change according to time domain and spectrum domain voice average energy and carry out language respectively
The detection of sound, the background noise size gone out finally according to dynamic estimation, select optimum result, thus voice is greatly improved and knows
Other accuracy rate and the adaptability to environmental change.
Summary of the invention
It is an object of the invention to overcome the above-mentioned deficiency in the presence of prior art, it is provided that one can be greatly improved voice
The accuracy rate identified and speech detection method adaptive to environmental change.
In order to achieve the above object, the invention provides following technical scheme.
A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method, it comprises the following steps:
Step one, is loaded into current frame data, and described current frame data is speech data in time domain;
Step 2, calculates in described time domain the energy summation of every frame sound of speech data as time domain short-time energy, and will be every
In time domain described in frame, speech data is frequency domain data by FFT;
Step 3, chooses described frequency domain data certain frequency scope subband data, calculates described certain frequency scope subband data
Energy cumulative as frequency domain short-time energy;
Step 4, background noise estimation unit calculates background noise energy, and frequency domain background energy computing unit calculates frequency domain
Background energy;
Step 5, compares described time domain short-time energy with described background noise energy, and result is for make an uproar more than described background
Acoustic energy is then voice, result be less than or equal to described background noise energy then for non-voice;
Step 6, compares described frequency domain short-time energy with described frequency domain background energy, and result is for carry on the back more than described frequency domain
Scape energy is then voice, result be less than or equal to described frequency domain background energy then for non-voice;
Step 7, compares the threshold value one of described background noise energy and a default, if selecting first more than threshold value
Step 6 compares the result for voice, if selecting step 5 compares the result for voice first less than or equal to threshold value;
Step 8, if described present frame result is detected as non-voice, then delivers to the described time domain short-time energy of described present frame
Described background noise estimation unit adds up, after being added to the first frame number, accumulated value is obtained new divided by described first frame number
The described frequency domain short-time energy of described present frame, as output, is delivered to described frequency domain background energy simultaneously and is calculated single by background noise
Unit adds up, after being added to the second frame number, accumulated value is obtained new frequency domain background energy as defeated divided by described second frame number
Go out.
Common speech energy has short-time stability, and described background noise energy has Long-term stability, time described
Territory short-time energy compares with described background noise energy, and comparative result is as the time domain probability that this moment is voice, generally
During non-voice, the cycle can be much larger than during voice, because described time domain short-time energy can be regarded as may contain voice and described background
The acoustic energy of noise energy, and when time domain is long, energy is mainly made up of described background noise energy, described time domain short-time energy
Time longer than described time domain, energy is big, then be that the probability of voice is the biggest, and when described time domain is long, energy is that dynamic calculation goes out, so
The change of environment noise can be well adapted to, utilize the method ratio that described time domain short-time energy is compared with described background noise energy
Relatively it is suitable for quiet environment, in order to improve the accuracy of speech detection, uses described time domain short-time energy and described background noise
The new method that the method that the method for energy comparison and described frequency domain short-time energy are compared with described frequency domain background energy combines is entered
Row speech detection, improves the accuracy of speech detection.
As the preferred version of the present invention, time domain short-time energy described in step 5 compares with described background noise energy
Method relatively is that the threshold value two of the difference and default that deduct described background noise energy with described time domain short-time energy compares,
Second result is voice more than described threshold value, and second result is non-voice less than or equal to described threshold value;
The method that frequency domain short-time energy described in step 6 and described frequency domain background energy compare is with described frequency domain in short-term
The threshold value three of difference and default that energy deducts described frequency domain background energy compares, and result is then language more than described threshold value three
Sound, result is then non-voice less than or equal to described threshold value three.
As the preferred version of the present invention, time domain short-time energy described in step 5 compares with described background noise energy
Method relatively is to compare with the threshold value four of default with the ratio of described background noise energy with described time domain short-time energy, knot
Fruit is voice more than described threshold value four fundamental rules, and result is non-voice less than or equal to described threshold value four fundamental rules;
The method that frequency domain short-time energy described in step 6 and described frequency domain background energy compare is with described frequency domain in short-term
Energy compares with the threshold value five of default with the ratio of described frequency domain background energy, and result is then language more than described threshold value five
Sound, result is then non-voice less than or equal to described threshold value five.
As the preferred version of the present invention, the frequency range that described frequency range behaviour speech energy is mainly distributed, people's
Sound spectrum distribution is relatively wider, and people's sonic-frequency band interval can be arranged by two parameters, and one is upper threshold frequency, another
Being lower frequency threshold value, usually more than the sound of this frequency range is often environment noise or other non-voice, at this frequency band
In the range of, environmental noise power receives bigger suppression, in general people's acoustic energy be concentrated mainly on 300Hz to 4000Hz it
Between, and within background noise energy is mainly distributed on 300Hz, takes voice and be mainly distributed the energy of frequency band range and compare, because of
This is in this frequency band range, and when there being voice, described frequency domain short-time energy has significantly increases, therefore with described time domain in short-term
Energy comparison is similar to, and compares with described frequency domain background energy with described frequency domain short-time energy, exceedes the described threshold value that system is arranged
Three or described threshold value five, then this period big probability is voice.
As the preferred version of the present invention, the time range size of described frame between 10 milliseconds to 50 milliseconds, described
One frame number and described second frame number are configured by system.
As the preferred version of the present invention, described background noise energy is that the described time domain that will be deemed as during non-voice is short
The result that Shi Nengliang is averaging after carrying out adding up.
As the preferred version of the present invention, described frequency domain background energy is that the described frequency domain that will be deemed as during non-voice is short
The result that Shi Nengliang is averaging after carrying out adding up.
Compared with prior art, beneficial effects of the present invention:
The present invention carries out the detection of voice respectively according to the change of time domain and spectrum domain voice average energy, finally according to dynamic estimation
The background noise size gone out, selects optimum result, thus the accuracy rate of speech recognition is greatly improved and to environmental change
Adaptability.
Accompanying drawing explanation
Fig. 1 is flow chart of the present invention;
Fig. 2 is that the present invention runs block diagram.
Detailed description of the invention
Below in conjunction with embodiment and detailed description of the invention, the present invention is described in further detail, but should this not understood
Scope for aforementioned body of the present invention is only limitted to below example, and all technology realized based on present invention belong to this
The scope of invention.
As it is shown in figure 1, one estimates time-frequency domain adaptive voice detection method based on dynamic noise, it includes following step
Rapid:
Step one, is loaded into current frame data, and current frame data is speech data in time domain;
Step 2, in calculating time domain, the energy summation of every frame sound of speech data is as time domain short-time energy, and during by every frame
In territory, speech data is frequency domain data by FFT;
Step 3, chooses frequency domain data certain frequency scope subband data, calculates the energy of certain frequency scope subband data also
Cumulative as frequency domain short-time energy;
Step 4, background noise estimation unit calculates background noise energy, and frequency domain background energy computing unit calculates frequency domain
Background energy;
Step 5, compares time domain short-time energy with background noise energy, and result is to be then more than background noise energy
Voice, result be less than or equal to background noise energy then for non-voice;
Step 6, compares frequency domain short-time energy with frequency domain background energy, and result is to be then more than frequency domain background energy
Voice, result be less than or equal to frequency domain background energy then for non-voice;
Step 7, compares the threshold value one of background noise energy and a default, if selecting step first more than threshold value
The result for voice is compared, if selecting step 5 compares the result for voice first less than or equal to threshold value in six;
Step 8, if present frame result is detected as non-voice, then delivers to the time domain short-time energy of present frame background noise and estimates
In unit cumulative, after being added to the first frame number, accumulated value is obtained new background noise energy as output divided by the first frame number,
The frequency domain short-time energy of present frame is delivered in frequency domain background energy computing unit cumulative simultaneously, after being added to the second frame number, will
Accumulated value obtains new frequency domain background energy as output divided by the second frame number.
As depicted in figs. 1 and 2, being first loaded into current frame data, current frame data is speech data in time domain, works as being loaded into
Carry out the calculating of time domain short-time energy after front frame data, pass through calculating while time domain short-time energy speech data in by time domain
FFT is frequency domain data, then calculates frequency domain short-time energy, background noise estimation unit calculates background noise energy,
Frequency domain background energy is calculated, respectively by time domain short-time energy and background noise energy and frequency by frequency domain background energy computing unit
Territory short-time energy compares with frequency domain background energy, uses time domain short-time energy and background noise energy in the present embodiment
The threshold value two of difference and default compares difference and the default of also frequency domain short-time energy and frequency domain background energy
The method that threshold value three compares, the difference of time domain short-time energy subtracting background noise energy compares with the threshold value two of default
Relatively, second result is voice more than threshold value, and second result is non-voice less than or equal to threshold value, and frequency domain short-time energy deducts frequency domain
The difference of background energy compares with the threshold value three of default, and result is then voice more than threshold value three, and result is less than or equal to threshold
Value three is then non-voice, and two above-mentioned comparative results all export, and the threshold value one background noise energy and system arranged is carried out
Relatively, if selecting step 6 compares the result for voice first more than threshold value, if selecting step 5 first less than or equal to threshold value
Middle comparing the result for voice, in step 5 and step 6, comparative result is that the result of non-voice is delivered to background noise respectively
Energy estimation unit and frequency domain background energy computing unit calculate new background noise energy and new frequency domain background energy,
The frequency range that people's speech energy is mainly distributed in the present embodiment takes 300Hz to 4000Hz, and the time range size of frame exists
Between 10 milliseconds to 50 milliseconds.
Use the threshold value four of the time domain short-time energy ratio with background noise energy and default in another embodiment
Compare and method that the ratio of frequency domain short-time energy and frequency domain background energy compares with the threshold value five of default, time
Territory short-time energy is compared with the threshold value four of default with the ratio of background noise energy, and result is voice more than threshold value four fundamental rules,
Result is non-voice less than or equal to threshold value four fundamental rules, the ratio of frequency domain short-time energy and frequency domain background energy and the threshold of default
Value five compares, and result is then voice more than threshold value five, and result is then non-voice less than or equal to threshold value five, and remaining calculates process
All identical with previous embodiment, do not repeat them here.
Time domain short-time energy can also be used in other embodiments to set with system with the difference of background noise energy
Fixed threshold value six compares and frequency domain short-time energy is compared with the threshold value seven of default with the ratio of frequency domain background energy
Method etc. relatively.
Claims (7)
1. estimating a time-frequency domain adaptive voice detection method based on dynamic noise, it comprises the following steps:
Step one, is loaded into current frame data, and described current frame data is speech data in time domain;
Step 2, calculates in described time domain the energy summation of every frame sound of speech data as time domain short-time energy, and will be every
In time domain described in frame, speech data is frequency domain data by FFT;
Step 3, chooses described frequency domain data certain frequency scope subband data, calculates described certain frequency scope subband data
Energy cumulative as frequency domain short-time energy;
Step 4, background noise energy estimation unit calculates background noise energy, and frequency domain background energy computing unit calculates
Frequency domain background energy;
Step 5, compares described time domain short-time energy with described background noise energy, and result is for make an uproar more than described background
Acoustic energy is then voice, result be less than or equal to described background noise energy then for non-voice;
Step 6, compares described frequency domain short-time energy with described frequency domain background energy, and result is for carry on the back more than described frequency domain
Scape energy is then voice, result be less than or equal to described frequency domain background energy then for non-voice;
Step 7, compares the threshold value one of described background noise energy and a default, if selecting first more than threshold value
Step 6 compares the result for voice, if selecting step 5 compares the result for voice first less than or equal to threshold value;
Step 8, if described present frame result is detected as non-voice, then delivers to the described time domain short-time energy of described present frame
Described background noise estimation unit adds up, after being added to the first frame number, accumulated value is obtained new divided by described first frame number
The described frequency domain short-time energy of described present frame, as output, is delivered to described frequency domain background energy meter by background noise energy simultaneously
Calculate in unit cumulative, after being added to the second frame number, accumulated value is obtained new frequency domain background energy divided by described second frame number and makees
For output.
It is the most according to claim 1 based on dynamic noise estimation time-frequency domain adaptive voice detection method, it is characterised in that:
The method that time domain short-time energy described in step 5 and described background noise energy compare is by described time domain in short-term
The threshold value two of difference and default that energy deducts described background noise energy compares, and second result is language more than described threshold value
Sound, second result is non-voice less than or equal to described threshold value;
The method that frequency domain short-time energy described in step 6 and described frequency domain background energy compare is with described frequency domain in short-term
The threshold value three of difference and default that energy deducts described frequency domain background energy compares, and result is then language more than described threshold value three
Sound, result is then non-voice less than or equal to described threshold value three.
The most according to claim 1 based on dynamic noise estimation time-frequency domain self adaptation automatic speech detection method, its feature
It is:
The method that time domain short-time energy described in step 5 and described background noise energy compare is by described time domain in short-term
Energy compares with the threshold value four of default with the ratio of described background noise energy, and result is language more than described threshold value four fundamental rules
Sound, result is non-voice less than or equal to described threshold value four fundamental rules;
The method that frequency domain short-time energy described in step 6 and described frequency domain background energy compare is with described frequency domain in short-term
Energy compares with the threshold value five of default with the ratio of described frequency domain background energy, and result is then language more than described threshold value five
Sound, result is then non-voice less than or equal to described threshold value five.
It is the most according to claim 1 based on dynamic noise estimation time-frequency domain adaptive voice detection method, it is characterised in that:
The frequency range that described frequency range behaviour speech energy is mainly distributed, described frequency range passes through upper threshold frequency and lower frequency
Threshold value determines.
It is the most according to claim 1 based on dynamic noise estimation time-frequency domain adaptive voice detection method, it is characterised in that:
The time range size of described frame is between 10 milliseconds to 50 milliseconds, and described first frame number and described second frame number are joined by system
Put.
It is the most according to claim 1 based on dynamic noise estimation time-frequency domain adaptive voice detection method, it is characterised in that:
Described background noise energy is the result being averaging after the described time domain short-time energy that will be deemed as during non-voice carries out adding up.
It is the most according to claim 1 based on dynamic noise estimation time-frequency domain adaptive voice detection method, it is characterised in that:
Described frequency domain background energy is the result being averaging after the described frequency domain short-time energy that will be deemed as during non-voice carries out adding up.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610393406.XA CN106098076B (en) | 2016-06-06 | 2016-06-06 | One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610393406.XA CN106098076B (en) | 2016-06-06 | 2016-06-06 | One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106098076A true CN106098076A (en) | 2016-11-09 |
CN106098076B CN106098076B (en) | 2019-05-21 |
Family
ID=57447624
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610393406.XA Active CN106098076B (en) | 2016-06-06 | 2016-06-06 | One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106098076B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106448696A (en) * | 2016-12-20 | 2017-02-22 | 成都启英泰伦科技有限公司 | Adaptive high-pass filtering speech noise reduction method based on background noise estimation |
CN106531180A (en) * | 2016-12-10 | 2017-03-22 | 广州酷狗计算机科技有限公司 | Noise detection method and device |
CN107833579A (en) * | 2017-10-30 | 2018-03-23 | 广州酷狗计算机科技有限公司 | Noise cancellation method, device and computer-readable recording medium |
CN108986830A (en) * | 2018-08-28 | 2018-12-11 | 安徽淘云科技有限公司 | A kind of audio corpus screening technique and device |
CN109616098A (en) * | 2019-02-15 | 2019-04-12 | 北京嘉楠捷思信息技术有限公司 | Voice endpoint detection method and device based on frequency domain energy |
CN110021305A (en) * | 2019-01-16 | 2019-07-16 | 上海惠芽信息技术有限公司 | A kind of audio filtering method, audio filter and wearable device |
CN111261143A (en) * | 2018-12-03 | 2020-06-09 | 杭州嘉楠耘智信息科技有限公司 | Voice wake-up method and device and computer readable storage medium |
WO2020252782A1 (en) * | 2019-06-21 | 2020-12-24 | 深圳市汇顶科技股份有限公司 | Voice detection method, voice detection device, voice processing chip and electronic apparatus |
WO2021135547A1 (en) * | 2020-07-24 | 2021-07-08 | 平安科技(深圳)有限公司 | Human voice detection method, apparatus, device, and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101635865A (en) * | 2008-07-22 | 2010-01-27 | 中兴通讯股份有限公司 | System and method for preventing error detection of dual-tone multi-frequency signals |
CN101968957A (en) * | 2010-10-28 | 2011-02-09 | 哈尔滨工程大学 | Voice detection method under noise condition |
WO2015085532A1 (en) * | 2013-12-12 | 2015-06-18 | Spreadtrum Communications (Shanghai) Co., Ltd. | Signal noise reduction |
CN105118502A (en) * | 2015-07-14 | 2015-12-02 | 百度在线网络技术(北京)有限公司 | End point detection method and system of voice identification system |
CN105575405A (en) * | 2014-10-08 | 2016-05-11 | 展讯通信(上海)有限公司 | Double-microphone voice active detection method and voice acquisition device |
-
2016
- 2016-06-06 CN CN201610393406.XA patent/CN106098076B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101635865A (en) * | 2008-07-22 | 2010-01-27 | 中兴通讯股份有限公司 | System and method for preventing error detection of dual-tone multi-frequency signals |
CN101968957A (en) * | 2010-10-28 | 2011-02-09 | 哈尔滨工程大学 | Voice detection method under noise condition |
WO2015085532A1 (en) * | 2013-12-12 | 2015-06-18 | Spreadtrum Communications (Shanghai) Co., Ltd. | Signal noise reduction |
CN105575405A (en) * | 2014-10-08 | 2016-05-11 | 展讯通信(上海)有限公司 | Double-microphone voice active detection method and voice acquisition device |
CN105118502A (en) * | 2015-07-14 | 2015-12-02 | 百度在线网络技术(北京)有限公司 | End point detection method and system of voice identification system |
Non-Patent Citations (1)
Title |
---|
李灵光: "一种时频结合的抗噪性端点检测算法", 《计算机与现代化》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106531180B (en) * | 2016-12-10 | 2019-09-20 | 广州酷狗计算机科技有限公司 | Noise detecting method and device |
CN106531180A (en) * | 2016-12-10 | 2017-03-22 | 广州酷狗计算机科技有限公司 | Noise detection method and device |
CN106448696A (en) * | 2016-12-20 | 2017-02-22 | 成都启英泰伦科技有限公司 | Adaptive high-pass filtering speech noise reduction method based on background noise estimation |
CN107833579A (en) * | 2017-10-30 | 2018-03-23 | 广州酷狗计算机科技有限公司 | Noise cancellation method, device and computer-readable recording medium |
CN107833579B (en) * | 2017-10-30 | 2021-06-11 | 广州酷狗计算机科技有限公司 | Noise elimination method, device and computer readable storage medium |
CN108986830A (en) * | 2018-08-28 | 2018-12-11 | 安徽淘云科技有限公司 | A kind of audio corpus screening technique and device |
CN108986830B (en) * | 2018-08-28 | 2021-02-09 | 安徽淘云科技有限公司 | Audio corpus screening method and device |
CN111261143A (en) * | 2018-12-03 | 2020-06-09 | 杭州嘉楠耘智信息科技有限公司 | Voice wake-up method and device and computer readable storage medium |
CN111261143B (en) * | 2018-12-03 | 2024-03-22 | 嘉楠明芯(北京)科技有限公司 | Voice wakeup method and device and computer readable storage medium |
CN110021305A (en) * | 2019-01-16 | 2019-07-16 | 上海惠芽信息技术有限公司 | A kind of audio filtering method, audio filter and wearable device |
CN110021305B (en) * | 2019-01-16 | 2021-08-20 | 上海惠芽信息技术有限公司 | Audio filtering method, audio filtering device and wearable equipment |
CN109616098A (en) * | 2019-02-15 | 2019-04-12 | 北京嘉楠捷思信息技术有限公司 | Voice endpoint detection method and device based on frequency domain energy |
WO2020252782A1 (en) * | 2019-06-21 | 2020-12-24 | 深圳市汇顶科技股份有限公司 | Voice detection method, voice detection device, voice processing chip and electronic apparatus |
US11322174B2 (en) | 2019-06-21 | 2022-05-03 | Shenzhen GOODIX Technology Co., Ltd. | Voice detection from sub-band time-domain signals |
WO2021135547A1 (en) * | 2020-07-24 | 2021-07-08 | 平安科技(深圳)有限公司 | Human voice detection method, apparatus, device, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106098076B (en) | 2019-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106098076A (en) | A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method | |
CN106373587B (en) | Automatic acoustic feedback detection and removing method in a kind of real-time communication system | |
Moattar et al. | A simple but efficient real-time voice activity detection algorithm | |
CN105118502B (en) | End point detection method and system of voice identification system | |
CN103366739B (en) | Towards self-adaptation end-point detecting method and the system thereof of alone word voice identification | |
CN105810201B (en) | Voice activity detection method and its system | |
CN103325386A (en) | Method and system for signal transmission control | |
CN102314884B (en) | Voice-activation detecting method and device | |
JP6635440B2 (en) | Acquisition method of voice section correction frame number, voice section detection method and apparatus | |
CN103886871A (en) | Detection method of speech endpoint and device thereof | |
CN110047470A (en) | A kind of sound end detecting method | |
US20160077792A1 (en) | Methods and apparatus for unsupervised wakeup | |
WO2004075167A2 (en) | Log-likelihood ratio method for detecting voice activity and apparatus | |
CN106303878A (en) | One is uttered long and high-pitched sounds and is detected and suppressing method | |
CN104867499A (en) | Frequency-band-divided wiener filtering and de-noising method used for hearing aid and system thereof | |
CN105810214B (en) | Voice-activation detecting method and device | |
CN106504760B (en) | Broadband ambient noise and speech Separation detection system and method | |
CN110265058A (en) | Estimate the ambient noise in audio signal | |
US11341988B1 (en) | Hybrid learning-based and statistical processing techniques for voice activity detection | |
Hirszhorn et al. | Transient interference suppression in speech signals based on the OM-LSA algorithm | |
CN108847218B (en) | Self-adaptive threshold setting voice endpoint detection method, equipment and readable storage medium | |
CN106486133B (en) | One kind is uttered long and high-pitched sounds scene recognition method and equipment | |
EP3195314B1 (en) | Methods and apparatus for unsupervised wakeup | |
CN111755028A (en) | Near-field remote controller voice endpoint detection method and system based on fundamental tone characteristics | |
CN111128244A (en) | Short wave communication voice activation detection method based on zero crossing rate detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |