CN106098076A - A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method - Google Patents

A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method Download PDF

Info

Publication number
CN106098076A
CN106098076A CN201610393406.XA CN201610393406A CN106098076A CN 106098076 A CN106098076 A CN 106098076A CN 201610393406 A CN201610393406 A CN 201610393406A CN 106098076 A CN106098076 A CN 106098076A
Authority
CN
China
Prior art keywords
energy
frequency domain
time
voice
threshold value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610393406.XA
Other languages
Chinese (zh)
Other versions
CN106098076B (en
Inventor
何云鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Leader Technology Co Ltd
Chipintelli Technology Co Ltd
Original Assignee
Chengdu Leader Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Leader Technology Co Ltd filed Critical Chengdu Leader Technology Co Ltd
Priority to CN201610393406.XA priority Critical patent/CN106098076B/en
Publication of CN106098076A publication Critical patent/CN106098076A/en
Application granted granted Critical
Publication of CN106098076B publication Critical patent/CN106098076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/75Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • G10L2025/786Adaptive threshold

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention relates to the information processing technology and transducing signal process field, especially relate to a kind of based on dynamic noise estimation time-frequency domain self adaptation automatic speech detection method, the present invention carries out the detection of voice respectively according to the time domain short-time energy of sound and certain limit frequency domain short-time energy change, size finally according to the background noise energy that dynamic estimation goes out, select optimum result, thus the accuracy rate of speech recognition is greatly improved and improves the speech recognition adaptability to environmental change.

Description

A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method
Technical field
The present invention relates to the information processing technology and transducing signal process field, especially relate to a kind of based on dynamic noise Estimate time-frequency domain adaptive voice detection method.
Background technology
One focus in artificial intelligence application field is exactly speech recognition, and current speech recognition has begun in every field Extensively application.The realization of speech detection is the pith of speech recognition system real-time implementation, its objective is in complicated reality Environment is distinguished voice segments and non-speech segment.Having document to show, in actual application, discrimination relatively lower part is largely due to Correctly not processing voice, substantial amounts of non-voice information has had a strong impact on the accuracy rate of speech recognition system, particularly should With environment with the speech recognition of much noise, correct speech detection technology can be effectively reduced system operations amount, shortens system System processes the time, reduces mobile terminal and launches power and save channel resource, improves speech recognition accuracy, especially carries on the back in complexity Under scape noise, the quality of speech recognition system performance depends greatly on the quality of speech detection technology, the most steadily and surely, Accurately, in real time the speech detection technology that, adaptivity is strong and robustness is good is necessary to each speech recognition system.
When speech recognition technology is applied on mobile terminal especially mobile phone or voice remote controller at present, rely primarily on button side Formula determines the starting and ending of voice, but for a large amount of, this mode the farthest says that application is the most very inconvenient, to far saying or For the smart machine of support speech recognition that do not takes, robot, automatic speech detecting system is exactly requisite Parts.
The main stream approach of current automatic speech detection is dependent on short-time energy size in time domain, zero-crossing rate size, Yi Jipin Frequency band energy mean square deviation three kinds of methods in territory detect, and it is equal that concrete grammar formula obtains short-time energy, zero-crossing rate or frequency band energy Variance, then compares with an empirical value, individually compares short-time energy size or zero-crossing rate size it is demonstrated experimentally that this Method bad for noisy environmental suitability, especially when applied environment changes, the background of same environment is made an uproar Sound also can occur to change accordingly, and frequency band energy mean square deviation method is also adapted to bad for quiet environment.
For solving the problems referred to above, need to invent a kind of change according to time domain and spectrum domain voice average energy and carry out language respectively The detection of sound, the background noise size gone out finally according to dynamic estimation, select optimum result, thus voice is greatly improved and knows Other accuracy rate and the adaptability to environmental change.
Summary of the invention
It is an object of the invention to overcome the above-mentioned deficiency in the presence of prior art, it is provided that one can be greatly improved voice The accuracy rate identified and speech detection method adaptive to environmental change.
In order to achieve the above object, the invention provides following technical scheme.
A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method, it comprises the following steps:
Step one, is loaded into current frame data, and described current frame data is speech data in time domain;
Step 2, calculates in described time domain the energy summation of every frame sound of speech data as time domain short-time energy, and will be every In time domain described in frame, speech data is frequency domain data by FFT;
Step 3, chooses described frequency domain data certain frequency scope subband data, calculates described certain frequency scope subband data Energy cumulative as frequency domain short-time energy;
Step 4, background noise estimation unit calculates background noise energy, and frequency domain background energy computing unit calculates frequency domain Background energy;
Step 5, compares described time domain short-time energy with described background noise energy, and result is for make an uproar more than described background Acoustic energy is then voice, result be less than or equal to described background noise energy then for non-voice;
Step 6, compares described frequency domain short-time energy with described frequency domain background energy, and result is for carry on the back more than described frequency domain Scape energy is then voice, result be less than or equal to described frequency domain background energy then for non-voice;
Step 7, compares the threshold value one of described background noise energy and a default, if selecting first more than threshold value Step 6 compares the result for voice, if selecting step 5 compares the result for voice first less than or equal to threshold value;
Step 8, if described present frame result is detected as non-voice, then delivers to the described time domain short-time energy of described present frame Described background noise estimation unit adds up, after being added to the first frame number, accumulated value is obtained new divided by described first frame number The described frequency domain short-time energy of described present frame, as output, is delivered to described frequency domain background energy simultaneously and is calculated single by background noise Unit adds up, after being added to the second frame number, accumulated value is obtained new frequency domain background energy as defeated divided by described second frame number Go out.
Common speech energy has short-time stability, and described background noise energy has Long-term stability, time described Territory short-time energy compares with described background noise energy, and comparative result is as the time domain probability that this moment is voice, generally During non-voice, the cycle can be much larger than during voice, because described time domain short-time energy can be regarded as may contain voice and described background The acoustic energy of noise energy, and when time domain is long, energy is mainly made up of described background noise energy, described time domain short-time energy Time longer than described time domain, energy is big, then be that the probability of voice is the biggest, and when described time domain is long, energy is that dynamic calculation goes out, so The change of environment noise can be well adapted to, utilize the method ratio that described time domain short-time energy is compared with described background noise energy Relatively it is suitable for quiet environment, in order to improve the accuracy of speech detection, uses described time domain short-time energy and described background noise The new method that the method that the method for energy comparison and described frequency domain short-time energy are compared with described frequency domain background energy combines is entered Row speech detection, improves the accuracy of speech detection.
As the preferred version of the present invention, time domain short-time energy described in step 5 compares with described background noise energy Method relatively is that the threshold value two of the difference and default that deduct described background noise energy with described time domain short-time energy compares, Second result is voice more than described threshold value, and second result is non-voice less than or equal to described threshold value;
The method that frequency domain short-time energy described in step 6 and described frequency domain background energy compare is with described frequency domain in short-term The threshold value three of difference and default that energy deducts described frequency domain background energy compares, and result is then language more than described threshold value three Sound, result is then non-voice less than or equal to described threshold value three.
As the preferred version of the present invention, time domain short-time energy described in step 5 compares with described background noise energy Method relatively is to compare with the threshold value four of default with the ratio of described background noise energy with described time domain short-time energy, knot Fruit is voice more than described threshold value four fundamental rules, and result is non-voice less than or equal to described threshold value four fundamental rules;
The method that frequency domain short-time energy described in step 6 and described frequency domain background energy compare is with described frequency domain in short-term Energy compares with the threshold value five of default with the ratio of described frequency domain background energy, and result is then language more than described threshold value five Sound, result is then non-voice less than or equal to described threshold value five.
As the preferred version of the present invention, the frequency range that described frequency range behaviour speech energy is mainly distributed, people's Sound spectrum distribution is relatively wider, and people's sonic-frequency band interval can be arranged by two parameters, and one is upper threshold frequency, another Being lower frequency threshold value, usually more than the sound of this frequency range is often environment noise or other non-voice, at this frequency band In the range of, environmental noise power receives bigger suppression, in general people's acoustic energy be concentrated mainly on 300Hz to 4000Hz it Between, and within background noise energy is mainly distributed on 300Hz, takes voice and be mainly distributed the energy of frequency band range and compare, because of This is in this frequency band range, and when there being voice, described frequency domain short-time energy has significantly increases, therefore with described time domain in short-term Energy comparison is similar to, and compares with described frequency domain background energy with described frequency domain short-time energy, exceedes the described threshold value that system is arranged Three or described threshold value five, then this period big probability is voice.
As the preferred version of the present invention, the time range size of described frame between 10 milliseconds to 50 milliseconds, described One frame number and described second frame number are configured by system.
As the preferred version of the present invention, described background noise energy is that the described time domain that will be deemed as during non-voice is short The result that Shi Nengliang is averaging after carrying out adding up.
As the preferred version of the present invention, described frequency domain background energy is that the described frequency domain that will be deemed as during non-voice is short The result that Shi Nengliang is averaging after carrying out adding up.
Compared with prior art, beneficial effects of the present invention:
The present invention carries out the detection of voice respectively according to the change of time domain and spectrum domain voice average energy, finally according to dynamic estimation The background noise size gone out, selects optimum result, thus the accuracy rate of speech recognition is greatly improved and to environmental change Adaptability.
Accompanying drawing explanation
Fig. 1 is flow chart of the present invention;
Fig. 2 is that the present invention runs block diagram.
Detailed description of the invention
Below in conjunction with embodiment and detailed description of the invention, the present invention is described in further detail, but should this not understood Scope for aforementioned body of the present invention is only limitted to below example, and all technology realized based on present invention belong to this The scope of invention.
As it is shown in figure 1, one estimates time-frequency domain adaptive voice detection method based on dynamic noise, it includes following step Rapid:
Step one, is loaded into current frame data, and current frame data is speech data in time domain;
Step 2, in calculating time domain, the energy summation of every frame sound of speech data is as time domain short-time energy, and during by every frame In territory, speech data is frequency domain data by FFT;
Step 3, chooses frequency domain data certain frequency scope subband data, calculates the energy of certain frequency scope subband data also Cumulative as frequency domain short-time energy;
Step 4, background noise estimation unit calculates background noise energy, and frequency domain background energy computing unit calculates frequency domain Background energy;
Step 5, compares time domain short-time energy with background noise energy, and result is to be then more than background noise energy Voice, result be less than or equal to background noise energy then for non-voice;
Step 6, compares frequency domain short-time energy with frequency domain background energy, and result is to be then more than frequency domain background energy Voice, result be less than or equal to frequency domain background energy then for non-voice;
Step 7, compares the threshold value one of background noise energy and a default, if selecting step first more than threshold value The result for voice is compared, if selecting step 5 compares the result for voice first less than or equal to threshold value in six;
Step 8, if present frame result is detected as non-voice, then delivers to the time domain short-time energy of present frame background noise and estimates In unit cumulative, after being added to the first frame number, accumulated value is obtained new background noise energy as output divided by the first frame number, The frequency domain short-time energy of present frame is delivered in frequency domain background energy computing unit cumulative simultaneously, after being added to the second frame number, will Accumulated value obtains new frequency domain background energy as output divided by the second frame number.
As depicted in figs. 1 and 2, being first loaded into current frame data, current frame data is speech data in time domain, works as being loaded into Carry out the calculating of time domain short-time energy after front frame data, pass through calculating while time domain short-time energy speech data in by time domain FFT is frequency domain data, then calculates frequency domain short-time energy, background noise estimation unit calculates background noise energy, Frequency domain background energy is calculated, respectively by time domain short-time energy and background noise energy and frequency by frequency domain background energy computing unit Territory short-time energy compares with frequency domain background energy, uses time domain short-time energy and background noise energy in the present embodiment The threshold value two of difference and default compares difference and the default of also frequency domain short-time energy and frequency domain background energy The method that threshold value three compares, the difference of time domain short-time energy subtracting background noise energy compares with the threshold value two of default Relatively, second result is voice more than threshold value, and second result is non-voice less than or equal to threshold value, and frequency domain short-time energy deducts frequency domain The difference of background energy compares with the threshold value three of default, and result is then voice more than threshold value three, and result is less than or equal to threshold Value three is then non-voice, and two above-mentioned comparative results all export, and the threshold value one background noise energy and system arranged is carried out Relatively, if selecting step 6 compares the result for voice first more than threshold value, if selecting step 5 first less than or equal to threshold value Middle comparing the result for voice, in step 5 and step 6, comparative result is that the result of non-voice is delivered to background noise respectively Energy estimation unit and frequency domain background energy computing unit calculate new background noise energy and new frequency domain background energy, The frequency range that people's speech energy is mainly distributed in the present embodiment takes 300Hz to 4000Hz, and the time range size of frame exists Between 10 milliseconds to 50 milliseconds.
Use the threshold value four of the time domain short-time energy ratio with background noise energy and default in another embodiment Compare and method that the ratio of frequency domain short-time energy and frequency domain background energy compares with the threshold value five of default, time Territory short-time energy is compared with the threshold value four of default with the ratio of background noise energy, and result is voice more than threshold value four fundamental rules, Result is non-voice less than or equal to threshold value four fundamental rules, the ratio of frequency domain short-time energy and frequency domain background energy and the threshold of default Value five compares, and result is then voice more than threshold value five, and result is then non-voice less than or equal to threshold value five, and remaining calculates process All identical with previous embodiment, do not repeat them here.
Time domain short-time energy can also be used in other embodiments to set with system with the difference of background noise energy Fixed threshold value six compares and frequency domain short-time energy is compared with the threshold value seven of default with the ratio of frequency domain background energy Method etc. relatively.

Claims (7)

1. estimating a time-frequency domain adaptive voice detection method based on dynamic noise, it comprises the following steps:
Step one, is loaded into current frame data, and described current frame data is speech data in time domain;
Step 2, calculates in described time domain the energy summation of every frame sound of speech data as time domain short-time energy, and will be every In time domain described in frame, speech data is frequency domain data by FFT;
Step 3, chooses described frequency domain data certain frequency scope subband data, calculates described certain frequency scope subband data Energy cumulative as frequency domain short-time energy;
Step 4, background noise energy estimation unit calculates background noise energy, and frequency domain background energy computing unit calculates Frequency domain background energy;
Step 5, compares described time domain short-time energy with described background noise energy, and result is for make an uproar more than described background Acoustic energy is then voice, result be less than or equal to described background noise energy then for non-voice;
Step 6, compares described frequency domain short-time energy with described frequency domain background energy, and result is for carry on the back more than described frequency domain Scape energy is then voice, result be less than or equal to described frequency domain background energy then for non-voice;
Step 7, compares the threshold value one of described background noise energy and a default, if selecting first more than threshold value Step 6 compares the result for voice, if selecting step 5 compares the result for voice first less than or equal to threshold value;
Step 8, if described present frame result is detected as non-voice, then delivers to the described time domain short-time energy of described present frame Described background noise estimation unit adds up, after being added to the first frame number, accumulated value is obtained new divided by described first frame number The described frequency domain short-time energy of described present frame, as output, is delivered to described frequency domain background energy meter by background noise energy simultaneously Calculate in unit cumulative, after being added to the second frame number, accumulated value is obtained new frequency domain background energy divided by described second frame number and makees For output.
It is the most according to claim 1 based on dynamic noise estimation time-frequency domain adaptive voice detection method, it is characterised in that:
The method that time domain short-time energy described in step 5 and described background noise energy compare is by described time domain in short-term The threshold value two of difference and default that energy deducts described background noise energy compares, and second result is language more than described threshold value Sound, second result is non-voice less than or equal to described threshold value;
The method that frequency domain short-time energy described in step 6 and described frequency domain background energy compare is with described frequency domain in short-term The threshold value three of difference and default that energy deducts described frequency domain background energy compares, and result is then language more than described threshold value three Sound, result is then non-voice less than or equal to described threshold value three.
The most according to claim 1 based on dynamic noise estimation time-frequency domain self adaptation automatic speech detection method, its feature It is:
The method that time domain short-time energy described in step 5 and described background noise energy compare is by described time domain in short-term Energy compares with the threshold value four of default with the ratio of described background noise energy, and result is language more than described threshold value four fundamental rules Sound, result is non-voice less than or equal to described threshold value four fundamental rules;
The method that frequency domain short-time energy described in step 6 and described frequency domain background energy compare is with described frequency domain in short-term Energy compares with the threshold value five of default with the ratio of described frequency domain background energy, and result is then language more than described threshold value five Sound, result is then non-voice less than or equal to described threshold value five.
It is the most according to claim 1 based on dynamic noise estimation time-frequency domain adaptive voice detection method, it is characterised in that: The frequency range that described frequency range behaviour speech energy is mainly distributed, described frequency range passes through upper threshold frequency and lower frequency Threshold value determines.
It is the most according to claim 1 based on dynamic noise estimation time-frequency domain adaptive voice detection method, it is characterised in that: The time range size of described frame is between 10 milliseconds to 50 milliseconds, and described first frame number and described second frame number are joined by system Put.
It is the most according to claim 1 based on dynamic noise estimation time-frequency domain adaptive voice detection method, it is characterised in that: Described background noise energy is the result being averaging after the described time domain short-time energy that will be deemed as during non-voice carries out adding up.
It is the most according to claim 1 based on dynamic noise estimation time-frequency domain adaptive voice detection method, it is characterised in that: Described frequency domain background energy is the result being averaging after the described frequency domain short-time energy that will be deemed as during non-voice carries out adding up.
CN201610393406.XA 2016-06-06 2016-06-06 One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise Active CN106098076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610393406.XA CN106098076B (en) 2016-06-06 2016-06-06 One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610393406.XA CN106098076B (en) 2016-06-06 2016-06-06 One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise

Publications (2)

Publication Number Publication Date
CN106098076A true CN106098076A (en) 2016-11-09
CN106098076B CN106098076B (en) 2019-05-21

Family

ID=57447624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610393406.XA Active CN106098076B (en) 2016-06-06 2016-06-06 One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise

Country Status (1)

Country Link
CN (1) CN106098076B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448696A (en) * 2016-12-20 2017-02-22 成都启英泰伦科技有限公司 Adaptive high-pass filtering speech noise reduction method based on background noise estimation
CN106531180A (en) * 2016-12-10 2017-03-22 广州酷狗计算机科技有限公司 Noise detection method and device
CN107833579A (en) * 2017-10-30 2018-03-23 广州酷狗计算机科技有限公司 Noise cancellation method, device and computer-readable recording medium
CN108986830A (en) * 2018-08-28 2018-12-11 安徽淘云科技有限公司 A kind of audio corpus screening technique and device
CN109616098A (en) * 2019-02-15 2019-04-12 北京嘉楠捷思信息技术有限公司 Voice endpoint detection method and device based on frequency domain energy
CN110021305A (en) * 2019-01-16 2019-07-16 上海惠芽信息技术有限公司 A kind of audio filtering method, audio filter and wearable device
CN111261143A (en) * 2018-12-03 2020-06-09 杭州嘉楠耘智信息科技有限公司 Voice wake-up method and device and computer readable storage medium
WO2020252782A1 (en) * 2019-06-21 2020-12-24 深圳市汇顶科技股份有限公司 Voice detection method, voice detection device, voice processing chip and electronic apparatus
WO2021135547A1 (en) * 2020-07-24 2021-07-08 平安科技(深圳)有限公司 Human voice detection method, apparatus, device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101635865A (en) * 2008-07-22 2010-01-27 中兴通讯股份有限公司 System and method for preventing error detection of dual-tone multi-frequency signals
CN101968957A (en) * 2010-10-28 2011-02-09 哈尔滨工程大学 Voice detection method under noise condition
WO2015085532A1 (en) * 2013-12-12 2015-06-18 Spreadtrum Communications (Shanghai) Co., Ltd. Signal noise reduction
CN105118502A (en) * 2015-07-14 2015-12-02 百度在线网络技术(北京)有限公司 End point detection method and system of voice identification system
CN105575405A (en) * 2014-10-08 2016-05-11 展讯通信(上海)有限公司 Double-microphone voice active detection method and voice acquisition device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101635865A (en) * 2008-07-22 2010-01-27 中兴通讯股份有限公司 System and method for preventing error detection of dual-tone multi-frequency signals
CN101968957A (en) * 2010-10-28 2011-02-09 哈尔滨工程大学 Voice detection method under noise condition
WO2015085532A1 (en) * 2013-12-12 2015-06-18 Spreadtrum Communications (Shanghai) Co., Ltd. Signal noise reduction
CN105575405A (en) * 2014-10-08 2016-05-11 展讯通信(上海)有限公司 Double-microphone voice active detection method and voice acquisition device
CN105118502A (en) * 2015-07-14 2015-12-02 百度在线网络技术(北京)有限公司 End point detection method and system of voice identification system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李灵光: "一种时频结合的抗噪性端点检测算法", 《计算机与现代化》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106531180B (en) * 2016-12-10 2019-09-20 广州酷狗计算机科技有限公司 Noise detecting method and device
CN106531180A (en) * 2016-12-10 2017-03-22 广州酷狗计算机科技有限公司 Noise detection method and device
CN106448696A (en) * 2016-12-20 2017-02-22 成都启英泰伦科技有限公司 Adaptive high-pass filtering speech noise reduction method based on background noise estimation
CN107833579A (en) * 2017-10-30 2018-03-23 广州酷狗计算机科技有限公司 Noise cancellation method, device and computer-readable recording medium
CN107833579B (en) * 2017-10-30 2021-06-11 广州酷狗计算机科技有限公司 Noise elimination method, device and computer readable storage medium
CN108986830A (en) * 2018-08-28 2018-12-11 安徽淘云科技有限公司 A kind of audio corpus screening technique and device
CN108986830B (en) * 2018-08-28 2021-02-09 安徽淘云科技有限公司 Audio corpus screening method and device
CN111261143A (en) * 2018-12-03 2020-06-09 杭州嘉楠耘智信息科技有限公司 Voice wake-up method and device and computer readable storage medium
CN111261143B (en) * 2018-12-03 2024-03-22 嘉楠明芯(北京)科技有限公司 Voice wakeup method and device and computer readable storage medium
CN110021305A (en) * 2019-01-16 2019-07-16 上海惠芽信息技术有限公司 A kind of audio filtering method, audio filter and wearable device
CN110021305B (en) * 2019-01-16 2021-08-20 上海惠芽信息技术有限公司 Audio filtering method, audio filtering device and wearable equipment
CN109616098A (en) * 2019-02-15 2019-04-12 北京嘉楠捷思信息技术有限公司 Voice endpoint detection method and device based on frequency domain energy
WO2020252782A1 (en) * 2019-06-21 2020-12-24 深圳市汇顶科技股份有限公司 Voice detection method, voice detection device, voice processing chip and electronic apparatus
US11322174B2 (en) 2019-06-21 2022-05-03 Shenzhen GOODIX Technology Co., Ltd. Voice detection from sub-band time-domain signals
WO2021135547A1 (en) * 2020-07-24 2021-07-08 平安科技(深圳)有限公司 Human voice detection method, apparatus, device, and storage medium

Also Published As

Publication number Publication date
CN106098076B (en) 2019-05-21

Similar Documents

Publication Publication Date Title
CN106098076A (en) A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method
CN106373587B (en) Automatic acoustic feedback detection and removing method in a kind of real-time communication system
Moattar et al. A simple but efficient real-time voice activity detection algorithm
CN105118502B (en) End point detection method and system of voice identification system
CN103366739B (en) Towards self-adaptation end-point detecting method and the system thereof of alone word voice identification
CN105810201B (en) Voice activity detection method and its system
CN103325386A (en) Method and system for signal transmission control
CN102314884B (en) Voice-activation detecting method and device
JP6635440B2 (en) Acquisition method of voice section correction frame number, voice section detection method and apparatus
CN103886871A (en) Detection method of speech endpoint and device thereof
CN110047470A (en) A kind of sound end detecting method
US20160077792A1 (en) Methods and apparatus for unsupervised wakeup
WO2004075167A2 (en) Log-likelihood ratio method for detecting voice activity and apparatus
CN106303878A (en) One is uttered long and high-pitched sounds and is detected and suppressing method
CN104867499A (en) Frequency-band-divided wiener filtering and de-noising method used for hearing aid and system thereof
CN105810214B (en) Voice-activation detecting method and device
CN106504760B (en) Broadband ambient noise and speech Separation detection system and method
CN110265058A (en) Estimate the ambient noise in audio signal
US11341988B1 (en) Hybrid learning-based and statistical processing techniques for voice activity detection
Hirszhorn et al. Transient interference suppression in speech signals based on the OM-LSA algorithm
CN108847218B (en) Self-adaptive threshold setting voice endpoint detection method, equipment and readable storage medium
CN106486133B (en) One kind is uttered long and high-pitched sounds scene recognition method and equipment
EP3195314B1 (en) Methods and apparatus for unsupervised wakeup
CN111755028A (en) Near-field remote controller voice endpoint detection method and system based on fundamental tone characteristics
CN111128244A (en) Short wave communication voice activation detection method based on zero crossing rate detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant