CN106098076B - One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise - Google Patents

One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise Download PDF

Info

Publication number
CN106098076B
CN106098076B CN201610393406.XA CN201610393406A CN106098076B CN 106098076 B CN106098076 B CN 106098076B CN 201610393406 A CN201610393406 A CN 201610393406A CN 106098076 B CN106098076 B CN 106098076B
Authority
CN
China
Prior art keywords
energy
frequency domain
time
voice
threshold value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610393406.XA
Other languages
Chinese (zh)
Other versions
CN106098076A (en
Inventor
何云鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Leader Technology Co Ltd
Original Assignee
Chengdu Leader Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Leader Technology Co Ltd filed Critical Chengdu Leader Technology Co Ltd
Priority to CN201610393406.XA priority Critical patent/CN106098076B/en
Publication of CN106098076A publication Critical patent/CN106098076A/en
Application granted granted Critical
Publication of CN106098076B publication Critical patent/CN106098076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/75Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • G10L2025/786Adaptive threshold

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention relates to the information processing technology and transducing signal process fields, it especially relates to a kind of based on the dynamic noise estimation adaptive automatic speech detection method of time-frequency domain, the present invention changes the detection for carrying out voice respectively according to the time domain short-time energy and a certain range frequency domain short-time energy of sound, the size of the background noise energy finally gone out according to dynamic estimation, it selects optimal as a result, thus the adaptability that the accuracy rate of speech recognition greatly improved and improve speech recognition to environmental change.

Description

One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise
Technical field
The present invention relates to the information processing technology and transducing signal process fields, especially relate to a kind of based on dynamic noise Estimate time-frequency domain adaptive voice detection method.
Background technique
One hot spot in artificial intelligence application field is exactly speech recognition, and speech recognition has begun in every field at present It is widely applied.The realization of speech detection is the pith of speech recognition system real-time implementation, and the purpose is in complicated reality Voice segments and non-speech segment are distinguished in environment.There is document to show that discrimination is largely due to compared with lower part in practical application Voice is not handled correctly, a large amount of non-voice information has seriously affected the accuracy rate of speech recognition system, especially answers The speech recognition of much noise is had with environment, correct speech detection technology can be effectively reduced system operations amount, shorten system The system processing time reduces mobile terminal transmission power and saves channel resource, improves speech recognition accuracy, especially carries on the back in complexity Under scape noise, the superiority and inferiority of speech recognition system performance depends greatly on the superiority and inferiority of speech detection technology, therefore steadily and surely, Accurately, in real time, the speech detection technology that adaptivity is strong and robustness is good be necessary to each speech recognition system.
Speech recognition technology is on mobile terminal especially mobile phone or voice remote controller in application, relying primarily on key side at present Formula determines the starting and ending of voice, however this mode far says that application is then very inconvenient for largely, to far saying either For the smart machine of the support speech recognition not taken, robot, automatic speech detection system is exactly essential Component.
The main stream approach of current automatic speech detection is by short-time energy size in time domain, zero-crossing rate size, Yi Jipin Domain Frequency band energy mean square deviation three kinds of methods detect, and it is equal that specific method formula finds out short-time energy, zero-crossing rate or frequency band energy Then variance is compared with an empirical value, it is demonstrated experimentally that this independent relatively short-time energy size or zero-crossing rate size Method it is bad for noisy environmental suitability, especially when application environment changes, the background of same environment is made an uproar Sound can also occur to change accordingly, and frequency band energy mean square deviation method quiet environment is also adapted to it is bad.
To solve the above problems, need to invent a kind of variation according to time domain and spectrum domain voice average energy carries out language respectively The detection of sound, the ambient noise size finally gone out according to dynamic estimation select optimal as a result, to which voice knowledge greatly improved Other accuracy rate and adaptability to environmental change.
Summary of the invention
It is an object of the invention to overcome the above-mentioned deficiency in the presence of the prior art, providing a kind of can greatly improve voice The accuracy rate of identification and speech detection method to environmental change adaptability.
In order to achieve the above object, the present invention provides following technical solutions.
One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise comprising following steps:
Step 1, is loaded into current frame data, and the current frame data is voice data in time domain;
Step 2 calculates the energy summation of every frame sound of voice data in the time domain as time domain short-time energy, and It is frequency domain data that voice data in time domain described in every frame, which is passed through FFT transform,;
Step 3 chooses the frequency domain data certain frequency range subband data, calculates the certain frequency range subband The energy of data is simultaneously cumulative as frequency domain short-time energy;
Step 4, ambient noise estimation unit calculate background noise energy, and frequency domain background energy computing unit calculates Frequency domain background energy;
The time domain short-time energy is compared by step 5 with the background noise energy, and result is greater than the back Scape noise energy is then voice, result be less than or equal to the background noise energy be then non-voice;
The frequency domain short-time energy is compared by step 6 with the frequency domain background energy, and result is greater than the frequency Domain background energy is then voice, result be less than or equal to the frequency domain background energy be then non-voice;
The threshold value one of the background noise energy and a default is compared, first if more than threshold value by step 7 It selects to compare for voice in step 6 as a result, first selecting to compare the result for voice in step 5 if being less than or equal to threshold value;
Step 8, if the present frame result is detected as non-voice, by the time domain short-time energy of the present frame It is sent in the ambient noise estimation unit and adds up, after being added to the first frame number, accumulated value is obtained divided by first frame number The frequency domain short-time energy of the present frame is sent to the frequency domain background energy meter as output by new ambient noise It calculates in unit and adds up, after being added to the second frame number, accumulated value is obtained into new frequency domain background energy divided by second frame number and is made For output.
Common speech energy have short-time stability, and the background noise energy have it is long when stability, when described Domain short-time energy is compared with the background noise energy, and comparison result is the time domain probability of voice as the moment, usually The period can be much larger than during voice, because the time domain short-time energy, which can be regarded as, may contain voice and the background during non-voice The acoustic energy of noise energy, and energy is mainly made of the background noise energy when time domain is long, the time domain short-time energy Energy is big when longer than the time domain, then be voice probability with regard to big, and energy is that dynamic is calculated when the time domain is long, so The variation that can well adapt to ambient noise utilizes method ratio of the time domain short-time energy compared with the background noise energy Relatively it is suitble to quiet environment, in order to improve the accuracy of speech detection, uses the time domain short-time energy and the ambient noise The new method that the method for the method of energy comparison and the frequency domain short-time energy compared with the frequency domain background energy combines into Row speech detection improves the accuracy of speech detection.
As a preferred solution of the present invention, time domain short-time energy described in step 5 is compared with the background noise energy Compared with method be to subtract the difference of the background noise energy compared with the threshold value two of default with the time domain short-time energy, As a result second being greater than the threshold value is voice, second being as a result less than or equal to the threshold value is non-voice;
The method that frequency domain short-time energy described in step 6 is compared with the frequency domain background energy is with the frequency domain Short-time energy subtracts the difference of the frequency domain background energy compared with the threshold value three of default, is as a result greater than the threshold value three then For voice, being as a result less than or equal to the threshold value three is then non-voice.
As a preferred solution of the present invention, time domain short-time energy described in step 5 is compared with the background noise energy Compared with method be with the time domain short-time energy with the ratio of the background noise energy compared with the threshold value four of default, knot It is voice that fruit, which is greater than the threshold value four fundamental rules, and being as a result less than or equal to the threshold value four fundamental rules is non-voice;
The method that frequency domain short-time energy described in step 6 is compared with the frequency domain background energy is with the frequency domain With the ratio of the frequency domain background energy compared with the threshold value five of default, being as a result greater than the threshold value five is then for short-time energy Voice, being as a result less than or equal to the threshold value five is then non-voice.
As a preferred solution of the present invention, the frequency range is the frequency range that people's speech energy is mainly distributed, people's Sound spectrum distribution is wider, and voice frequency band section can be arranged by two parameters, first is that upper threshold frequency, another It is lower frequency threshold value, the sound of usually more than this frequency range is often ambient noise or other non-voice, in the frequency band In range, environmental noise power receives biggish inhibition, in general voice energy be concentrated mainly on 300Hz to 4000Hz it Between, and background noise energy is mainly distributed within 300Hz, the energy for taking voice to be mainly distributed frequency range is compared, because This is in the frequency range, and when there is voice, the frequency domain short-time energy, which has, significantly increases, therefore in short-term with the time domain Energy comparison is similar, with the frequency domain short-time energy compared with the frequency domain background energy, more than the threshold value of system setting Three or the threshold value five, then the period maximum probability is voice.
As a preferred solution of the present invention, the time range size of the frame is between 10 milliseconds to 50 milliseconds, and described One frame number and second frame number are by system configuration.
As a preferred solution of the present invention, the background noise energy is that the time domain during will be deemed as non-voice is short Shi Nengliang carries out the result being averaging after adding up.
As a preferred solution of the present invention, the frequency domain background energy is that the frequency domain during will be deemed as non-voice is short Shi Nengliang carries out the result being averaging after adding up.
Compared with prior art, beneficial effects of the present invention:
The present invention carries out the detection of voice according to the variation of time domain and spectrum domain voice average energy respectively, finally according to dynamic The ambient noise size estimated selects optimal as a result, so that the accuracy rate of speech recognition greatly improved and to environment The adaptability of variation.
Detailed description of the invention
Fig. 1 is flow chart of the present invention;
Fig. 2 is present invention operation block diagram.
Specific embodiment
Below with reference to embodiment and specific embodiment, the present invention is described in further detail, but should not understand this It is only limitted to embodiment below for the range of aforementioned body of the present invention, it is all that this is belonged to based on the technology that the content of present invention is realized The range of invention.
As shown in Figure 1, a kind of estimate time-frequency domain adaptive voice detection method based on dynamic noise comprising following step It is rapid:
Step 1, is loaded into current frame data, and current frame data is voice data in time domain;
Step 2 calculates the energy summation of every frame sound of voice data in time domain as time domain short-time energy, and will be every Voice data is frequency domain data by FFT transform in frame time domain;
Step 3 chooses frequency domain data certain frequency range subband data, calculates the energy of certain frequency range subband data It measures and adds up as frequency domain short-time energy;
Step 4, ambient noise estimation unit calculate background noise energy, and frequency domain background energy computing unit calculates Frequency domain background energy;
Time domain short-time energy is compared by step 5 with background noise energy, and result is greater than background noise energy Then be voice, result be less than or equal to background noise energy be then non-voice;
Frequency domain short-time energy is compared by step 6 with frequency domain background energy, and result is greater than frequency domain background energy Then be voice, result be less than or equal to frequency domain background energy be then non-voice;
The threshold value one of background noise energy and a default is compared, first selects if more than threshold value by step 7 Compare for voice in step 6 as a result, first selecting to compare the result for voice in step 5 if being less than or equal to threshold value;
The time domain short-time energy of present frame is sent to ambient noise if present frame result is detected as non-voice by step 8 In estimation unit add up, after being added to the first frame number, using accumulated value divided by the first frame number obtain new background noise energy as Output, while the frequency domain short-time energy of present frame being sent in frequency domain background energy computing unit and is added up, it is added to the second frame number Afterwards, accumulated value is obtained into new frequency domain background energy as output divided by the second frame number.
As depicted in figs. 1 and 2, it is first loaded into current frame data, current frame data is voice data in time domain, is worked as in loading The calculating that time domain short-time energy is carried out after preceding frame data, passes through voice data in time domain while calculating time domain short-time energy FFT transform is frequency domain data, then calculates frequency domain short-time energy, calculates background noise energy by ambient noise estimation unit, Frequency domain background energy is calculated by frequency domain background energy computing unit, respectively by time domain short-time energy and background noise energy and frequency Domain short-time energy is compared with frequency domain background energy, in the present embodiment using time domain short-time energy and background noise energy Difference and the threshold value of default two are compared and the difference of frequency domain short-time energy and frequency domain background energy and default The method that threshold value three is compared, time domain short-time energy subtracts the difference of background noise energy and the threshold value two of default compares Compared with second being as a result greater than threshold value is voice, second being as a result less than or equal to threshold value is non-voice, frequency domain short-time energy subtracts frequency domain For the difference of background energy compared with the threshold value three of default, being as a result greater than threshold value three is then voice, is as a result less than or equal to threshold Value three is then non-voice, and two above-mentioned comparison results export, and the threshold value one that background noise energy is arranged with system carries out Compare, first selects to compare for voice in step 6 if more than threshold value as a result, if first being less than or equal to threshold value selects step 5 Middle comparison is voice as a result, comparison result is that the result of non-voice is delivered to ambient noise respectively in step 5 and step 6 New background noise energy and new frequency domain background energy are calculated in energy estimation unit and frequency domain background energy computing unit, The frequency range that people's speech energy is mainly distributed in the present embodiment takes 300Hz to 4000Hz, and the time range size of frame exists Between 10 milliseconds to 50 milliseconds.
The ratio of time domain short-time energy and background noise energy and the threshold value four of default are used in another embodiment It is compared and method that the threshold value five of the ratio of frequency domain short-time energy and frequency domain background energy and default is compared, when With the ratio of background noise energy compared with the threshold value four of default, being as a result greater than threshold value four fundamental rules is voice for domain short-time energy, As a result being less than or equal to threshold value four fundamental rules is non-voice, frequency domain short-time energy and the ratio of frequency domain background energy and the threshold of default Value five compares, and being as a result greater than threshold value five is then voice, and being as a result less than or equal to threshold value five is then non-voice, remaining calculating process Identical as previous embodiment, details are not described herein.
It can also be set in other embodiments using the difference using time domain short-time energy and background noise energy with system Fixed threshold value six is compared and frequency domain short-time energy is compared with the ratio of frequency domain background energy and the threshold value seven of default Compared with method etc..

Claims (7)

1. one kind estimates time-frequency domain adaptive voice detection method based on dynamic noise comprising following steps:
Step 1, is loaded into current frame data, and the current frame data is voice data in time domain;
Step 2 calculates the energy summation of every frame sound of voice data in the time domain as time domain short-time energy, and will be every Voice data is frequency domain data by FFT transform in time domain described in frame;
Step 3 chooses the frequency domain data certain frequency range subband data, calculates the certain frequency range subband data Energy and cumulative be used as frequency domain short-time energy;
Step 4, background noise energy estimation unit calculate background noise energy, and frequency domain background energy computing unit calculates Frequency domain background energy;
The time domain short-time energy is compared by step 5 with the background noise energy, and result is to make an uproar greater than the background Acoustic energy is then voice, result be less than or equal to the background noise energy be then non-voice;
The frequency domain short-time energy is compared by step 6 with the frequency domain background energy, and result is to carry on the back greater than the frequency domain Scape energy is then voice, result be less than or equal to the frequency domain background energy be then non-voice;
The threshold value one of the background noise energy and a default is compared, first selects if more than threshold value by step 7 Compare for voice in step 6 as a result, first selecting to compare the result for voice in step 5 if being less than or equal to threshold value;
The time domain short-time energy of the present frame is sent to by step 8 if the present frame result is detected as non-voice It adds up in the ambient noise estimation unit, after being added to the first frame number, accumulated value is obtained divided by first frame number new The frequency domain short-time energy of the present frame is sent to the frequency domain background energy meter as output by background noise energy It calculates in unit and adds up, after being added to the second frame number, accumulated value is obtained into new frequency domain background energy divided by second frame number and is made For output.
2. according to claim 1 estimate time-frequency domain adaptive voice detection method based on dynamic noise, it is characterised in that:
The method that time domain short-time energy described in step 5 is compared with the background noise energy be with the time domain in short-term Energy subtracts the difference of the background noise energy compared with the threshold value two of default, second being as a result greater than the threshold value is language Sound, second being as a result less than or equal to the threshold value is non-voice;
The method that frequency domain short-time energy described in step 6 is compared with the frequency domain background energy be with the frequency domain in short-term Energy subtracts the difference of the frequency domain background energy compared with the threshold value three of default, and being as a result greater than the threshold value three is then language Sound, being as a result less than or equal to the threshold value three is then non-voice.
3. according to claim 1 estimate time-frequency domain adaptive voice detection method based on dynamic noise, it is characterised in that:
The method that time domain short-time energy described in step 5 is compared with the background noise energy be with the time domain in short-term For energy with the ratio of the background noise energy compared with the threshold value four of default, being as a result greater than the threshold value four fundamental rules is language Sound, being as a result less than or equal to the threshold value four fundamental rules is non-voice;
The method that frequency domain short-time energy described in step 6 is compared with the frequency domain background energy be with the frequency domain in short-term For energy with the ratio of the frequency domain background energy compared with the threshold value five of default, being as a result greater than the threshold value five is then language Sound, being as a result less than or equal to the threshold value five is then non-voice.
4. according to claim 1 estimate time-frequency domain adaptive voice detection method based on dynamic noise, it is characterised in that: The frequency range is the frequency range that people's speech energy is mainly distributed, and the frequency range passes through upper threshold frequency and lower frequency Threshold value determines.
5. according to claim 1 estimate time-frequency domain adaptive voice detection method based on dynamic noise, it is characterised in that: The time range size of the frame between 10 milliseconds to 50 milliseconds, matched by system by first frame number and second frame number It sets.
6. according to claim 1 estimate time-frequency domain adaptive voice detection method based on dynamic noise, it is characterised in that: The background noise energy is that the time domain short-time energy during will be deemed as non-voice carries out the result being averaging after adding up.
7. according to claim 1 estimate time-frequency domain adaptive voice detection method based on dynamic noise, it is characterised in that: The frequency domain background energy is that the frequency domain short-time energy during will be deemed as non-voice carries out the result being averaging after adding up.
CN201610393406.XA 2016-06-06 2016-06-06 One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise Active CN106098076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610393406.XA CN106098076B (en) 2016-06-06 2016-06-06 One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610393406.XA CN106098076B (en) 2016-06-06 2016-06-06 One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise

Publications (2)

Publication Number Publication Date
CN106098076A CN106098076A (en) 2016-11-09
CN106098076B true CN106098076B (en) 2019-05-21

Family

ID=57447624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610393406.XA Active CN106098076B (en) 2016-06-06 2016-06-06 One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise

Country Status (1)

Country Link
CN (1) CN106098076B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106531180B (en) * 2016-12-10 2019-09-20 广州酷狗计算机科技有限公司 Noise detecting method and device
CN106448696A (en) * 2016-12-20 2017-02-22 成都启英泰伦科技有限公司 Adaptive high-pass filtering speech noise reduction method based on background noise estimation
CN107833579B (en) * 2017-10-30 2021-06-11 广州酷狗计算机科技有限公司 Noise elimination method, device and computer readable storage medium
CN108986830B (en) * 2018-08-28 2021-02-09 安徽淘云科技有限公司 Audio corpus screening method and device
CN111261143B (en) * 2018-12-03 2024-03-22 嘉楠明芯(北京)科技有限公司 Voice wakeup method and device and computer readable storage medium
CN110021305B (en) * 2019-01-16 2021-08-20 上海惠芽信息技术有限公司 Audio filtering method, audio filtering device and wearable equipment
CN109616098B (en) * 2019-02-15 2022-04-01 嘉楠明芯(北京)科技有限公司 Voice endpoint detection method and device based on frequency domain energy
WO2020252782A1 (en) * 2019-06-21 2020-12-24 深圳市汇顶科技股份有限公司 Voice detection method, voice detection device, voice processing chip and electronic apparatus
CN111883182B (en) * 2020-07-24 2024-03-19 平安科技(深圳)有限公司 Human voice detection method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101635865B (en) * 2008-07-22 2012-07-04 中兴通讯股份有限公司 System and method for preventing error detection of dual-tone multi-frequency signals
CN101968957B (en) * 2010-10-28 2012-02-01 哈尔滨工程大学 Voice detection method under noise condition
WO2015085532A1 (en) * 2013-12-12 2015-06-18 Spreadtrum Communications (Shanghai) Co., Ltd. Signal noise reduction
CN105575405A (en) * 2014-10-08 2016-05-11 展讯通信(上海)有限公司 Double-microphone voice active detection method and voice acquisition device
CN105118502B (en) * 2015-07-14 2017-05-10 百度在线网络技术(北京)有限公司 End point detection method and system of voice identification system

Also Published As

Publication number Publication date
CN106098076A (en) 2016-11-09

Similar Documents

Publication Publication Date Title
CN106098076B (en) One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise
Moattar et al. A simple but efficient real-time voice activity detection algorithm
US8515097B2 (en) Single microphone wind noise suppression
CN104464722B (en) Voice activity detection method and apparatus based on time domain and frequency domain
WO2016049611A1 (en) Neural network voice activity detection employing running range normalization
CN103325386A (en) Method and system for signal transmission control
CN105810201B (en) Voice activity detection method and its system
CN107863099B (en) Novel double-microphone voice detection and enhancement method
CN105810214B (en) Voice-activation detecting method and device
WO2004075167A2 (en) Log-likelihood ratio method for detecting voice activity and apparatus
WO2008011319A3 (en) Method and system for near-end detection
CN106504760B (en) Broadband ambient noise and speech Separation detection system and method
GB2426368A (en) Using input signal quality in speeech recognition
CN102760444A (en) Support vector machine based classification method of base-band time-domain voice-frequency signal
CN110349598A (en) A kind of end-point detecting method under low signal-to-noise ratio environment
US11341988B1 (en) Hybrid learning-based and statistical processing techniques for voice activity detection
CN103617801A (en) Voice detection method and device and electronic equipment
CN103905656A (en) Residual echo detection method and apparatus
WO2017128910A1 (en) Method, apparatus and electronic device for determining speech presence probability
CN100369113C (en) Method for adaptively improving speech recognition rate by means of gain
CN112216285B (en) Multi-user session detection method, system, mobile terminal and storage medium
Sakhnov et al. Dynamical energy-based speech/silence detector for speech enhancement applications
CN106486133B (en) One kind is uttered long and high-pitched sounds scene recognition method and equipment
US20230247350A1 (en) Processor
CN111755028A (en) Near-field remote controller voice endpoint detection method and system based on fundamental tone characteristics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant