CN106504760B - Broadband ambient noise and speech Separation detection system and method - Google Patents

Broadband ambient noise and speech Separation detection system and method Download PDF

Info

Publication number
CN106504760B
CN106504760B CN201610947596.5A CN201610947596A CN106504760B CN 106504760 B CN106504760 B CN 106504760B CN 201610947596 A CN201610947596 A CN 201610947596A CN 106504760 B CN106504760 B CN 106504760B
Authority
CN
China
Prior art keywords
speech
time
voice
energy
frequency domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610947596.5A
Other languages
Chinese (zh)
Other versions
CN106504760A (en
Inventor
何云鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Leader Technology Co Ltd
Original Assignee
Chengdu Leader Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Leader Technology Co Ltd filed Critical Chengdu Leader Technology Co Ltd
Priority to CN201610947596.5A priority Critical patent/CN106504760B/en
Publication of CN106504760A publication Critical patent/CN106504760A/en
Application granted granted Critical
Publication of CN106504760B publication Critical patent/CN106504760B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Abstract

The present invention relates to the information processing technology and transducing signal process fields, especially relate to a kind of broadband ambient noise and speech Separation detection system, the system includes present frame time domain circuit for calculating energy, ambient noise counting circuit, time domain speech detects long short-time average energy comparison circuit, frequency domain speech detects length time-frequency domain energy comparison circuit, ambient noise comparison circuit, sub-belt energy distributing homogeneity speech detection circuit and number of speech frames statistical circuit, the invention also discloses a kind of broadband ambient noises and speech Separation detection method, the present invention uses three-level speech detection means, there is good detection effect for the ambient noise of low-and high-frequency, also there is extraordinary detection effect simultaneously for the noise of accidental discontinuously row, the accuracy of speech detection under complicated noise is greatly improved.

Description

Broadband ambient noise and speech Separation detection system and method
Technical field
The present invention relates to the information processing technology and transducing signal process field, especially relates to a kind of broadband background and make an uproar Sound and speech Separation detection system and method.
Background technique
One hot spot in artificial intelligence application field is exactly speech recognition, and speech recognition has begun in every field at present Start to be widely applied.Speech detection realization is the pith of speech recognition system real-time implementation, and the purpose is in complicated reality Voice segments and non-speech segment are distinguished in the environment of border, have document show in practical application discrimination compared with lower part be largely by In not handled correctly voice, a large amount of non-speech noise seriously affects the accuracy rate of speech recognition system, especially answers The speech recognition of much noise is had with environment, correct speech detection technology can be effectively reduced system operations amount, shorten system The system processing time reduces mobile terminal transmission power and saves channel resource, improves speech recognition accuracy, especially carries on the back in complexity Under scape noise, the superiority and inferiority of speech recognition system performance depends greatly on the superiority and inferiority of speech detection technology, therefore steadily and surely, Accurately, in real time, the speech detection technology that adaptivity is strong and robustness is good be necessary to each speech recognition system.
The main stream approach of current automatic speech end-point detection is to rely on short-time energy size in time domain, zero-crossing rate size, with And three kinds of methods of frequency domain Frequency band energy mean square deviation detect, specific method is to find out short-time energy, zero-crossing rate or frequency band energy Mean square deviation is measured, is then compared with an empirical value, it is demonstrated experimentally that this independent relatively short-time energy size or zero-crossing rate The method of size is bad for noisy environmental suitability, and especially application environment can change, the background of same environment When noise can also change, and frequency band energy mean square deviation method is bad for quiet environment adaptability.
The detection that can also carry out voice respectively according to the variation of time domain and spectrum domain voice average energy, finally according to dynamic The ambient noise size estimated selects optimal as a result, to greatly improve the accuracy rate of speech recognition and become to environment The adaptability of change, since the energy of most of stationary background noises concentrates on low-frequency range, this method is for most low frequencies The noise of distribution is highly effective, and for the sound such as chirping of birds that object or animal issue, car horn, piano and other musical instrument bullets The sound played, since its frequency band distribution is wider, in the voice band distribution in same people, for such noise It is then easy to for the type noise to be mistaken for voice using the above method, distinguishes the type noise for speech detection, voice drop It makes an uproar, one of all extremely important and difficult point for speech recognition.
To solve the above problems, needing to invent a kind of frequency domain by broadband non-speech noise and time domain specification carries out The broadband ambient noise and speech Separation detection system and method proposed after many experiments analysis and theoretical research.
Summary of the invention
It is an object of the invention to overcome the above-mentioned deficiency in the presence of the prior art, provide it is a kind of can greatly improve it is all kinds of The broadband ambient noise of the accuracy of adaptability and the automatic speech detection of ambient noise and speech Separation detection system and side Method.
In order to achieve the above object, the present invention provides following technical solutions.
Broadband ambient noise and speech Separation detection system comprising: frequency domain energy counting circuit when the current frame, with institute The ambient noise counting circuit, time domain speech for stating frequency domain energy counting circuit connection when the current frame detect long short-time average energy ratio Length time-frequency domain energy comparison circuit is detected compared with circuit and frequency domain speech, is examined with the ambient noise counting circuit, time domain speech Survey the ambient noise ratio of long short-time average energy comparison circuit and frequency domain speech detection length time-frequency domain energy comparison circuit connection Compared with circuit, long short-time average energy comparison circuit is detected with the time domain speech and frequency domain speech detects length time-frequency domain energy ratio Compared with the sub-belt energy distributing homogeneity speech detection circuit that circuit is separately connected, examined with the sub-belt energy distributing homogeneity voice The number of speech frames statistical circuit of slowdown monitoring circuit connection, the ambient noise counting circuit are also evenly distributed with the sub-belt energy respectively Property speech detection circuit, number of speech frames statistical circuit, time domain speech detect long short-time average energy comparison circuit and frequency domain speech Detect length time-frequency domain energy comparison circuit connection.
As a preferred solution of the present invention, the number of speech frames statistical circuit is made of time width filter, the time width filter Wave device is used to count the frame number of voice, and the quantity of the time width filter is more than or equal to 1.
The invention also discloses a kind of broadband ambient noises and speech Separation detection method comprising following steps:
Step 1 is loaded into voice data, and the voice data is handled by frame, and the voice data is voice number in time domain According to the time size of the frame can configure, usually between 10 milliseconds to 50 milliseconds;
Step 2 calculates time domain short-time energy and time domain long-term average energy, the time domain short-time energy are the time domains Time domain short-time energy described in multiframe is accumulated and divided by the time domain short-time energy by the energy summation of interior voice data present frame Frame number obtains the time domain long-term average energy;
Voice data present frame in the time domain is carried out FFT(fast Flourier by step 3) transformation, it will be in the time domain Voice data present frame is transformed into sub--band speech data in frequency domain;
Step 4 calculates frequency domain short-time energy and frequency domain long-term average energy, and sub--band speech data in the frequency domain are worked as Previous frame voice main energetic distribution frequency range sub-belt energy is cumulative to obtain the frequency domain short-time energy, and frequency domain described in multiframe is short When energy accumulation and obtain the frequency domain long-term average energy divided by the frame number of the frequency domain short-time energy;
The time domain short-time energy of non-speech frame is sent into ambient noise estimation by step 5 ambient noise accumulation calculating Unit adds up, and is often added to certain frame number and then exports the new ambient noise;
The ambient noise and the threshold value of setting one are compared by step 6, are first walked if more than the threshold value Rapid seven, if first being less than the threshold value carries out step 8;
Step 7 carries out frequency domain speech detection, is that voice then enters step nine, is not that voice then carries out step 5 and step 11;
Step 8 carries out time domain speech detection, is that voice then enters the step 9, is not that voice then carries out the step Five and step 11;
Step 9 carry out the detection of frequency domain sub-band energy distribution of laser, be that voice then enters step ten, be not voice then into Row step described rapid five and step 11;
Step 10 time width filter counts the number of speech frames that the step 9 generates, and is compared with the threshold value of setting two Compared with if the frame number is greater than the threshold value and is second directly entered the step 11, if second the frame number is less than the threshold value Into the step 5 and step 11;
The output of step 11 testing result, detection terminate.
As a preferred solution of the present invention, the frequency domain speech detection is by the frequency domain short-time energy and the long Shi Ping of frequency domain Equal energy is compared, and the frequency domain short-time energy is then voice, otherwise to a certain degree more than the frequency domain long-term average energy For non-voice, the output when being judged as non-voice is as a result, detection terminates.
As a preferred solution of the present invention, the time domain speech detection is by the time domain short-time energy and the long Shi Ping of time domain Equal energy is compared, and the time domain short-time energy is then voice, otherwise to a certain degree more than the time domain long-term average energy For non-voice, it is judged as output when non-voice as a result, detection terminates.
As a preferred solution of the present invention, when carrying out step 8, if testing result uniformity compared with Gao Zewei voice, such as Lower fruit testing result uniformity is then non-voice, is judged as output when non-voice as a result, detection terminates.
As a preferred solution of the present invention, the time width filter counts the voice data continuously and is the frame number of voice, If second it is voice that the frame number, which is greater than the threshold value, if the frame number is less than the threshold value and is second judged as non-voice, It is judged as output when non-voice as a result, detection terminates.
As a preferred solution of the present invention, in operating procedure seven to step 9, when operation result is determined as non-voice, The non-speech data operating procedure five is generated to the new ambient noise.
The present invention has used three-level speech detection, first using described during detecting voice data in the time domain Time domain speech detection or frequency domain speech detection, are secondly detected using the frequency domain sub-band energy distribution of laser, when finally using Wide filter counts the number of speech frames that the step 8 generates, and is compared with the threshold value of setting two, is successively filtered, most Authentic and valid voice data screens at last.
Compared with prior art, beneficial effects of the present invention:
The present invention uses three-level speech detection means, has good detection effect for the ambient noise of low-and high-frequency, together When also have extraordinary detection effect for the accidental discontinuously noise of row, speech detection under complicated noise is greatly improved Accuracy.
Detailed description of the invention
Fig. 1 is circuit frame figure of the present invention;
Fig. 2 is flow chart of the present invention.
Specific embodiment
Below with reference to embodiment and specific embodiment, the present invention is described in further detail, but should not understand this It is only limitted to embodiment below for the range of aforementioned body of the present invention, it is all that this is belonged to based on the technology that the content of present invention is realized The range of invention.
As shown in Figure 1, a kind of broadband ambient noise and speech Separation detection system, system frequency domain energy when the current frame Counting circuit, the ambient noise counting circuit being connect with the counting circuit of frequency domain energy when the current frame, time domain speech detection length Short-time average energy comparison circuit and frequency domain speech detect length time-frequency domain energy comparison circuit, calculate electricity with the ambient noise Long short-time average energy comparison circuit is detected on road, time domain speech and frequency domain speech detects length time-frequency domain energy comparison circuit connection Ambient noise comparison circuit, detect long short-time average energy comparison circuit with the time domain speech and frequency domain speech detect length The sub-belt energy distributing homogeneity speech detection circuit that time-frequency domain energy comparison circuit is separately connected is distributed with the sub-belt energy The number of speech frames statistical circuit of uniformity speech detection circuit connection, the ambient noise counting circuit also respectively with the subband Energy distribution of laser speech detection circuit, number of speech frames statistical circuit, the long short-time average energy of time domain speech detection are more electric Road and frequency domain speech detect length time-frequency domain energy comparison circuit connection, and number of speech frames statistical circuit is made of time width filter, Time width filter is used to count the frame number of voice, and the quantity of time width filter is 1 in the present embodiment, in the present embodiment when Wide filter is a voice frame counter.
As shown in Fig. 2, a kind of broadband ambient noise and speech Separation detection method comprising following 11 steps:
Step 1 is loaded into voice data, and the voice data is handled by frame, and the voice data is voice number in time domain According to the time size of the frame can configure, usually between 10 milliseconds to 50 milliseconds;
Step 2 calculates time domain short-time energy and time domain long-term average energy, the time domain short-time energy are the time domains Time domain short-time energy described in multiframe is accumulated and divided by the time domain short-time energy by the energy summation of interior voice data present frame Frame number obtains the time domain long-term average energy;
Voice data present frame in the time domain is carried out FFT(fast Flourier by step 3) transformation, it will be in the time domain Voice data present frame is transformed into sub--band speech data in frequency domain;
Step 4 calculates frequency domain short-time energy and frequency domain long-term average energy, and sub--band speech data in the frequency domain are worked as Previous frame voice main energetic distribution frequency range sub-belt energy is cumulative to obtain the frequency domain short-time energy, and frequency domain described in multiframe is short When energy accumulation and obtain the frequency domain long-term average energy divided by the frame number of the frequency domain short-time energy;
The time domain short-time energy of non-speech frame is sent into ambient noise estimation by step 5 ambient noise accumulation calculating Unit adds up, and is often added to certain frame number and then exports the new ambient noise;
The ambient noise and the threshold value of setting one are compared by step 6, are first walked if more than the threshold value Rapid seven, if first being less than the threshold value carries out step 8;
Step 7 carries out frequency domain speech detection, and the frequency domain speech detection is that the frequency domain short-time energy and frequency domain is long When average energy be compared, the frequency domain short-time energy be more than the frequency domain long-term average energy to a certain degree, then be voice, Otherwise it is non-voice, is that voice then enters step nine, is not that voice then carries out step 5 and step 11;
Step 8 carries out time domain speech detection, and the time domain speech detection is that the time domain short-time energy and time domain is long When average energy be compared, the time domain short-time energy be more than the time domain long-term average energy to a certain degree, then be voice, Otherwise it is non-voice, is that voice then enters the step 9, is not that voice then carries out the step 5 and step 11;
Step 9 carries out the detection of frequency domain sub-band energy distribution of laser, if testing result uniformity compared with Gao Zewei voice, It is non-voice if testing result uniformity is lower, ten is entered step if being voice, is not that voice then carries out walking described rapid five And step 11;
Step 10 time width filter counts the number of speech frames that the step 9 generates, described in the time width filter statistics Voice data is continuously the frame number of voice, and is compared with the threshold value of setting two, if second the frame number is greater than the threshold value It then is directly entered the step 11 for voice, if second it is that non-voice enters the step 5 that the frame number, which is less than the threshold value, And step 11;
The output of step 11 testing result, detection terminate.
In operating procedure seven to step 9, when operation result is determined as non-voice, the non-speech data is run Step 5 generates the new ambient noise.
In the present embodiment, the calculating process of step 3 is as follows:
Assuming that frequency domain sub-band number is N, then average sub band energy is, wherein Eavg is average son Band energy, Etotal are all sub-belt energy summations, and Ei is the i-th sub-belt energy, i=1,2......N.In a frequency domain, sub It is equal to square obtaining with square summation of imaginary part for its real part with energy.
In the present embodiment, the calculating process of step 9 is as follows:
Heterogeneity is asked using mean square deviation method, if each sub-belt energy is Ei, then asks heterogeneity, formula with mean square deviation For, wherein nU is heterogeneity, if threshold value Th_nu is non-homogeneous The threshold value of property can temporarily be judged to voice then as nU < Th_nu, be otherwise non-voice.
It can be calculated in other embodiments with following two ways:
One, using asking absolute value of the difference and averaging, formula is, Middle nU is heterogeneity, if threshold value Th_nu is that heteropical threshold value can temporarily be judged to voice then as nU < Th_nu, It otherwise is non-voice;
Two, the subband close from average sub band energy to sub-belt energy counts, if more sub-belt energy be distributed in it is flat Near equal energy, then it is voice, is otherwise non-voice.Specific formula is as follows, if: | Ei-Eavg | when < k*Eavg, U=U+ 1, k is a configuration parameter between 0 and 1 here, and representative value is configurable to 0.5, U and is characterized as uniformity, if Th_u It if U > Th_u, is judged to voice is otherwise non-voice for threshold value.
The detailed calculating process of step 10 is as follows in the present embodiment:
If a voice frame counter, the counter are initially 0 at the beginning, clearing when encountering non-speech frame encounters voice When adding 1 when frame, and speech frame will be changed to from non-speech frame, the serial number of first speech frame is updated to speech frame initial address, When the speech frame counter values are greater than a threshold value two, then since first speech frame, continuous speech frame is all language Sound frame, until non-speech frame occur, if change to non-speech frame from speech frame, the voice frame counter values be less than threshold value, then this Preceding speech frame is also judged to non-speech frame.

Claims (8)

1. broadband ambient noise and speech Separation detection system comprising: frequency domain energy counting circuit when the current frame, and it is described The ambient noise counting circuit, time domain speech of frequency domain energy counting circuit connection detect long short-time average energy and compare when the current frame Circuit and frequency domain speech detect length time-frequency domain energy comparison circuit, detect with the ambient noise counting circuit, time domain speech Long short-time average energy comparison circuit and the ambient noise of frequency domain speech detection length time-frequency domain energy comparison circuit connection compare Circuit detects long short-time average energy comparison circuit with the time domain speech and frequency domain speech detects length time-frequency domain energy comparison The sub-belt energy distributing homogeneity speech detection circuit that circuit is separately connected, with the sub-belt energy distributing homogeneity speech detection The number of speech frames statistical circuit of circuit connection, the ambient noise counting circuit also respectively with the sub-belt energy distributing homogeneity Speech detection circuit, number of speech frames statistical circuit, time domain speech detect long short-time average energy comparison circuit and frequency domain speech inspection Survey length time-frequency domain energy comparison circuit connection.
2. broadband ambient noise according to claim 1 and speech Separation detection system, it is characterised in that: the voice Frames statistic circuit is made of time width filter, and the time width filter is used to count the frame number of voice, the time width filter Quantity be more than or equal to 1.
3. broadband ambient noise and speech Separation detection method comprising following steps:
Step 1 is loaded into voice data, and the voice data is handled by frame, and the voice data is voice data in time domain;
Step 2 calculates time domain short-time energy and time domain long-term average energy, the time domain short-time energy are languages in the time domain Time domain short-time energy described in multiframe is accumulated and divided by the frame number of the time domain short-time energy by the energy summation of sound data present frame Obtain the time domain long-term average energy;
Voice data present frame in the time domain is carried out FFT(fast Flourier by step 3) transformation, by voice in the time domain Data present frame is transformed into sub--band speech data in frequency domain;
Step 4 calculates frequency domain short-time energy and frequency domain long-term average energy, by sub--band speech data present frame in the frequency domain Voice main energetic distribution frequency range sub-belt energy is cumulative to obtain the frequency domain short-time energy, and frequency domain described in multiframe in short-term can The frame number that amount accumulates and divides by the frequency domain short-time energy obtains the frequency domain long-term average energy;
Step 5 ambient noise accumulation calculating;
The ambient noise and the threshold value of setting one are compared by step 6, first carry out step 7 if more than the threshold value, If first being less than the threshold value carries out step 8;
Step 7 carries out frequency domain speech detection, is that voice then enters step nine, is not that voice then carries out step 5 and step 10 One;
Step 8 carry out time domain speech detection, be that voice then enters the step 9, be not voice then carry out the step 5 and Step 11;
Step 9 carries out the detection of frequency domain sub-band energy distribution of laser, is that voice then enters step ten, is not that voice is then walked Described rapid five and step 11;
Step 10 time width filter counts the number of speech frames that the step 9 generates, and is compared with the threshold value of setting two, if The frame number is greater than the threshold value and is second directly entered the step 11, if the frame number is less than the threshold value and second enters institute State step 5 and step 11;
The output of step 11 testing result, detection terminate.
4. broadband ambient noise according to claim 3 and speech Separation detection method, it is characterised in that: the frequency domain Speech detection is to be compared the frequency domain short-time energy and frequency domain long-term average energy, and the frequency domain short-time energy is more than institute It states frequency domain long-term average energy to a certain degree, is then voice, be otherwise non-voice, the output when being judged as non-voice is as a result, inspection Survey terminates.
5. broadband ambient noise according to claim 3 and speech Separation detection method, it is characterised in that: the time domain Speech detection is to be compared the time domain short-time energy and time domain long-term average energy, and the time domain short-time energy is more than institute It states time domain long-term average energy to a certain degree, is then voice, be otherwise non-voice, output when being judged as non-voice is as a result, detection Terminate.
6. broadband ambient noise according to claim 3 and speech Separation detection method, it is characterised in that: walked When rapid eight, if testing result uniformity compared with Gao Zewei voice, is non-voice if testing result uniformity is lower, is judged as Output when non-voice is as a result, detection terminates.
7. broadband ambient noise according to claim 3 and speech Separation detection method, it is characterised in that: the time width Filter counts the voice data continuously and is the frame number of voice, if second it is voice that the frame number, which is greater than the threshold value, such as Frame number described in fruit is less than the threshold value and is second judged as non-voice, is judged as output when non-voice as a result, detection terminates.
8. broadband ambient noise according to claim 3 and speech Separation detection method, it is characterised in that: walked in operation Rapid seven to step 9 when, when operation result is determined as non-voice, the non-speech data operating procedure five is generated to new institute State ambient noise.
CN201610947596.5A 2016-10-26 2016-10-26 Broadband ambient noise and speech Separation detection system and method Active CN106504760B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610947596.5A CN106504760B (en) 2016-10-26 2016-10-26 Broadband ambient noise and speech Separation detection system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610947596.5A CN106504760B (en) 2016-10-26 2016-10-26 Broadband ambient noise and speech Separation detection system and method

Publications (2)

Publication Number Publication Date
CN106504760A CN106504760A (en) 2017-03-15
CN106504760B true CN106504760B (en) 2019-04-26

Family

ID=58322976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610947596.5A Active CN106504760B (en) 2016-10-26 2016-10-26 Broadband ambient noise and speech Separation detection system and method

Country Status (1)

Country Link
CN (1) CN106504760B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109327633B (en) * 2017-07-31 2020-09-22 苏州谦问万答吧教育科技有限公司 Sound mixing method, device, equipment and storage medium
CN108064007A (en) * 2017-11-07 2018-05-22 苏宁云商集团股份有限公司 Know method for distinguishing and microcontroller and intelligent sound box for the enhancing voice of intelligent sound box
CN109639904B (en) * 2019-01-25 2021-02-02 努比亚技术有限公司 Mobile phone mode adjusting method, system and computer storage medium
CN112992167A (en) * 2021-02-08 2021-06-18 歌尔科技有限公司 Audio signal processing method and device and electronic equipment
CN113470623B (en) * 2021-08-12 2023-05-16 成都启英泰伦科技有限公司 Self-adaptive voice endpoint detection method and detection circuit

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452698B (en) * 2007-11-29 2011-06-22 中国科学院声学研究所 Voice HNR automatic analytical method
CN101826327B (en) * 2009-03-03 2013-06-05 中兴通讯股份有限公司 Method and system for judging transient state based on time domain masking
CN101631102B (en) * 2009-04-10 2011-09-21 北京理工大学 Interference pattern recognition technology of frequency hopping system
CN104575498B (en) * 2015-01-30 2018-08-17 深圳市云之讯网络技术有限公司 Efficient voice recognition methods and system
CN105118522B (en) * 2015-08-27 2021-02-12 广州市百果园网络科技有限公司 Noise detection method and device

Also Published As

Publication number Publication date
CN106504760A (en) 2017-03-15

Similar Documents

Publication Publication Date Title
CN106504760B (en) Broadband ambient noise and speech Separation detection system and method
CN103646649B (en) A kind of speech detection method efficiently
CN103578468B (en) The method of adjustment and electronic equipment of a kind of confidence coefficient threshold of voice recognition
CN104464722B (en) Voice activity detection method and apparatus based on time domain and frequency domain
US7508948B2 (en) Reverberation removal
CN106098076B (en) One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise
CN104900238B (en) A kind of audio real-time comparison method based on perception filtering
CN104143341B (en) Sonic boom detection method and device
CN106885971B (en) Intelligent background noise reduction method for cable fault detection pointing instrument
CN104681038A (en) Audio signal quality detecting method and device
CN105427859A (en) Front voice enhancement method for identifying speaker
CN105118522A (en) Noise detection method and device
CN105785324B (en) Linear frequency-modulated parameter estimating method based on MGCSTFT
US20060100866A1 (en) Influencing automatic speech recognition signal-to-noise levels
CN106303878A (en) One is uttered long and high-pitched sounds and is detected and suppressing method
CN110085259B (en) Audio comparison method, device and equipment
CN108962285B (en) Voice endpoint detection method for dividing sub-bands based on human ear masking effect
CN109741760B (en) Noise estimation method and system
CN111540342B (en) Energy threshold adjusting method, device, equipment and medium
WO2021248522A1 (en) Current noise detection method and apparatus, terminal, and storage medium
CN111951834A (en) Method and device for detecting voice existence based on ultralow computational power of zero crossing rate calculation
CN105810201A (en) Voice activity detection method and system
CN105336344A (en) Noise detection method and apparatus thereof
Chu et al. A noise-robust FFT-based auditory spectrum with application in audio classification
CN103310800B (en) A kind of turbid speech detection method of anti-noise jamming and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant