CN106504760B - Broadband ambient noise and speech Separation detection system and method - Google Patents
Broadband ambient noise and speech Separation detection system and method Download PDFInfo
- Publication number
- CN106504760B CN106504760B CN201610947596.5A CN201610947596A CN106504760B CN 106504760 B CN106504760 B CN 106504760B CN 201610947596 A CN201610947596 A CN 201610947596A CN 106504760 B CN106504760 B CN 106504760B
- Authority
- CN
- China
- Prior art keywords
- speech
- time
- voice
- energy
- frequency domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 76
- 238000000926 separation method Methods 0.000 title claims abstract description 19
- 238000000034 method Methods 0.000 title abstract description 18
- 230000007774 longterm Effects 0.000 claims description 20
- 238000012360 testing method Methods 0.000 claims description 9
- 238000009825 accumulation Methods 0.000 claims description 5
- 238000011017 operating method Methods 0.000 claims description 4
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 235000013399 edible fruits Nutrition 0.000 claims description 2
- 238000007689 inspection Methods 0.000 claims 2
- 230000000694 effects Effects 0.000 abstract description 4
- 230000010365 information processing Effects 0.000 abstract description 2
- 230000002463 transducing effect Effects 0.000 abstract description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Telephonic Communication Services (AREA)
Abstract
The present invention relates to the information processing technology and transducing signal process fields, especially relate to a kind of broadband ambient noise and speech Separation detection system, the system includes present frame time domain circuit for calculating energy, ambient noise counting circuit, time domain speech detects long short-time average energy comparison circuit, frequency domain speech detects length time-frequency domain energy comparison circuit, ambient noise comparison circuit, sub-belt energy distributing homogeneity speech detection circuit and number of speech frames statistical circuit, the invention also discloses a kind of broadband ambient noises and speech Separation detection method, the present invention uses three-level speech detection means, there is good detection effect for the ambient noise of low-and high-frequency, also there is extraordinary detection effect simultaneously for the noise of accidental discontinuously row, the accuracy of speech detection under complicated noise is greatly improved.
Description
Technical field
The present invention relates to the information processing technology and transducing signal process field, especially relates to a kind of broadband background and make an uproar
Sound and speech Separation detection system and method.
Background technique
One hot spot in artificial intelligence application field is exactly speech recognition, and speech recognition has begun in every field at present
Start to be widely applied.Speech detection realization is the pith of speech recognition system real-time implementation, and the purpose is in complicated reality
Voice segments and non-speech segment are distinguished in the environment of border, have document show in practical application discrimination compared with lower part be largely by
In not handled correctly voice, a large amount of non-speech noise seriously affects the accuracy rate of speech recognition system, especially answers
The speech recognition of much noise is had with environment, correct speech detection technology can be effectively reduced system operations amount, shorten system
The system processing time reduces mobile terminal transmission power and saves channel resource, improves speech recognition accuracy, especially carries on the back in complexity
Under scape noise, the superiority and inferiority of speech recognition system performance depends greatly on the superiority and inferiority of speech detection technology, therefore steadily and surely,
Accurately, in real time, the speech detection technology that adaptivity is strong and robustness is good be necessary to each speech recognition system.
The main stream approach of current automatic speech end-point detection is to rely on short-time energy size in time domain, zero-crossing rate size, with
And three kinds of methods of frequency domain Frequency band energy mean square deviation detect, specific method is to find out short-time energy, zero-crossing rate or frequency band energy
Mean square deviation is measured, is then compared with an empirical value, it is demonstrated experimentally that this independent relatively short-time energy size or zero-crossing rate
The method of size is bad for noisy environmental suitability, and especially application environment can change, the background of same environment
When noise can also change, and frequency band energy mean square deviation method is bad for quiet environment adaptability.
The detection that can also carry out voice respectively according to the variation of time domain and spectrum domain voice average energy, finally according to dynamic
The ambient noise size estimated selects optimal as a result, to greatly improve the accuracy rate of speech recognition and become to environment
The adaptability of change, since the energy of most of stationary background noises concentrates on low-frequency range, this method is for most low frequencies
The noise of distribution is highly effective, and for the sound such as chirping of birds that object or animal issue, car horn, piano and other musical instrument bullets
The sound played, since its frequency band distribution is wider, in the voice band distribution in same people, for such noise
It is then easy to for the type noise to be mistaken for voice using the above method, distinguishes the type noise for speech detection, voice drop
It makes an uproar, one of all extremely important and difficult point for speech recognition.
To solve the above problems, needing to invent a kind of frequency domain by broadband non-speech noise and time domain specification carries out
The broadband ambient noise and speech Separation detection system and method proposed after many experiments analysis and theoretical research.
Summary of the invention
It is an object of the invention to overcome the above-mentioned deficiency in the presence of the prior art, provide it is a kind of can greatly improve it is all kinds of
The broadband ambient noise of the accuracy of adaptability and the automatic speech detection of ambient noise and speech Separation detection system and side
Method.
In order to achieve the above object, the present invention provides following technical solutions.
Broadband ambient noise and speech Separation detection system comprising: frequency domain energy counting circuit when the current frame, with institute
The ambient noise counting circuit, time domain speech for stating frequency domain energy counting circuit connection when the current frame detect long short-time average energy ratio
Length time-frequency domain energy comparison circuit is detected compared with circuit and frequency domain speech, is examined with the ambient noise counting circuit, time domain speech
Survey the ambient noise ratio of long short-time average energy comparison circuit and frequency domain speech detection length time-frequency domain energy comparison circuit connection
Compared with circuit, long short-time average energy comparison circuit is detected with the time domain speech and frequency domain speech detects length time-frequency domain energy ratio
Compared with the sub-belt energy distributing homogeneity speech detection circuit that circuit is separately connected, examined with the sub-belt energy distributing homogeneity voice
The number of speech frames statistical circuit of slowdown monitoring circuit connection, the ambient noise counting circuit are also evenly distributed with the sub-belt energy respectively
Property speech detection circuit, number of speech frames statistical circuit, time domain speech detect long short-time average energy comparison circuit and frequency domain speech
Detect length time-frequency domain energy comparison circuit connection.
As a preferred solution of the present invention, the number of speech frames statistical circuit is made of time width filter, the time width filter
Wave device is used to count the frame number of voice, and the quantity of the time width filter is more than or equal to 1.
The invention also discloses a kind of broadband ambient noises and speech Separation detection method comprising following steps:
Step 1 is loaded into voice data, and the voice data is handled by frame, and the voice data is voice number in time domain
According to the time size of the frame can configure, usually between 10 milliseconds to 50 milliseconds;
Step 2 calculates time domain short-time energy and time domain long-term average energy, the time domain short-time energy are the time domains
Time domain short-time energy described in multiframe is accumulated and divided by the time domain short-time energy by the energy summation of interior voice data present frame
Frame number obtains the time domain long-term average energy;
Voice data present frame in the time domain is carried out FFT(fast Flourier by step 3) transformation, it will be in the time domain
Voice data present frame is transformed into sub--band speech data in frequency domain;
Step 4 calculates frequency domain short-time energy and frequency domain long-term average energy, and sub--band speech data in the frequency domain are worked as
Previous frame voice main energetic distribution frequency range sub-belt energy is cumulative to obtain the frequency domain short-time energy, and frequency domain described in multiframe is short
When energy accumulation and obtain the frequency domain long-term average energy divided by the frame number of the frequency domain short-time energy;
The time domain short-time energy of non-speech frame is sent into ambient noise estimation by step 5 ambient noise accumulation calculating
Unit adds up, and is often added to certain frame number and then exports the new ambient noise;
The ambient noise and the threshold value of setting one are compared by step 6, are first walked if more than the threshold value
Rapid seven, if first being less than the threshold value carries out step 8;
Step 7 carries out frequency domain speech detection, is that voice then enters step nine, is not that voice then carries out step 5 and step
11;
Step 8 carries out time domain speech detection, is that voice then enters the step 9, is not that voice then carries out the step
Five and step 11;
Step 9 carry out the detection of frequency domain sub-band energy distribution of laser, be that voice then enters step ten, be not voice then into
Row step described rapid five and step 11;
Step 10 time width filter counts the number of speech frames that the step 9 generates, and is compared with the threshold value of setting two
Compared with if the frame number is greater than the threshold value and is second directly entered the step 11, if second the frame number is less than the threshold value
Into the step 5 and step 11;
The output of step 11 testing result, detection terminate.
As a preferred solution of the present invention, the frequency domain speech detection is by the frequency domain short-time energy and the long Shi Ping of frequency domain
Equal energy is compared, and the frequency domain short-time energy is then voice, otherwise to a certain degree more than the frequency domain long-term average energy
For non-voice, the output when being judged as non-voice is as a result, detection terminates.
As a preferred solution of the present invention, the time domain speech detection is by the time domain short-time energy and the long Shi Ping of time domain
Equal energy is compared, and the time domain short-time energy is then voice, otherwise to a certain degree more than the time domain long-term average energy
For non-voice, it is judged as output when non-voice as a result, detection terminates.
As a preferred solution of the present invention, when carrying out step 8, if testing result uniformity compared with Gao Zewei voice, such as
Lower fruit testing result uniformity is then non-voice, is judged as output when non-voice as a result, detection terminates.
As a preferred solution of the present invention, the time width filter counts the voice data continuously and is the frame number of voice,
If second it is voice that the frame number, which is greater than the threshold value, if the frame number is less than the threshold value and is second judged as non-voice,
It is judged as output when non-voice as a result, detection terminates.
As a preferred solution of the present invention, in operating procedure seven to step 9, when operation result is determined as non-voice,
The non-speech data operating procedure five is generated to the new ambient noise.
The present invention has used three-level speech detection, first using described during detecting voice data in the time domain
Time domain speech detection or frequency domain speech detection, are secondly detected using the frequency domain sub-band energy distribution of laser, when finally using
Wide filter counts the number of speech frames that the step 8 generates, and is compared with the threshold value of setting two, is successively filtered, most
Authentic and valid voice data screens at last.
Compared with prior art, beneficial effects of the present invention:
The present invention uses three-level speech detection means, has good detection effect for the ambient noise of low-and high-frequency, together
When also have extraordinary detection effect for the accidental discontinuously noise of row, speech detection under complicated noise is greatly improved
Accuracy.
Detailed description of the invention
Fig. 1 is circuit frame figure of the present invention;
Fig. 2 is flow chart of the present invention.
Specific embodiment
Below with reference to embodiment and specific embodiment, the present invention is described in further detail, but should not understand this
It is only limitted to embodiment below for the range of aforementioned body of the present invention, it is all that this is belonged to based on the technology that the content of present invention is realized
The range of invention.
As shown in Figure 1, a kind of broadband ambient noise and speech Separation detection system, system frequency domain energy when the current frame
Counting circuit, the ambient noise counting circuit being connect with the counting circuit of frequency domain energy when the current frame, time domain speech detection length
Short-time average energy comparison circuit and frequency domain speech detect length time-frequency domain energy comparison circuit, calculate electricity with the ambient noise
Long short-time average energy comparison circuit is detected on road, time domain speech and frequency domain speech detects length time-frequency domain energy comparison circuit connection
Ambient noise comparison circuit, detect long short-time average energy comparison circuit with the time domain speech and frequency domain speech detect length
The sub-belt energy distributing homogeneity speech detection circuit that time-frequency domain energy comparison circuit is separately connected is distributed with the sub-belt energy
The number of speech frames statistical circuit of uniformity speech detection circuit connection, the ambient noise counting circuit also respectively with the subband
Energy distribution of laser speech detection circuit, number of speech frames statistical circuit, the long short-time average energy of time domain speech detection are more electric
Road and frequency domain speech detect length time-frequency domain energy comparison circuit connection, and number of speech frames statistical circuit is made of time width filter,
Time width filter is used to count the frame number of voice, and the quantity of time width filter is 1 in the present embodiment, in the present embodiment when
Wide filter is a voice frame counter.
As shown in Fig. 2, a kind of broadband ambient noise and speech Separation detection method comprising following 11 steps:
Step 1 is loaded into voice data, and the voice data is handled by frame, and the voice data is voice number in time domain
According to the time size of the frame can configure, usually between 10 milliseconds to 50 milliseconds;
Step 2 calculates time domain short-time energy and time domain long-term average energy, the time domain short-time energy are the time domains
Time domain short-time energy described in multiframe is accumulated and divided by the time domain short-time energy by the energy summation of interior voice data present frame
Frame number obtains the time domain long-term average energy;
Voice data present frame in the time domain is carried out FFT(fast Flourier by step 3) transformation, it will be in the time domain
Voice data present frame is transformed into sub--band speech data in frequency domain;
Step 4 calculates frequency domain short-time energy and frequency domain long-term average energy, and sub--band speech data in the frequency domain are worked as
Previous frame voice main energetic distribution frequency range sub-belt energy is cumulative to obtain the frequency domain short-time energy, and frequency domain described in multiframe is short
When energy accumulation and obtain the frequency domain long-term average energy divided by the frame number of the frequency domain short-time energy;
The time domain short-time energy of non-speech frame is sent into ambient noise estimation by step 5 ambient noise accumulation calculating
Unit adds up, and is often added to certain frame number and then exports the new ambient noise;
The ambient noise and the threshold value of setting one are compared by step 6, are first walked if more than the threshold value
Rapid seven, if first being less than the threshold value carries out step 8;
Step 7 carries out frequency domain speech detection, and the frequency domain speech detection is that the frequency domain short-time energy and frequency domain is long
When average energy be compared, the frequency domain short-time energy be more than the frequency domain long-term average energy to a certain degree, then be voice,
Otherwise it is non-voice, is that voice then enters step nine, is not that voice then carries out step 5 and step 11;
Step 8 carries out time domain speech detection, and the time domain speech detection is that the time domain short-time energy and time domain is long
When average energy be compared, the time domain short-time energy be more than the time domain long-term average energy to a certain degree, then be voice,
Otherwise it is non-voice, is that voice then enters the step 9, is not that voice then carries out the step 5 and step 11;
Step 9 carries out the detection of frequency domain sub-band energy distribution of laser, if testing result uniformity compared with Gao Zewei voice,
It is non-voice if testing result uniformity is lower, ten is entered step if being voice, is not that voice then carries out walking described rapid five
And step 11;
Step 10 time width filter counts the number of speech frames that the step 9 generates, described in the time width filter statistics
Voice data is continuously the frame number of voice, and is compared with the threshold value of setting two, if second the frame number is greater than the threshold value
It then is directly entered the step 11 for voice, if second it is that non-voice enters the step 5 that the frame number, which is less than the threshold value,
And step 11;
The output of step 11 testing result, detection terminate.
In operating procedure seven to step 9, when operation result is determined as non-voice, the non-speech data is run
Step 5 generates the new ambient noise.
In the present embodiment, the calculating process of step 3 is as follows:
Assuming that frequency domain sub-band number is N, then average sub band energy is, wherein Eavg is average son
Band energy, Etotal are all sub-belt energy summations, and Ei is the i-th sub-belt energy, i=1,2......N.In a frequency domain, sub
It is equal to square obtaining with square summation of imaginary part for its real part with energy.
In the present embodiment, the calculating process of step 9 is as follows:
Heterogeneity is asked using mean square deviation method, if each sub-belt energy is Ei, then asks heterogeneity, formula with mean square deviation
For, wherein nU is heterogeneity, if threshold value Th_nu is non-homogeneous
The threshold value of property can temporarily be judged to voice then as nU < Th_nu, be otherwise non-voice.
It can be calculated in other embodiments with following two ways:
One, using asking absolute value of the difference and averaging, formula is,
Middle nU is heterogeneity, if threshold value Th_nu is that heteropical threshold value can temporarily be judged to voice then as nU < Th_nu,
It otherwise is non-voice;
Two, the subband close from average sub band energy to sub-belt energy counts, if more sub-belt energy be distributed in it is flat
Near equal energy, then it is voice, is otherwise non-voice.Specific formula is as follows, if: | Ei-Eavg | when < k*Eavg, U=U+
1, k is a configuration parameter between 0 and 1 here, and representative value is configurable to 0.5, U and is characterized as uniformity, if Th_u
It if U > Th_u, is judged to voice is otherwise non-voice for threshold value.
The detailed calculating process of step 10 is as follows in the present embodiment:
If a voice frame counter, the counter are initially 0 at the beginning, clearing when encountering non-speech frame encounters voice
When adding 1 when frame, and speech frame will be changed to from non-speech frame, the serial number of first speech frame is updated to speech frame initial address,
When the speech frame counter values are greater than a threshold value two, then since first speech frame, continuous speech frame is all language
Sound frame, until non-speech frame occur, if change to non-speech frame from speech frame, the voice frame counter values be less than threshold value, then this
Preceding speech frame is also judged to non-speech frame.
Claims (8)
1. broadband ambient noise and speech Separation detection system comprising: frequency domain energy counting circuit when the current frame, and it is described
The ambient noise counting circuit, time domain speech of frequency domain energy counting circuit connection detect long short-time average energy and compare when the current frame
Circuit and frequency domain speech detect length time-frequency domain energy comparison circuit, detect with the ambient noise counting circuit, time domain speech
Long short-time average energy comparison circuit and the ambient noise of frequency domain speech detection length time-frequency domain energy comparison circuit connection compare
Circuit detects long short-time average energy comparison circuit with the time domain speech and frequency domain speech detects length time-frequency domain energy comparison
The sub-belt energy distributing homogeneity speech detection circuit that circuit is separately connected, with the sub-belt energy distributing homogeneity speech detection
The number of speech frames statistical circuit of circuit connection, the ambient noise counting circuit also respectively with the sub-belt energy distributing homogeneity
Speech detection circuit, number of speech frames statistical circuit, time domain speech detect long short-time average energy comparison circuit and frequency domain speech inspection
Survey length time-frequency domain energy comparison circuit connection.
2. broadband ambient noise according to claim 1 and speech Separation detection system, it is characterised in that: the voice
Frames statistic circuit is made of time width filter, and the time width filter is used to count the frame number of voice, the time width filter
Quantity be more than or equal to 1.
3. broadband ambient noise and speech Separation detection method comprising following steps:
Step 1 is loaded into voice data, and the voice data is handled by frame, and the voice data is voice data in time domain;
Step 2 calculates time domain short-time energy and time domain long-term average energy, the time domain short-time energy are languages in the time domain
Time domain short-time energy described in multiframe is accumulated and divided by the frame number of the time domain short-time energy by the energy summation of sound data present frame
Obtain the time domain long-term average energy;
Voice data present frame in the time domain is carried out FFT(fast Flourier by step 3) transformation, by voice in the time domain
Data present frame is transformed into sub--band speech data in frequency domain;
Step 4 calculates frequency domain short-time energy and frequency domain long-term average energy, by sub--band speech data present frame in the frequency domain
Voice main energetic distribution frequency range sub-belt energy is cumulative to obtain the frequency domain short-time energy, and frequency domain described in multiframe in short-term can
The frame number that amount accumulates and divides by the frequency domain short-time energy obtains the frequency domain long-term average energy;
Step 5 ambient noise accumulation calculating;
The ambient noise and the threshold value of setting one are compared by step 6, first carry out step 7 if more than the threshold value,
If first being less than the threshold value carries out step 8;
Step 7 carries out frequency domain speech detection, is that voice then enters step nine, is not that voice then carries out step 5 and step 10
One;
Step 8 carry out time domain speech detection, be that voice then enters the step 9, be not voice then carry out the step 5 and
Step 11;
Step 9 carries out the detection of frequency domain sub-band energy distribution of laser, is that voice then enters step ten, is not that voice is then walked
Described rapid five and step 11;
Step 10 time width filter counts the number of speech frames that the step 9 generates, and is compared with the threshold value of setting two, if
The frame number is greater than the threshold value and is second directly entered the step 11, if the frame number is less than the threshold value and second enters institute
State step 5 and step 11;
The output of step 11 testing result, detection terminate.
4. broadband ambient noise according to claim 3 and speech Separation detection method, it is characterised in that: the frequency domain
Speech detection is to be compared the frequency domain short-time energy and frequency domain long-term average energy, and the frequency domain short-time energy is more than institute
It states frequency domain long-term average energy to a certain degree, is then voice, be otherwise non-voice, the output when being judged as non-voice is as a result, inspection
Survey terminates.
5. broadband ambient noise according to claim 3 and speech Separation detection method, it is characterised in that: the time domain
Speech detection is to be compared the time domain short-time energy and time domain long-term average energy, and the time domain short-time energy is more than institute
It states time domain long-term average energy to a certain degree, is then voice, be otherwise non-voice, output when being judged as non-voice is as a result, detection
Terminate.
6. broadband ambient noise according to claim 3 and speech Separation detection method, it is characterised in that: walked
When rapid eight, if testing result uniformity compared with Gao Zewei voice, is non-voice if testing result uniformity is lower, is judged as
Output when non-voice is as a result, detection terminates.
7. broadband ambient noise according to claim 3 and speech Separation detection method, it is characterised in that: the time width
Filter counts the voice data continuously and is the frame number of voice, if second it is voice that the frame number, which is greater than the threshold value, such as
Frame number described in fruit is less than the threshold value and is second judged as non-voice, is judged as output when non-voice as a result, detection terminates.
8. broadband ambient noise according to claim 3 and speech Separation detection method, it is characterised in that: walked in operation
Rapid seven to step 9 when, when operation result is determined as non-voice, the non-speech data operating procedure five is generated to new institute
State ambient noise.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610947596.5A CN106504760B (en) | 2016-10-26 | 2016-10-26 | Broadband ambient noise and speech Separation detection system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610947596.5A CN106504760B (en) | 2016-10-26 | 2016-10-26 | Broadband ambient noise and speech Separation detection system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106504760A CN106504760A (en) | 2017-03-15 |
CN106504760B true CN106504760B (en) | 2019-04-26 |
Family
ID=58322976
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610947596.5A Active CN106504760B (en) | 2016-10-26 | 2016-10-26 | Broadband ambient noise and speech Separation detection system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106504760B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109327633B (en) * | 2017-07-31 | 2020-09-22 | 苏州谦问万答吧教育科技有限公司 | Sound mixing method, device, equipment and storage medium |
CN108064007A (en) * | 2017-11-07 | 2018-05-22 | 苏宁云商集团股份有限公司 | Know method for distinguishing and microcontroller and intelligent sound box for the enhancing voice of intelligent sound box |
CN109639904B (en) * | 2019-01-25 | 2021-02-02 | 努比亚技术有限公司 | Mobile phone mode adjusting method, system and computer storage medium |
CN112992167A (en) * | 2021-02-08 | 2021-06-18 | 歌尔科技有限公司 | Audio signal processing method and device and electronic equipment |
CN113470623B (en) * | 2021-08-12 | 2023-05-16 | 成都启英泰伦科技有限公司 | Self-adaptive voice endpoint detection method and detection circuit |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101452698B (en) * | 2007-11-29 | 2011-06-22 | 中国科学院声学研究所 | Voice HNR automatic analytical method |
CN101826327B (en) * | 2009-03-03 | 2013-06-05 | 中兴通讯股份有限公司 | Method and system for judging transient state based on time domain masking |
CN101631102B (en) * | 2009-04-10 | 2011-09-21 | 北京理工大学 | Interference pattern recognition technology of frequency hopping system |
CN104575498B (en) * | 2015-01-30 | 2018-08-17 | 深圳市云之讯网络技术有限公司 | Efficient voice recognition methods and system |
CN105118522B (en) * | 2015-08-27 | 2021-02-12 | 广州市百果园网络科技有限公司 | Noise detection method and device |
-
2016
- 2016-10-26 CN CN201610947596.5A patent/CN106504760B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106504760A (en) | 2017-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106504760B (en) | Broadband ambient noise and speech Separation detection system and method | |
CN103646649B (en) | A kind of speech detection method efficiently | |
CN104464722B (en) | Voice activity detection method and apparatus based on time domain and frequency domain | |
US7508948B2 (en) | Reverberation removal | |
CN106098076B (en) | One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise | |
CN106885971B (en) | Intelligent background noise reduction method for cable fault detection pointing instrument | |
CN105785324B (en) | Linear frequency-modulated parameter estimating method based on MGCSTFT | |
CN104681038A (en) | Audio signal quality detecting method and device | |
CN105427859A (en) | Front voice enhancement method for identifying speaker | |
CN105118522A (en) | Noise detection method and device | |
CN110085259B (en) | Audio comparison method, device and equipment | |
US20060100866A1 (en) | Influencing automatic speech recognition signal-to-noise levels | |
CN104143341A (en) | Sonic boom detection method and device | |
CN104900238A (en) | Audio real-time comparison method based on sensing filtering | |
CN106303878A (en) | One is uttered long and high-pitched sounds and is detected and suppressing method | |
CN108962285B (en) | Voice endpoint detection method for dividing sub-bands based on human ear masking effect | |
CN111540342B (en) | Energy threshold adjusting method, device, equipment and medium | |
CN111951834A (en) | Method and device for detecting voice existence based on ultralow computational power of zero crossing rate calculation | |
CN105810201A (en) | Voice activity detection method and system | |
CN111797708A (en) | Airflow noise detection method and device, terminal and storage medium | |
CN105336344A (en) | Noise detection method and apparatus thereof | |
Chu et al. | A noise-robust FFT-based auditory spectrum with application in audio classification | |
CN103310800B (en) | A kind of turbid speech detection method of anti-noise jamming and system | |
Zhang et al. | Fast nonstationary noise tracking based on log-spectral power mmse estimator and temporal recursive averaging | |
CN103745726A (en) | Self-adaptive variable-sampling rate audio frequency sampling method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |