CN105023572A - Noised voice end point robustness detection method - Google Patents

Noised voice end point robustness detection method Download PDF

Info

Publication number
CN105023572A
CN105023572A CN201410152461.0A CN201410152461A CN105023572A CN 105023572 A CN105023572 A CN 105023572A CN 201410152461 A CN201410152461 A CN 201410152461A CN 105023572 A CN105023572 A CN 105023572A
Authority
CN
China
Prior art keywords
noise
frame
spectrum
voice
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410152461.0A
Other languages
Chinese (zh)
Inventor
王景芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410152461.0A priority Critical patent/CN105023572A/en
Publication of CN105023572A publication Critical patent/CN105023572A/en
Pending legal-status Critical Current

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a noised voice end point robustness detection method. The method comprises the following steps of constructing an estimation method of a noise power spectrum of each frame of acoustical signals in filtering and providing a time-varying updating mechanism of a noise spectrum; firstly, carrying out iterative wiener filtering on a frequency spectrum of each frame of voices; then, dividing into several sub-band and calculating a frequency spectrum entropy of each sub-band; and then making successive several frames of sub-band frequency spectrum entropies pass through one group of median filters so as to acquire each frame of the frequency spectrum entropies; according to values of the frequency spectrum entropies, classifying input voices. By using the algorithm, the voices and noises, and a voice state and a voiceless state can be effectively distinguished. Under different noise environment conditions, robustness is possessed. The algorithm has low calculating cost, is simple, is easy to realize and is suitable for application of real-time voice signal processing system of various kinds of systems needing voice end point detection. The method is a real-time voice end points detection algorithm which adapts to a complex environment, and voice end point detection and voice filtering enhancement are completed together in a one-time state.

Description

A kind of noisy speech end points Robust Detection Method
Technical field
The invention belongs to voice process technology field, refer to especially a kind ofnoisy speech end points Robust Detection Method.
Background technology
Speech terminals detection (also claims voice activity detection) be the important step of digital speech processing, its objective is the digital signal obtained from sampling and detect speech signal segments and noise signal section; From ground unrest, detect that the starting point and ending point of voice is of use in many ways exactly, such as: in speech recognition, can remove not containing the noise signal section of phonetic element, characteristic parameter is extracted for speech signal segments, not only increase the precision of identification like this, and decrease the identifying processing time; In voice coding, then when not affecting the quality of speech signal received, the bit rate of noise segment can be reduced, improving code efficiency; There is following benefit to Embedded speech recognition system: 1. non-speech frame abandoned and do not deliver to the recognizer of rear end, the calculated amount of back-end recognizer can be reduced; In built-in speech recognition system, as mobile phone, PDA (Personal digital assistant) etc., the response time of system can be reduced, improve the real-time of system; 2., in distributed speech recognition system, a speech frames can reduce the data volume of transmission significantly; 3. the feature reduced due to a large amount of non-speech frame is sent to back-end recognizer and the inserting error caused.
Short-time energy is feature the most frequently used in voice activity detection algorithm, it can effectively separate voice and noise in high s/n ratio environment, but a large amount of experimental result display, based on the method for short-time energy in low signal-to-noise ratio and nonstationary noise environment, its performance obviously declines; Certainly, some algorithm can keep stable performance in low signal-to-noise ratio environment, and its shortcoming is that computation complexity is too large, is not suitable for the application of time Speech Recognition System; Propose the earliest information entropy to be used for speech/noise classification, the pronunciation of people and the difference of noise can show from their frequency spectrum entropy; Experimental result shows, the algorithm based on voice spectrum entropy surpasses the method based on energy under low signal-to-noise ratio environment.
The present invention proposes a kind of subband spectrum entropy voice activity detection algorithm based on Iterative Wiener Filtering, first the frequency spectrum of every frame voice is carried out Iterative Wiener Filtering, again it be divided into several subbands and calculate the frequency spectrum entropy of each subband, then the subband spectrum entropy of some frames in succession being obtained the frequency spectrum entropy of every frame through a class mean wave filter; Due in the noise circumstance of non-stationary, the fluctuation ratio of frequency spectrum entropy contour curve is comparatively large, is unfavorable for the selection of threshold value; Therefore, we by subband spectrum entropy through the smoothing process of a class mean wave filter; In smothing filtering process, before and after having used, the information of the subband spectrum entropy of frame, substantially increases accuracy of detection and the validity of algorithm; This algorithm can keep good performance in multiple environment and signal to noise ratio (S/N ratio) condition, can meet the real-time of embedded system and the requirement of low-power consumption.
Summary of the invention
(1) technical matters that will solve
In view of this, fundamental purpose of the present invention is to propose a kind of noisy speech end points Robust Detection Method, detects speech signal segments and noise signal section the digital signal obtained from sampling; From ground unrest, detect the starting point and ending point of voice exactly, for the voice signal polluted by additive noise, design the voice activity detection method of high robust; The frequency spectrum entropy of the subband of nonstationary noise voice, noisy speech and noise double thresholddiscrimination threshold and method of discrimination.
 
(2) technical scheme
For achieving the above object, the invention provides a kind of noisy speech end points Robust Detection Method, the method comprises:
1) weiner equalizer
Wiener filtering is the maximal possibility estimation under the minimum mean square error criterion meaning of time domain waveform under smooth conditions, and its main method is as follows, for Noisy Speech Signal:
(1)
Wherein y( n), s( n), v( n) represent noisy speech, clean speech and noise respectively.Wiener Filtering is exactly by minimum mean square error criterion pair s( n) estimate, namely choose s( n) estimation s( n), suppose impulse response h (n) of wave filter, have
(2)
And meet ε= e[ s( n) s( n)] 2minimum.Will s( n) substitute in error formula ε, according to orthogonality principle, Fourier transform and s( n) with v( n) irrelevance can obtain Wiener filtering equation;
(3)
(4)
H(k) be frequency field transport function, k is frequency sequence number, P sthe power spectrum density of (k) voice signal s (n), P vthe power spectrum density of (k) noise v (n); Determine P in actual applications s(k) and P vk () is problem focus; P snrk () iteration realizes;
2) noise power spectrum iteration upgrades
The signal of l frame obtains its N in spectrum through fast fourier transform (FFT) fFTindividual some YF(i, l) (0≤i≤N fFT), before initial, N frame elects noise signal as, and the every point spectrum initial value of noise is:
(5)
Noise often puts power spectrum initial value:
(6)
The renewal of noise spectrum and noise power spectrum: be l frame voice signal if current, calculates current signal
(7)
MN (i, l-1) is the noise in former frame situation, (0≤i≤N fFT);
The signal to noise ratio snr of this frame is defined as and meets SNR (i) >0 (0≤i≤N fFT) mean value of number;
If SNR<3db (decibel), so record successive frame noise times N oiseCounter=NoiseCounter+1, only have as NoiseCounter>HangOver (such as HangOver=8), just upgrade noise spectrum and noise power spectrum, other situations are constant; Newer (as α=0.9):
(8)
(9)
3) Iterative Wiener Filtering realizes
Setting signal to noise ratio (S/N ratio) iteration initial value SNRpre (i, 0)=1, (0≤i≤N fFT);
Be l frame voice signal if current, calculate snr gain
(10)
Lastest imformation amount, max is maximizing (as β=0.95)
(11)
Calculating filter coefficient (frequency domain filtering transport function)
(12)
Calculate spectral filter value
(13)
Upgrade the standby next frame filtering of signal to noise ratio snr pre (i, l) to use
(14)
4) subband spectrum entropy calculates
By the voice signal of every frame through fast fourier transform (FFT) obtain it spectrum on N fFTindividual some YF i(0≤i≤N fFT), obtain N by the filtering of above-mentioned iteration dimension sodium fFTindividual some YN i(0≤i≤N fFT), calculate its N fFTindividual some power spectrum Y i=YN i* YN i(0≤i≤N fFT), because pure voice spectrum is between [250Hz, 3500Hz], look for some interval [Nd, the Ng] (0≤Nd<Ng≤N of its correspondence fFT), the point in frequency domain section [Nd, Ng] is divided into the frequency range of a non-overlapping copies, is called subband (Subband).Because the noise in some environment just concentrates on certain subband, sub-band approach can improve the accuracy rate of algorithm in narrow band noise environment. the probability of each point on l frame frequency spectral domain is calculated according to formula (15)
(15)
Wherein, Y ibe the point on i-th subband, Q is a positive number, adds that the object of Q is to make the frequency spectrum entropy of various noise signal in identical signal to noise ratio (S/N ratio) environment relatively, thus can more easily distinguish voice and noise; In experiment, the value of Q gets the linear formula of the mean value STD of initial front N frame time domain each frame standard deviation, i.e. Q=a*STD+b, (as get: a=500, b=1); In order to the impact of single-point power spectrum is too concentrated in cancellation, if .9, then ;
According to the definition of information entropy, the value of the frequency spectrum entropy of a kth subband of l frame is
( ) (16)
5) medium filtering and double threshold voice activity detection
According to the principle of information entropy, when the noise signal in some environment is more regular, the accuracy of sorter will be affected; Therefore, when calculating the frequency spectrum entropy of present frame, the information of front and back L frame used by wave filter; A class mean wave filter is adopted respectively to the smoothing process of frequency spectrum entropy of each subband in algorithm;
Be L=2N to one group of length 1the sub-band information entropy Es [l-N of+1 1, k] ..., Es [l, k] ..., Es [l+N 1, k] and carry out medium filtering, l is the speech frame of present analysis;
According to following formula, we can calculate the spectrum information negentropy of l frame:
After subband spectrum entropy estimate and intermediate value statistical filtering, the signal of every frame can obtain a frequency spectrum negentropy H l;
VAD (Voice Activity Detection) is double-threshold algorithm, and this calculation ratio juris (see figure 2) is as follows:
A) calculate subband spectrum entropy E_Sil (such as front 3 frames are averaging) according to ground unrest, and calculate two high-low threshold E_Low and E_High;
B) find the forward terminal of voice, when the position met the following conditions is exactly speech front-end point: have continuous m frame in the l frame after current location more than E_Low, and have N continuous frame more than E_High in M frame after current location;
C) aft terminal of voice is found, when the position met the following conditions is exactly voice aft terminals: first find the point lower than E_Low, and lower than not having continuous C frame in B frame after the point of E_Low more than E_High.
Preferably, the parameter initialization of described extraction: noisy speech signal framing, frame length N=[0.25fs] point, fs is signal sampling frequency, and frame moves N/2; Noise spectrum initial value is determined to take away the beginning without a few frame of voice segments.
Preferably, described in weiner equalizer:voice signal as a whole its characteristic and the parameter that characterizes its essential characteristic is all times to time change, it is a typical non-stationary process, but in a short time period (10 ~ 30ms), its characteristic keeps stable relatively, thus can be regarded as an accurate stationary process, i.e. the short-term stationarity of voice signal; The voice process technology of the current overwhelming majority is all on the basis of " in short-term ", voice signal is divided into many sections and analyzes its characteristic parameter piecemeal, wherein each section is called one " frame ", the process of segmentation is called that " framing " processes, by realizing voice signal windowed function, frame length generally gets 10 ~ 30ms; Framing can contiguous segmentation, but is generally carry out overlapping type segmentation by a moving window, makes like this to seamlessly transit between frame and frame, maintains the continuity of signal.Choosing at window function, in order to can obtain high frequency resolution and overcome Gibbas phenomenon, we choose the segmentation of Hamming (Hamming) window overlapping type.
Preferably, this invention implementation procedure described is shown in Fig. 1, voice double threshold voice activity detectionprocess as shown in Figure 2.
Preferably, noisy speech signal processes one by one in real time.
(3) beneficial effect
1, this noisy speech end points Robust Detection Method provided by the invention, propose the subband power spectrum sound end real-time detection method of entropy under Iterative Wiener Filtering, this algorithm has and calculates simple, and real-time is high, the feature that noise resisting ability is strong, has good robustness; This algorithm versatility is good, conforms wide, even if apply under very low signal to noise ratio (S/N ratio), speech frame still has the subband compared with high s/n ratio, is applicable to embedded time Speech Recognition System;
2, this noisy speech end points Robust Detection Method advantage provided by the invention and characteristic:
1) Iterative Wiener Filtering be have employed to every frame signal frequency spectrum;
2) low, the high frequency pruned outside voice band in subband power spectrum calculates;
3) introduce the Q of restraint speckle in the computation process of sub-bands of a spectrum entropy, can attenuating noise to the interference of frequency spectrum entropy;
4) at subband spectrum entropy through the smoothing process of a class mean wave filter, be conducive to the selection of threshold value;
3, this noisy speech end points Robust Detection Method provided by the invention is for non-stationary environment noise, detects speech signal segments and noise signal section the digital signal obtained from sampling; From ground unrest, detect the starting point and ending point of voice exactly, for the voice signal polluted by additive noise, design the voice activity detection method of high robust.
Accompanying drawing explanation
Fig. 1 a kind of noisy speech end points Robust Detection Method process flow diagram provided by the invention;
Fig. 2 is provided by the invention double threshold voice activity detectionschematic diagram;
Fig. 3 is that the end-point detection of the different noise of the primitive mixture of tones provided by the invention compares with frequency spectrum entropy curve;
The former speech terminals detection of Fig. 3 (a), (a 1) primitive sound algorithm frequency spectrum entropy herein, (a 2) this conventional spectral of primitive sound entropy;
Fig. 3 (b) mixes white noise (white) voice (SNR=16.5dB) end-point detection, (b 1) mixing white noise algorithm frequency spectrum entropy herein, (b 2) this conventional spectral of mixing white noise entropy;
Fig. 3 (c) mixed powder coloured noise (pink) voice (SNR=13.4dB) end-point detection, (c 1) mixed powder coloured noise algorithm frequency spectrum entropy herein, (c 2) this conventional spectral of mixed powder coloured noise entropy;
Fig. 3 (d) mixes voice (SNR=8.8dB) end-point detection of opportunity of combat passenger cabin (f16_cockpit) noise, (d 1) mixing opportunity of combat noise algorithm frequency spectrum entropy herein, (d 2) this conventional spectral of mixing opportunity of combat noise entropy;
Fig. 3 (e) mixes voice (SNR=7.76dB) end-point detection of the noisy noise of people (babble), (e 1) the noisy noise of mixing people algorithm frequency spectrum entropy herein, (e 2) this conventional spectral of the noisy noise of mixing people entropy;
Fig. 4 is that the noisy noise of mixing provided by the invention (babble) voice (SNR=5dB, 0dB ,-5dB) end-point detection compares with frequency spectrum entropy curve.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
Core content of the present invention is: change new principle when proposing noise spectrum, achieve iterative Wiener Filtering, carry out subband spectrum entropy calculates, devise medium filtering and double threshold voice activity detection,reach speech terminals detection object.
As shown in Figure 1, Fig. 1 is a kind of noisy speech end points Robust Detection Method process flow diagram provided by the invention, and the method comprises the following steps:
Step 101: parameter initialization: noisy speech signal framing, frame length N=[0.25fs] point, fs is signal sampling frequency, and frame moves N/2; Noise spectrum initial value;
Step 102: framing: the noisy speech in m frame moment, Fourier transform;
Step 103: change new during m frame signal noise power spectrum;
Step 104: m frame signal to noise ratio (S/N ratio) Iterative Wiener Filtering;
Step 105: the calculating of m frame voice subband spectrum entropy, medium filtering and double threshold end-point detection;
Step 106: next frame real time signal processing goes to step 102.
Become step of updating described in above-mentioned steps 103 during noise power spectrum to comprise:
The signal of l frame obtains its N in spectrum through fast fourier transform (FFT) fFTindividual some YF(i, l) (0≤i≤N fFT), before initial, N frame elects noise signal as, and the every point spectrum initial value of noise is:
Noise often puts power spectrum initial value:
The renewal of noise spectrum and noise power spectrum: be l frame voice signal if current, calculates current signal
MN (i, l-1) is the noise in former frame situation, (0≤i≤N fFT);
The signal to noise ratio snr of this frame is defined as and meets SNR (i) >0 (0≤i≤N fFT) mean value of number;
If SNR<3db (decibel), so record successive frame noise times N oiseCounter=NoiseCounter+1, only have as NoiseCounter>HangOver (such as HangOver=8), just upgrade noise spectrum and noise power spectrum, other situations are constant; Newer (as α=0.9):
    
The step of signal to noise ratio (S/N ratio) Iterative Wiener Filtering described in above-mentioned steps 104 comprises:
Setting signal to noise ratio (S/N ratio) iteration initial value SNRpre (i, 0)=1, (0≤i≤N fFT);
Be l frame voice signal if current, calculate snr gain
Lastest imformation amount, max is maximizing (as β=0.95)
Calculating filter coefficient (frequency domain filtering transport function)
Calculate spectral filter value
Upgrade the standby next frame filtering of signal to noise ratio snr pre (i, l) to use
The calculating of the entropy of voice subband spectrum described in above-mentioned steps 105, medium filtering and double threshold end-point detection step comprise:
The probability of each point on l frame frequency spectral domain
If .9, then .
According to the definition of information entropy, the value of the frequency spectrum entropy of a kth subband of l frame is
( )
Be L=2N to one group of length 1the sub-band information entropy Es [l-N of+1 1, k] ..., Es [l, k] ..., Es [l+N 1, k] and carry out medium filtering, l is the speech frame of present analysis.
(17)
According to following formula, we can calculate the spectrum information negentropy of l frame:
It is as follows that double-threshold algorithm calculates ratio juris (see figure 2):
A) calculate subband spectrum entropy E_Sil (such as front 3 frames are averaging) according to ground unrest, and calculate two high-low threshold E_Low and E_High;
B) find the forward terminal of voice, when the position met the following conditions is exactly speech front-end point: have continuous m frame in the l frame after current location more than E_Low, and have N continuous frame more than E_High in M frame after current location;
C) aft terminal of voice is found, when the position met the following conditions is exactly voice aft terminals: first find the point lower than E_Low, and lower than not having continuous C frame in B frame after the point of E_Low more than E_High.
Based on a kind of noisy speech end points Robust Detection Method process flow diagram shown in Fig. 1, Fig. 2 further illustrates voice double threshold end-point detection process schematic process.
 
Below in conjunction with specific embodiment, this noisy speech end points Robust Detection Method provided by the invention is further described; experimentground unrest is selected from Noisex-92 database, its sample frequency fs=19.98kHZ.We are with same sample frequency fs below, and under computing machine noise and room noise environment are recorded, " language, sound, end, point " sound is shown in Fig. 3 (a), and doorframe broken line is context of methods extreme result; In voice framing process, every frame gets 25ms, i.e. frame length M=[0.025fs] point, and frame moves , cut and start noise frame N=20, voice cross-talk spectrum segmentation K=8 section, medium filtering chooses adjacent L=9 frame;
White noise (white) in primitive sound, primitive sound and noise Noisex-92 storehouse, pink colour noise (pink), the noisy noise of people (babble), opportunity of combat passenger cabin (f16_cockpit) noise are composed entropy method with herein Iterative Wiener Filtering respectively and common spectrum entropy method is shown in Fig. 3, in Fig. 3, the horizontal ordinate of left part is the time (second), ordinate is amplitude, in, the horizontal ordinate of right part is frame number, ordinate is negentropy; The left part of Fig. 3 is voice, is mixed with the voice of different noise and their end-point detection, and figure middle part is frequency spectrum entropy and the end points cut-off rule of algorithm herein, and figure right part is corresponding conventional spectral entropy; Algorithm is in each middle noise mixing situation herein, and entropy-spectrum curvilinear motion is little, and accurately, adaptivity is good in sound end segmentation.And conventional entropy-spectrum method, in white noise situation better, poor in noiseless and other noise situations effect, the noisy noise of people (babble) effect is the poorest;
Noise source is become further to during mixing---the noisy noise of people's language (babble) is also fine at signal to noise ratio (S/N ratio) speech terminals detection under 5db, 0db ,-5db, and conventional entropy-spectrum method detects inefficacy;
Test result is weighed by 3 indexs ]:
,
Wherein, N 1and N 0be respectively manual markings speech frame and the total number of noise frame in tested speech, N 1,0the number of errors of noise frame is identified as, N for manual markings speech frame 0,1for manual markings noise frame and be identified as the number of errors of speech frame, then P (A/S) is that speech frame detects accuracy, and P (A/N) is that noise frame detects accuracy, and P (A) is total detection accuracy;
Table 1 provides the abridged table of the experimental result of different signal to noise ratio (S/N ratio) (SNR) under noisy noise (babble) environment.
Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (5)

1. a noisy speech end points Robust Detection Method, is characterized in that the method comprises:
The projectional technique of the design noise power spectrum of acoustical signal in filtering, give noise spectrum time change new mechanism; The frequency spectrum of voice carries out Iterative Wiener Filtering algorithm, spectrum division become several subbands and calculate the frequency spectrum entropy of each subband, the subband spectrum entropy of some frames is in succession obtained the frequency spectrum entropy of every frame through a class mean wave filter, the voice of value to input according to frequency spectrum entropy are classified; This algorithm can distinguish voice and noise effectively, have voice with without voice.
2. noisy speech end points Robust Detection Method according to claim 1, is characterized in that, described noise spectrum time change new mechanism:
1) l frame signal through fast fourier transform (FFT) obtain it spectrum on N fFTindividual some YF(i, l) (0≤i≤N fFT), before initial, N frame elects noise signal as, and the every point spectrum initial value of noise is:
(1)
Noise often puts power spectrum initial value:
(2)
2) renewal of noise spectrum and noise power spectrum: be l frame voice signal if current, calculates current signal
(3)
MN (i, l-1) is the noise in former frame situation, (0≤i≤N fFT);
The signal to noise ratio snr of this frame is defined as and meets SNR (i) >0 (0≤i≤N fFT) mean value of number;
If SNR<3db (decibel), so record successive frame noise times N oiseCounter=NoiseCounter+1, only have as NoiseCounter>HangOver (such as HangOver=8), just upgrade noise spectrum and noise power spectrum, other situations are constant; Newer (as α=0.9):
(4)
(5)。
3. noisy speech end points Robust Detection Method according to claim 1, is characterized in that, described Iterative Wiener Filtering algorithm:
Setting signal to noise ratio (S/N ratio) iteration initial value SNRpre (i, 0)=1, (0≤i≤N fFT);
Be l frame voice signal if current, calculate snr gain
(6)
Lastest imformation amount, max is maximizing (as β=0.95)
(7)
Calculating filter coefficient (frequency domain filtering transport function)
(8)
Calculate spectral filter value
(9)
Upgrade the standby next frame filtering of signal to noise ratio snr pre (i, l) to use
(10)。
4. noisy speech end points Robust Detection Method according to claim 1, is characterized in that, described in subband spectrum entropy calculates:
By the voice signal of every frame through fast fourier transform (FFT) obtain it spectrum on N fFTindividual some YF i(0≤i≤N fFT), obtain N by the filtering of above-mentioned iteration dimension sodium fFTindividual some YN i(0≤i≤N fFT), calculate its N fFTindividual some power spectrum Y i=YN i* YN i(0≤i≤N fFT), because pure voice spectrum is between [250Hz, 3500Hz], look for some interval [Nd, the Ng] (0≤Nd<Ng≤N of its correspondence fFT), the point in frequency domain section [Nd, Ng] is divided into the frequency range of a non-overlapping copies, is called subband (Subband);
Because the noise in some environment just concentrates on certain subband, sub-band approach can improve the accuracy rate of algorithm in narrow band noise environment. the probability of each point on l frame frequency spectral domain is calculated according to formula (9)
(11)
Wherein, Y ibe the point on i-th subband, Q is a positive number, adds that the object of Q is to make the frequency spectrum entropy of various noise signal in identical signal to noise ratio (S/N ratio) environment relatively, thus can more easily distinguish voice and noise; In experiment, the value of Q gets the linear formula of the mean value STD of initial front N frame time domain each frame standard deviation, i.e. Q=a*STD+b, (as get: a=500, b=1); In order to the impact of single-point power spectrum is too concentrated in cancellation, if .9, then ;
According to the definition of information entropy, the value of the frequency spectrum entropy of a kth subband of l frame is
( ) (12)
According to the principle of information entropy, when the noise signal in some environment is more regular, the accuracy of sorter will be affected; Therefore, when calculating the frequency spectrum entropy of present frame, the information of front and back L frame used by wave filter; A class mean wave filter is adopted respectively to the smoothing process of frequency spectrum entropy of each subband in algorithm.
5. noisy speech end points Robust Detection Method according to claim 1, is characterized in that, described in medium filtering and end points segmentation threshold:
Be L=2N to one group of length 1the sub-band information entropy Es [l-N of+1 1, k] ..., Es [l, k] ..., Es [l+N 1, k] and carry out medium filtering, l is the speech frame of present analysis;
(13)
According to following formula, we can calculate the spectrum information negentropy of l frame:
(14)
After subband spectrum entropy estimate and intermediate value statistical filtering, the signal of every frame can obtain a frequency spectrum negentropy H l;
Work as H lvalue when being greater than the threshold value of in advance setting, l frame is judged to be speech frame, otherwise is judged to non-speech frame. voice initial N frame hypothesis is pure noise, is used for estimating noise parameter initialization threshold value, and threshold value T is defined as follows:
(15)
T=avg+c (16)
Wherein, be intermediate value, avg is that the noise that input signal starts N frame is most estimated. experimentally result selects c=0.01 (max (H l)-min (H l)) or about 0.004 constant.
CN201410152461.0A 2014-04-16 2014-04-16 Noised voice end point robustness detection method Pending CN105023572A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410152461.0A CN105023572A (en) 2014-04-16 2014-04-16 Noised voice end point robustness detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410152461.0A CN105023572A (en) 2014-04-16 2014-04-16 Noised voice end point robustness detection method

Publications (1)

Publication Number Publication Date
CN105023572A true CN105023572A (en) 2015-11-04

Family

ID=54413491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410152461.0A Pending CN105023572A (en) 2014-04-16 2014-04-16 Noised voice end point robustness detection method

Country Status (1)

Country Link
CN (1) CN105023572A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105390142A (en) * 2015-12-17 2016-03-09 广州大学 Digital hearing aid voice noise elimination method
CN105575406A (en) * 2016-01-07 2016-05-11 深圳市音加密科技有限公司 Noise robustness detection method based on likelihood ratio test
CN106531180A (en) * 2016-12-10 2017-03-22 广州酷狗计算机科技有限公司 Noise detection method and device
CN106782608A (en) * 2016-12-10 2017-05-31 广州酷狗计算机科技有限公司 noise detecting method and device
CN106816157A (en) * 2015-11-30 2017-06-09 展讯通信(上海)有限公司 Audio recognition method and device
CN107331386A (en) * 2017-06-26 2017-11-07 上海智臻智能网络科技股份有限公司 End-point detecting method, device, processing system and the computer equipment of audio signal
CN107665711A (en) * 2016-07-28 2018-02-06 展讯通信(上海)有限公司 Voice activity detection method and device
CN107910017A (en) * 2017-12-19 2018-04-13 河海大学 A kind of method that threshold value is set in noisy speech end-point detection
CN108122552A (en) * 2017-12-15 2018-06-05 上海智臻智能网络科技股份有限公司 Voice mood recognition methods and device
CN108198547A (en) * 2018-01-18 2018-06-22 深圳市北科瑞声科技股份有限公司 Sound end detecting method, device, computer equipment and storage medium
CN108962285A (en) * 2018-07-20 2018-12-07 浙江万里学院 A kind of sound end detecting method dividing subband based on human ear masking effect
CN109102823A (en) * 2018-09-05 2018-12-28 河海大学 A kind of sound enhancement method based on subband spectrum entropy
CN109319351A (en) * 2018-11-28 2019-02-12 广州市煌子辉贸易有限公司 A kind of intelligent garbage bin with sound identifying function
CN109360585A (en) * 2018-12-19 2019-02-19 晶晨半导体(上海)股份有限公司 A kind of voice-activation detecting method
CN110634497A (en) * 2019-10-28 2019-12-31 普联技术有限公司 Noise reduction method and device, terminal equipment and storage medium
CN110706693A (en) * 2019-10-18 2020-01-17 浙江大华技术股份有限公司 Method and device for determining voice endpoint, storage medium and electronic device
CN112102818A (en) * 2020-11-19 2020-12-18 成都启英泰伦科技有限公司 Signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation
CN112365897A (en) * 2020-11-26 2021-02-12 北京百瑞互联技术有限公司 Method, device and medium for self-adaptively adjusting interframe transmission code rate of LC3 encoder
CN112955951A (en) * 2018-11-15 2021-06-11 深圳市欢太科技有限公司 Voice endpoint detection method and device, storage medium and electronic equipment
CN113539300A (en) * 2020-04-10 2021-10-22 宇龙计算机通信科技(深圳)有限公司 Voice detection method and device based on noise suppression, storage medium and terminal
WO2021218591A1 (en) * 2020-04-27 2021-11-04 佛山市顺德区美的洗涤电器制造有限公司 Voice processing method and apparatus, household appliance, and readable storage medium
CN115376548A (en) * 2022-07-06 2022-11-22 华南理工大学 Audio signal voiced section endpoint detection method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1763844A (en) * 2004-10-18 2006-04-26 中国科学院声学研究所 End-point detecting method, device and speech recognition system based on moving window
CN101872616A (en) * 2009-04-22 2010-10-27 索尼株式会社 Endpoint detection method and system using same
CN203288240U (en) * 2013-03-04 2013-11-13 安徽理工大学 Speech endpoint detection system based on DSP
CN103594094A (en) * 2012-08-15 2014-02-19 王景芳 Self-adaptive spectral subtraction real-time speech enhancement
CN103594093A (en) * 2012-08-15 2014-02-19 王景芳 Method for enhancing voice based on signal to noise ratio soft masking
CN103595414A (en) * 2012-08-15 2014-02-19 王景芳 Sparse sampling and signal compressive sensing reconstruction method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1763844A (en) * 2004-10-18 2006-04-26 中国科学院声学研究所 End-point detecting method, device and speech recognition system based on moving window
CN101872616A (en) * 2009-04-22 2010-10-27 索尼株式会社 Endpoint detection method and system using same
CN103594094A (en) * 2012-08-15 2014-02-19 王景芳 Self-adaptive spectral subtraction real-time speech enhancement
CN103594093A (en) * 2012-08-15 2014-02-19 王景芳 Method for enhancing voice based on signal to noise ratio soft masking
CN103595414A (en) * 2012-08-15 2014-02-19 王景芳 Sparse sampling and signal compressive sensing reconstruction method
CN203288240U (en) * 2013-03-04 2013-11-13 安徽理工大学 Speech endpoint detection system based on DSP

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王景芳: ""实时语音端点鲁棒检测"", 《计算机工程与应用》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106816157A (en) * 2015-11-30 2017-06-09 展讯通信(上海)有限公司 Audio recognition method and device
CN105390142B (en) * 2015-12-17 2019-04-05 广州大学 A kind of digital deaf-aid voice noise removing method
CN105390142A (en) * 2015-12-17 2016-03-09 广州大学 Digital hearing aid voice noise elimination method
CN105575406A (en) * 2016-01-07 2016-05-11 深圳市音加密科技有限公司 Noise robustness detection method based on likelihood ratio test
CN107665711A (en) * 2016-07-28 2018-02-06 展讯通信(上海)有限公司 Voice activity detection method and device
CN106531180A (en) * 2016-12-10 2017-03-22 广州酷狗计算机科技有限公司 Noise detection method and device
CN106782608A (en) * 2016-12-10 2017-05-31 广州酷狗计算机科技有限公司 noise detecting method and device
CN106782608B (en) * 2016-12-10 2019-11-05 广州酷狗计算机科技有限公司 Noise detecting method and device
CN106531180B (en) * 2016-12-10 2019-09-20 广州酷狗计算机科技有限公司 Noise detecting method and device
CN107331386A (en) * 2017-06-26 2017-11-07 上海智臻智能网络科技股份有限公司 End-point detecting method, device, processing system and the computer equipment of audio signal
CN107331386B (en) * 2017-06-26 2020-07-21 上海智臻智能网络科技股份有限公司 Audio signal endpoint detection method and device, processing system and computer equipment
CN108122552A (en) * 2017-12-15 2018-06-05 上海智臻智能网络科技股份有限公司 Voice mood recognition methods and device
CN107910017A (en) * 2017-12-19 2018-04-13 河海大学 A kind of method that threshold value is set in noisy speech end-point detection
CN108198547A (en) * 2018-01-18 2018-06-22 深圳市北科瑞声科技股份有限公司 Sound end detecting method, device, computer equipment and storage medium
CN108962285A (en) * 2018-07-20 2018-12-07 浙江万里学院 A kind of sound end detecting method dividing subband based on human ear masking effect
CN108962285B (en) * 2018-07-20 2023-04-14 浙江万里学院 Voice endpoint detection method for dividing sub-bands based on human ear masking effect
CN109102823A (en) * 2018-09-05 2018-12-28 河海大学 A kind of sound enhancement method based on subband spectrum entropy
CN109102823B (en) * 2018-09-05 2022-12-06 河海大学 Speech enhancement method based on subband spectral entropy
CN112955951A (en) * 2018-11-15 2021-06-11 深圳市欢太科技有限公司 Voice endpoint detection method and device, storage medium and electronic equipment
CN109319351A (en) * 2018-11-28 2019-02-12 广州市煌子辉贸易有限公司 A kind of intelligent garbage bin with sound identifying function
CN109360585A (en) * 2018-12-19 2019-02-19 晶晨半导体(上海)股份有限公司 A kind of voice-activation detecting method
CN110706693A (en) * 2019-10-18 2020-01-17 浙江大华技术股份有限公司 Method and device for determining voice endpoint, storage medium and electronic device
CN110706693B (en) * 2019-10-18 2022-04-19 浙江大华技术股份有限公司 Method and device for determining voice endpoint, storage medium and electronic device
CN110634497B (en) * 2019-10-28 2022-02-18 普联技术有限公司 Noise reduction method and device, terminal equipment and storage medium
CN110634497A (en) * 2019-10-28 2019-12-31 普联技术有限公司 Noise reduction method and device, terminal equipment and storage medium
CN113539300A (en) * 2020-04-10 2021-10-22 宇龙计算机通信科技(深圳)有限公司 Voice detection method and device based on noise suppression, storage medium and terminal
WO2021218591A1 (en) * 2020-04-27 2021-11-04 佛山市顺德区美的洗涤电器制造有限公司 Voice processing method and apparatus, household appliance, and readable storage medium
CN112102818A (en) * 2020-11-19 2020-12-18 成都启英泰伦科技有限公司 Signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation
CN112365897A (en) * 2020-11-26 2021-02-12 北京百瑞互联技术有限公司 Method, device and medium for self-adaptively adjusting interframe transmission code rate of LC3 encoder
CN115376548A (en) * 2022-07-06 2022-11-22 华南理工大学 Audio signal voiced section endpoint detection method and system

Similar Documents

Publication Publication Date Title
CN105023572A (en) Noised voice end point robustness detection method
CN109378013B (en) Voice noise reduction method
CN103646649A (en) High-efficiency voice detecting method
CN110047470A (en) A kind of sound end detecting method
CN103440869A (en) Audio-reverberation inhibiting device and inhibiting method thereof
CN110349598A (en) A kind of end-point detecting method under low signal-to-noise ratio environment
Chenchah et al. Speech emotion recognition in noisy environment
Gerkmann et al. Empirical distributions of DFT-domain speech coefficients based on estimated speech variances
CN105575406A (en) Noise robustness detection method based on likelihood ratio test
Jaiswal et al. Implicit wiener filtering for speech enhancement in non-stationary noise
Shrawankar et al. Noise estimation and noise removal techniques for speech recognition in adverse environment
CN109102823A (en) A kind of sound enhancement method based on subband spectrum entropy
May et al. Generalization of supervised learning for binary mask estimation
CN110689905B (en) Voice activity detection system for video conference system
Thakare Voice activity detector and noise trackers for speech recognition system in noisy environment
Goel et al. Developments in spectral subtraction for speech enhancement
Elshamy et al. Two-stage speech enhancement with manipulation of the cepstral excitation
Tang et al. Speech Recognition in High Noise Environment.
Shao et al. A versatile speech enhancement system based on perceptual wavelet denoising
Kurpukdee et al. Improving voice activity detection by using denoising-based techniques with convolutional lstm
Heese et al. Speech-codebook based soft voice activity detection
Arakawa et al. Model-basedwiener filter for noise robust speech recognition
Yoon et al. Speech enhancement based on speech/noise-dominant decision
Li et al. Sub-band based log-energy and its dynamic range stretching for robust in-car speech recognition
Park et al. Preprocessing of dysarthric speech in noise based on CV–dependent wiener filtering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151104