CN102737645A

CN102737645A - Algorithm for estimating pitch period of voice signal

Info

Publication number: CN102737645A
Application number: CN2012101969831A
Authority: CN
Inventors: 管晏; 付斌
Original assignee: Wuhan Tianyu Information Industry Co Ltd
Current assignee: Wuhan Tianyu Information Industry Co Ltd
Priority date: 2012-06-15
Filing date: 2012-06-15
Publication date: 2012-10-17

Abstract

The invention discloses an algorithm for estimating a pitch period of a voice signal and relates to the field of voice signal processing. The algorithm comprises the following steps of: 1, denoising a voice signal with noise through an adaptive filter; 2, determining a self-correlation function of the denoised voice signal and a cyclic average magnitude difference function; and 3, obtaining a weighted square characteristic value through a formula, wherein alpha, beta and gamma are constants which are respectively more than 1, R (k) is the self-correlation function, and the D (k) is the average magnitude difference function. By the algorithm, the pitch period can be effectively detected in the environment with a low signal to noise ratio, the extraction errors are reduced, octave or semioctave errors are reduced, the estimation accuracy of a pitch is improved when the algorithm is sensitive to change of an amplitude or frequency of the voice signal, and the robustness is high.

Description

A kind of pitch period algorithm for estimating of voice signal

Technical field

The present invention relates to field of voice signal, specifically relate to a kind of pitch period algorithm for estimating of voice signal.

Background technology

Speech signal analysis is prerequisite and the basis that voice signal is handled; Only analyze the parameter that can characterize the voice signal essential characteristic; Just might utilize these parameters to carry out processing such as phonetic synthesis, speech recognition, voice compression coding efficiently; Wherein, pitch period is one of most important characteristic parameter during voice signal is handled.Pitch period is meant the cycle of vocal cord vibration when sending out voiced sound, and the estimation of pitch period is called pitch Detection, its objective is to extract the geometric locus that pitch period consistent with vibration frequency of vocal band or that match as far as possible changes, and effect is very crucial.

Because voice signal can be considered the stochastic process of a dynamic non-stationary; The frequency range of speech waveform and vocal cord vibration is big and very complicated; The changeableness of sound channel and sound channel characteristic vary with each individual, and the scope of fundamental tone is very wide, even the pitch period that same individual pronounces under different moods is also different; Pitch period also receives the influence of pronunciation of words tone in addition, thereby the accurate detection of pitch period is actually a relatively thing of difficulty.Especially the portion end to end at voice does not have the such periodicity of vocal cord vibration, judges that to the transition frames of some pure and impure sound is very difficult it belongs to periodically or aperiodicity; Even voice signal is quasi-periodic, its resonance peak structure and noise influence crest and zero-crossing rate sometimes, are difficult to the accurately beginning and the end of location pitch period; The pitch period variation range is bigger, and the 500Hz from the bass male sex's 50Hz to high pitch women or children is near three octaves; These have brought certain difficulty all for the detection of pitch period.

In the present fundamental tone detecting method, the most classical with ACF (Auto Correlation Function, autocorrelation function) method and AMDF (Average Magnitude Difference Function, average magnitude difference function) method.The ACF method is the autocorrelation function of computing voice signal, exists big peak value to estimate fundamental tone in pitch period integral multiple position through the ACF curve, but along with the decline of signal to noise ratio (S/N ratio), can cause frequency multiplication or half mistake frequently usually.The AMDF method is the average magnitude difference function of computing voice signal, occurs valley through the AMDF curve at pitch period integral multiple place and estimates fundamental tone, and this method is when the amplitude of voice signal or change of frequency are relatively more responsive, and the pitch Detection precision obviously descends.

Summary of the invention

To the defective that exists in the prior art; The object of the present invention is to provide a kind of pitch period algorithm for estimating of voice signal, under the low signal-to-noise ratio environment, can effectively detect pitch period, reduce and extract error; Reduce frequency multiplication or half mistake frequently; When the amplitude of voice signal or change of frequency are responsive, improve the fundamental tone estimated accuracy, robustness is better.

For reaching above purpose, the technical scheme that the present invention takes is: a kind of pitch period evaluation method of voice signal comprises the steps: that S1. will carry out noise reduction process through sef-adapting filter with the voice signal of noise; S2. obtain the autocorrelation function and the round-robin average magnitude difference function of voice signal behind the noise reduction; S3. draw the weighted quadratic characteristic through formula

Wherein, α, β, γ are the constant greater than 1, and R (k) is said autocorrelation function, and D (k) is said average magnitude difference function.

On the basis of technique scheme, said sef-adapting filter is the least mean-square error sef-adapting filter.

On the basis of technique scheme, among the said S2, the round-robin average magnitude difference function does

K=0,1 ... N-1, wherein N is the length of speech analysis frame, S _ω(n) be windowing voice behind the noise reduction, (n+k, N) n+k is carried out mould is that the mould of asking of N is got surplus operation to mod in expression.

On the basis of technique scheme, when calculating said round-robin average magnitude difference function, each sample point in the current windowing speech frame all is used and only is used once, and the difference item number of summation is also identical.

On the basis of technique scheme, said S _ω(n) autocorrelation function Wherein N is the length of speech analysis frame, and k is the delay degree.

On the basis of technique scheme, a said autocorrelation function mistake! Do not find Reference source.R (k) peak feature occurs at fundamental frequency integral multiple place, estimates fundamental tone according to first peak point except that R (0).

On the basis of technique scheme, said autocorrelation function shows as peak value at the pitch period place, and average magnitude difference function shows as valley at the pitch period place.

On the basis of technique scheme, among the said S3,

K (k) = \frac{{[α R (k)]}^{2}}{{[\frac{D (k) + β}{γ}]}^{2}} = \frac{{[α Σ_{n = 0}^{N - K - 1} S_{ω} (n) S_{ω} (n + k)]}^{2}}{{[\frac{Σ_{n = 0}^{N - 1} | S_{ω} (Mod (n + k, N)) - S_{ω} (n) | + β}{γ}]}^{2}},

R (k) is identical with the cycle of K (k), and waveform is more sharp-pointed after R (k) and D (k) weighted quadratic, and both are divided by and obtain the weighted quadratic characteristic.

Beneficial effect of the present invention is: the pitch period algorithm for estimating of voice signal of the present invention; Suppressed the influence of resonance peak effectively; Under the low signal-to-noise ratio environment, can effectively detect pitch period, can locate the position of pitch period more accurately, reduce the extraction error; Improved the fundamental tone estimated accuracy, and algorithm complex is lower.

Description of drawings

Fig. 1 is the process flow diagram of the pitch period algorithm for estimating of voice signal of the present invention;

Fig. 2 is a LMS sef-adapting filter schematic diagram in the embodiment of the invention.

Embodiment

Below in conjunction with accompanying drawing and embodiment the present invention is done further explain.

As shown in Figure 1, the pitch period algorithm for estimating of voice signal of the present invention, it comprises the steps:

S1. will carry out noise reduction process through sef-adapting filter with the voice signal of noise.

In the present embodiment, with the voice signal of band noise through LMS (Least Mean Square, least mean-square error) sef-adapting filter enhancement process, to extract pure as far as possible primary speech signal.The LMS sef-adapting filter is one type of adaptive system with feedback performance; It comes the filter parameter of now of adjusting automatically through the filter parameter that previous moment is obtained; Make that the error mean square value between filter output signal and the wanted signal is minimum, thereby reach the effect of optimum filtering.Certainly, in other embodiments, can use other sef-adapting filter.

As shown in Figure 2; For using the schematic diagram of LMS (Least Mean Square, lowest mean square) sef-adapting filter in the present embodiment, X (n) expression n input signal vector constantly; D (n) representes wanted signal; The weighted vector of W (n) expression sef-adapting filter, Y (n) expression output signal, E (n) representes error signal.Said n input signal vector X (n) constantly is: X (n)=[x (n), x (n-1) ..., x (n-M+1)] ^T, wherein M is the exponent number of sef-adapting filter.

Said output signal Y (n) is: Y (n)=X (n) ^TW (n); Error signal E (n) is: E (n)=D (n)-Y (n).The weighted vector iterative formula of sef-adapting filter is:

W (n+1)=W (n)+μ E (n) X (n) formula 1

In the said formula 1, μ is the converging factor of sef-adapting filter, and next weighted vector constantly of adaptive iteration can be added with the error signal to be that the input vector of scale factor obtains by the weighted vector of current time.Said converging factor μ is step-length, and it confirms effect of filtering very responsive, selects suitable converging factor μ will influence convergence of algorithm speed, μ than hour, algorithm convergence is slow, but the stable state offset error is little; When μ was big, algorithm the convergence speed was fast, but the stable state offset error is big, so converging factor μ has decisive influence to the performance of algorithm.In addition, the exponent number M of wave filter also will directly influence the performance of sef-adapting filter.As square error E [E ²(n)] hour, wave filter will adjust the best weight value vector W (n) that is fit to external environment automatically, make Y (n) optimal approximation D (n).

S2. obtain the ACF and the CAMDF (Circular Average Magnitude Difference Function, round-robin average magnitude difference function) of voice signal behind the noise reduction.

Said CAMDF function expression is:

D (k) = Σ_{n = 0}^{N - 1} | S_{ω} (Mod (n + k, N)) - S_{ω} (n) |, k = 0,1, . . . N - 1

Formula 2

Wherein, S _ω(n) be windowing voice behind the noise reduction, N is the length of speech analysis frame, and (n+k, N) n+k is carried out mould is that the mould of asking of N is got surplus operation to mod in expression.D (0)=0; In field of definition, D (k) is about k=N/2 symmetry, i.e. D (k)=D (N-k).In addition, for the minimum period be the strict periodic signal of T, the CAMDF function also possesses following character:

D (aT)≤D (aT+b), 0≤aT+b≤N/2 wherein, 0＜b＜T, a=0,1,2,

K=aT is the local smallest point of D (K), 0≤aT≤N/2 wherein, and a=0,1,2,

D (aT)≤D (aT+T), 0≤aT＜aT+T≤N/2 wherein, a=0,1,2,

When calculating amplitude difference function D (k), each sample point in the current windowing speech frame all is used and only is used once, and the difference item number of summation is also identical.The character of utilizing the functional value of symmetry and the valley point of CAMDF function to increase progressively successively can also overcome the problem that pitch period doubles, brings great convenience to pitch Detection, as: the fluctuation tendency of level, it is easier that valley point is detected; Can one-time positioning arrive the pitch period position of estimating, simplify the testing process of pitch period; The sample point that uses during each D of calculating (k) is all consistent, makes the amplitude difference function more can react the difference between the different value of K.

Said ACF function representation random signal is in any two different degrees of correlation between the value constantly, and the autocorrelation function of periodic signal has the identical cycle.Windowing voice S behind the noise reduction _ω(n) ACF function R (k) is:

R (k) = Σ_{n = 0}^{N - k - 1} S_{ω} (n) S_{ω} (n + k)

Formula 3

Wherein N is the length of speech analysis frame, and k is the delay degree.The autocorrelation function R (k) of voice signal peak feature will occur at fundamental frequency integral multiple place, estimate fundamental tone according to first peak point (except the R (0)) usually.

S3. draw the weighted quadratic characteristic of ACF/CAMDF through formula

; Wherein, α, β, γ are the constant greater than 1, and R (k) is said autocorrelation function, and D (k) is said average magnitude difference function; All can be drawn by formula 2 and formula 3, promptly this formula further is:

K (k) = \frac{{[α R (k)]}^{2}}{{[\frac{D (k) + β}{γ}]}^{2}} = \frac{{[α Σ_{n = 0}^{N - K - 1} S_{ω} (n) S_{ω} (n + k)]}^{2}}{{[\frac{Σ_{n = 0}^{N - 1} | S_{ω} (Mod (n + k, N)) - S_{ω} (n) | + β}{γ}]}^{2}}

Formula 4

Can be known that by above-mentioned steps S2 what the ACF function was sought is the position of maximal peak point, be the position of dark valley point and the CAMDF function is sought; ACF function number shows as peak value at the pitch period place, and the CAMDF function shows as valley at the pitch period place.If first peak value of R (k) is more sharp-pointed or the acutance of the overall valley point of D (k) is outstanding more, then the estimation of pitch period will be accurate more.Analyze and to know by formula 4; R (k) is identical with the cycle of K (k); Then the peak value waveform is more sharp-pointed after R (k) weighted quadratic, and the valley waveform is more outstanding after D (k) weighted quadratic, and both weighted quadratic characteristics that obtains of being divided by are especially obvious at the peak point at pitch period integral multiple place; Because pitch period locatees out through peak point, so this weighted quadratic characteristic has been located the position of pitch period more accurately.

The present invention is not limited to above-mentioned embodiment, for those skilled in the art, under the prerequisite that does not break away from the principle of the invention, can also make some improvement and retouching, and these improvement and retouching also are regarded as within protection scope of the present invention.The content of not doing in this instructions to describe in detail belongs to this area professional and technical personnel's known prior art.

Claims

1. the pitch period evaluation method of a voice signal is characterized in that, comprises the steps:

S1. will carry out noise reduction process through sef-adapting filter with the voice signal of noise;

S2. obtain the autocorrelation function and the round-robin average magnitude difference function of voice signal behind the noise reduction;

S3. draw the weighted quadratic characteristic through formula

; Wherein, α, β, γ are the constant greater than 1; R (k) is said autocorrelation function, and D (k) is said average magnitude difference function.

2. the pitch period evaluation method of voice signal as claimed in claim 1, it is characterized in that: said sef-adapting filter is the least mean-square error sef-adapting filter.

3. the pitch period evaluation method of voice signal as claimed in claim 1, it is characterized in that: among the said S2, the round-robin average magnitude difference function does

4. the pitch period evaluation method of voice signal as claimed in claim 3; It is characterized in that: when calculating said round-robin average magnitude difference function; Each sample point in the current windowing speech frame all is used and only is used once, and the difference item number of summation is also identical.

5. the pitch period evaluation method of voice signal as claimed in claim 3 is characterized in that: said S _ω(n) autocorrelation function Wherein N is the length of speech analysis frame, and k is the delay degree.

6. the pitch period evaluation method of voice signal as claimed in claim 5 is characterized in that: said autocorrelation function mistake! Do not find Reference source.R (k) peak feature occurs at fundamental frequency integral multiple place, estimates fundamental tone according to first peak point except that R (0).

7. the pitch period evaluation method of voice signal as claimed in claim 5, it is characterized in that: said autocorrelation function shows as peak value at the pitch period place, and average magnitude difference function shows as valley at the pitch period place.

8. the pitch period evaluation method of voice signal as claimed in claim 7 is characterized in that: among the said S3,

K (k) = \frac{{[α R (k)]}^{2}}{{[\frac{D (k) + β}{γ}]}^{2}} = \frac{{[α Σ_{n = 0}^{N - K - 1} S_{ω} (n) S_{ω} (n + k)]}^{2}}{{[\frac{Σ_{n = 0}^{N - 1} | S_{ω} (Mod (n + k, N)) - S_{ω} (n) | + β}{γ}]}^{2}},