CN109346106A

CN109346106A - A kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted

Info

Publication number: CN109346106A
Application number: CN201811035434.XA
Authority: CN
Inventors: 吕勇
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2018-09-06
Filing date: 2018-09-06
Publication date: 2019-02-15
Anticipated expiration: 2038-09-06
Also published as: CN109346106B

Abstract

The cepstrum domain pitch period estimation method based on subband noise Ratio Weighted that the invention discloses a kind of, subband weighting coefficient is calculated using the Mel spectrum of noisy speech, subband weighting is carried out to all noisy speech logarithmic spectrums on each Mel subband in log-spectral domain, peak detection is carried out in cepstrum domain, estimates the pitch period of noisy speech signal.Technical solution of the present invention can simultaneously inhibit ambient noise and sound channel formant, obtain accurate pitch period estimated value, the pitch evaluation being especially suitable under low signal-to-noise ratio environment.

Description

A kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted

Technical field

The invention belongs to voice processing technology fields, and in particular to carry out subband to noisy speech signal in log-spectral domain Noise Ratio Weighted carries out the pitch period estimation method of peak detection in cepstrum domain.

Background technique

For people when sending out voiced sound, air-flow makes vocal cords generate vibration by glottis, generates pulse paracycle air-flow, and excitation sound channel produces Raw sound.The frequency of this vocal cord vibration is known as fundamental frequency, and the inverse of fundamental frequency is known as pitch period.Pitch period is voice signal One of important parameter, it describes an important feature of driving source, in Speaker Identification, speech synthesis, voice coding etc. Multiple fields have a wide range of applications.

Because the glottal excitation signal of voice is quasi-periodic, and vocal tract resonances summit influences the harmonic wave knot of pumping signal Structure, so it is relatively difficult for accurately extracting pitch period from voice signal.Common pitch period method includes auto-correlation Method, average magnitude difference function method, method for parallel processing and Cepstrum Method.Noisy language of these methods to clean speech or high s/n ratio Sound has preferable effect.However, in the transmission process of voice, it inevitably will be by the interference of ambient noise, this can The pitch period extracted under noise circumstance can be made to differ greatly with actual value.

Summary of the invention

Goal of the invention: aiming at the problems existing in the prior art, the present invention provides a kind of based on subband noise Ratio Weighted Cepstrum domain pitch period estimation method in pitch period estimation in a noisy environment, while considering that ambient noise and sound channel are total Influence of the vibration peak to pumping signal, increases the robustness of pitch evaluation algorithm.

Technical solution: a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted utilizes noisy speech Mel spectrum calculate subband weighting coefficient, subband weighting is carried out to the characteristic parameter of noisy speech in log-spectral domain, cepstrum domain into Row peak detection estimates the pitch period of noisy speech signal.

The specific steps of the present invention are as follows:

(1) interpolation or extraction are carried out to the digital speech of input, its sample frequency is fixed as 8000Hz；

(2) to after interpolation or extraction standard digital voice carry out low-pass filtering, only retain 1000Hz frequency below at Point, and adding window, framing obtain frame signal；

(3) Fast Fourier Transform (FFT) (FFT:Fast Fourier Transform) is carried out to every frame voice signal, obtained The amplitude spectrum of every frame signal；

(4) Mel filtering is carried out to the amplitude spectrum of every frame signal, takes logarithm, and according to the signal-to-noise ratio computation of each Mel subband Subband weighting coefficient；

(5) logarithm is taken to the amplitude spectrum of every frame signal, obtains logarithmic spectrum, and subband weighting is carried out to logarithmic spectrum, reduces and add The influence that property noise estimates pitch period；

(6) discrete cosine transform (DCT:Discrete Cosine is carried out to the logarithmic spectrum after subband weighting Transform), the cepstrum parameter of voice signal is obtained；

(7) peak detection, smothing filtering are carried out to the cepstrum parameter of voice signal, the pitch period for obtaining input voice is estimated Evaluation.

The present invention by adopting the above technical scheme, has the advantages that

Technical solution of the present invention can simultaneously inhibit ambient noise and sound channel formant, obtain accurately Pitch period estimated value, the pitch evaluation being especially suitable under low signal-to-noise ratio environment.

Detailed description of the invention

Fig. 1 is the overall frame of the cepstrum domain pitch period estimation method based on subband noise Ratio Weighted of the embodiment of the present invention Frame figure.

Specific embodiment

Combined with specific embodiments below, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention Rather than limit the scope of the invention, after the present invention has been read, those skilled in the art are to various equivalences of the invention The modification of form falls within the application range as defined in the appended claims.

As shown in Figure 1, the cepstrum domain pitch period estimation method based on subband noise Ratio Weighted, mainly includes interpolation and pumping It takes, pre-process, FFT, Mel filtering, subband weighting, DCT and pitch evaluation part.

1, interpolation and extraction

For the ease of back-end processing, need the sample frequency for inputting voice being fixed as 8000Hz.If inputting the original of voice Beginning sample frequency is higher than 8000Hz, then is extracted to 8000Hz；If the original sampling frequency for inputting voice is lower than 8000Hz, 8000Hz will be inserted in it.

2, it pre-processes

Because the energy of voice is concentrated mainly on low frequency region, the energy of high-frequency region is smaller, is easy the shadow by noise Ring, thus in pretreatment first to after interpolation or extraction standard digital voice carry out low-pass filtering, only retain 1000Hz with Under frequency content；Then, to filtered digital speech adding window, framing obtains frame signal.Length of window is 256, and frame shifting is 128。

3、FFT

Fast Fourier Transform (FFT) (FFT) is carried out to every frame voice signal, and to the voice signal after Fast Fourier Transform (FFT) Complex spectrum modulus, obtain the amplitude spectrum of every frame signal.

4, Mel is filtered

Mel filtering is carried out to the amplitude spectrum of every frame signal first, obtains Mel spectrum；Then, logarithm is taken to Mel spectrum, obtains language The logarithmic spectrum of sound signal；Finally, calculating the weighting coefficient of each Mel subband according to the following formula:

Wherein, SNR (i) is the signal-to-noise ratio of i-th of Mel subband；SNR_maxAnd SNR_minRespectively indicate this section of voice subband letter It makes an uproar the maximum value and minimum value of ratio；α (i) indicates the weighting coefficient of i-th of Mel subband.Subband Signal to Noise Ratio (SNR) (i) passes through noisy Voice is in the energy and estimation of noise energy of the Mel subband, and wherein noise energy is estimated at mute section.

5, subband weights

Logarithm is taken to the amplitude spectrum of every frame signal first, obtains logarithmic spectrum；Then with the weighting coefficient α (i) estimated to i-th All noisy speech logarithmic spectrum x (k) on a Mel subband carry out subband weighting:

Wherein,It is the logarithmic spectrum after subband weighting, the i.e. estimated value of clean speech logarithmic spectrum.

6、DCT

To the logarithmic spectrum after subband weightingIt carries out discrete cosine transform (DCT), obtains the cepstrum parameter of voice signal

7, pitch evaluation

Clean speech s (n) can be regarded as by glottal excitation signal e (n) by sound channel response v (n) filtering generation, i.e.,

S (n)=e (n) * v (n) (3)

Wherein, symbol " * " indicates convolution.

Clean speech s (n) by FFT, take logarithm, subband weighted sum DCT after, pumping signal and sound channel response in cepstrum domain Realize separation:

Wherein,WithRespectively indicate the cepstrum parameter of pumping signal and sound channel response.

In cepstrum domain, pumping signalWith pulse characteristic, i.e.,Only in the discrete periodic N of fundamental tone_pIntegral multiple at There is nonzero value；At the independent variable n of other sequences,Value be equal to 0.Because of the pitch period T of people_pVariation range about Between 2ms and 20ms, and the sample frequency of this system is 8000Hz, so N_pVariation range about between 16 and 160.And Sound channel responseUsually with the characteristic of rapid decay, value except section [- 16,16] very little, it can be assumed that it is 0.Therefore, in pitch evaluation, it is only necessary to which estimating for pitch period can be obtained in the peak value for detecting cepstrum parameter between [16,160] Evaluation:

Wherein,It isThe corresponding n value of the first peak value；It is the estimated value of pitch period.

Claims

1. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted, it is characterised in that: utilize noisy speech Mel spectrum calculate subband weighting coefficient, subband weighting is carried out to the characteristic parameter of noisy speech in log-spectral domain, cepstrum domain into Row peak detection estimates the pitch period of noisy speech signal.

2. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted according to claim 1, special Sign is, calculates subband weighting coefficient using the Mel spectrum of noisy speech, carries out in characteristic parameter of the log-spectral domain to noisy speech Subband weighting carries out peak detection in cepstrum domain, estimates the pitch period of noisy speech signal；It specifically includes:

(2) low-pass filtering is carried out to the standard digital voice after interpolation or extraction, only retains 1000Hz frequency content below, and Adding window, framing obtain frame signal；

(3) Fast Fourier Transform (FFT) is carried out to every frame voice signal, obtains the amplitude spectrum of every frame signal；

(4) Mel filtering is carried out to the amplitude spectrum of every frame signal, takes logarithm, and according to the signal-to-noise ratio computation subband of each Mel subband Weighting coefficient；

(5) logarithm is taken to the amplitude spectrum of every frame signal, obtains logarithmic spectrum, and carry out subband weighting, reduce additive noise to fundamental tone The influence of phase estimate；

(6) discrete cosine transform is carried out to the logarithmic spectrum after subband weighting, obtains the cepstrum parameter of voice signal；

(7) peak detection is carried out to the cepstrum parameter of voice signal, smothing filtering obtains the pitch period estimation of input voice Value.

3. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted according to claim 2, special Sign is: the weighting coefficient α (i) of each Mel subband is calculated by the following formula:

Wherein, SNR (i) is the signal-to-noise ratio of i-th of Mel subband；SNR_maxAnd SNR_minRespectively indicate this section of voice subband signal-to-noise ratio Maximum value and minimum value；α (i) indicates the weighting coefficient of i-th of Mel subband；Subband Signal to Noise Ratio (SNR) (i) passes through noisy speech In the energy and estimation of noise energy of the Mel subband, wherein noise energy is estimated at mute section.

4. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted according to claim 2, special Sign is: the sample frequency for inputting digital speech being fixed as 8000Hz, if the original sampling frequency of input voice is higher than 8000Hz is then extracted to 8000Hz；If the original sampling frequency for inputting voice is lower than 8000Hz, will be inserted in it 8000Hz；Low-pass filtering is carried out to the standard digital voice after interpolation or extraction, only retains 1000Hz frequency content below；So Afterwards, to filtered digital speech adding window, framing obtains frame signal；Length of window is 256, and it is 128 that frame, which moves,.

5. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted according to claim 2, special Sign is: taking logarithm to the amplitude spectrum of every frame signal first, obtains logarithmic spectrum；Then with the weighting coefficient α (i) estimated to i-th All noisy speech logarithmic spectrum x (k) on a Mel subband carry out subband weighting: