CN109346106A - A kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted - Google Patents

A kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted Download PDF

Info

Publication number
CN109346106A
CN109346106A CN201811035434.XA CN201811035434A CN109346106A CN 109346106 A CN109346106 A CN 109346106A CN 201811035434 A CN201811035434 A CN 201811035434A CN 109346106 A CN109346106 A CN 109346106A
Authority
CN
China
Prior art keywords
subband
signal
pitch period
noise ratio
mel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811035434.XA
Other languages
Chinese (zh)
Other versions
CN109346106B (en
Inventor
吕勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201811035434.XA priority Critical patent/CN109346106B/en
Publication of CN109346106A publication Critical patent/CN109346106A/en
Application granted granted Critical
Publication of CN109346106B publication Critical patent/CN109346106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Abstract

The cepstrum domain pitch period estimation method based on subband noise Ratio Weighted that the invention discloses a kind of, subband weighting coefficient is calculated using the Mel spectrum of noisy speech, subband weighting is carried out to all noisy speech logarithmic spectrums on each Mel subband in log-spectral domain, peak detection is carried out in cepstrum domain, estimates the pitch period of noisy speech signal.Technical solution of the present invention can simultaneously inhibit ambient noise and sound channel formant, obtain accurate pitch period estimated value, the pitch evaluation being especially suitable under low signal-to-noise ratio environment.

Description

A kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted
Technical field
The invention belongs to voice processing technology fields, and in particular to carry out subband to noisy speech signal in log-spectral domain Noise Ratio Weighted carries out the pitch period estimation method of peak detection in cepstrum domain.
Background technique
For people when sending out voiced sound, air-flow makes vocal cords generate vibration by glottis, generates pulse paracycle air-flow, and excitation sound channel produces Raw sound.The frequency of this vocal cord vibration is known as fundamental frequency, and the inverse of fundamental frequency is known as pitch period.Pitch period is voice signal One of important parameter, it describes an important feature of driving source, in Speaker Identification, speech synthesis, voice coding etc. Multiple fields have a wide range of applications.
Because the glottal excitation signal of voice is quasi-periodic, and vocal tract resonances summit influences the harmonic wave knot of pumping signal Structure, so it is relatively difficult for accurately extracting pitch period from voice signal.Common pitch period method includes auto-correlation Method, average magnitude difference function method, method for parallel processing and Cepstrum Method.Noisy language of these methods to clean speech or high s/n ratio Sound has preferable effect.However, in the transmission process of voice, it inevitably will be by the interference of ambient noise, this can The pitch period extracted under noise circumstance can be made to differ greatly with actual value.
Summary of the invention
Goal of the invention: aiming at the problems existing in the prior art, the present invention provides a kind of based on subband noise Ratio Weighted Cepstrum domain pitch period estimation method in pitch period estimation in a noisy environment, while considering that ambient noise and sound channel are total Influence of the vibration peak to pumping signal, increases the robustness of pitch evaluation algorithm.
Technical solution: a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted utilizes noisy speech Mel spectrum calculate subband weighting coefficient, subband weighting is carried out to the characteristic parameter of noisy speech in log-spectral domain, cepstrum domain into Row peak detection estimates the pitch period of noisy speech signal.
The specific steps of the present invention are as follows:
(1) interpolation or extraction are carried out to the digital speech of input, its sample frequency is fixed as 8000Hz;
(2) to after interpolation or extraction standard digital voice carry out low-pass filtering, only retain 1000Hz frequency below at Point, and adding window, framing obtain frame signal;
(3) Fast Fourier Transform (FFT) (FFT:Fast Fourier Transform) is carried out to every frame voice signal, obtained The amplitude spectrum of every frame signal;
(4) Mel filtering is carried out to the amplitude spectrum of every frame signal, takes logarithm, and according to the signal-to-noise ratio computation of each Mel subband Subband weighting coefficient;
(5) logarithm is taken to the amplitude spectrum of every frame signal, obtains logarithmic spectrum, and subband weighting is carried out to logarithmic spectrum, reduces and add The influence that property noise estimates pitch period;
(6) discrete cosine transform (DCT:Discrete Cosine is carried out to the logarithmic spectrum after subband weighting Transform), the cepstrum parameter of voice signal is obtained;
(7) peak detection, smothing filtering are carried out to the cepstrum parameter of voice signal, the pitch period for obtaining input voice is estimated Evaluation.
The present invention by adopting the above technical scheme, has the advantages that
Technical solution of the present invention can simultaneously inhibit ambient noise and sound channel formant, obtain accurately Pitch period estimated value, the pitch evaluation being especially suitable under low signal-to-noise ratio environment.
Detailed description of the invention
Fig. 1 is the overall frame of the cepstrum domain pitch period estimation method based on subband noise Ratio Weighted of the embodiment of the present invention Frame figure.
Specific embodiment
Combined with specific embodiments below, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention Rather than limit the scope of the invention, after the present invention has been read, those skilled in the art are to various equivalences of the invention The modification of form falls within the application range as defined in the appended claims.
As shown in Figure 1, the cepstrum domain pitch period estimation method based on subband noise Ratio Weighted, mainly includes interpolation and pumping It takes, pre-process, FFT, Mel filtering, subband weighting, DCT and pitch evaluation part.
1, interpolation and extraction
For the ease of back-end processing, need the sample frequency for inputting voice being fixed as 8000Hz.If inputting the original of voice Beginning sample frequency is higher than 8000Hz, then is extracted to 8000Hz;If the original sampling frequency for inputting voice is lower than 8000Hz, 8000Hz will be inserted in it.
2, it pre-processes
Because the energy of voice is concentrated mainly on low frequency region, the energy of high-frequency region is smaller, is easy the shadow by noise Ring, thus in pretreatment first to after interpolation or extraction standard digital voice carry out low-pass filtering, only retain 1000Hz with Under frequency content;Then, to filtered digital speech adding window, framing obtains frame signal.Length of window is 256, and frame shifting is 128。
3、FFT
Fast Fourier Transform (FFT) (FFT) is carried out to every frame voice signal, and to the voice signal after Fast Fourier Transform (FFT) Complex spectrum modulus, obtain the amplitude spectrum of every frame signal.
4, Mel is filtered
Mel filtering is carried out to the amplitude spectrum of every frame signal first, obtains Mel spectrum;Then, logarithm is taken to Mel spectrum, obtains language The logarithmic spectrum of sound signal;Finally, calculating the weighting coefficient of each Mel subband according to the following formula:
Wherein, SNR (i) is the signal-to-noise ratio of i-th of Mel subband;SNRmaxAnd SNRminRespectively indicate this section of voice subband letter It makes an uproar the maximum value and minimum value of ratio;α (i) indicates the weighting coefficient of i-th of Mel subband.Subband Signal to Noise Ratio (SNR) (i) passes through noisy Voice is in the energy and estimation of noise energy of the Mel subband, and wherein noise energy is estimated at mute section.
5, subband weights
Logarithm is taken to the amplitude spectrum of every frame signal first, obtains logarithmic spectrum;Then with the weighting coefficient α (i) estimated to i-th All noisy speech logarithmic spectrum x (k) on a Mel subband carry out subband weighting:
Wherein,It is the logarithmic spectrum after subband weighting, the i.e. estimated value of clean speech logarithmic spectrum.
6、DCT
To the logarithmic spectrum after subband weightingIt carries out discrete cosine transform (DCT), obtains the cepstrum parameter of voice signal
7, pitch evaluation
Clean speech s (n) can be regarded as by glottal excitation signal e (n) by sound channel response v (n) filtering generation, i.e.,
S (n)=e (n) * v (n) (3)
Wherein, symbol " * " indicates convolution.
Clean speech s (n) by FFT, take logarithm, subband weighted sum DCT after, pumping signal and sound channel response in cepstrum domain Realize separation:
Wherein,WithRespectively indicate the cepstrum parameter of pumping signal and sound channel response.
In cepstrum domain, pumping signalWith pulse characteristic, i.e.,Only in the discrete periodic N of fundamental tonepIntegral multiple at There is nonzero value;At the independent variable n of other sequences,Value be equal to 0.Because of the pitch period T of peoplepVariation range about Between 2ms and 20ms, and the sample frequency of this system is 8000Hz, so NpVariation range about between 16 and 160.And Sound channel responseUsually with the characteristic of rapid decay, value except section [- 16,16] very little, it can be assumed that it is 0.Therefore, in pitch evaluation, it is only necessary to which estimating for pitch period can be obtained in the peak value for detecting cepstrum parameter between [16,160] Evaluation:
Wherein,It isThe corresponding n value of the first peak value;It is the estimated value of pitch period.

Claims (5)

1. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted, it is characterised in that: utilize noisy speech Mel spectrum calculate subband weighting coefficient, subband weighting is carried out to the characteristic parameter of noisy speech in log-spectral domain, cepstrum domain into Row peak detection estimates the pitch period of noisy speech signal.
2. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted according to claim 1, special Sign is, calculates subband weighting coefficient using the Mel spectrum of noisy speech, carries out in characteristic parameter of the log-spectral domain to noisy speech Subband weighting carries out peak detection in cepstrum domain, estimates the pitch period of noisy speech signal;It specifically includes:
(1) interpolation or extraction are carried out to the digital speech of input, its sample frequency is fixed as 8000Hz;
(2) low-pass filtering is carried out to the standard digital voice after interpolation or extraction, only retains 1000Hz frequency content below, and Adding window, framing obtain frame signal;
(3) Fast Fourier Transform (FFT) is carried out to every frame voice signal, obtains the amplitude spectrum of every frame signal;
(4) Mel filtering is carried out to the amplitude spectrum of every frame signal, takes logarithm, and according to the signal-to-noise ratio computation subband of each Mel subband Weighting coefficient;
(5) logarithm is taken to the amplitude spectrum of every frame signal, obtains logarithmic spectrum, and carry out subband weighting, reduce additive noise to fundamental tone The influence of phase estimate;
(6) discrete cosine transform is carried out to the logarithmic spectrum after subband weighting, obtains the cepstrum parameter of voice signal;
(7) peak detection is carried out to the cepstrum parameter of voice signal, smothing filtering obtains the pitch period estimation of input voice Value.
3. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted according to claim 2, special Sign is: the weighting coefficient α (i) of each Mel subband is calculated by the following formula:
Wherein, SNR (i) is the signal-to-noise ratio of i-th of Mel subband;SNRmaxAnd SNRminRespectively indicate this section of voice subband signal-to-noise ratio Maximum value and minimum value;α (i) indicates the weighting coefficient of i-th of Mel subband;Subband Signal to Noise Ratio (SNR) (i) passes through noisy speech In the energy and estimation of noise energy of the Mel subband, wherein noise energy is estimated at mute section.
4. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted according to claim 2, special Sign is: the sample frequency for inputting digital speech being fixed as 8000Hz, if the original sampling frequency of input voice is higher than 8000Hz is then extracted to 8000Hz;If the original sampling frequency for inputting voice is lower than 8000Hz, will be inserted in it 8000Hz;Low-pass filtering is carried out to the standard digital voice after interpolation or extraction, only retains 1000Hz frequency content below;So Afterwards, to filtered digital speech adding window, framing obtains frame signal;Length of window is 256, and it is 128 that frame, which moves,.
5. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted according to claim 2, special Sign is: taking logarithm to the amplitude spectrum of every frame signal first, obtains logarithmic spectrum;Then with the weighting coefficient α (i) estimated to i-th All noisy speech logarithmic spectrum x (k) on a Mel subband carry out subband weighting:
Wherein,It is the logarithmic spectrum after subband weighting, the i.e. estimated value of clean speech logarithmic spectrum.
CN201811035434.XA 2018-09-06 2018-09-06 Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting Active CN109346106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811035434.XA CN109346106B (en) 2018-09-06 2018-09-06 Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811035434.XA CN109346106B (en) 2018-09-06 2018-09-06 Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting

Publications (2)

Publication Number Publication Date
CN109346106A true CN109346106A (en) 2019-02-15
CN109346106B CN109346106B (en) 2022-12-06

Family

ID=65292452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811035434.XA Active CN109346106B (en) 2018-09-06 2018-09-06 Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting

Country Status (1)

Country Link
CN (1) CN109346106B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1397929A (en) * 2002-07-12 2003-02-19 清华大学 Speech intensifying-characteristic weighing-logrithmic spectrum addition method for anti-noise speech recognization
CN1431650A (en) * 2003-02-21 2003-07-23 清华大学 Antinoise voice recognition method based on weighted local energy
CN1702736A (en) * 2001-08-31 2005-11-30 株式会社建伍 Apparatus and method for generating pitch waveform signal and apparatus and method for compressing/decomprising and synthesizing speech signal using the same
CN103021405A (en) * 2012-12-05 2013-04-03 渤海大学 Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter
CN105448297A (en) * 2014-08-28 2016-03-30 中国移动通信集团公司 Method and device for acquiring pitch period
CN106373559A (en) * 2016-09-08 2017-02-01 河海大学 Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1702736A (en) * 2001-08-31 2005-11-30 株式会社建伍 Apparatus and method for generating pitch waveform signal and apparatus and method for compressing/decomprising and synthesizing speech signal using the same
CN1397929A (en) * 2002-07-12 2003-02-19 清华大学 Speech intensifying-characteristic weighing-logrithmic spectrum addition method for anti-noise speech recognization
CN1431650A (en) * 2003-02-21 2003-07-23 清华大学 Antinoise voice recognition method based on weighted local energy
CN103021405A (en) * 2012-12-05 2013-04-03 渤海大学 Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter
CN105448297A (en) * 2014-08-28 2016-03-30 中国移动通信集团公司 Method and device for acquiring pitch period
CN106373559A (en) * 2016-09-08 2017-02-01 河海大学 Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李娟: "几种基音周期算法性能比较", 《运城学院学报》 *
王蕾: "噪声环境下话者识别系统的特征提取", 《电脑知识与技术》 *

Also Published As

Publication number Publication date
CN109346106B (en) 2022-12-06

Similar Documents

Publication Publication Date Title
JP3591068B2 (en) Noise reduction method for audio signal
US6587816B1 (en) Fast frequency-domain pitch estimation
Gu et al. Perceptual harmonic cepstral coefficients for speech recognition in noisy environment
CN103594094B (en) Adaptive spectra subtraction real-time voice strengthens
CA2310491A1 (en) Noise suppression for low bitrate speech coder
WO2002029782A1 (en) Perceptual harmonic cepstral coefficients as the front-end for speech recognition
WO2010091554A1 (en) Method and device for pitch period detection
JP2005535920A (en) Distributed speech recognition and method with back-end speech detection device
CN103474074B (en) Pitch estimation method and apparatus
KR101892733B1 (en) Voice recognition apparatus based on cepstrum feature vector and method thereof
KR101762723B1 (en) Method and apparatus for detecting correctness of pitch period
Hanilçi et al. Comparing spectrum estimators in speaker verification under additive noise degradation
CN108922514A (en) A kind of robust features extracting method based on low frequency logarithmic spectrum
CN1412742A (en) Speech signal base voice period detection method based on wave form correlation method
Nelke Wind noise reduction: signal processing concepts
CN112233657A (en) Speech enhancement method based on low-frequency syllable recognition
CN110379438B (en) Method and system for detecting and extracting fundamental frequency of voice signal
Sun et al. An adaptive speech endpoint detection method in low SNR environments
CN109346106A (en) A kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted
Shannon et al. MFCC computation from magnitude spectrum of higher lag autocorrelation coefficients for robust speech recognition.
Shahnaz et al. Robust pitch estimation at very low SNR exploiting time and frequency domain cues
CN113744725A (en) Training method of voice endpoint detection model and voice noise reduction method
Bai et al. Two-pass quantile based noise spectrum estimation
Li et al. Robust speech endpoint detection based on improved adaptive band-partitioning spectral entropy
Zhiyan et al. Research on speech endpoint detection under low signal-to-noise ratios

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant