CN109346106A - A kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted - Google Patents
A kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted Download PDFInfo
- Publication number
- CN109346106A CN109346106A CN201811035434.XA CN201811035434A CN109346106A CN 109346106 A CN109346106 A CN 109346106A CN 201811035434 A CN201811035434 A CN 201811035434A CN 109346106 A CN109346106 A CN 109346106A
- Authority
- CN
- China
- Prior art keywords
- subband
- signal
- pitch period
- noise ratio
- mel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
Abstract
The cepstrum domain pitch period estimation method based on subband noise Ratio Weighted that the invention discloses a kind of, subband weighting coefficient is calculated using the Mel spectrum of noisy speech, subband weighting is carried out to all noisy speech logarithmic spectrums on each Mel subband in log-spectral domain, peak detection is carried out in cepstrum domain, estimates the pitch period of noisy speech signal.Technical solution of the present invention can simultaneously inhibit ambient noise and sound channel formant, obtain accurate pitch period estimated value, the pitch evaluation being especially suitable under low signal-to-noise ratio environment.
Description
Technical field
The invention belongs to voice processing technology fields, and in particular to carry out subband to noisy speech signal in log-spectral domain
Noise Ratio Weighted carries out the pitch period estimation method of peak detection in cepstrum domain.
Background technique
For people when sending out voiced sound, air-flow makes vocal cords generate vibration by glottis, generates pulse paracycle air-flow, and excitation sound channel produces
Raw sound.The frequency of this vocal cord vibration is known as fundamental frequency, and the inverse of fundamental frequency is known as pitch period.Pitch period is voice signal
One of important parameter, it describes an important feature of driving source, in Speaker Identification, speech synthesis, voice coding etc.
Multiple fields have a wide range of applications.
Because the glottal excitation signal of voice is quasi-periodic, and vocal tract resonances summit influences the harmonic wave knot of pumping signal
Structure, so it is relatively difficult for accurately extracting pitch period from voice signal.Common pitch period method includes auto-correlation
Method, average magnitude difference function method, method for parallel processing and Cepstrum Method.Noisy language of these methods to clean speech or high s/n ratio
Sound has preferable effect.However, in the transmission process of voice, it inevitably will be by the interference of ambient noise, this can
The pitch period extracted under noise circumstance can be made to differ greatly with actual value.
Summary of the invention
Goal of the invention: aiming at the problems existing in the prior art, the present invention provides a kind of based on subband noise Ratio Weighted
Cepstrum domain pitch period estimation method in pitch period estimation in a noisy environment, while considering that ambient noise and sound channel are total
Influence of the vibration peak to pumping signal, increases the robustness of pitch evaluation algorithm.
Technical solution: a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted utilizes noisy speech
Mel spectrum calculate subband weighting coefficient, subband weighting is carried out to the characteristic parameter of noisy speech in log-spectral domain, cepstrum domain into
Row peak detection estimates the pitch period of noisy speech signal.
The specific steps of the present invention are as follows:
(1) interpolation or extraction are carried out to the digital speech of input, its sample frequency is fixed as 8000Hz;
(2) to after interpolation or extraction standard digital voice carry out low-pass filtering, only retain 1000Hz frequency below at
Point, and adding window, framing obtain frame signal;
(3) Fast Fourier Transform (FFT) (FFT:Fast Fourier Transform) is carried out to every frame voice signal, obtained
The amplitude spectrum of every frame signal;
(4) Mel filtering is carried out to the amplitude spectrum of every frame signal, takes logarithm, and according to the signal-to-noise ratio computation of each Mel subband
Subband weighting coefficient;
(5) logarithm is taken to the amplitude spectrum of every frame signal, obtains logarithmic spectrum, and subband weighting is carried out to logarithmic spectrum, reduces and add
The influence that property noise estimates pitch period;
(6) discrete cosine transform (DCT:Discrete Cosine is carried out to the logarithmic spectrum after subband weighting
Transform), the cepstrum parameter of voice signal is obtained;
(7) peak detection, smothing filtering are carried out to the cepstrum parameter of voice signal, the pitch period for obtaining input voice is estimated
Evaluation.
The present invention by adopting the above technical scheme, has the advantages that
Technical solution of the present invention can simultaneously inhibit ambient noise and sound channel formant, obtain accurately
Pitch period estimated value, the pitch evaluation being especially suitable under low signal-to-noise ratio environment.
Detailed description of the invention
Fig. 1 is the overall frame of the cepstrum domain pitch period estimation method based on subband noise Ratio Weighted of the embodiment of the present invention
Frame figure.
Specific embodiment
Combined with specific embodiments below, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention
Rather than limit the scope of the invention, after the present invention has been read, those skilled in the art are to various equivalences of the invention
The modification of form falls within the application range as defined in the appended claims.
As shown in Figure 1, the cepstrum domain pitch period estimation method based on subband noise Ratio Weighted, mainly includes interpolation and pumping
It takes, pre-process, FFT, Mel filtering, subband weighting, DCT and pitch evaluation part.
1, interpolation and extraction
For the ease of back-end processing, need the sample frequency for inputting voice being fixed as 8000Hz.If inputting the original of voice
Beginning sample frequency is higher than 8000Hz, then is extracted to 8000Hz;If the original sampling frequency for inputting voice is lower than 8000Hz,
8000Hz will be inserted in it.
2, it pre-processes
Because the energy of voice is concentrated mainly on low frequency region, the energy of high-frequency region is smaller, is easy the shadow by noise
Ring, thus in pretreatment first to after interpolation or extraction standard digital voice carry out low-pass filtering, only retain 1000Hz with
Under frequency content;Then, to filtered digital speech adding window, framing obtains frame signal.Length of window is 256, and frame shifting is
128。
3、FFT
Fast Fourier Transform (FFT) (FFT) is carried out to every frame voice signal, and to the voice signal after Fast Fourier Transform (FFT)
Complex spectrum modulus, obtain the amplitude spectrum of every frame signal.
4, Mel is filtered
Mel filtering is carried out to the amplitude spectrum of every frame signal first, obtains Mel spectrum;Then, logarithm is taken to Mel spectrum, obtains language
The logarithmic spectrum of sound signal;Finally, calculating the weighting coefficient of each Mel subband according to the following formula:
Wherein, SNR (i) is the signal-to-noise ratio of i-th of Mel subband;SNRmaxAnd SNRminRespectively indicate this section of voice subband letter
It makes an uproar the maximum value and minimum value of ratio;α (i) indicates the weighting coefficient of i-th of Mel subband.Subband Signal to Noise Ratio (SNR) (i) passes through noisy
Voice is in the energy and estimation of noise energy of the Mel subband, and wherein noise energy is estimated at mute section.
5, subband weights
Logarithm is taken to the amplitude spectrum of every frame signal first, obtains logarithmic spectrum;Then with the weighting coefficient α (i) estimated to i-th
All noisy speech logarithmic spectrum x (k) on a Mel subband carry out subband weighting:
Wherein,It is the logarithmic spectrum after subband weighting, the i.e. estimated value of clean speech logarithmic spectrum.
6、DCT
To the logarithmic spectrum after subband weightingIt carries out discrete cosine transform (DCT), obtains the cepstrum parameter of voice signal
7, pitch evaluation
Clean speech s (n) can be regarded as by glottal excitation signal e (n) by sound channel response v (n) filtering generation, i.e.,
S (n)=e (n) * v (n) (3)
Wherein, symbol " * " indicates convolution.
Clean speech s (n) by FFT, take logarithm, subband weighted sum DCT after, pumping signal and sound channel response in cepstrum domain
Realize separation:
Wherein,WithRespectively indicate the cepstrum parameter of pumping signal and sound channel response.
In cepstrum domain, pumping signalWith pulse characteristic, i.e.,Only in the discrete periodic N of fundamental tonepIntegral multiple at
There is nonzero value;At the independent variable n of other sequences,Value be equal to 0.Because of the pitch period T of peoplepVariation range about
Between 2ms and 20ms, and the sample frequency of this system is 8000Hz, so NpVariation range about between 16 and 160.And
Sound channel responseUsually with the characteristic of rapid decay, value except section [- 16,16] very little, it can be assumed that it is
0.Therefore, in pitch evaluation, it is only necessary to which estimating for pitch period can be obtained in the peak value for detecting cepstrum parameter between [16,160]
Evaluation:
Wherein,It isThe corresponding n value of the first peak value;It is the estimated value of pitch period.
Claims (5)
1. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted, it is characterised in that: utilize noisy speech
Mel spectrum calculate subband weighting coefficient, subband weighting is carried out to the characteristic parameter of noisy speech in log-spectral domain, cepstrum domain into
Row peak detection estimates the pitch period of noisy speech signal.
2. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted according to claim 1, special
Sign is, calculates subband weighting coefficient using the Mel spectrum of noisy speech, carries out in characteristic parameter of the log-spectral domain to noisy speech
Subband weighting carries out peak detection in cepstrum domain, estimates the pitch period of noisy speech signal;It specifically includes:
(1) interpolation or extraction are carried out to the digital speech of input, its sample frequency is fixed as 8000Hz;
(2) low-pass filtering is carried out to the standard digital voice after interpolation or extraction, only retains 1000Hz frequency content below, and
Adding window, framing obtain frame signal;
(3) Fast Fourier Transform (FFT) is carried out to every frame voice signal, obtains the amplitude spectrum of every frame signal;
(4) Mel filtering is carried out to the amplitude spectrum of every frame signal, takes logarithm, and according to the signal-to-noise ratio computation subband of each Mel subband
Weighting coefficient;
(5) logarithm is taken to the amplitude spectrum of every frame signal, obtains logarithmic spectrum, and carry out subband weighting, reduce additive noise to fundamental tone
The influence of phase estimate;
(6) discrete cosine transform is carried out to the logarithmic spectrum after subband weighting, obtains the cepstrum parameter of voice signal;
(7) peak detection is carried out to the cepstrum parameter of voice signal, smothing filtering obtains the pitch period estimation of input voice
Value.
3. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted according to claim 2, special
Sign is: the weighting coefficient α (i) of each Mel subband is calculated by the following formula:
Wherein, SNR (i) is the signal-to-noise ratio of i-th of Mel subband;SNRmaxAnd SNRminRespectively indicate this section of voice subband signal-to-noise ratio
Maximum value and minimum value;α (i) indicates the weighting coefficient of i-th of Mel subband;Subband Signal to Noise Ratio (SNR) (i) passes through noisy speech
In the energy and estimation of noise energy of the Mel subband, wherein noise energy is estimated at mute section.
4. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted according to claim 2, special
Sign is: the sample frequency for inputting digital speech being fixed as 8000Hz, if the original sampling frequency of input voice is higher than
8000Hz is then extracted to 8000Hz;If the original sampling frequency for inputting voice is lower than 8000Hz, will be inserted in it
8000Hz;Low-pass filtering is carried out to the standard digital voice after interpolation or extraction, only retains 1000Hz frequency content below;So
Afterwards, to filtered digital speech adding window, framing obtains frame signal;Length of window is 256, and it is 128 that frame, which moves,.
5. a kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted according to claim 2, special
Sign is: taking logarithm to the amplitude spectrum of every frame signal first, obtains logarithmic spectrum;Then with the weighting coefficient α (i) estimated to i-th
All noisy speech logarithmic spectrum x (k) on a Mel subband carry out subband weighting:
Wherein,It is the logarithmic spectrum after subband weighting, the i.e. estimated value of clean speech logarithmic spectrum.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811035434.XA CN109346106B (en) | 2018-09-06 | 2018-09-06 | Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811035434.XA CN109346106B (en) | 2018-09-06 | 2018-09-06 | Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109346106A true CN109346106A (en) | 2019-02-15 |
CN109346106B CN109346106B (en) | 2022-12-06 |
Family
ID=65292452
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811035434.XA Active CN109346106B (en) | 2018-09-06 | 2018-09-06 | Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109346106B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1397929A (en) * | 2002-07-12 | 2003-02-19 | 清华大学 | Speech intensifying-characteristic weighing-logrithmic spectrum addition method for anti-noise speech recognization |
CN1431650A (en) * | 2003-02-21 | 2003-07-23 | 清华大学 | Antinoise voice recognition method based on weighted local energy |
CN1702736A (en) * | 2001-08-31 | 2005-11-30 | 株式会社建伍 | Apparatus and method for generating pitch waveform signal and apparatus and method for compressing/decomprising and synthesizing speech signal using the same |
CN103021405A (en) * | 2012-12-05 | 2013-04-03 | 渤海大学 | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter |
CN105448297A (en) * | 2014-08-28 | 2016-03-30 | 中国移动通信集团公司 | Method and device for acquiring pitch period |
CN106373559A (en) * | 2016-09-08 | 2017-02-01 | 河海大学 | Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting |
-
2018
- 2018-09-06 CN CN201811035434.XA patent/CN109346106B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1702736A (en) * | 2001-08-31 | 2005-11-30 | 株式会社建伍 | Apparatus and method for generating pitch waveform signal and apparatus and method for compressing/decomprising and synthesizing speech signal using the same |
CN1397929A (en) * | 2002-07-12 | 2003-02-19 | 清华大学 | Speech intensifying-characteristic weighing-logrithmic spectrum addition method for anti-noise speech recognization |
CN1431650A (en) * | 2003-02-21 | 2003-07-23 | 清华大学 | Antinoise voice recognition method based on weighted local energy |
CN103021405A (en) * | 2012-12-05 | 2013-04-03 | 渤海大学 | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter |
CN105448297A (en) * | 2014-08-28 | 2016-03-30 | 中国移动通信集团公司 | Method and device for acquiring pitch period |
CN106373559A (en) * | 2016-09-08 | 2017-02-01 | 河海大学 | Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting |
Non-Patent Citations (2)
Title |
---|
李娟: "几种基音周期算法性能比较", 《运城学院学报》 * |
王蕾: "噪声环境下话者识别系统的特征提取", 《电脑知识与技术》 * |
Also Published As
Publication number | Publication date |
---|---|
CN109346106B (en) | 2022-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3591068B2 (en) | Noise reduction method for audio signal | |
US6587816B1 (en) | Fast frequency-domain pitch estimation | |
Gu et al. | Perceptual harmonic cepstral coefficients for speech recognition in noisy environment | |
CN103594094B (en) | Adaptive spectra subtraction real-time voice strengthens | |
CA2310491A1 (en) | Noise suppression for low bitrate speech coder | |
WO2002029782A1 (en) | Perceptual harmonic cepstral coefficients as the front-end for speech recognition | |
WO2010091554A1 (en) | Method and device for pitch period detection | |
JP2005535920A (en) | Distributed speech recognition and method with back-end speech detection device | |
CN103474074B (en) | Pitch estimation method and apparatus | |
KR101892733B1 (en) | Voice recognition apparatus based on cepstrum feature vector and method thereof | |
KR101762723B1 (en) | Method and apparatus for detecting correctness of pitch period | |
Hanilçi et al. | Comparing spectrum estimators in speaker verification under additive noise degradation | |
CN108922514A (en) | A kind of robust features extracting method based on low frequency logarithmic spectrum | |
CN1412742A (en) | Speech signal base voice period detection method based on wave form correlation method | |
Nelke | Wind noise reduction: signal processing concepts | |
CN112233657A (en) | Speech enhancement method based on low-frequency syllable recognition | |
CN110379438B (en) | Method and system for detecting and extracting fundamental frequency of voice signal | |
Sun et al. | An adaptive speech endpoint detection method in low SNR environments | |
CN109346106A (en) | A kind of cepstrum domain pitch period estimation method based on subband noise Ratio Weighted | |
Shannon et al. | MFCC computation from magnitude spectrum of higher lag autocorrelation coefficients for robust speech recognition. | |
Shahnaz et al. | Robust pitch estimation at very low SNR exploiting time and frequency domain cues | |
CN113744725A (en) | Training method of voice endpoint detection model and voice noise reduction method | |
Bai et al. | Two-pass quantile based noise spectrum estimation | |
Li et al. | Robust speech endpoint detection based on improved adaptive band-partitioning spectral entropy | |
Zhiyan et al. | Research on speech endpoint detection under low signal-to-noise ratios |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |