CN109346106B - Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting - Google Patents
Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting Download PDFInfo
- Publication number
- CN109346106B CN109346106B CN201811035434.XA CN201811035434A CN109346106B CN 109346106 B CN109346106 B CN 109346106B CN 201811035434 A CN201811035434 A CN 201811035434A CN 109346106 B CN109346106 B CN 109346106B
- Authority
- CN
- China
- Prior art keywords
- signal
- sub
- weighting
- subband
- noise ratio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000001228 spectrum Methods 0.000 claims abstract description 36
- 238000001514 detection method Methods 0.000 claims abstract description 7
- 238000001914 filtration Methods 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 9
- 238000009432 framing Methods 0.000 claims description 4
- 239000000654 additive Substances 0.000 claims description 2
- 230000000996 additive effect Effects 0.000 claims description 2
- 230000037433 frameshift Effects 0.000 claims description 2
- 230000007613 environmental effect Effects 0.000 abstract description 4
- 230000005284 excitation Effects 0.000 description 8
- 230000001755 vocal effect Effects 0.000 description 4
- 210000001260 vocal cord Anatomy 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000004704 glottis Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting, which is characterized in that a sub-band weighting coefficient is calculated by utilizing Mel spectrums of noisy speech, sub-band weighting is carried out on all logarithmic spectrums of the noisy speech on each Mel sub-band in a logarithmic spectrum domain, peak value detection is carried out in the cepstrum domain, and the pitch period of the noisy speech signal is estimated. The technical scheme of the invention can simultaneously inhibit the environmental noise and the sound channel formants to obtain more accurate pitch period estimated value, and is particularly suitable for pitch estimation in low signal-to-noise ratio environment.
Description
Technical Field
The invention belongs to the technical field of voice processing, and particularly relates to a pitch period estimation method for carrying out sub-band signal-to-noise ratio weighting on a voice signal containing noise in a log-spectrum domain and carrying out peak detection in a cepstrum domain.
Background
When a person is voiced, the airflow passes through the glottis to vibrate the vocal cords, so that quasi-periodic pulse airflow is generated to excite the vocal cords to generate sound. The frequency of this vocal cord vibration is called the fundamental frequency, and the reciprocal of the fundamental frequency is called the pitch period. The pitch period is one of the important parameters of a speech signal, describes an important feature of an excitation source, and has wide application in multiple fields of speaker recognition, speech synthesis, speech coding and the like.
Because the glottal excitation signal of speech is only quasi-periodic and the channel formants affect the harmonic structure of the excitation signal, it is difficult to accurately extract the pitch period from the speech signal. Common pitch period methods include autocorrelation, mean amplitude difference function, parallel processing, and cepstrum. These methods have a good effect on clean speech or noisy speech with a high signal-to-noise ratio. However, during the transmission of speech, it is inevitable to be interfered by the environmental noise, which may make the extracted pitch period in the noise environment far from the actual value.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a method for estimating the pitch period of a cepstrum domain based on the weighting of the signal-to-noise ratio of a sub-band, which considers the influence of environmental noise and a vocal tract formant on an excitation signal simultaneously in the pitch period estimation under the noise environment and increases the robustness of a pitch estimation algorithm.
The technical scheme is as follows: a cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting utilizes Mel spectrum of noisy speech to calculate sub-band weighting coefficient, carries out sub-band weighting on characteristic parameters of noisy speech in log spectrum domain, carries out peak value detection in cepstrum domain, and estimates the pitch period of noisy speech signal.
The method comprises the following specific steps:
(1) Interpolating or extracting the input digital voice, and fixing the sampling frequency of the digital voice to 8000Hz;
(2) Performing low-pass filtering on the interpolated or extracted standard digital voice, only reserving frequency components below 1000Hz, windowing, and framing to obtain a frame signal;
(3) Performing Fast Fourier Transform (FFT) on each frame of voice signal to obtain a magnitude spectrum of each frame of signal;
(4) Mel filtering is carried out on the amplitude spectrum of each frame of signal, logarithm is taken, and a sub-band weighting coefficient is calculated according to the signal-to-noise ratio of each Mel sub-band;
(5) Taking logarithm of the magnitude spectrum of each frame signal to obtain a logarithmic spectrum, and carrying out sub-band weighting on the logarithmic spectrum to reduce the influence of additive noise on pitch period estimation;
(6) Carrying out Discrete Cosine Transform (DCT) on the logarithmic spectrum after the subband weighting to obtain cepstrum parameters of the voice signal;
(7) And carrying out peak value detection on the cepstrum parameters of the voice signals, and carrying out smooth filtering to obtain a pitch period estimated value of the input voice.
By adopting the technical scheme, the invention has the following beneficial effects:
the technical scheme of the invention can simultaneously inhibit the environmental noise and the sound channel formants to obtain more accurate pitch period estimated value, and is particularly suitable for pitch estimation in the environment with low signal-to-noise ratio.
Drawings
Fig. 1 is a general block diagram of a subband snr weighting-based cepstral pitch period estimation method according to an embodiment of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
As shown in FIG. 1, the method for estimating the pitch period in the cepstral domain based on the weighting of the sub-band SNR mainly comprises the steps of interpolation and extraction, preprocessing, FFT, mel filtering, sub-band weighting, DCT and pitch estimation.
1. Interpolation and decimation
For the convenience of back-end processing, the sampling frequency of the input voice needs to be fixed to 8000Hz. If the original sampling frequency of the input voice is higher than 8000Hz, extracting the input voice to 8000Hz; if the original sampling frequency of the input speech is below 8000Hz, it is interpolated to 8000Hz.
2. Pretreatment of
Because the energy of the voice is mainly concentrated in a low-frequency area, and the energy of a high-frequency area is small and is easily influenced by noise, the low-pass filtering is firstly carried out on the standard digital voice after interpolation or extraction in the preprocessing, and only frequency components below 1000Hz are reserved; and then windowing and framing the filtered digital voice to obtain a frame signal. The window length is 256 and the frame shift is 128.
3、FFT
And performing Fast Fourier Transform (FFT) on each frame of voice signal, and performing modulus operation on the complex frequency spectrum of the voice signal after the FFT to obtain the magnitude spectrum of each frame of signal.
4. Mel filtering
Firstly, performing Mel filtering on the amplitude spectrum of each frame of signal to obtain a Mel spectrum; then, taking logarithm of the Mel spectrum to obtain a logarithm spectrum of the voice signal; finally, the weighting coefficient for each Mel subband is calculated according to the following formula:
where SNR (i) is the signal-to-noise ratio of the ith Mel subband; SNR max And SNR min Respectively represent the phrasesMaximum and minimum values of the tone sub-band signal-to-noise ratio; α (i) represents a weighting coefficient of the ith Mel subband. The subband signal-to-noise ratio SNR (i) is estimated from the energy of noisy speech in the Mel subband and the noise energy, which is estimated in the silence period.
5. Subband weighting
Firstly, taking logarithm of the magnitude spectrum of each frame signal to obtain a logarithm spectrum; then all noisy speech log spectra x (k) on the ith Mel subband are subband weighted with the estimated weighting coefficients α (i):
wherein,is the logarithmic spectrum after the weighting of the sub-band, namely the estimation value of the clean speech logarithmic spectrum.
6、DCT
Weighted log spectrum of subbandsPerforming Discrete Cosine Transform (DCT) to obtain cepstrum parameters of the speech signal
7. Pitch estimation
The clean speech s (n) can be regarded as resulting from the filtering of the glottal excitation signal e (n) by the vocal tract response v (n), i.e.
s(n)=e(n)*v(n) (3)
Wherein the symbol "+" represents convolution.
After the pure speech s (n) is subjected to FFT, logarithm taking, sub-band weighting and DCT, the excitation signal and the vocal tract response are separated in a cepstrum domain:
wherein,andcepstral parameters representing the excitation signal and the vocal tract response, respectively.
In the cepstral domain, the excitation signalHaving a pulse characteristic, i.e.Only in discrete periods N of the fundamental tone p The integer multiples of the number of the positive electrode have non-zero values; at the argument n of the other sequences,all equal to 0. Human pitch period T p Is in the range between about 2ms and 20ms and the sampling frequency of the system is 8000Hz, so N p May range between about 16 and 160. And the sound channel is responded toUsually has a fast decay characteristic in the region-16, 16]The values outside are already small and can be assumed to be 0. Thus, in pitch estimation, only detection [16, 160 ] is required]Obtaining the estimated value of the pitch period by the peak value of the cepstrum parameter:
Claims (4)
1. A cepstrum domain pitch period estimation method based on subband signal-to-noise ratio weighting is characterized in that: calculating a sub-band weighting coefficient by using the Mel spectrum of the noisy speech, carrying out sub-band weighting on the characteristic parameters of the noisy speech in a logarithmic spectrum domain, carrying out peak value detection in a cepstrum domain, and estimating the pitch period of the noisy speech signal;
the method specifically comprises the following steps:
(1) Interpolating or extracting the input digital voice, and fixing the sampling frequency of the digital voice to 8000Hz;
(2) Performing low-pass filtering on the interpolated or extracted standard digital voice, only reserving frequency components below 1000Hz, windowing, and framing to obtain a frame signal;
(3) Performing fast Fourier transform on each frame of voice signal to obtain an amplitude spectrum of each frame of signal;
(4) Mel filtering is carried out on the amplitude spectrum of each frame of signal, logarithm is taken, and a sub-band weighting coefficient is calculated according to the signal-to-noise ratio of each Mel sub-band;
(5) Taking logarithm of the magnitude spectrum of each frame signal to obtain a logarithmic spectrum, and carrying out sub-band weighting to reduce the influence of additive noise on pitch period estimation;
(6) Carrying out discrete cosine transform on the logarithmic spectrum after the subband weighting to obtain cepstrum parameters of the voice signal;
(7) And carrying out peak value detection and smooth filtering on the cepstrum parameters of the voice signals to obtain a pitch period estimated value of the input voice.
2. The method of claim 1, wherein the subband signal-to-noise ratio weighting based cepstral pitch lag estimation method comprises: in each frame signal, the weighting coefficient α (i) of each Mel subband is calculated by the following formula:
where SNR (i) is the signal-to-noise ratio of the ith Mel subband; SNR max And SNR min Respectively representing the maximum value and the minimum value of the signal-to-noise ratio of the frame voice sub-band; α (i) represents a weighting coefficient of the ith Mel subband; the subband signal-to-noise ratio SNR (i) is estimated from the energy of the noisy speech in the Mel subband and the noise energy, which is estimated in the silence period.
3. The method of claim 1, wherein the subband signal-to-noise ratio weighting based cepstral pitch lag estimation method comprises: fixing the sampling frequency of the input digital voice to 8000Hz, and extracting the input digital voice to 8000Hz if the original sampling frequency of the input digital voice is higher than 8000Hz; if the original sampling frequency of the input voice is lower than 8000Hz, interpolating the input voice to 8000Hz; performing low-pass filtering on the interpolated or extracted standard digital voice, and only reserving frequency components below 1000 Hz; then, windowing the filtered digital voice, and framing to obtain a frame signal; the window length is 256 and the frame shift is 128.
4. The method of claim 1, wherein the subband signal-to-noise ratio weighting based cepstral pitch lag estimation method comprises: firstly, taking logarithm of the magnitude spectrum of each frame signal to obtain a logarithm spectrum; then all noisy speech log spectra x (k) on the ith Mel subband are subband weighted with the estimated weighting coefficient α (i):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811035434.XA CN109346106B (en) | 2018-09-06 | 2018-09-06 | Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811035434.XA CN109346106B (en) | 2018-09-06 | 2018-09-06 | Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109346106A CN109346106A (en) | 2019-02-15 |
CN109346106B true CN109346106B (en) | 2022-12-06 |
Family
ID=65292452
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811035434.XA Active CN109346106B (en) | 2018-09-06 | 2018-09-06 | Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109346106B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1397929A (en) * | 2002-07-12 | 2003-02-19 | 清华大学 | Speech intensifying-characteristic weighing-logrithmic spectrum addition method for anti-noise speech recognization |
CN1431650A (en) * | 2003-02-21 | 2003-07-23 | 清华大学 | Antinoise voice recognition method based on weighted local energy |
CN1702736A (en) * | 2001-08-31 | 2005-11-30 | 株式会社建伍 | Apparatus and method for generating pitch waveform signal and apparatus and method for compressing/decomprising and synthesizing speech signal using the same |
CN103021405A (en) * | 2012-12-05 | 2013-04-03 | 渤海大学 | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter |
CN105448297A (en) * | 2014-08-28 | 2016-03-30 | 中国移动通信集团公司 | Method and device for acquiring pitch period |
CN106373559A (en) * | 2016-09-08 | 2017-02-01 | 河海大学 | Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting |
-
2018
- 2018-09-06 CN CN201811035434.XA patent/CN109346106B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1702736A (en) * | 2001-08-31 | 2005-11-30 | 株式会社建伍 | Apparatus and method for generating pitch waveform signal and apparatus and method for compressing/decomprising and synthesizing speech signal using the same |
CN1397929A (en) * | 2002-07-12 | 2003-02-19 | 清华大学 | Speech intensifying-characteristic weighing-logrithmic spectrum addition method for anti-noise speech recognization |
CN1431650A (en) * | 2003-02-21 | 2003-07-23 | 清华大学 | Antinoise voice recognition method based on weighted local energy |
CN103021405A (en) * | 2012-12-05 | 2013-04-03 | 渤海大学 | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter |
CN105448297A (en) * | 2014-08-28 | 2016-03-30 | 中国移动通信集团公司 | Method and device for acquiring pitch period |
CN106373559A (en) * | 2016-09-08 | 2017-02-01 | 河海大学 | Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting |
Non-Patent Citations (2)
Title |
---|
几种基音周期算法性能比较;李娟;《运城学院学报》;20100430;第28卷(第2期);第37页 * |
噪声环境下话者识别系统的特征提取;王蕾;《电脑知识与技术》;20080805(第22期);第784、785、824页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109346106A (en) | 2019-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103854662B (en) | Adaptive voice detection method based on multiple domain Combined estimator | |
JP4440937B2 (en) | Method and apparatus for improving speech in the presence of background noise | |
US8073689B2 (en) | Repetitive transient noise removal | |
US6122610A (en) | Noise suppression for low bitrate speech coder | |
CN109410977B (en) | Voice segment detection method based on MFCC similarity of EMD-Wavelet | |
EP2031583B1 (en) | Fast estimation of spectral noise power density for speech signal enhancement | |
KR101266894B1 (en) | Apparatus and method for processing an audio signal for speech emhancement using a feature extraxtion | |
US6453289B1 (en) | Method of noise reduction for speech codecs | |
KR101737824B1 (en) | Method and Apparatus for removing a noise signal from input signal in a noisy environment | |
CN103544961B (en) | Audio signal processing method and device | |
EP1386313B1 (en) | Speech enhancement device | |
CN111292758B (en) | Voice activity detection method and device and readable storage medium | |
Morales-Cordovilla et al. | Feature extraction based on pitch-synchronous averaging for robust speech recognition | |
Tan et al. | Noise-robust F0 estimation using SNR-weighted summary correlograms from multi-band comb filters | |
Amehraye et al. | Perceptual improvement of Wiener filtering | |
Nelke | Wind noise reduction: signal processing concepts | |
CN112233657B (en) | Speech enhancement method based on low-frequency syllable recognition | |
CN109346106B (en) | Cepstrum domain pitch period estimation method based on sub-band signal-to-noise ratio weighting | |
Elshamy et al. | Two-stage speech enhancement with manipulation of the cepstral excitation | |
Bai et al. | Two-pass quantile based noise spectrum estimation | |
Sunnydayal et al. | Speech enhancement using sub-band wiener filter with pitch synchronous analysis | |
Farahani et al. | Robust feature extraction of speech via noise reduction in autocorrelation domain | |
CN114822577B (en) | Method and device for estimating fundamental frequency of voice signal | |
Han et al. | Noise reduction for VoIP speech codecs using modified Wiener Filter | |
Kim et al. | Speech enhancement of noisy speech using log-spectral amplitude estimator and harmonic tunneling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |