CN104183233A - Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds - Google Patents
Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds Download PDFInfo
- Publication number
- CN104183233A CN104183233A CN201410457379.9A CN201410457379A CN104183233A CN 104183233 A CN104183233 A CN 104183233A CN 201410457379 A CN201410457379 A CN 201410457379A CN 104183233 A CN104183233 A CN 104183233A
- Authority
- CN
- China
- Prior art keywords
- frame
- amplitude
- frequency
- voice
- harmonic wave
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Electrophonic Musical Instruments (AREA)
Abstract
When a sine model is used for processing speech sound periodic components of joint parts of consonants and vowels, a gradient descent algorithm is used for optimizing sine model parameters obtained by adopting modes of short-time Fourier transformation and the like, so that the sine model fits speech sounds of the joint parts of the consonants and the vowels more accurately, the quality of extracting the speech sound periodic components is higher, and finally the natureness degree of fitting of the joint parts of the consonants and the vowels is effectively improved in the speech sound synthesizing process.
Description
Technical field
The present invention's technology belongs to field of voice signal, the technical field of the phonetic synthesis that the sinusoidal model of particularly take is technical foundation.
Background technology
One of common method of at present voice being carried out to modeling is sinusoidal model.Sinusoidal model theory thinks that all waveforms can represent with the stack of several sine waves, thereby all waveforms are expressed as to unified functional expression.Conventionally use in actual use cosine function to represent:
Wherein:
be i frame, the cosine wave (CW) amplitude when time is n;
be i frame, the instantaneous phase when time is n, can be expressed as
, wherein
for instantaneous frequency, fs is sample frequency,
for initial phase;
N is sinusoidal wave number.
What in phonetic synthesis, conventionally adopt is harmonic sinusoidal modal,
.
Use sinusoidal model to carry out pronunciation modeling, model parameter is more accurate, and the degree of fitting of model and raw tone is just higher.The most popular method that is used for obtaining sinusoidal model parameter is short time discrete Fourier transform, but because short time discrete Fourier transform is intrinsic, and when carrying out voice signal windowing and analyzing, window is long shorter, and the analysis of voice signal frequency is got over to out of true; Window is longer, more accurate to the analysis of voice signal frequency.Therefore, short time discrete Fourier transform time precision and frequency accuracy can not get both.Short time discrete Fourier transform when the speech processes, is applicable to frequency and the inviolent voice signal of changes in amplitude to process more, while processing for frequency or the comparatively violent voice signal of changes in amplitude, does not often reach higher precision.
The auxiliary first joining place voice cycle of voice becomes the changes in amplitude of sub-signal more violent, while using short time discrete Fourier transform to carry out voice cycle constituent analysis, and often can not be by voice cycle composition Accurate Curve-fitting.
Summary of the invention
The present invention seeks to: realize the accurate extraction of the auxiliary first joining place voice cycle composition of voice.The present invention uses gradient descent algorithm, to what adopt that short time discrete Fourier transform or additive method obtain, not that point-device speech sinusoidal model parameter is carried out iteration correction, to obtain more high-precision sinusoidal model parameter, thereby make model and the better matching of raw tone.
The present invention's step adopting that achieves the goal comprises:
Step 1, input speech signal, analyze sounding starting point and fundamental curve;
Step 2, to input voice signal carry out the analysis of harmonic wave plus noise: whole segment signal is carried out to short time discrete Fourier transform (STFT), with reference to fundamental frequency in short-term frequently width spectrum carry out peak value detection, thereby each humorous wave frequency and amplitude on this analysis frame according to a preliminary estimate, and from phase spectrum, obtain the phase place of each harmonic wave;
Step 3, the beginning from sounding starting point to voice signal, carry out conversed analysis frame by frame: by harmonic amplitude and phase place on each analysis frame of gradient descent algorithm accurate Calculation;
Step 4, the amplitude and the phase place that according to accurate Calculation in step 3, obtain, adjust each humorous wave frequency.
Purposes: in harmonic wave plus noise model, while rejecting acquisition noise (the gentle sound of voiceless consonant) by voice cycle composition, the vowel that obviously reduces the gentle sound of voiceless consonant is residual, and the aural signature that overcomes the gentle sound of voiceless consonant cannot reduce preferably, near the larger problem of frequency analysis error of sounding starting point.
Accompanying drawing explanation
Fig. 1 the present invention is based on the algorithm block diagram that sinusoidal model accurately extracts the auxiliary first joining place periodic component of voice.
Fig. 2 is that the present invention uses gradient descent algorithm to optimize the harmonic amplitude of specific i analysis frame, the algorithm block diagram of phase place.
Fig. 3 is the input voice time domain waveform example of certain analysis frame.
Fig. 4 is the original state of gradient descent algorithm matching speech waveform.Wherein red line is target waveform, and green line is periodic component waveform to be optimized.
Fig. 5 to Fig. 7 is respectively periodic component waveform (green) that gradient descent algorithm generates after 15,50,100,150 iteration and the contrast of target waveform (red).
Embodiment
Algorithm overall procedure used herein as shown in Figure 1.
One, input speech signal
, divide frame, add rectangular window, produce sound bite as shown in Figure 3
。The integer power that the long N of window is 2, window gap length
for being less than 2 the integer power of N.When N meets following condition, best results:
Wherein fs is sampling rate, unit hertz.
Two, use YIN algorithm to carry out Analysis of Fundamental Frequencies to the signal of each frame, concrete steps are as follows:
(1) i frame is calculated to similarity function:
(2) calculate the average similarity function of normalization accumulative total:
(3) find n, satisfy condition: 1.
for local minimum 2.
3. n is the minimum value that meets above-mentioned condition.Threshold value T span is best between 0.1 to 0.3.If meet the n of above-mentioned condition, do not exist, find and meet
n for global minimum;
(4) according to n and sampling rate, calculate fundamental frequency in short-term:
.
The fundamental frequency in short-term of each frame that YIN algorithm is obtained, is used hereinafter
represent.For the unvoiced frames of no periodic composition, YIN algorithm will produce wrong fundamental frequency estimation, but does not affect the execution of the method for the invention.
Three, basis
judgement sounding initial time roughly
, be the minimum value satisfying condition:
at j, be to set up for 1,2,3,4 o'clock, threshold value
span is 10 to 40 hertz.
Four, right
jia Hanning (Hanning) window, and carry out short time discrete Fourier transform (STFT), obtain amplitude spectrum
and phase frequency spectrum
.
Five, from
start, successively increase progressively i, carry out the analysis of forward direction sinusoidal model periodic component, use following algorithm to find harmonic wave:
(1) set harmonic wave counting h=1
(2) exist
following frequency range in, the corresponding frequency of maximizing amplitude and maximal value, phase place.Frequency assignment arrives
; After amplitude normalization, assignment arrives
; Phase place assignment arrives
.
(3) h is added to 1, repeating step 2,3, until
surpass
.
generally be greater than 5000 hertz.
Six, from
start, the i that successively successively decreases, carries out the analysis of reverse sinusoidal model periodic component:
(1) set harmonic wave counting h=1
(2), with (2) step in step 5, find
,
,
.
(3) if meet
, or
, set
,
,
.In this step
span is best at 30 to 50 hertz.
(4) h is added to 1, repeating step 2,3, until
surpass
.
generally be greater than 5000 hertz.
Seven,, as shown in Fig. 2 flow process, use gradient descent algorithm pair
front each frame
with
be optimized, iteration is carried out following steps:
(1) reduction cycle composition
and calculate the difference of two squares
,
Wherein
for harmonic wave number.
(2) right
calculate respectively
time,
partial derivative
,
。
(3) right
calculate respectively
time,
partial derivative
,
。
(3), according to the partial derivative of obtaining in step (2), (3), upgrade
with
,
When
get 0.2 to 0.5, iterations is 100 o'clock, and Gradient Descent matching can obtain better effects.State after matching original state, the 15th, 50,100 iteration respectively as shown in FIG. 4,5,6, 7.Wherein red line representative
waveform; Green line representative
waveform.
Eight, according to after optimizing
right
correct output
,
, with
:
(1) calculate the window moving interval time
(2) calculate phase place and change,
(3) cycle estimator quantity n,
(4) recalculate
,
(5) correct
,
Claims (4)
1. based on a sinusoidal model, improve the auxiliary unit of voice and be connected the method that part periodic component extracts quality, it is characterized in that comprising the following steps:
Step 1, input speech signal, analyze sounding starting point and fundamental curve;
Step 2, to input voice signal carry out the analysis of harmonic wave plus noise: whole segment signal is carried out to short time discrete Fourier transform (STFT), with reference to fundamental frequency in short-term frequently width spectrum carry out peak value detection, thereby each humorous wave frequency and amplitude on this analysis frame according to a preliminary estimate, and from phase spectrum, obtain the phase place of each harmonic wave;
Step 3, the beginning from sounding starting point to voice signal, carry out conversed analysis frame by frame: by harmonic amplitude and phase place on each analysis frame of gradient descent algorithm accurate Calculation;
Step 4, the amplitude and the phase place that according to accurate Calculation in step 3, obtain, adjust each humorous wave frequency.
2. a kind of auxiliary unit of voice of improving according to claim 1 is connected the method that part periodic component extracts quality, step 2 is characterised in that: from sounding starting point, while oppositely obtaining harmonic frequency, amplitude from frequency spectrum, the amplitude of the unsettled harmonic wave of frequency in adjacent two frames is set as to 0:
If meet
, or
, set
,
,
;
span is at 30 to 50 hertz;
Wherein,
be the fundamental frequency of i frame,
h humorous wave frequency of i frame,
be the amplitude of h harmonic wave of i frame,
the phase place of h harmonic wave of i frame,
it is the phase spectrum of i frame.
3. a kind of auxiliary unit of voice of improving according to claim 1 is connected the method that part periodic component extracts quality, step 3 is characterised in that: while using the harmonic amplitude phase place of each frame of gradient descent algorithm accurate Calculation, use following formula to upgrade the amplitude-phase of each harmonic wave:
Wherein,
for harmonic wave number,
for analysis window in step 2 long,
for sample frequency,
be the speech signal segment before the windowing of i frame,
the speech signal segment of attaching most importance to and generating;
? span is 0.2 to 0.5.
4. according to one kind of claim 1, improve the auxiliary unit of voice and be connected the method that part periodic component extracts quality, based on claim 3, the feature of step 3 is also: according to after optimizing in step 3
right
correct, formula and computation sequence are as follows:
(1)
(2)
(3)
(4)
(5)
Wherein,
for window moving interval hits in step 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410457379.9A CN104183233A (en) | 2014-09-10 | 2014-09-10 | Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410457379.9A CN104183233A (en) | 2014-09-10 | 2014-09-10 | Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104183233A true CN104183233A (en) | 2014-12-03 |
Family
ID=51964224
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410457379.9A Pending CN104183233A (en) | 2014-09-10 | 2014-09-10 | Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104183233A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105679331A (en) * | 2015-12-30 | 2016-06-15 | 广东工业大学 | Sound-breath signal separating and synthesizing method and system |
-
2014
- 2014-09-10 CN CN201410457379.9A patent/CN104183233A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105679331A (en) * | 2015-12-30 | 2016-06-15 | 广东工业大学 | Sound-breath signal separating and synthesizing method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shrawankar et al. | Techniques for feature extraction in speech recognition system: A comparative study | |
CN104392718B (en) | A kind of robust speech recognition methods based on acoustic model array | |
Rakesh et al. | Gender Recognition using speech processing techniques in LABVIEW | |
CN102543073B (en) | Shanghai dialect phonetic recognition information processing method | |
US20210193149A1 (en) | Method, apparatus and device for voiceprint recognition, and medium | |
Story et al. | Formant measurement in children’s speech based on spectral filtering | |
CN102201240B (en) | Harmonic noise excitation model vocoder based on inverse filtering | |
JP2009042716A (en) | Cyclic signal processing method, cyclic signal conversion method, cyclic signal processing apparatus, and cyclic signal analysis method | |
JP2012515939A (en) | Apparatus, method and computer program for obtaining parameters describing changes in signal characteristics of signals | |
CN103474074B (en) | Pitch estimation method and apparatus | |
CN103117067A (en) | Voice endpoint detection method under low signal-to-noise ratio | |
Mittal et al. | Study of characteristics of aperiodicity in Noh voices | |
CN102592589B (en) | Speech scoring method and device implemented through dynamically normalizing digital characteristics | |
Cabral et al. | Glottal spectral separation for parametric speech synthesis | |
AU2020227065B2 (en) | Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system | |
CN108108357A (en) | Accent conversion method and device, electronic equipment | |
Shue et al. | A new voice source model based on high-speed imaging and its application to voice source estimation | |
Kawahara et al. | Higher order waveform symmetry measure and its application to periodicity detectors for speech and singing with fine temporal resolution | |
CN104183233A (en) | Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds | |
US20110131039A1 (en) | Complex acoustic resonance speech analysis system | |
JP5325130B2 (en) | LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program | |
Upadhya | Pitch detection in time and frequency domain | |
Wang et al. | Improve gan-based neural vocoder using pointwise relativistic leastsquare gan | |
Li et al. | Study on Simultaneous Estimation of Glottal Source and Vocal Tract Parameters by ARMAX-LF Model for Speech Analysis/Synthesis | |
CN110210348B (en) | New frequency estimation algorithm based on different time and different frequency |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20141203 |