CN104183233A - Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds - Google Patents

Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds Download PDF

Info

Publication number
CN104183233A
CN104183233A CN201410457379.9A CN201410457379A CN104183233A CN 104183233 A CN104183233 A CN 104183233A CN 201410457379 A CN201410457379 A CN 201410457379A CN 104183233 A CN104183233 A CN 104183233A
Authority
CN
China
Prior art keywords
frame
amplitude
frequency
voice
harmonic wave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410457379.9A
Other languages
Chinese (zh)
Inventor
华侃如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410457379.9A priority Critical patent/CN104183233A/en
Publication of CN104183233A publication Critical patent/CN104183233A/en
Pending legal-status Critical Current

Links

Landscapes

  • Electrophonic Musical Instruments (AREA)

Abstract

When a sine model is used for processing speech sound periodic components of joint parts of consonants and vowels, a gradient descent algorithm is used for optimizing sine model parameters obtained by adopting modes of short-time Fourier transformation and the like, so that the sine model fits speech sounds of the joint parts of the consonants and the vowels more accurately, the quality of extracting the speech sound periodic components is higher, and finally the natureness degree of fitting of the joint parts of the consonants and the vowels is effectively improved in the speech sound synthesizing process.

Description

Improve the auxiliary unit of voice and be connected the method that part periodic component extracts quality
Technical field
The present invention's technology belongs to field of voice signal, the technical field of the phonetic synthesis that the sinusoidal model of particularly take is technical foundation.
Background technology
One of common method of at present voice being carried out to modeling is sinusoidal model.Sinusoidal model theory thinks that all waveforms can represent with the stack of several sine waves, thereby all waveforms are expressed as to unified functional expression.Conventionally use in actual use cosine function to represent:
Wherein:
be i frame, the cosine wave (CW) amplitude when time is n;
be i frame, the instantaneous phase when time is n, can be expressed as , wherein for instantaneous frequency, fs is sample frequency, for initial phase;
N is sinusoidal wave number.
What in phonetic synthesis, conventionally adopt is harmonic sinusoidal modal, .
Use sinusoidal model to carry out pronunciation modeling, model parameter is more accurate, and the degree of fitting of model and raw tone is just higher.The most popular method that is used for obtaining sinusoidal model parameter is short time discrete Fourier transform, but because short time discrete Fourier transform is intrinsic, and when carrying out voice signal windowing and analyzing, window is long shorter, and the analysis of voice signal frequency is got over to out of true; Window is longer, more accurate to the analysis of voice signal frequency.Therefore, short time discrete Fourier transform time precision and frequency accuracy can not get both.Short time discrete Fourier transform when the speech processes, is applicable to frequency and the inviolent voice signal of changes in amplitude to process more, while processing for frequency or the comparatively violent voice signal of changes in amplitude, does not often reach higher precision.
The auxiliary first joining place voice cycle of voice becomes the changes in amplitude of sub-signal more violent, while using short time discrete Fourier transform to carry out voice cycle constituent analysis, and often can not be by voice cycle composition Accurate Curve-fitting.
Summary of the invention
The present invention seeks to: realize the accurate extraction of the auxiliary first joining place voice cycle composition of voice.The present invention uses gradient descent algorithm, to what adopt that short time discrete Fourier transform or additive method obtain, not that point-device speech sinusoidal model parameter is carried out iteration correction, to obtain more high-precision sinusoidal model parameter, thereby make model and the better matching of raw tone.
The present invention's step adopting that achieves the goal comprises:
Step 1, input speech signal, analyze sounding starting point and fundamental curve;
Step 2, to input voice signal carry out the analysis of harmonic wave plus noise: whole segment signal is carried out to short time discrete Fourier transform (STFT), with reference to fundamental frequency in short-term frequently width spectrum carry out peak value detection, thereby each humorous wave frequency and amplitude on this analysis frame according to a preliminary estimate, and from phase spectrum, obtain the phase place of each harmonic wave;
Step 3, the beginning from sounding starting point to voice signal, carry out conversed analysis frame by frame: by harmonic amplitude and phase place on each analysis frame of gradient descent algorithm accurate Calculation;
Step 4, the amplitude and the phase place that according to accurate Calculation in step 3, obtain, adjust each humorous wave frequency.
 
Purposes: in harmonic wave plus noise model, while rejecting acquisition noise (the gentle sound of voiceless consonant) by voice cycle composition, the vowel that obviously reduces the gentle sound of voiceless consonant is residual, and the aural signature that overcomes the gentle sound of voiceless consonant cannot reduce preferably, near the larger problem of frequency analysis error of sounding starting point.
Accompanying drawing explanation
Fig. 1 the present invention is based on the algorithm block diagram that sinusoidal model accurately extracts the auxiliary first joining place periodic component of voice.
Fig. 2 is that the present invention uses gradient descent algorithm to optimize the harmonic amplitude of specific i analysis frame, the algorithm block diagram of phase place.
Fig. 3 is the input voice time domain waveform example of certain analysis frame.
Fig. 4 is the original state of gradient descent algorithm matching speech waveform.Wherein red line is target waveform, and green line is periodic component waveform to be optimized.
Fig. 5 to Fig. 7 is respectively periodic component waveform (green) that gradient descent algorithm generates after 15,50,100,150 iteration and the contrast of target waveform (red).
Embodiment
Algorithm overall procedure used herein as shown in Figure 1.
One, input speech signal , divide frame, add rectangular window, produce sound bite as shown in Figure 3
。The integer power that the long N of window is 2, window gap length for being less than 2 the integer power of N.When N meets following condition, best results:
Wherein fs is sampling rate, unit hertz.
Two, use YIN algorithm to carry out Analysis of Fundamental Frequencies to the signal of each frame, concrete steps are as follows:
(1) i frame is calculated to similarity function:
(2) calculate the average similarity function of normalization accumulative total:
(3) find n, satisfy condition: 1. for local minimum 2. 3. n is the minimum value that meets above-mentioned condition.Threshold value T span is best between 0.1 to 0.3.If meet the n of above-mentioned condition, do not exist, find and meet n for global minimum;
(4) according to n and sampling rate, calculate fundamental frequency in short-term: .
The fundamental frequency in short-term of each frame that YIN algorithm is obtained, is used hereinafter represent.For the unvoiced frames of no periodic composition, YIN algorithm will produce wrong fundamental frequency estimation, but does not affect the execution of the method for the invention.
Three, basis judgement sounding initial time roughly , be the minimum value satisfying condition: at j, be to set up for 1,2,3,4 o'clock, threshold value span is 10 to 40 hertz.
Four, right jia Hanning (Hanning) window, and carry out short time discrete Fourier transform (STFT), obtain amplitude spectrum and phase frequency spectrum .
Five, from start, successively increase progressively i, carry out the analysis of forward direction sinusoidal model periodic component, use following algorithm to find harmonic wave:
(1) set harmonic wave counting h=1
(2) exist following frequency range in, the corresponding frequency of maximizing amplitude and maximal value, phase place.Frequency assignment arrives ; After amplitude normalization, assignment arrives ; Phase place assignment arrives .
(3) h is added to 1, repeating step 2,3, until surpass . generally be greater than 5000 hertz.
Six, from start, the i that successively successively decreases, carries out the analysis of reverse sinusoidal model periodic component:
(1) set harmonic wave counting h=1
(2), with (2) step in step 5, find , , .
(3) if meet , or , set , , .In this step span is best at 30 to 50 hertz.
(4) h is added to 1, repeating step 2,3, until surpass . generally be greater than 5000 hertz.
Seven,, as shown in Fig. 2 flow process, use gradient descent algorithm pair front each frame with be optimized, iteration is carried out following steps:
(1) reduction cycle composition and calculate the difference of two squares ,
Wherein for harmonic wave number.
(2) right calculate respectively time, partial derivative ,
(3) right calculate respectively time, partial derivative ,
(3), according to the partial derivative of obtaining in step (2), (3), upgrade with ,
When get 0.2 to 0.5, iterations is 100 o'clock, and Gradient Descent matching can obtain better effects.State after matching original state, the 15th, 50,100 iteration respectively as shown in FIG. 4,5,6, 7.Wherein red line representative waveform; Green line representative waveform.
Eight, according to after optimizing right correct output , , with :
(1) calculate the window moving interval time
(2) calculate phase place and change,
(3) cycle estimator quantity n,
(4) recalculate ,
(5) correct ,

Claims (4)

1. based on a sinusoidal model, improve the auxiliary unit of voice and be connected the method that part periodic component extracts quality, it is characterized in that comprising the following steps:
Step 1, input speech signal, analyze sounding starting point and fundamental curve;
Step 2, to input voice signal carry out the analysis of harmonic wave plus noise: whole segment signal is carried out to short time discrete Fourier transform (STFT), with reference to fundamental frequency in short-term frequently width spectrum carry out peak value detection, thereby each humorous wave frequency and amplitude on this analysis frame according to a preliminary estimate, and from phase spectrum, obtain the phase place of each harmonic wave;
Step 3, the beginning from sounding starting point to voice signal, carry out conversed analysis frame by frame: by harmonic amplitude and phase place on each analysis frame of gradient descent algorithm accurate Calculation;
Step 4, the amplitude and the phase place that according to accurate Calculation in step 3, obtain, adjust each humorous wave frequency.
2. a kind of auxiliary unit of voice of improving according to claim 1 is connected the method that part periodic component extracts quality, step 2 is characterised in that: from sounding starting point, while oppositely obtaining harmonic frequency, amplitude from frequency spectrum, the amplitude of the unsettled harmonic wave of frequency in adjacent two frames is set as to 0:
If meet , or , set , , ;
span is at 30 to 50 hertz;
Wherein, be the fundamental frequency of i frame, h humorous wave frequency of i frame, be the amplitude of h harmonic wave of i frame, the phase place of h harmonic wave of i frame, it is the phase spectrum of i frame.
3. a kind of auxiliary unit of voice of improving according to claim 1 is connected the method that part periodic component extracts quality, step 3 is characterised in that: while using the harmonic amplitude phase place of each frame of gradient descent algorithm accurate Calculation, use following formula to upgrade the amplitude-phase of each harmonic wave:
Wherein, for harmonic wave number, for analysis window in step 2 long, for sample frequency, be the speech signal segment before the windowing of i frame, the speech signal segment of attaching most importance to and generating;
? span is 0.2 to 0.5.
4. according to one kind of claim 1, improve the auxiliary unit of voice and be connected the method that part periodic component extracts quality, based on claim 3, the feature of step 3 is also: according to after optimizing in step 3 right correct, formula and computation sequence are as follows:
(1)
(2)
(3)
(4)
(5)
Wherein, for window moving interval hits in step 2.
CN201410457379.9A 2014-09-10 2014-09-10 Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds Pending CN104183233A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410457379.9A CN104183233A (en) 2014-09-10 2014-09-10 Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410457379.9A CN104183233A (en) 2014-09-10 2014-09-10 Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds

Publications (1)

Publication Number Publication Date
CN104183233A true CN104183233A (en) 2014-12-03

Family

ID=51964224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410457379.9A Pending CN104183233A (en) 2014-09-10 2014-09-10 Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds

Country Status (1)

Country Link
CN (1) CN104183233A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679331A (en) * 2015-12-30 2016-06-15 广东工业大学 Sound-breath signal separating and synthesizing method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679331A (en) * 2015-12-30 2016-06-15 广东工业大学 Sound-breath signal separating and synthesizing method and system

Similar Documents

Publication Publication Date Title
Shrawankar et al. Techniques for feature extraction in speech recognition system: A comparative study
CN104392718B (en) A kind of robust speech recognition methods based on acoustic model array
Rakesh et al. Gender Recognition using speech processing techniques in LABVIEW
CN102543073B (en) Shanghai dialect phonetic recognition information processing method
US20210193149A1 (en) Method, apparatus and device for voiceprint recognition, and medium
Story et al. Formant measurement in children’s speech based on spectral filtering
CN102201240B (en) Harmonic noise excitation model vocoder based on inverse filtering
JP2009042716A (en) Cyclic signal processing method, cyclic signal conversion method, cyclic signal processing apparatus, and cyclic signal analysis method
JP2012515939A (en) Apparatus, method and computer program for obtaining parameters describing changes in signal characteristics of signals
CN103474074B (en) Pitch estimation method and apparatus
CN103117067A (en) Voice endpoint detection method under low signal-to-noise ratio
Mittal et al. Study of characteristics of aperiodicity in Noh voices
CN102592589B (en) Speech scoring method and device implemented through dynamically normalizing digital characteristics
Cabral et al. Glottal spectral separation for parametric speech synthesis
AU2020227065B2 (en) Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
CN108108357A (en) Accent conversion method and device, electronic equipment
Shue et al. A new voice source model based on high-speed imaging and its application to voice source estimation
Kawahara et al. Higher order waveform symmetry measure and its application to periodicity detectors for speech and singing with fine temporal resolution
CN104183233A (en) Method for improving periodic component extraction quality of joint parts of consonants and vowels of speech sounds
US20110131039A1 (en) Complex acoustic resonance speech analysis system
JP5325130B2 (en) LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program
Upadhya Pitch detection in time and frequency domain
Wang et al. Improve gan-based neural vocoder using pointwise relativistic leastsquare gan
Li et al. Study on Simultaneous Estimation of Glottal Source and Vocal Tract Parameters by ARMAX-LF Model for Speech Analysis/Synthesis
CN110210348B (en) New frequency estimation algorithm based on different time and different frequency

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20141203