CN101000766B

CN101000766B - Chinese intonation base frequency contour generating method based on intonation model

Info

Publication number: CN101000766B
Application number: CN2007100716149A
Authority: CN
Inventors: 张鹏; 王丽红
Original assignee: Heilongjiang University
Current assignee: Heilongjiang University
Priority date: 2007-01-09
Filing date: 2007-01-09
Publication date: 2011-02-02
Anticipated expiration: 2027-01-09
Also published as: CN101000766A

Abstract

A method for generating intonation base frequency outline Chinese based on intonation model includes outputting phrase unit base frequency outline curve of inputted label phonetic code sequence through phrase control mechanism, generating and outputting syllable unit base frequency outline curve of inputted label phonetic code sequence through syllable control mechanism, carrying out logarithm superposition of obtained phrase unit base frequency outline and obtained syllable unit base frequency outline with minimum base frequency value F to generate and output base frequency outline curve of intonation.

Description

Chinese intonation pitch contour generation method based on the intonation model

(1) technical field

The present invention relates to the voice process technology field, be specifically related to a kind of Chinese intonation pitch contour generation method in the speech synthesis technique based on the intonation model

(2) background technology

At present, Chinese voice synthetic method adopts the time domain waveform splicing speech synthesis technique based on big corpus usually.In this method, the speech primitive of synthetic statement from a record in advance down, select the corpus of huge natural-sounding, system is according to certain rule or cost function or statistical method etc., and directly screening synthesis unit or fragment are spliced from corpus.Can imagine as long as this corpus is enough big, might splice any statement theoretically.Because synthetic speech primitive all comes from the original transcription of nature, or a syllable, or a kind of language fragments of random length, as multi-character words or prosodic phrase, therefore, the sharpness and the naturalness of synthetic back voice are all very high.This method has been avoided speech primitive is done rhythm adjustment, need not make the conversion process of time domain or frequency domain basically to signal.Yet the rhythm of Chinese is complicated and changeable, and intonation also is diversified, and therefore the synthetic speech that adopts said method to obtain can't satisfy people's requirement.Compare with natural-sounding, the sentence that these systems are synthetic and the voice naturalness and the intelligibility of chapter are relatively low, and " machine flavor " is denseer, and people sound that it not is very comfortable feeling.Its reason is: also do not obtain gratifying achievement so far on the rhythm control method of phonetic synthesis, thereby having restricted this technology comes into the market on a large scale, and being exactly the fundamental curve of intonation, major issue wherein can't adjust, or the intonation model can't reflect the intonation rule of Chinese, or the like.

(3) summary of the invention

The object of the present invention is to provide the intonation of the tone of a kind of phonetic feature, Chinese and characteristics and Chinese and pattern, further improve the Chinese intonation pitch contour generation method based on the intonation model of the synthetic naturalness of Chinese speech from Chinese.

The object of the present invention is achieved like this: it comprises the attainable step of following computing machine:

Input mark phonetic sign indicating number sequence;

The pitch contour of phrase unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be phrase information; If, then send into to the phrase control gear as phrase command, simultaneously according to this phrase prosodic information, index goes out corresponding phrase unit rhythm template from rhythm template base, generates and exports the pitch contour curve of phrase unit, and be retained in the buffer zone; Otherwise continue to search phrase information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output phrase unit;

The pitch contour of syllable unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be syllable information; If then order inputs to the syllable control gear as syllable, simultaneously according to this syllable rhythm information, index goes out corresponding syllable unit rhythm template from rhythm template base, generates and exports the pitch contour curve of syllable unit, and be retained in the buffer zone; Otherwise continue to search syllable information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output syllable unit;

Intonation pitch contour stack step, the pitch contour and the fundamental frequency minimum value F of the pitch contour of the phrase unit that the phrase control gear is sent, the syllable unit that the syllable control gear is sent _Min, according to Chinese intonation model, carry out logarithm stack by tagmeme, amplitude size and time length; Do not dispose if judge mark phonetic sign indicating number sequence, then return and continue to carry out; Otherwise, generate and export intonation pitch contour curve and give follow-up signal Processing step.

The present invention also has some technical characterictics like this:

1, described pitch contour curve, its mathematic(al) representation is as follows:

\ln F_{0} (t) = \ln F_{\min} + Σ_{i = 1}^{I} A_{pi} G_{pi} (t - T_{0 i}) + Σ_{j = 1}^{J} A_{aj} [G_{aj} (t - T_{1 j}) - G_{aj} (t - T_{2 j})]

G_{pi} = \{\begin{matrix} R_{i}^{2} texp (- R_{i} t), & t &GreaterEqual; 0 \\ 0, & t < 0 \end{matrix}

Or

G_{pi}^{(m)} (t) = G_{pi}^{(1)} (t), G_{pi}^{(2)} (t) \cdot \cdot \cdot G_{pi}^{(M)} (t), m = 1,2, \cdot \cdot \cdot M

G_{aj} = \{\begin{matrix} Min [1 - (1 + B_{j} t) \exp (- B_{j} t), θ_{j}], & t &GreaterEqual; 0 \\ 0, & t < 0 \end{matrix}

Or

G_{aj}^{(n)} (t) = G_{aj}^{(1)} (t), G_{aj}^{(2)} (t) \cdot \cdot \cdot G_{aj}^{(N)} (t), n = 1,2, \cdot \cdot \cdot N

Wherein:

F _Min: the fundamental frequency minimum value of sentence;

I: the number of phrase; R _j: i phrase attenuation coefficient, empirical value are 3/s; T _0i: i the time that the phrase control command takes place; A _Pi: the amplitude of i phrase control command; G _Pi ^(m): represent different phrase accent types;

J: the number of syllable or rhythm speech; A _Aj: the amplitude of j syllable control command; T _1j: j the time that the syllable control command begins; T _2j: j the time that the syllable control command finishes; B _j: the intrinsic angle value of j syllable control command under the syllable control gear, empirical value is 20/s; θ _j: the maximum permissible value of the syllable composition of j syllable control command, empirical value are 0.9; G _Aj ⁽ⁿ⁾: represent different syllable accent types;

2, described model parameter is generated automatically by computer program, the first step of algorithm be to determine phrase command parameter and fundamental frequency minimum value F _Min, then, by fundamental frequency minimum value F _MinGo out F accurately with the phrase parameter simulation ₀Curve after the parameter optimization of phrase unit is good, calculates the parameter of syllable unit again; Independent rhythm speech is from left to right handled, and local fundamental curve simulation all done in each rhythm speech.

Useful advantage of the present invention has:

(1) from the mark phonetic sign indicating number sequence of input text, obtains the prosodic information of phrase and syllable, generate the rhythm structure requirement that the intonation pitch contour meets natural-sounding;

(2) pitch contour of phrase unit and the pitch contour of syllable unit are handled the time tagmeme that can determine phrase unit, syllable unit exactly respectively;

(3) adopt phrase unit rhythm template and syllable unit rhythm template can simplify the generative process of the pitch contour of phrase unit, syllable unit.Simultaneously, can reflect better that the rhythm changes requirement complicated and changeable;

(4) phrase control gear and syllable control gear are regarded as the second order oscillatory system of decay, met the physiological property of people's vocal organs.

Chinese is different from other department of western languages, shows many aspects such as syntactic structure, syntax rule, acoustic characteristic, rhythm structure.At first, Chinese is one word for one tone, i.e. monosyllable; Secondly, Chinese is tone language, and tone has distinguishes the justice effect, and each word all has fixing tone (fundamental frequency shape).And can morph in the tone front and back between word and word influence each other, even lost original accent type, coarticulation phenomenon (change of tune phenomenon) promptly occurs.Simultaneously, also have of short duration pause in the middle of the pronunciation of continuous statement.Everyone has a basic frequency in a minute, is called fundamental frequency, and it has embodied speaker's tone height, and in addition, people also have difference of sound size or the like in a minute.In the literary composition of Chinese language conversion (TTS) system, prediction, analysis and the control of prosodic informations such as speech pitch, duration, amplitude is called rhythm control.

At this situation, the inventor is from the phonetic feature of Chinese, and the intonation and the pattern of the tone of Chinese and characteristics, Chinese are set out, and constructs the complete Chinese intonation pitch contour generation method based on the intonation model of a cover, has improved the naturalness of synthetic speech.Each step among the present invention and module, submodule all can be realized by computer program, and operability, transplantability are strong, applied widely.

(4) description of drawings

Fig. 1 is a Chinese intonation pitch contour generation model block diagram;

Fig. 2 is that Chinese intonation pitch contour generates block diagram;

Fig. 3 is Chinese intonation pitch contour product process figure;

Fig. 4 is the phrase fundamental curve of attenuation characteristic;

Fig. 5 is the syllable fundamental curve of characteristic of raising up;

Fig. 6 is the prosodic features control flow chart;

Fig. 7 is the computer hardware system block diagram of the embodiment of the invention.

(5) embodiment

The present invention is described in further detail below in conjunction with the drawings and specific embodiments:

In conjunction with Fig. 2, the present invention includes the attainable step of following computing machine:

Input mark phonetic sign indicating number sequence;

Wherein each step realizes by computer program.

Embodiment:

1, the structure of rhythm template base

The structure of rhythm template base adopts conventional method to get final product, and is identical with the method for generally setting up database, just carefully do not lift here.The present invention selects the minimum in the Chinese to listen the unit of distinguishing after taking all factors into consideration various factors---and syllable is as the primitive of phonetic synthesis, and a plurality of samples stored in a syllable in the sound bank, and the soft and stress tone and the fundamental curve of each sample also have nothing in common with each other.

2, the intonation model of Chinese

Pronunciation nature, a complete statement mainly show three aspects: the one, and the accent type of sentence is mainly reflected on the fundamental frequency of sentence, i.e. the pitch curve of sentence; The 2nd, prosodic phrase and the particular location of rhythm speech in sentence are because they have reflected the prosodic features attribute change of whole sentence; The 3rd, the stress of sentence and stall position, stress can highlight and emphasize the Semantic center of whole sentence, pauses to have reflected the rhythm rhythm of sentence; In aspect these three, the fundamental curve of sentence is particularly important, and it has reflected that the whole sentence rhythm changes the trend of notable attribute and whole sentence fundamental curve profile varying.

Can be the F of a sentence ₀The pitch contour curve is regarded the pitch contour curve of phrase unit, the pitch contour curve and the fundamental frequency minimum value F of syllable unit as _MinStack, pitch contour is represented with logarithmic coordinate.Wherein the pitch contour curve of phrase unit has reflected that the pitch contour of the sentence overall situation changes, and the pitch contour curve of syllable unit has reflected the local fundamental frequency profile variations of syllable or rhythm speech, and fundamental frequency minimum value F _MinRepresented the low-limit frequency that the human vocal band is vibrated sound.Phrase unit and syllable unit belong to phrase control gear and syllable control gear respectively, and two control gears are similar to the second order oscillatory system of decay.The input of phrase control gear is a phrase command, and output is the pitch contour of phrase unit; And the input of syllable control gear is the syllable order, and output is the pitch contour of syllable unit.Phrase command can be described with an impulse function, and the syllable order can be described with a step function.These functions are made up of two groups of different control commands and parameter respectively:

The ratio of damping of (1) timing of phrase command, amplitude and phrase control gear;

(2) ratio of damping of the syllable order moment, amplitude and the syllable control gear that begin and finish.

It is constant that these parameters must keep in the time period of a setting, and promptly the parameter of phrase unit is constant a prosodic phrase inside, and the parameter of syllable unit is constant in syllable or rhythm speech, fundamental frequency minimum value F _MinConstant in whole sentence.Chinese intonation pitch contour generation model block diagram, as shown in Figure 1.

Based on above-mentioned Chinese intonation pitch contour generation model, be phrase command and syllable order with two kinds of orders, as the input of sentence intonation model, and model is output as the pitch contour curve of sentence, and its mathematic(al) representation is as follows:

\ln F_{0} (t) = \ln F_{\min} + Σ_{i = 1}^{I} A_{pi} G_{pi} (t - T_{0 i}) + Σ_{j = 1}^{J} A_{aj} [G_{aj} (t - T_{1 j}) - G_{aj} (t - T_{2 j})] - - - (1)

G_{pi} = \{\begin{matrix} R_{i}^{2} texp (- R_{i} t), & t &GreaterEqual; 0 \\ 0, & t < 0 \end{matrix} - - - (2)

Or

G_{pi}^{(m)} (t) = G_{pi}^{(1)} (t), G_{pi}^{(2)} (t) \cdot \cdot \cdot G_{pi}^{(M)} (t), m = 1,2, \cdot \cdot \cdot M

(shape function is transferred in the phrase unit)

G_{aj} = \{\begin{matrix} Min [1 - (1 + B_{j} t) \exp (- B_{j} t), θ_{j}], & t &GreaterEqual; 0 \\ 0, & t < 0 \end{matrix} - - - (3)

Or

G_{aj}^{(n)} (t) = G_{aj}^{(1)} (t), G_{aj}^{(2)} (t) \cdot \cdot \cdot G_{aj}^{(N)} (t), n = 1,2, \cdot \cdot \cdot N

(syllable unit accent shape function)

Wherein:

F _Min: the fundamental frequency minimum value of sentence;

I: the number of phrase; R _i: i phrase attenuation coefficient, empirical value are 3/s; T _0i: i the time that the phrase control command takes place; A _Pi: the amplitude of i phrase control command; G _Pi ^(m): represent different phrase accent types.

J: the number of syllable or rhythm speech; A _Aj: the amplitude of j syllable control command; T _1j: j the time that the syllable control command begins; T _2j: j the time that the syllable control command finishes; B _j: the intrinsic angle value of j syllable control command under the syllable control gear, empirical value is 20/s; θ _j: the maximum permissible value of the syllable composition of j syllable control command, empirical value are 0.9; G _Aj ⁽ⁿ⁾: represent different syllable accent types.

The part 1 of formula (1) can be regarded as and makes vocal cords keep the fundamental frequency minimum value of vibration; Part 2 is represented the pitch contour of phrase unit; The 3rd part is represented the pitch contour of syllable unit; The three becomes logarithm superposition form.Here fundamental frequency minimum value F _MinBe that voice and accent type by sentence determined, through whole statement; Secondly the fundamental frequency change curve of stack phrase on it obtains the basic trend of the fundamental frequency centrode of a sentence; On the basis of this fundamental frequency centrode, continue to press the fundamental frequency change curve of tagmeme stack syllable or rhythm speech then.At last, the result of these three partial stacks is the fundamental frequency change curve of a complete sentence.

To the slow trend of falling of phrase, can be by regulating R _iSize change G _Pi(t) attenuation characteristic, and then reach the purpose of adjusting phrase fundamental frequency trend.R _iBe worth greatly more, then attenuation degree is big more, and it is serious more that the phrase fundamental curve has a down dip; Simultaneously, R _iSize also reflected the length of intonation phrase indirectly.Equally, can be to the slow trend that rises of syllable by regulating B _jSize realize B _jBe worth greatly more, it is obvious more that the fundamental curve of syllable raises up.Fig. 4 and Fig. 5 have listed the raise up fundamental curve of characteristic of the fundamental curve of phrase attenuation characteristic and syllable respectively.

For the concrete accent shape of each phrase, determine phrase accent shape according to the phrase information in the phonetic sign indicating number sequence of mark, its phrase transfers shape function can adopt " phrase unit rhythm template " directly to generate the pitch contour curve of phrase.

For the concrete accent shape of each syllable, determine syllable accent shape according to the syllable information in the phonetic sign indicating number sequence of mark, its syllable transfers shape function can adopt " syllable unit rhythm template " directly to generate the rhythm fundamental curve of syllable, also can utilize curvilinear equation

G_{aj}^{(n)} (t) = a_{0} + a_{1} t + a_{2} t^{2} + a_{3} t^{3} + a_{4} t^{4}

Generate the comparatively desirable syllable of fitting effect and transfer deltoid.Transfer length to determine by the timing starting point and the terminal point of the voiced segments of correspondence; Transferring the territory to use with transferring long corresponding staged transfers the territory amplitude to control.

3, the setting of model parameter

Model parameter is generated automatically by computer program.Based on overlapping principle, the first step of algorithm be to determine phrase command parameter and fundamental frequency minimum value F _Min, this step can separate with the definite of syllable command parameter.Then, by fundamental frequency minimum value F _MinGo out F accurately with the phrase parameter simulation ₀Curve.After the parameter optimization of phrase model is good, calculate the parameter of syllable unit again.

A syllable of fundamental curve simulation or the rhythm speech that the syllable order generates.Independent rhythm speech is from left to right handled, and whole syllable unit is not carried out global optimization, but local fundamental curve simulation all done in each rhythm speech.To this F ₀The processing of fundamental curve from left to right should have two restrictive conditions: condition is the curve after the syllable command affects that prevents the back is optimized; Another condition is to guarantee also can estimate syllable or rhythm speech under the inadequate situation of command parameter in front.

4, based on the F of intonation model ₀Synthetic

(1) ratio of damping

Phrase unit and syllable unit are used as the constant of damping time and are handled.For the phrase unit, the ratio of damping standard value is 3.1Hz.The ratio of damping average of all speakers and all syllables or rhythm speech is 16Hz.

(2) fundamental frequency minimum value F _Min

Fundamental frequency minimum value F _MinThe distribution dispersion is less, and the scope of representative value is that the male sex is 70-80Hz, and the women is 120-140Hz.

(3) amplitude of phrase command and timing

F in the statement has been represented in the phrase unit ₀The overall situation of curve has a down dip and slowly changes, and is the basis of intonation fundamental curve.On the fundamental frequency amplitude, the amplitude of phrase command is to determine the product coefficient of side-play amount on frequency domain of phrase fundamental curve, is F in the sentence ₀The direct yardstick that has a down dip, and relevant with the speaker to a great extent.Sentence pattern shows on the overall situation by the fundamental curve of phrase unit, and for example, the fundamental curve of declarative sentence is the situation that has a down dip, and from the beginning the fundamental curve of general question and disjunctive question has a down dip earlier, raises up to afterbody then again to tail.From the time, the fundamental curve of phrase reaches maximal value relatively earlier, and along with the major part of sentence descends separately.The first of the peak value of phrase fundamental curve and sentence or prosodic phrase causes, so the timing of phrase command is directly according to ratio of damping (3.1Hz).First phrase command before sentence begins is set to 323ms, and this has also proved F ₀Generation and the result of study of control, disclosed the activity before throat's muscle pronunciation.

(4) syllable order amplitude

The syllable amplitude is to determine the side-play amount on frequency domain of syllable curve and the product coefficient of syllable peak value height, and amplitude is relevant with the position of syllable to a great extent.The amplitude of sentence last or end syllable joint order amplitude other position in the sentence, the amplitude of noun will be higher than other part of speech, and the syllable order amplitude before the phrasal boundary is than the amplitude high about 10～20% of other position.

(5) syllable order duration

The duration of syllable order can be by this syllable place the duration prediction of rhythm speech go out, the degree of correlation of the two (r=0.84), i.e. the about variation more than 70% of syllable order duration can obtain from the duration of rhythm speech.

(6) syllable command position

Instantaneous distance in the non-sentence tail position between syllable (or rhythm speech) beginning and the order beginning approximately is 10% of syllable (or rhythm speech) duration, promptly will have one quiet section between syllable (or rhythm speech) order beginning and syllable (or rhythm speech) pronunciation beginning, this distance goes to zero in the rhythm speech of sentence end.

Therefore,, can set up the adjustment that set of rule is come controlled variable, such as statement pattern, statement stress, phrasal boundary or word stress, as an artificial intonation curve of given sentence generation according to top analysis.And need the information of input to comprise the position of speech syllable, the duration of rhythm speech and their part of speech.

Here the rule that is proposed is based on The result of statistics for basic, and the parameter that provides is a mean value, so the curve that produces is not represented any one real speaker.But illustrate from another aspect: if can catch speaker's feature accurately, so will be very approaching by above-mentioned rule and intonation pattern that model produced and the intonation pattern that the speaker who is modeled sends.The intonation model parameter sees Table 1.

Table 1 intonation model parameter table

The intonation model parameter	Parameter declaration
		Ratio of damping	Phrase unit 3.1Hz, syllable unit 16Hz.
Fundamental frequency minimum value F _Min	The scope of representative value is that the male sex is 70-80Hz, and the women is 120-140Hz.
		The phrase command amplitude	The amplitude of phrase command is to determine the product coefficient of side-play amount on frequency domain of phrase fundamental curve.Be F in the sentence ₀The direct yardstick that has a down dip, and relevant with the speaker to a great extent.
Syllable order amplitude	The syllable amplitude is to determine the side-play amount on frequency domain of syllable curve and the product coefficient of syllable peak value height, and amplitude is relevant with the position of syllable to a great extent.Syllable order amplitude before the phrasal boundary is than the amplitude high about 10～20% of other position.
		The phrase command time is provided with	First phrase command before sentence begins is set to 323ms.
The syllable order time is provided with	Instantaneous distance in the non-sentence tail position between syllable (or rhythm speech) order beginning and the syllable pronunciation beginning approximately is 10% of syllable (or rhythm speech) duration, promptly will have one quiet section between syllable (or rhythm speech) order beginning and syllable (or rhythm speech) pronunciation beginning, this distance goes to zero in the rhythm speech of sentence end.
		The phrase command duration	The phrase command duration can obtain from the duration of prosodic phrase.

Syllable order duration

The about variation more than 70% of syllable order duration can obtain from the duration of rhythm speech.

Fig. 6 is the prosodic features control flow chart, in conjunction with Fig. 6, the phonetic sign indicating number sequence that contains rhythm structure information is analyzed and converted to computing machine to the text message of keying in, according to rhythm structure information, the phrase position of mark text sentence and number, each phrase intensity, syllable number, syllable transfer shape, syllable length, syllable to transfer parameters such as territory amplitude and whole sentence keynote value; Regulate also dose relevant controlling parameter with artificial and parameter optimization algorithm, it is comprehensive to press the model layering, calculates the pitch contour data that form a complete sentence; Then,, adopt the PSOLA method that the prosodic parameter of each syllable waveform in the sound storehouse is adjusted at last, the synthetic continuous speech of splicing according to fundamental frequency output valve and corresponding duration parameters.

5, system environments

In conjunction with Fig. 7, be one and can implement suitable computingasystem environment of the present invention.This computingasystem environment just can be implemented an embodiment of computingasystem environment of the present invention, and is not to be that range of application of the present invention or function are carried out any restriction.Computing environment should not be considered to that the combination of any one parts shown in the example operational environment or parts is had any dependence or requirement yet.

The present invention can be used for numerous specific or unspecific computingasystem environment or configurations, as: personal computer, small-size computer, medium-size computer, mainframe computer, network computer, server computer, hand or laptop devices, multicomputer system is based on the system of microprocessor, set-top box, the programmable electronic consumption device comprises any above-mentioned system or the distributed computing environment of device, or the like.

Can the use a computer general modfel of executable instruction of the present invention is described, for example the program module of computing machine.Program module comprises program, subroutine, object, control, assembly, data structure etc., and they are used for carrying out specific task or realize specific abstract data type.The present invention also can be applied to distributed computing environment, wherein executes the task by the teleprocessing device that utilizes the communication network link.In distributed computing environment, program module can leave in the local and remote computer-readable storage medium that comprises memory storage apparatus simultaneously.

The formation of computer installation shown in Figure 7 comprises: one or more CPU (central processing unit), internal storage, external memory storage, input equipment interface, output device interface and the system bus that connects above-mentioned each unit or parts.System bus can be any bus structure that comprise in the bus structure of following several types: memory bus or memory controller, a peripheral bus and use the local bus of bus in the various bus structure.These bus structure: as industrial standard architectures (ISA) bus, MCA (MCA) bus, the ISA line of enhancing, VESA (VESA), local bus and peripheral component interconnect (PCI) bus (also being mezzanine bus Mezzanine bus), or the like.

The user can be by input media to defeated people's order of computer port and information.These input medias can be keyboard, microphone and pointing device such as mouse, trace ball or touch pad, can also be other input media (not drawing on the figure), for example control lever, game mat, the big line of disc type satellite television (satellite dish), scanner etc.Above-mentioned input media normally is connected to processing unit by user's input interface that is coupled to system bus, but also can be connected with bus structure by other interface, for example parallel port, game port or USB (universal serial bus) (USB).The display device of monitor or other types is by an interface, and for example video interface is connected to system bus.Except this monitor, computing machine also can comprise other output peripheral equipment, for example loudspeaker and printer, and they connect by an outside output interface.

Computing machine can by the logic ways of connecting be connected to one or more how far journey computing machine (for example remote computer) thus in network environment, operate.

Claims

1. Chinese intonation pitch contour generation method based on the intonation model is characterized in that it comprises the attainable step of following computing machine:

Input mark phonetic sign indicating number sequence step;

Intonation pitch contour stack step, the pitch contour and the fundamental frequency minimum value F of the pitch contour of the phrase unit that the phrase control gear is sent, the syllable unit that the syllable control gear is sent _Min, according to Chinese intonation model, carry out logarithm stack by tagmeme, amplitude size and time length; Do not dispose if judge mark phonetic sign indicating number sequence, then continue to carry out; Otherwise the intonation pitch contour curve that generates and export the mark phonetic sign indicating number sequence of being imported is given follow-up signal Processing step.

2. the Chinese intonation pitch contour generation method based on the intonation model according to claim 1 is characterized in that described intonation pitch contour curve, and its mathematic(al) representation is as follows:

Wherein:

F _Min: the fundamental frequency minimum value of sentence;

I: the number of phrase; R _i: i phrase attenuation coefficient, empirical value are 3/s; T _0i: i the time that the phrase control command takes place; A _Pi: the amplitude of i phrase control command; G _Pi(t): represent different phrase accent types;

J: the number of syllable or rhythm speech; A _Aj: the amplitude of j syllable control command; T _1j: j the time that the syllable control command begins; T _2j: j the time that the syllable control command finishes; B _j: the intrinsic angle value of j syllable control command under the syllable control gear, empirical value is 20/s; θ _j: the maximum permissible value of the syllable composition of j syllable control command, empirical value are 0.9; G _Aj(t): represent different syllable accent types;

Concrete accent type G for each phrase _Pi(t), can also determine phrase accent type according to the phrase information in the phonetic sign indicating number sequence of mark, its phrase transfers type function can adopt " phrase unit rhythm template " directly to generate the pitch contour curve of phrase;

Concrete accent type G for each syllable _Aj(t), can also determine syllable accent type according to the syllable information in the mark phonetic sign indicating number sequence, its syllable transfers type function can adopt " syllable unit rhythm template " directly to generate the rhythm fundamental curve of syllable; Also can utilize curvilinear equation G _Aj(t)=a ₀+ a ₁T+a ₂t ²+ a ₃t ³+ a ₄t ⁴Generate the pitch contour curve of the comparatively desirable syllable unit of fitting effect.

3. the Chinese intonation pitch contour generation method based on the intonation model according to claim 1 is characterized in that the intonation model parameter is generated automatically by computer program, and the first step of algorithm is to determine phrase command parameter and fundamental frequency minimum value F _Min, then, by fundamental frequency minimum value F _MinGo out F0 curve accurately with the phrase command parameter simulation, after the phrase command parameter optimization is good, calculate the syllable command parameter again; Independent syllable is from left to right handled, and local fundamental curve simulation all done in each syllable.