CN101000766B - Chinese intonation base frequency contour generating method based on intonation model - Google Patents

Chinese intonation base frequency contour generating method based on intonation model Download PDF

Info

Publication number
CN101000766B
CN101000766B CN2007100716149A CN200710071614A CN101000766B CN 101000766 B CN101000766 B CN 101000766B CN 2007100716149 A CN2007100716149 A CN 2007100716149A CN 200710071614 A CN200710071614 A CN 200710071614A CN 101000766 B CN101000766 B CN 101000766B
Authority
CN
China
Prior art keywords
syllable
phrase
intonation
pitch contour
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2007100716149A
Other languages
Chinese (zh)
Other versions
CN101000766A (en
Inventor
张鹏
王丽红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heilongjiang University
Original Assignee
Heilongjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Heilongjiang University filed Critical Heilongjiang University
Priority to CN2007100716149A priority Critical patent/CN101000766B/en
Publication of CN101000766A publication Critical patent/CN101000766A/en
Application granted granted Critical
Publication of CN101000766B publication Critical patent/CN101000766B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Electrophonic Musical Instruments (AREA)

Abstract

A method for generating intonation base frequency outline Chinese based on intonation model includes outputting phrase unit base frequency outline curve of inputted label phonetic code sequence through phrase control mechanism, generating and outputting syllable unit base frequency outline curve of inputted label phonetic code sequence through syllable control mechanism, carrying out logarithm superposition of obtained phrase unit base frequency outline and obtained syllable unit base frequency outline with minimum base frequency value F to generate and output base frequency outline curve of intonation.

Description

Chinese intonation pitch contour generation method based on the intonation model
(1) technical field
The present invention relates to the voice process technology field, be specifically related to a kind of Chinese intonation pitch contour generation method in the speech synthesis technique based on the intonation model
(2) background technology
At present, Chinese voice synthetic method adopts the time domain waveform splicing speech synthesis technique based on big corpus usually.In this method, the speech primitive of synthetic statement from a record in advance down, select the corpus of huge natural-sounding, system is according to certain rule or cost function or statistical method etc., and directly screening synthesis unit or fragment are spliced from corpus.Can imagine as long as this corpus is enough big, might splice any statement theoretically.Because synthetic speech primitive all comes from the original transcription of nature, or a syllable, or a kind of language fragments of random length, as multi-character words or prosodic phrase, therefore, the sharpness and the naturalness of synthetic back voice are all very high.This method has been avoided speech primitive is done rhythm adjustment, need not make the conversion process of time domain or frequency domain basically to signal.Yet the rhythm of Chinese is complicated and changeable, and intonation also is diversified, and therefore the synthetic speech that adopts said method to obtain can't satisfy people's requirement.Compare with natural-sounding, the sentence that these systems are synthetic and the voice naturalness and the intelligibility of chapter are relatively low, and " machine flavor " is denseer, and people sound that it not is very comfortable feeling.Its reason is: also do not obtain gratifying achievement so far on the rhythm control method of phonetic synthesis, thereby having restricted this technology comes into the market on a large scale, and being exactly the fundamental curve of intonation, major issue wherein can't adjust, or the intonation model can't reflect the intonation rule of Chinese, or the like.
(3) summary of the invention
The object of the present invention is to provide the intonation of the tone of a kind of phonetic feature, Chinese and characteristics and Chinese and pattern, further improve the Chinese intonation pitch contour generation method based on the intonation model of the synthetic naturalness of Chinese speech from Chinese.
The object of the present invention is achieved like this: it comprises the attainable step of following computing machine:
Input mark phonetic sign indicating number sequence;
The pitch contour of phrase unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be phrase information; If, then send into to the phrase control gear as phrase command, simultaneously according to this phrase prosodic information, index goes out corresponding phrase unit rhythm template from rhythm template base, generates and exports the pitch contour curve of phrase unit, and be retained in the buffer zone; Otherwise continue to search phrase information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output phrase unit;
The pitch contour of syllable unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be syllable information; If then order inputs to the syllable control gear as syllable, simultaneously according to this syllable rhythm information, index goes out corresponding syllable unit rhythm template from rhythm template base, generates and exports the pitch contour curve of syllable unit, and be retained in the buffer zone; Otherwise continue to search syllable information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output syllable unit;
Intonation pitch contour stack step, the pitch contour and the fundamental frequency minimum value F of the pitch contour of the phrase unit that the phrase control gear is sent, the syllable unit that the syllable control gear is sent Min, according to Chinese intonation model, carry out logarithm stack by tagmeme, amplitude size and time length; Do not dispose if judge mark phonetic sign indicating number sequence, then return and continue to carry out; Otherwise, generate and export intonation pitch contour curve and give follow-up signal Processing step.
The present invention also has some technical characterictics like this:
1, described pitch contour curve, its mathematic(al) representation is as follows:
ln F 0 ( t ) = ln F min + Σ i = 1 I A pi G pi ( t - T 0 i ) + Σ j = 1 J A aj [ G aj ( t - T 1 j ) - G aj ( t - T 2 j ) ]
G pi = R i 2 texp ( - R i t ) , t &GreaterEqual; 0 0 , t < 0
Or G pi ( m ) ( t ) = G pi ( 1 ) ( t ) , G pi ( 2 ) ( t ) &CenterDot; &CenterDot; &CenterDot; G pi ( M ) ( t ) , m = 1,2 , &CenterDot; &CenterDot; &CenterDot; M
G aj = Min [ 1 - ( 1 + B j t ) exp ( - B j t ) , &theta; j ] , t &GreaterEqual; 0 0 , t < 0
Or G aj ( n ) ( t ) = G aj ( 1 ) ( t ) , G aj ( 2 ) ( t ) &CenterDot; &CenterDot; &CenterDot; G aj ( N ) ( t ) , n = 1,2 , &CenterDot; &CenterDot; &CenterDot; N
Wherein:
F Min: the fundamental frequency minimum value of sentence;
I: the number of phrase; R j: i phrase attenuation coefficient, empirical value are 3/s; T 0i: i the time that the phrase control command takes place; A Pi: the amplitude of i phrase control command; G Pi (m): represent different phrase accent types;
J: the number of syllable or rhythm speech; A Aj: the amplitude of j syllable control command; T 1j: j the time that the syllable control command begins; T 2j: j the time that the syllable control command finishes; B j: the intrinsic angle value of j syllable control command under the syllable control gear, empirical value is 20/s; θ j: the maximum permissible value of the syllable composition of j syllable control command, empirical value are 0.9; G Aj (n): represent different syllable accent types;
2, described model parameter is generated automatically by computer program, the first step of algorithm be to determine phrase command parameter and fundamental frequency minimum value F Min, then, by fundamental frequency minimum value F MinGo out F accurately with the phrase parameter simulation 0Curve after the parameter optimization of phrase unit is good, calculates the parameter of syllable unit again; Independent rhythm speech is from left to right handled, and local fundamental curve simulation all done in each rhythm speech.
Useful advantage of the present invention has:
(1) from the mark phonetic sign indicating number sequence of input text, obtains the prosodic information of phrase and syllable, generate the rhythm structure requirement that the intonation pitch contour meets natural-sounding;
(2) pitch contour of phrase unit and the pitch contour of syllable unit are handled the time tagmeme that can determine phrase unit, syllable unit exactly respectively;
(3) adopt phrase unit rhythm template and syllable unit rhythm template can simplify the generative process of the pitch contour of phrase unit, syllable unit.Simultaneously, can reflect better that the rhythm changes requirement complicated and changeable;
(4) phrase control gear and syllable control gear are regarded as the second order oscillatory system of decay, met the physiological property of people's vocal organs.
Chinese is different from other department of western languages, shows many aspects such as syntactic structure, syntax rule, acoustic characteristic, rhythm structure.At first, Chinese is one word for one tone, i.e. monosyllable; Secondly, Chinese is tone language, and tone has distinguishes the justice effect, and each word all has fixing tone (fundamental frequency shape).And can morph in the tone front and back between word and word influence each other, even lost original accent type, coarticulation phenomenon (change of tune phenomenon) promptly occurs.Simultaneously, also have of short duration pause in the middle of the pronunciation of continuous statement.Everyone has a basic frequency in a minute, is called fundamental frequency, and it has embodied speaker's tone height, and in addition, people also have difference of sound size or the like in a minute.In the literary composition of Chinese language conversion (TTS) system, prediction, analysis and the control of prosodic informations such as speech pitch, duration, amplitude is called rhythm control.
At this situation, the inventor is from the phonetic feature of Chinese, and the intonation and the pattern of the tone of Chinese and characteristics, Chinese are set out, and constructs the complete Chinese intonation pitch contour generation method based on the intonation model of a cover, has improved the naturalness of synthetic speech.Each step among the present invention and module, submodule all can be realized by computer program, and operability, transplantability are strong, applied widely.
(4) description of drawings
Fig. 1 is a Chinese intonation pitch contour generation model block diagram;
Fig. 2 is that Chinese intonation pitch contour generates block diagram;
Fig. 3 is Chinese intonation pitch contour product process figure;
Fig. 4 is the phrase fundamental curve of attenuation characteristic;
Fig. 5 is the syllable fundamental curve of characteristic of raising up;
Fig. 6 is the prosodic features control flow chart;
Fig. 7 is the computer hardware system block diagram of the embodiment of the invention.
(5) embodiment
The present invention is described in further detail below in conjunction with the drawings and specific embodiments:
In conjunction with Fig. 2, the present invention includes the attainable step of following computing machine:
Input mark phonetic sign indicating number sequence;
The pitch contour of phrase unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be phrase information; If, then send into to the phrase control gear as phrase command, simultaneously according to this phrase prosodic information, index goes out corresponding phrase unit rhythm template from rhythm template base, generates and exports the pitch contour curve of phrase unit, and be retained in the buffer zone; Otherwise continue to search phrase information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output phrase unit;
The pitch contour of syllable unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be syllable information; If then order inputs to the syllable control gear as syllable, simultaneously according to this syllable rhythm information, index goes out corresponding syllable unit rhythm template from rhythm template base, generates and exports the pitch contour curve of syllable unit, and be retained in the buffer zone; Otherwise continue to search syllable information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output syllable unit;
Intonation pitch contour stack step, the pitch contour and the fundamental frequency minimum value F of the pitch contour of the phrase unit that the phrase control gear is sent, the syllable unit that the syllable control gear is sent Min, according to Chinese intonation model, carry out logarithm stack by tagmeme, amplitude size and time length; Do not dispose if judge mark phonetic sign indicating number sequence, then return and continue to carry out; Otherwise, generate and export intonation pitch contour curve and give follow-up signal Processing step.
Wherein each step realizes by computer program.
Embodiment:
1, the structure of rhythm template base
The structure of rhythm template base adopts conventional method to get final product, and is identical with the method for generally setting up database, just carefully do not lift here.The present invention selects the minimum in the Chinese to listen the unit of distinguishing after taking all factors into consideration various factors---and syllable is as the primitive of phonetic synthesis, and a plurality of samples stored in a syllable in the sound bank, and the soft and stress tone and the fundamental curve of each sample also have nothing in common with each other.
2, the intonation model of Chinese
Pronunciation nature, a complete statement mainly show three aspects: the one, and the accent type of sentence is mainly reflected on the fundamental frequency of sentence, i.e. the pitch curve of sentence; The 2nd, prosodic phrase and the particular location of rhythm speech in sentence are because they have reflected the prosodic features attribute change of whole sentence; The 3rd, the stress of sentence and stall position, stress can highlight and emphasize the Semantic center of whole sentence, pauses to have reflected the rhythm rhythm of sentence; In aspect these three, the fundamental curve of sentence is particularly important, and it has reflected that the whole sentence rhythm changes the trend of notable attribute and whole sentence fundamental curve profile varying.
Can be the F of a sentence 0The pitch contour curve is regarded the pitch contour curve of phrase unit, the pitch contour curve and the fundamental frequency minimum value F of syllable unit as MinStack, pitch contour is represented with logarithmic coordinate.Wherein the pitch contour curve of phrase unit has reflected that the pitch contour of the sentence overall situation changes, and the pitch contour curve of syllable unit has reflected the local fundamental frequency profile variations of syllable or rhythm speech, and fundamental frequency minimum value F MinRepresented the low-limit frequency that the human vocal band is vibrated sound.Phrase unit and syllable unit belong to phrase control gear and syllable control gear respectively, and two control gears are similar to the second order oscillatory system of decay.The input of phrase control gear is a phrase command, and output is the pitch contour of phrase unit; And the input of syllable control gear is the syllable order, and output is the pitch contour of syllable unit.Phrase command can be described with an impulse function, and the syllable order can be described with a step function.These functions are made up of two groups of different control commands and parameter respectively:
The ratio of damping of (1) timing of phrase command, amplitude and phrase control gear;
(2) ratio of damping of the syllable order moment, amplitude and the syllable control gear that begin and finish.
It is constant that these parameters must keep in the time period of a setting, and promptly the parameter of phrase unit is constant a prosodic phrase inside, and the parameter of syllable unit is constant in syllable or rhythm speech, fundamental frequency minimum value F MinConstant in whole sentence.Chinese intonation pitch contour generation model block diagram, as shown in Figure 1.
Based on above-mentioned Chinese intonation pitch contour generation model, be phrase command and syllable order with two kinds of orders, as the input of sentence intonation model, and model is output as the pitch contour curve of sentence, and its mathematic(al) representation is as follows:
ln F 0 ( t ) = ln F min + &Sigma; i = 1 I A pi G pi ( t - T 0 i ) + &Sigma; j = 1 J A aj [ G aj ( t - T 1 j ) - G aj ( t - T 2 j ) ] - - - ( 1 )
G pi = R i 2 texp ( - R i t ) , t &GreaterEqual; 0 0 , t < 0 - - - ( 2 )
Or G pi ( m ) ( t ) = G pi ( 1 ) ( t ) , G pi ( 2 ) ( t ) &CenterDot; &CenterDot; &CenterDot; G pi ( M ) ( t ) , m = 1,2 , &CenterDot; &CenterDot; &CenterDot; M (shape function is transferred in the phrase unit)
G aj = Min [ 1 - ( 1 + B j t ) exp ( - B j t ) , &theta; j ] , t &GreaterEqual; 0 0 , t < 0 - - - ( 3 )
Or G aj ( n ) ( t ) = G aj ( 1 ) ( t ) , G aj ( 2 ) ( t ) &CenterDot; &CenterDot; &CenterDot; G aj ( N ) ( t ) , n = 1,2 , &CenterDot; &CenterDot; &CenterDot; N (syllable unit accent shape function)
Wherein:
F Min: the fundamental frequency minimum value of sentence;
I: the number of phrase; R i: i phrase attenuation coefficient, empirical value are 3/s; T 0i: i the time that the phrase control command takes place; A Pi: the amplitude of i phrase control command; G Pi (m): represent different phrase accent types.
J: the number of syllable or rhythm speech; A Aj: the amplitude of j syllable control command; T 1j: j the time that the syllable control command begins; T 2j: j the time that the syllable control command finishes; B j: the intrinsic angle value of j syllable control command under the syllable control gear, empirical value is 20/s; θ j: the maximum permissible value of the syllable composition of j syllable control command, empirical value are 0.9; G Aj (n): represent different syllable accent types.
The part 1 of formula (1) can be regarded as and makes vocal cords keep the fundamental frequency minimum value of vibration; Part 2 is represented the pitch contour of phrase unit; The 3rd part is represented the pitch contour of syllable unit; The three becomes logarithm superposition form.Here fundamental frequency minimum value F MinBe that voice and accent type by sentence determined, through whole statement; Secondly the fundamental frequency change curve of stack phrase on it obtains the basic trend of the fundamental frequency centrode of a sentence; On the basis of this fundamental frequency centrode, continue to press the fundamental frequency change curve of tagmeme stack syllable or rhythm speech then.At last, the result of these three partial stacks is the fundamental frequency change curve of a complete sentence.
To the slow trend of falling of phrase, can be by regulating R iSize change G Pi(t) attenuation characteristic, and then reach the purpose of adjusting phrase fundamental frequency trend.R iBe worth greatly more, then attenuation degree is big more, and it is serious more that the phrase fundamental curve has a down dip; Simultaneously, R iSize also reflected the length of intonation phrase indirectly.Equally, can be to the slow trend that rises of syllable by regulating B jSize realize B jBe worth greatly more, it is obvious more that the fundamental curve of syllable raises up.Fig. 4 and Fig. 5 have listed the raise up fundamental curve of characteristic of the fundamental curve of phrase attenuation characteristic and syllable respectively.
For the concrete accent shape of each phrase, determine phrase accent shape according to the phrase information in the phonetic sign indicating number sequence of mark, its phrase transfers shape function can adopt " phrase unit rhythm template " directly to generate the pitch contour curve of phrase.
For the concrete accent shape of each syllable, determine syllable accent shape according to the syllable information in the phonetic sign indicating number sequence of mark, its syllable transfers shape function can adopt " syllable unit rhythm template " directly to generate the rhythm fundamental curve of syllable, also can utilize curvilinear equation G aj ( n ) ( t ) = a 0 + a 1 t + a 2 t 2 + a 3 t 3 + a 4 t 4 Generate the comparatively desirable syllable of fitting effect and transfer deltoid.Transfer length to determine by the timing starting point and the terminal point of the voiced segments of correspondence; Transferring the territory to use with transferring long corresponding staged transfers the territory amplitude to control.
3, the setting of model parameter
Model parameter is generated automatically by computer program.Based on overlapping principle, the first step of algorithm be to determine phrase command parameter and fundamental frequency minimum value F Min, this step can separate with the definite of syllable command parameter.Then, by fundamental frequency minimum value F MinGo out F accurately with the phrase parameter simulation 0Curve.After the parameter optimization of phrase model is good, calculate the parameter of syllable unit again.
A syllable of fundamental curve simulation or the rhythm speech that the syllable order generates.Independent rhythm speech is from left to right handled, and whole syllable unit is not carried out global optimization, but local fundamental curve simulation all done in each rhythm speech.To this F 0The processing of fundamental curve from left to right should have two restrictive conditions: condition is the curve after the syllable command affects that prevents the back is optimized; Another condition is to guarantee also can estimate syllable or rhythm speech under the inadequate situation of command parameter in front.
4, based on the F of intonation model 0Synthetic
(1) ratio of damping
Phrase unit and syllable unit are used as the constant of damping time and are handled.For the phrase unit, the ratio of damping standard value is 3.1Hz.The ratio of damping average of all speakers and all syllables or rhythm speech is 16Hz.
(2) fundamental frequency minimum value F Min
Fundamental frequency minimum value F MinThe distribution dispersion is less, and the scope of representative value is that the male sex is 70-80Hz, and the women is 120-140Hz.
(3) amplitude of phrase command and timing
F in the statement has been represented in the phrase unit 0The overall situation of curve has a down dip and slowly changes, and is the basis of intonation fundamental curve.On the fundamental frequency amplitude, the amplitude of phrase command is to determine the product coefficient of side-play amount on frequency domain of phrase fundamental curve, is F in the sentence 0The direct yardstick that has a down dip, and relevant with the speaker to a great extent.Sentence pattern shows on the overall situation by the fundamental curve of phrase unit, and for example, the fundamental curve of declarative sentence is the situation that has a down dip, and from the beginning the fundamental curve of general question and disjunctive question has a down dip earlier, raises up to afterbody then again to tail.From the time, the fundamental curve of phrase reaches maximal value relatively earlier, and along with the major part of sentence descends separately.The first of the peak value of phrase fundamental curve and sentence or prosodic phrase causes, so the timing of phrase command is directly according to ratio of damping (3.1Hz).First phrase command before sentence begins is set to 323ms, and this has also proved F 0Generation and the result of study of control, disclosed the activity before throat's muscle pronunciation.
(4) syllable order amplitude
The syllable amplitude is to determine the side-play amount on frequency domain of syllable curve and the product coefficient of syllable peak value height, and amplitude is relevant with the position of syllable to a great extent.The amplitude of sentence last or end syllable joint order amplitude other position in the sentence, the amplitude of noun will be higher than other part of speech, and the syllable order amplitude before the phrasal boundary is than the amplitude high about 10~20% of other position.
(5) syllable order duration
The duration of syllable order can be by this syllable place the duration prediction of rhythm speech go out, the degree of correlation of the two (r=0.84), i.e. the about variation more than 70% of syllable order duration can obtain from the duration of rhythm speech.
(6) syllable command position
Instantaneous distance in the non-sentence tail position between syllable (or rhythm speech) beginning and the order beginning approximately is 10% of syllable (or rhythm speech) duration, promptly will have one quiet section between syllable (or rhythm speech) order beginning and syllable (or rhythm speech) pronunciation beginning, this distance goes to zero in the rhythm speech of sentence end.
Therefore,, can set up the adjustment that set of rule is come controlled variable, such as statement pattern, statement stress, phrasal boundary or word stress, as an artificial intonation curve of given sentence generation according to top analysis.And need the information of input to comprise the position of speech syllable, the duration of rhythm speech and their part of speech.
Here the rule that is proposed is based on The result of statistics for basic, and the parameter that provides is a mean value, so the curve that produces is not represented any one real speaker.But illustrate from another aspect: if can catch speaker's feature accurately, so will be very approaching by above-mentioned rule and intonation pattern that model produced and the intonation pattern that the speaker who is modeled sends.The intonation model parameter sees Table 1.
Table 1 intonation model parameter table
The intonation model parameter Parameter declaration
Ratio of damping Phrase unit 3.1Hz, syllable unit 16Hz.
Fundamental frequency minimum value F Min The scope of representative value is that the male sex is 70-80Hz, and the women is 120-140Hz.
The phrase command amplitude The amplitude of phrase command is to determine the product coefficient of side-play amount on frequency domain of phrase fundamental curve.Be F in the sentence 0The direct yardstick that has a down dip, and relevant with the speaker to a great extent.
Syllable order amplitude The syllable amplitude is to determine the side-play amount on frequency domain of syllable curve and the product coefficient of syllable peak value height, and amplitude is relevant with the position of syllable to a great extent.Syllable order amplitude before the phrasal boundary is than the amplitude high about 10~20% of other position.
The phrase command time is provided with First phrase command before sentence begins is set to 323ms.
The syllable order time is provided with Instantaneous distance in the non-sentence tail position between syllable (or rhythm speech) order beginning and the syllable pronunciation beginning approximately is 10% of syllable (or rhythm speech) duration, promptly will have one quiet section between syllable (or rhythm speech) order beginning and syllable (or rhythm speech) pronunciation beginning, this distance goes to zero in the rhythm speech of sentence end.
The phrase command duration The phrase command duration can obtain from the duration of prosodic phrase.
Syllable order duration The about variation more than 70% of syllable order duration can obtain from the duration of rhythm speech.
Fig. 6 is the prosodic features control flow chart, in conjunction with Fig. 6, the phonetic sign indicating number sequence that contains rhythm structure information is analyzed and converted to computing machine to the text message of keying in, according to rhythm structure information, the phrase position of mark text sentence and number, each phrase intensity, syllable number, syllable transfer shape, syllable length, syllable to transfer parameters such as territory amplitude and whole sentence keynote value; Regulate also dose relevant controlling parameter with artificial and parameter optimization algorithm, it is comprehensive to press the model layering, calculates the pitch contour data that form a complete sentence; Then,, adopt the PSOLA method that the prosodic parameter of each syllable waveform in the sound storehouse is adjusted at last, the synthetic continuous speech of splicing according to fundamental frequency output valve and corresponding duration parameters.
5, system environments
In conjunction with Fig. 7, be one and can implement suitable computingasystem environment of the present invention.This computingasystem environment just can be implemented an embodiment of computingasystem environment of the present invention, and is not to be that range of application of the present invention or function are carried out any restriction.Computing environment should not be considered to that the combination of any one parts shown in the example operational environment or parts is had any dependence or requirement yet.
The present invention can be used for numerous specific or unspecific computingasystem environment or configurations, as: personal computer, small-size computer, medium-size computer, mainframe computer, network computer, server computer, hand or laptop devices, multicomputer system is based on the system of microprocessor, set-top box, the programmable electronic consumption device comprises any above-mentioned system or the distributed computing environment of device, or the like.
Can the use a computer general modfel of executable instruction of the present invention is described, for example the program module of computing machine.Program module comprises program, subroutine, object, control, assembly, data structure etc., and they are used for carrying out specific task or realize specific abstract data type.The present invention also can be applied to distributed computing environment, wherein executes the task by the teleprocessing device that utilizes the communication network link.In distributed computing environment, program module can leave in the local and remote computer-readable storage medium that comprises memory storage apparatus simultaneously.
The formation of computer installation shown in Figure 7 comprises: one or more CPU (central processing unit), internal storage, external memory storage, input equipment interface, output device interface and the system bus that connects above-mentioned each unit or parts.System bus can be any bus structure that comprise in the bus structure of following several types: memory bus or memory controller, a peripheral bus and use the local bus of bus in the various bus structure.These bus structure: as industrial standard architectures (ISA) bus, MCA (MCA) bus, the ISA line of enhancing, VESA (VESA), local bus and peripheral component interconnect (PCI) bus (also being mezzanine bus Mezzanine bus), or the like.
The user can be by input media to defeated people's order of computer port and information.These input medias can be keyboard, microphone and pointing device such as mouse, trace ball or touch pad, can also be other input media (not drawing on the figure), for example control lever, game mat, the big line of disc type satellite television (satellite dish), scanner etc.Above-mentioned input media normally is connected to processing unit by user's input interface that is coupled to system bus, but also can be connected with bus structure by other interface, for example parallel port, game port or USB (universal serial bus) (USB).The display device of monitor or other types is by an interface, and for example video interface is connected to system bus.Except this monitor, computing machine also can comprise other output peripheral equipment, for example loudspeaker and printer, and they connect by an outside output interface.
Computing machine can by the logic ways of connecting be connected to one or more how far journey computing machine (for example remote computer) thus in network environment, operate.

Claims (3)

1. Chinese intonation pitch contour generation method based on the intonation model is characterized in that it comprises the attainable step of following computing machine:
Input mark phonetic sign indicating number sequence step;
The pitch contour of phrase unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be phrase information; If, then send into to the phrase control gear as phrase command, simultaneously according to this phrase prosodic information, index goes out corresponding phrase unit rhythm template from rhythm template base, generates and exports the pitch contour curve of phrase unit, and be retained in the buffer zone; Otherwise continue to search phrase information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output phrase unit;
The pitch contour of syllable unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be syllable information; If then order inputs to the syllable control gear as syllable, simultaneously according to this syllable rhythm information, index goes out corresponding syllable unit rhythm template from rhythm template base, generates and exports the pitch contour curve of syllable unit, and be retained in the buffer zone; Otherwise continue to search syllable information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output syllable unit;
Intonation pitch contour stack step, the pitch contour and the fundamental frequency minimum value F of the pitch contour of the phrase unit that the phrase control gear is sent, the syllable unit that the syllable control gear is sent Min, according to Chinese intonation model, carry out logarithm stack by tagmeme, amplitude size and time length; Do not dispose if judge mark phonetic sign indicating number sequence, then continue to carry out; Otherwise the intonation pitch contour curve that generates and export the mark phonetic sign indicating number sequence of being imported is given follow-up signal Processing step.
2. the Chinese intonation pitch contour generation method based on the intonation model according to claim 1 is characterized in that described intonation pitch contour curve, and its mathematic(al) representation is as follows:
Figure FSB00000329590200011
Figure FSB00000329590200012
Figure FSB00000329590200013
Wherein:
F Min: the fundamental frequency minimum value of sentence;
I: the number of phrase; R i: i phrase attenuation coefficient, empirical value are 3/s; T 0i: i the time that the phrase control command takes place; A Pi: the amplitude of i phrase control command; G Pi(t): represent different phrase accent types;
J: the number of syllable or rhythm speech; A Aj: the amplitude of j syllable control command; T 1j: j the time that the syllable control command begins; T 2j: j the time that the syllable control command finishes; B j: the intrinsic angle value of j syllable control command under the syllable control gear, empirical value is 20/s; θ j: the maximum permissible value of the syllable composition of j syllable control command, empirical value are 0.9; G Aj(t): represent different syllable accent types;
Concrete accent type G for each phrase Pi(t), can also determine phrase accent type according to the phrase information in the phonetic sign indicating number sequence of mark, its phrase transfers type function can adopt " phrase unit rhythm template " directly to generate the pitch contour curve of phrase;
Concrete accent type G for each syllable Aj(t), can also determine syllable accent type according to the syllable information in the mark phonetic sign indicating number sequence, its syllable transfers type function can adopt " syllable unit rhythm template " directly to generate the rhythm fundamental curve of syllable; Also can utilize curvilinear equation G Aj(t)=a 0+ a 1T+a 2t 2+ a 3t 3+ a 4t 4Generate the pitch contour curve of the comparatively desirable syllable unit of fitting effect.
3. the Chinese intonation pitch contour generation method based on the intonation model according to claim 1 is characterized in that the intonation model parameter is generated automatically by computer program, and the first step of algorithm is to determine phrase command parameter and fundamental frequency minimum value F Min, then, by fundamental frequency minimum value F MinGo out F0 curve accurately with the phrase command parameter simulation, after the phrase command parameter optimization is good, calculate the syllable command parameter again; Independent syllable is from left to right handled, and local fundamental curve simulation all done in each syllable.
CN2007100716149A 2007-01-09 2007-01-09 Chinese intonation base frequency contour generating method based on intonation model Expired - Fee Related CN101000766B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2007100716149A CN101000766B (en) 2007-01-09 2007-01-09 Chinese intonation base frequency contour generating method based on intonation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2007100716149A CN101000766B (en) 2007-01-09 2007-01-09 Chinese intonation base frequency contour generating method based on intonation model

Publications (2)

Publication Number Publication Date
CN101000766A CN101000766A (en) 2007-07-18
CN101000766B true CN101000766B (en) 2011-02-02

Family

ID=38692705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2007100716149A Expired - Fee Related CN101000766B (en) 2007-01-09 2007-01-09 Chinese intonation base frequency contour generating method based on intonation model

Country Status (1)

Country Link
CN (1) CN101000766B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103035252B (en) * 2011-09-30 2015-04-29 西门子公司 Chinese speech signal processing method, Chinese speech signal processing device and hearing aid device
CN104347065A (en) * 2013-07-26 2015-02-11 英业达科技有限公司 Device generating appropriate voice signal according to user voice and method thereof
CN104217722B (en) * 2014-08-22 2017-07-11 哈尔滨工程大学 A kind of dolphin whistle signal time-frequency spectrum contour extraction method
CN110930975B (en) * 2018-08-31 2023-08-04 百度在线网络技术(北京)有限公司 Method and device for outputting information
CN109599090B (en) * 2018-10-29 2020-10-30 创新先进技术有限公司 Method, device and equipment for voice synthesis
CN112767923B (en) * 2021-01-05 2022-12-23 上海微盟企业发展有限公司 Voice recognition method and device
CN113421543A (en) * 2021-06-30 2021-09-21 深圳追一科技有限公司 Data labeling method, device and equipment and readable storage medium
CN113851114B (en) * 2021-11-26 2022-02-15 深圳市倍轻松科技股份有限公司 Method and device for determining fundamental frequency of voice signal

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1229194A (en) * 1997-11-28 1999-09-22 松下电器产业株式会社 Fundamental frequency pattern generating method, fundamental frequency pattern generator, and program recording medium
CN1731509A (en) * 2005-09-02 2006-02-08 清华大学 Mobile speech synthesis method
CN1787072A (en) * 2004-12-07 2006-06-14 北京捷通华声语音技术有限公司 Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1229194A (en) * 1997-11-28 1999-09-22 松下电器产业株式会社 Fundamental frequency pattern generating method, fundamental frequency pattern generator, and program recording medium
CN1787072A (en) * 2004-12-07 2006-06-14 北京捷通华声语音技术有限公司 Method for synthesizing pronunciation based on rhythm model and parameter selecting voice
CN1731509A (en) * 2005-09-02 2006-02-08 清华大学 Mobile speech synthesis method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Hiroya Fujisaki,et al..Analysis and synthesis of fundamental frequency contoursofStandard Chinese using the conmmand-response model.Speech CommunicaitonVol.47 2005.2005,Vol.47(2005),59-70.
Hiroya Fujisaki,et al..Analysis and synthesis of fundamental frequency contoursofStandard Chinese using the conmmand-response model.Speech CommunicaitonVol.47 2005.2005,Vol.47(2005),59-70. *
张鹏.汉语语音合成韵律控制方法与实现的研究.哈尔滨工程大学硕士学位论文.2006,53-59. *

Also Published As

Publication number Publication date
CN101000766A (en) 2007-07-18

Similar Documents

Publication Publication Date Title
CN101000765B (en) Speech synthetic method based on rhythm character
CN101000766B (en) Chinese intonation base frequency contour generating method based on intonation model
Chen et al. Production of weak elements in speech–evidence from f₀ patterns of neutral tone in Standard Chinese
Story Phrase-level speech simulation with an airway modulation model of speech production
Lindblom The status of phonetic gestures
CN106128450A (en) The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese
CN102426834B (en) Method for testing rhythm level of spoken English
CN103165126A (en) Method for voice playing of mobile phone text short messages
Bellegarda et al. Statistical prosodic modeling: from corpus design to parameter estimation
Li et al. Analysis and modeling of F0 contours for Cantonese text-to-speech
Sanchez et al. Hierarchical modeling of F0 contours for voice conversion
CN101887719A (en) Speech synthesis method, system and mobile terminal equipment with speech synthesis function
Přibil et al. GMM-based speaker gender and age classification after voice conversion
Chomphan et al. Tone correctness improvement in speaker dependent HMM-based Thai speech synthesis
Kröger et al. Articulatory synthesis of speech and singing: State of the art and suggestions for future research
CN104376850A (en) Estimation method for fundamental frequency of Chinese whispered speech
Anumanchipalli et al. A statistical phrase/accent model for intonation modeling
TWI402824B (en) A pronunciation variation generation method for spontaneous speech synthesis
Jacewicz et al. Variability in within-category implementation of stop consonant voicing in American English-speaking children
Gutierrez-Arriola et al. New rule-based and data-driven strategy to incorporate Fujisaki's F/sub 0/model to a text-to-speech system in Castillian Spanish
Li et al. A lyrics to singing voice synthesis system with variable timbre
Mertens et al. Comparing approaches to pitch contour stylization for speech synthesis
Sulír et al. Development of the Slovak HMM-based tts system and evaluation of voices in respect to the used vocoding techniques
Dogil et al. Towards a model of target oriented production of prosody.
Zhang et al. Emotional speech synthesis based on DNN and PAD emotional state model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110202

Termination date: 20140109