CN101000766B - Chinese intonation base frequency contour generating method based on intonation model - Google Patents
Chinese intonation base frequency contour generating method based on intonation model Download PDFInfo
- Publication number
- CN101000766B CN101000766B CN2007100716149A CN200710071614A CN101000766B CN 101000766 B CN101000766 B CN 101000766B CN 2007100716149 A CN2007100716149 A CN 2007100716149A CN 200710071614 A CN200710071614 A CN 200710071614A CN 101000766 B CN101000766 B CN 101000766B
- Authority
- CN
- China
- Prior art keywords
- syllable
- phrase
- intonation
- pitch contour
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Electrophonic Musical Instruments (AREA)
Abstract
A method for generating intonation base frequency outline Chinese based on intonation model includes outputting phrase unit base frequency outline curve of inputted label phonetic code sequence through phrase control mechanism, generating and outputting syllable unit base frequency outline curve of inputted label phonetic code sequence through syllable control mechanism, carrying out logarithm superposition of obtained phrase unit base frequency outline and obtained syllable unit base frequency outline with minimum base frequency value F to generate and output base frequency outline curve of intonation.
Description
(1) technical field
The present invention relates to the voice process technology field, be specifically related to a kind of Chinese intonation pitch contour generation method in the speech synthesis technique based on the intonation model
(2) background technology
At present, Chinese voice synthetic method adopts the time domain waveform splicing speech synthesis technique based on big corpus usually.In this method, the speech primitive of synthetic statement from a record in advance down, select the corpus of huge natural-sounding, system is according to certain rule or cost function or statistical method etc., and directly screening synthesis unit or fragment are spliced from corpus.Can imagine as long as this corpus is enough big, might splice any statement theoretically.Because synthetic speech primitive all comes from the original transcription of nature, or a syllable, or a kind of language fragments of random length, as multi-character words or prosodic phrase, therefore, the sharpness and the naturalness of synthetic back voice are all very high.This method has been avoided speech primitive is done rhythm adjustment, need not make the conversion process of time domain or frequency domain basically to signal.Yet the rhythm of Chinese is complicated and changeable, and intonation also is diversified, and therefore the synthetic speech that adopts said method to obtain can't satisfy people's requirement.Compare with natural-sounding, the sentence that these systems are synthetic and the voice naturalness and the intelligibility of chapter are relatively low, and " machine flavor " is denseer, and people sound that it not is very comfortable feeling.Its reason is: also do not obtain gratifying achievement so far on the rhythm control method of phonetic synthesis, thereby having restricted this technology comes into the market on a large scale, and being exactly the fundamental curve of intonation, major issue wherein can't adjust, or the intonation model can't reflect the intonation rule of Chinese, or the like.
(3) summary of the invention
The object of the present invention is to provide the intonation of the tone of a kind of phonetic feature, Chinese and characteristics and Chinese and pattern, further improve the Chinese intonation pitch contour generation method based on the intonation model of the synthetic naturalness of Chinese speech from Chinese.
The object of the present invention is achieved like this: it comprises the attainable step of following computing machine:
Input mark phonetic sign indicating number sequence;
The pitch contour of phrase unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be phrase information; If, then send into to the phrase control gear as phrase command, simultaneously according to this phrase prosodic information, index goes out corresponding phrase unit rhythm template from rhythm template base, generates and exports the pitch contour curve of phrase unit, and be retained in the buffer zone; Otherwise continue to search phrase information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output phrase unit;
The pitch contour of syllable unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be syllable information; If then order inputs to the syllable control gear as syllable, simultaneously according to this syllable rhythm information, index goes out corresponding syllable unit rhythm template from rhythm template base, generates and exports the pitch contour curve of syllable unit, and be retained in the buffer zone; Otherwise continue to search syllable information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output syllable unit;
Intonation pitch contour stack step, the pitch contour and the fundamental frequency minimum value F of the pitch contour of the phrase unit that the phrase control gear is sent, the syllable unit that the syllable control gear is sent
Min, according to Chinese intonation model, carry out logarithm stack by tagmeme, amplitude size and time length; Do not dispose if judge mark phonetic sign indicating number sequence, then return and continue to carry out; Otherwise, generate and export intonation pitch contour curve and give follow-up signal Processing step.
The present invention also has some technical characterictics like this:
1, described pitch contour curve, its mathematic(al) representation is as follows:
Or
Or
Wherein:
F
Min: the fundamental frequency minimum value of sentence;
I: the number of phrase; R
j: i phrase attenuation coefficient, empirical value are 3/s; T
0i: i the time that the phrase control command takes place; A
Pi: the amplitude of i phrase control command; G
Pi (m): represent different phrase accent types;
J: the number of syllable or rhythm speech; A
Aj: the amplitude of j syllable control command; T
1j: j the time that the syllable control command begins; T
2j: j the time that the syllable control command finishes; B
j: the intrinsic angle value of j syllable control command under the syllable control gear, empirical value is 20/s; θ
j: the maximum permissible value of the syllable composition of j syllable control command, empirical value are 0.9; G
Aj (n): represent different syllable accent types;
2, described model parameter is generated automatically by computer program, the first step of algorithm be to determine phrase command parameter and fundamental frequency minimum value F
Min, then, by fundamental frequency minimum value F
MinGo out F accurately with the phrase parameter simulation
0Curve after the parameter optimization of phrase unit is good, calculates the parameter of syllable unit again; Independent rhythm speech is from left to right handled, and local fundamental curve simulation all done in each rhythm speech.
Useful advantage of the present invention has:
(1) from the mark phonetic sign indicating number sequence of input text, obtains the prosodic information of phrase and syllable, generate the rhythm structure requirement that the intonation pitch contour meets natural-sounding;
(2) pitch contour of phrase unit and the pitch contour of syllable unit are handled the time tagmeme that can determine phrase unit, syllable unit exactly respectively;
(3) adopt phrase unit rhythm template and syllable unit rhythm template can simplify the generative process of the pitch contour of phrase unit, syllable unit.Simultaneously, can reflect better that the rhythm changes requirement complicated and changeable;
(4) phrase control gear and syllable control gear are regarded as the second order oscillatory system of decay, met the physiological property of people's vocal organs.
Chinese is different from other department of western languages, shows many aspects such as syntactic structure, syntax rule, acoustic characteristic, rhythm structure.At first, Chinese is one word for one tone, i.e. monosyllable; Secondly, Chinese is tone language, and tone has distinguishes the justice effect, and each word all has fixing tone (fundamental frequency shape).And can morph in the tone front and back between word and word influence each other, even lost original accent type, coarticulation phenomenon (change of tune phenomenon) promptly occurs.Simultaneously, also have of short duration pause in the middle of the pronunciation of continuous statement.Everyone has a basic frequency in a minute, is called fundamental frequency, and it has embodied speaker's tone height, and in addition, people also have difference of sound size or the like in a minute.In the literary composition of Chinese language conversion (TTS) system, prediction, analysis and the control of prosodic informations such as speech pitch, duration, amplitude is called rhythm control.
At this situation, the inventor is from the phonetic feature of Chinese, and the intonation and the pattern of the tone of Chinese and characteristics, Chinese are set out, and constructs the complete Chinese intonation pitch contour generation method based on the intonation model of a cover, has improved the naturalness of synthetic speech.Each step among the present invention and module, submodule all can be realized by computer program, and operability, transplantability are strong, applied widely.
(4) description of drawings
Fig. 1 is a Chinese intonation pitch contour generation model block diagram;
Fig. 2 is that Chinese intonation pitch contour generates block diagram;
Fig. 3 is Chinese intonation pitch contour product process figure;
Fig. 4 is the phrase fundamental curve of attenuation characteristic;
Fig. 5 is the syllable fundamental curve of characteristic of raising up;
Fig. 6 is the prosodic features control flow chart;
Fig. 7 is the computer hardware system block diagram of the embodiment of the invention.
(5) embodiment
The present invention is described in further detail below in conjunction with the drawings and specific embodiments:
In conjunction with Fig. 2, the present invention includes the attainable step of following computing machine:
Input mark phonetic sign indicating number sequence;
The pitch contour of phrase unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be phrase information; If, then send into to the phrase control gear as phrase command, simultaneously according to this phrase prosodic information, index goes out corresponding phrase unit rhythm template from rhythm template base, generates and exports the pitch contour curve of phrase unit, and be retained in the buffer zone; Otherwise continue to search phrase information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output phrase unit;
The pitch contour of syllable unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be syllable information; If then order inputs to the syllable control gear as syllable, simultaneously according to this syllable rhythm information, index goes out corresponding syllable unit rhythm template from rhythm template base, generates and exports the pitch contour curve of syllable unit, and be retained in the buffer zone; Otherwise continue to search syllable information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output syllable unit;
Intonation pitch contour stack step, the pitch contour and the fundamental frequency minimum value F of the pitch contour of the phrase unit that the phrase control gear is sent, the syllable unit that the syllable control gear is sent
Min, according to Chinese intonation model, carry out logarithm stack by tagmeme, amplitude size and time length; Do not dispose if judge mark phonetic sign indicating number sequence, then return and continue to carry out; Otherwise, generate and export intonation pitch contour curve and give follow-up signal Processing step.
Wherein each step realizes by computer program.
Embodiment:
1, the structure of rhythm template base
The structure of rhythm template base adopts conventional method to get final product, and is identical with the method for generally setting up database, just carefully do not lift here.The present invention selects the minimum in the Chinese to listen the unit of distinguishing after taking all factors into consideration various factors---and syllable is as the primitive of phonetic synthesis, and a plurality of samples stored in a syllable in the sound bank, and the soft and stress tone and the fundamental curve of each sample also have nothing in common with each other.
2, the intonation model of Chinese
Pronunciation nature, a complete statement mainly show three aspects: the one, and the accent type of sentence is mainly reflected on the fundamental frequency of sentence, i.e. the pitch curve of sentence; The 2nd, prosodic phrase and the particular location of rhythm speech in sentence are because they have reflected the prosodic features attribute change of whole sentence; The 3rd, the stress of sentence and stall position, stress can highlight and emphasize the Semantic center of whole sentence, pauses to have reflected the rhythm rhythm of sentence; In aspect these three, the fundamental curve of sentence is particularly important, and it has reflected that the whole sentence rhythm changes the trend of notable attribute and whole sentence fundamental curve profile varying.
Can be the F of a sentence
0The pitch contour curve is regarded the pitch contour curve of phrase unit, the pitch contour curve and the fundamental frequency minimum value F of syllable unit as
MinStack, pitch contour is represented with logarithmic coordinate.Wherein the pitch contour curve of phrase unit has reflected that the pitch contour of the sentence overall situation changes, and the pitch contour curve of syllable unit has reflected the local fundamental frequency profile variations of syllable or rhythm speech, and fundamental frequency minimum value F
MinRepresented the low-limit frequency that the human vocal band is vibrated sound.Phrase unit and syllable unit belong to phrase control gear and syllable control gear respectively, and two control gears are similar to the second order oscillatory system of decay.The input of phrase control gear is a phrase command, and output is the pitch contour of phrase unit; And the input of syllable control gear is the syllable order, and output is the pitch contour of syllable unit.Phrase command can be described with an impulse function, and the syllable order can be described with a step function.These functions are made up of two groups of different control commands and parameter respectively:
The ratio of damping of (1) timing of phrase command, amplitude and phrase control gear;
(2) ratio of damping of the syllable order moment, amplitude and the syllable control gear that begin and finish.
It is constant that these parameters must keep in the time period of a setting, and promptly the parameter of phrase unit is constant a prosodic phrase inside, and the parameter of syllable unit is constant in syllable or rhythm speech, fundamental frequency minimum value F
MinConstant in whole sentence.Chinese intonation pitch contour generation model block diagram, as shown in Figure 1.
Based on above-mentioned Chinese intonation pitch contour generation model, be phrase command and syllable order with two kinds of orders, as the input of sentence intonation model, and model is output as the pitch contour curve of sentence, and its mathematic(al) representation is as follows:
Or
(shape function is transferred in the phrase unit)
Or
(syllable unit accent shape function)
Wherein:
F
Min: the fundamental frequency minimum value of sentence;
I: the number of phrase; R
i: i phrase attenuation coefficient, empirical value are 3/s; T
0i: i the time that the phrase control command takes place; A
Pi: the amplitude of i phrase control command; G
Pi (m): represent different phrase accent types.
J: the number of syllable or rhythm speech; A
Aj: the amplitude of j syllable control command; T
1j: j the time that the syllable control command begins; T
2j: j the time that the syllable control command finishes; B
j: the intrinsic angle value of j syllable control command under the syllable control gear, empirical value is 20/s; θ
j: the maximum permissible value of the syllable composition of j syllable control command, empirical value are 0.9; G
Aj (n): represent different syllable accent types.
The part 1 of formula (1) can be regarded as and makes vocal cords keep the fundamental frequency minimum value of vibration; Part 2 is represented the pitch contour of phrase unit; The 3rd part is represented the pitch contour of syllable unit; The three becomes logarithm superposition form.Here fundamental frequency minimum value F
MinBe that voice and accent type by sentence determined, through whole statement; Secondly the fundamental frequency change curve of stack phrase on it obtains the basic trend of the fundamental frequency centrode of a sentence; On the basis of this fundamental frequency centrode, continue to press the fundamental frequency change curve of tagmeme stack syllable or rhythm speech then.At last, the result of these three partial stacks is the fundamental frequency change curve of a complete sentence.
To the slow trend of falling of phrase, can be by regulating R
iSize change G
Pi(t) attenuation characteristic, and then reach the purpose of adjusting phrase fundamental frequency trend.R
iBe worth greatly more, then attenuation degree is big more, and it is serious more that the phrase fundamental curve has a down dip; Simultaneously, R
iSize also reflected the length of intonation phrase indirectly.Equally, can be to the slow trend that rises of syllable by regulating B
jSize realize B
jBe worth greatly more, it is obvious more that the fundamental curve of syllable raises up.Fig. 4 and Fig. 5 have listed the raise up fundamental curve of characteristic of the fundamental curve of phrase attenuation characteristic and syllable respectively.
For the concrete accent shape of each phrase, determine phrase accent shape according to the phrase information in the phonetic sign indicating number sequence of mark, its phrase transfers shape function can adopt " phrase unit rhythm template " directly to generate the pitch contour curve of phrase.
For the concrete accent shape of each syllable, determine syllable accent shape according to the syllable information in the phonetic sign indicating number sequence of mark, its syllable transfers shape function can adopt " syllable unit rhythm template " directly to generate the rhythm fundamental curve of syllable, also can utilize curvilinear equation
Generate the comparatively desirable syllable of fitting effect and transfer deltoid.Transfer length to determine by the timing starting point and the terminal point of the voiced segments of correspondence; Transferring the territory to use with transferring long corresponding staged transfers the territory amplitude to control.
3, the setting of model parameter
Model parameter is generated automatically by computer program.Based on overlapping principle, the first step of algorithm be to determine phrase command parameter and fundamental frequency minimum value F
Min, this step can separate with the definite of syllable command parameter.Then, by fundamental frequency minimum value F
MinGo out F accurately with the phrase parameter simulation
0Curve.After the parameter optimization of phrase model is good, calculate the parameter of syllable unit again.
A syllable of fundamental curve simulation or the rhythm speech that the syllable order generates.Independent rhythm speech is from left to right handled, and whole syllable unit is not carried out global optimization, but local fundamental curve simulation all done in each rhythm speech.To this F
0The processing of fundamental curve from left to right should have two restrictive conditions: condition is the curve after the syllable command affects that prevents the back is optimized; Another condition is to guarantee also can estimate syllable or rhythm speech under the inadequate situation of command parameter in front.
4, based on the F of intonation model
0Synthetic
(1) ratio of damping
Phrase unit and syllable unit are used as the constant of damping time and are handled.For the phrase unit, the ratio of damping standard value is 3.1Hz.The ratio of damping average of all speakers and all syllables or rhythm speech is 16Hz.
(2) fundamental frequency minimum value F
Min
Fundamental frequency minimum value F
MinThe distribution dispersion is less, and the scope of representative value is that the male sex is 70-80Hz, and the women is 120-140Hz.
(3) amplitude of phrase command and timing
F in the statement has been represented in the phrase unit
0The overall situation of curve has a down dip and slowly changes, and is the basis of intonation fundamental curve.On the fundamental frequency amplitude, the amplitude of phrase command is to determine the product coefficient of side-play amount on frequency domain of phrase fundamental curve, is F in the sentence
0The direct yardstick that has a down dip, and relevant with the speaker to a great extent.Sentence pattern shows on the overall situation by the fundamental curve of phrase unit, and for example, the fundamental curve of declarative sentence is the situation that has a down dip, and from the beginning the fundamental curve of general question and disjunctive question has a down dip earlier, raises up to afterbody then again to tail.From the time, the fundamental curve of phrase reaches maximal value relatively earlier, and along with the major part of sentence descends separately.The first of the peak value of phrase fundamental curve and sentence or prosodic phrase causes, so the timing of phrase command is directly according to ratio of damping (3.1Hz).First phrase command before sentence begins is set to 323ms, and this has also proved F
0Generation and the result of study of control, disclosed the activity before throat's muscle pronunciation.
(4) syllable order amplitude
The syllable amplitude is to determine the side-play amount on frequency domain of syllable curve and the product coefficient of syllable peak value height, and amplitude is relevant with the position of syllable to a great extent.The amplitude of sentence last or end syllable joint order amplitude other position in the sentence, the amplitude of noun will be higher than other part of speech, and the syllable order amplitude before the phrasal boundary is than the amplitude high about 10~20% of other position.
(5) syllable order duration
The duration of syllable order can be by this syllable place the duration prediction of rhythm speech go out, the degree of correlation of the two (r=0.84), i.e. the about variation more than 70% of syllable order duration can obtain from the duration of rhythm speech.
(6) syllable command position
Instantaneous distance in the non-sentence tail position between syllable (or rhythm speech) beginning and the order beginning approximately is 10% of syllable (or rhythm speech) duration, promptly will have one quiet section between syllable (or rhythm speech) order beginning and syllable (or rhythm speech) pronunciation beginning, this distance goes to zero in the rhythm speech of sentence end.
Therefore,, can set up the adjustment that set of rule is come controlled variable, such as statement pattern, statement stress, phrasal boundary or word stress, as an artificial intonation curve of given sentence generation according to top analysis.And need the information of input to comprise the position of speech syllable, the duration of rhythm speech and their part of speech.
Here the rule that is proposed is based on The result of statistics for basic, and the parameter that provides is a mean value, so the curve that produces is not represented any one real speaker.But illustrate from another aspect: if can catch speaker's feature accurately, so will be very approaching by above-mentioned rule and intonation pattern that model produced and the intonation pattern that the speaker who is modeled sends.The intonation model parameter sees Table 1.
Table 1 intonation model parameter table
The intonation model parameter | Parameter declaration |
Ratio of damping | Phrase unit 3.1Hz, syllable unit 16Hz. |
Fundamental frequency minimum value F Min | The scope of representative value is that the male sex is 70-80Hz, and the women is 120-140Hz. |
The phrase command amplitude | The amplitude of phrase command is to determine the product coefficient of side-play amount on frequency domain of phrase fundamental curve.Be F in the sentence 0The direct yardstick that has a down dip, and relevant with the speaker to a great extent. |
Syllable order amplitude | The syllable amplitude is to determine the side-play amount on frequency domain of syllable curve and the product coefficient of syllable peak value height, and amplitude is relevant with the position of syllable to a great extent.Syllable order amplitude before the phrasal boundary is than the amplitude high about 10~20% of other position. |
The phrase command time is provided with | First phrase command before sentence begins is set to 323ms. |
The syllable order time is provided with | Instantaneous distance in the non-sentence tail position between syllable (or rhythm speech) order beginning and the syllable pronunciation beginning approximately is 10% of syllable (or rhythm speech) duration, promptly will have one quiet section between syllable (or rhythm speech) order beginning and syllable (or rhythm speech) pronunciation beginning, this distance goes to zero in the rhythm speech of sentence end. |
The phrase command duration | The phrase command duration can obtain from the duration of prosodic phrase. |
Syllable order duration | The about variation more than 70% of syllable order duration can obtain from the duration of rhythm speech. |
Fig. 6 is the prosodic features control flow chart, in conjunction with Fig. 6, the phonetic sign indicating number sequence that contains rhythm structure information is analyzed and converted to computing machine to the text message of keying in, according to rhythm structure information, the phrase position of mark text sentence and number, each phrase intensity, syllable number, syllable transfer shape, syllable length, syllable to transfer parameters such as territory amplitude and whole sentence keynote value; Regulate also dose relevant controlling parameter with artificial and parameter optimization algorithm, it is comprehensive to press the model layering, calculates the pitch contour data that form a complete sentence; Then,, adopt the PSOLA method that the prosodic parameter of each syllable waveform in the sound storehouse is adjusted at last, the synthetic continuous speech of splicing according to fundamental frequency output valve and corresponding duration parameters.
5, system environments
In conjunction with Fig. 7, be one and can implement suitable computingasystem environment of the present invention.This computingasystem environment just can be implemented an embodiment of computingasystem environment of the present invention, and is not to be that range of application of the present invention or function are carried out any restriction.Computing environment should not be considered to that the combination of any one parts shown in the example operational environment or parts is had any dependence or requirement yet.
The present invention can be used for numerous specific or unspecific computingasystem environment or configurations, as: personal computer, small-size computer, medium-size computer, mainframe computer, network computer, server computer, hand or laptop devices, multicomputer system is based on the system of microprocessor, set-top box, the programmable electronic consumption device comprises any above-mentioned system or the distributed computing environment of device, or the like.
Can the use a computer general modfel of executable instruction of the present invention is described, for example the program module of computing machine.Program module comprises program, subroutine, object, control, assembly, data structure etc., and they are used for carrying out specific task or realize specific abstract data type.The present invention also can be applied to distributed computing environment, wherein executes the task by the teleprocessing device that utilizes the communication network link.In distributed computing environment, program module can leave in the local and remote computer-readable storage medium that comprises memory storage apparatus simultaneously.
The formation of computer installation shown in Figure 7 comprises: one or more CPU (central processing unit), internal storage, external memory storage, input equipment interface, output device interface and the system bus that connects above-mentioned each unit or parts.System bus can be any bus structure that comprise in the bus structure of following several types: memory bus or memory controller, a peripheral bus and use the local bus of bus in the various bus structure.These bus structure: as industrial standard architectures (ISA) bus, MCA (MCA) bus, the ISA line of enhancing, VESA (VESA), local bus and peripheral component interconnect (PCI) bus (also being mezzanine bus Mezzanine bus), or the like.
The user can be by input media to defeated people's order of computer port and information.These input medias can be keyboard, microphone and pointing device such as mouse, trace ball or touch pad, can also be other input media (not drawing on the figure), for example control lever, game mat, the big line of disc type satellite television (satellite dish), scanner etc.Above-mentioned input media normally is connected to processing unit by user's input interface that is coupled to system bus, but also can be connected with bus structure by other interface, for example parallel port, game port or USB (universal serial bus) (USB).The display device of monitor or other types is by an interface, and for example video interface is connected to system bus.Except this monitor, computing machine also can comprise other output peripheral equipment, for example loudspeaker and printer, and they connect by an outside output interface.
Computing machine can by the logic ways of connecting be connected to one or more how far journey computing machine (for example remote computer) thus in network environment, operate.
Claims (3)
1. Chinese intonation pitch contour generation method based on the intonation model is characterized in that it comprises the attainable step of following computing machine:
Input mark phonetic sign indicating number sequence step;
The pitch contour of phrase unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be phrase information; If, then send into to the phrase control gear as phrase command, simultaneously according to this phrase prosodic information, index goes out corresponding phrase unit rhythm template from rhythm template base, generates and exports the pitch contour curve of phrase unit, and be retained in the buffer zone; Otherwise continue to search phrase information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output phrase unit;
The pitch contour of syllable unit generates step, and the mark phonetic sign indicating number sequence of input is from first to last extracted prosodic information successively, judges whether to be syllable information; If then order inputs to the syllable control gear as syllable, simultaneously according to this syllable rhythm information, index goes out corresponding syllable unit rhythm template from rhythm template base, generates and exports the pitch contour curve of syllable unit, and be retained in the buffer zone; Otherwise continue to search syllable information; By that analogy, until whole mark phonetic sign indicating number sequence search being finished the pitch contour curve of output syllable unit;
Intonation pitch contour stack step, the pitch contour and the fundamental frequency minimum value F of the pitch contour of the phrase unit that the phrase control gear is sent, the syllable unit that the syllable control gear is sent
Min, according to Chinese intonation model, carry out logarithm stack by tagmeme, amplitude size and time length; Do not dispose if judge mark phonetic sign indicating number sequence, then continue to carry out; Otherwise the intonation pitch contour curve that generates and export the mark phonetic sign indicating number sequence of being imported is given follow-up signal Processing step.
2. the Chinese intonation pitch contour generation method based on the intonation model according to claim 1 is characterized in that described intonation pitch contour curve, and its mathematic(al) representation is as follows:
Wherein:
F
Min: the fundamental frequency minimum value of sentence;
I: the number of phrase; R
i: i phrase attenuation coefficient, empirical value are 3/s; T
0i: i the time that the phrase control command takes place; A
Pi: the amplitude of i phrase control command; G
Pi(t): represent different phrase accent types;
J: the number of syllable or rhythm speech; A
Aj: the amplitude of j syllable control command; T
1j: j the time that the syllable control command begins; T
2j: j the time that the syllable control command finishes; B
j: the intrinsic angle value of j syllable control command under the syllable control gear, empirical value is 20/s; θ
j: the maximum permissible value of the syllable composition of j syllable control command, empirical value are 0.9; G
Aj(t): represent different syllable accent types;
Concrete accent type G for each phrase
Pi(t), can also determine phrase accent type according to the phrase information in the phonetic sign indicating number sequence of mark, its phrase transfers type function can adopt " phrase unit rhythm template " directly to generate the pitch contour curve of phrase;
Concrete accent type G for each syllable
Aj(t), can also determine syllable accent type according to the syllable information in the mark phonetic sign indicating number sequence, its syllable transfers type function can adopt " syllable unit rhythm template " directly to generate the rhythm fundamental curve of syllable; Also can utilize curvilinear equation G
Aj(t)=a
0+ a
1T+a
2t
2+ a
3t
3+ a
4t
4Generate the pitch contour curve of the comparatively desirable syllable unit of fitting effect.
3. the Chinese intonation pitch contour generation method based on the intonation model according to claim 1 is characterized in that the intonation model parameter is generated automatically by computer program, and the first step of algorithm is to determine phrase command parameter and fundamental frequency minimum value F
Min, then, by fundamental frequency minimum value F
MinGo out F0 curve accurately with the phrase command parameter simulation, after the phrase command parameter optimization is good, calculate the syllable command parameter again; Independent syllable is from left to right handled, and local fundamental curve simulation all done in each syllable.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2007100716149A CN101000766B (en) | 2007-01-09 | 2007-01-09 | Chinese intonation base frequency contour generating method based on intonation model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2007100716149A CN101000766B (en) | 2007-01-09 | 2007-01-09 | Chinese intonation base frequency contour generating method based on intonation model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101000766A CN101000766A (en) | 2007-07-18 |
CN101000766B true CN101000766B (en) | 2011-02-02 |
Family
ID=38692705
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2007100716149A Expired - Fee Related CN101000766B (en) | 2007-01-09 | 2007-01-09 | Chinese intonation base frequency contour generating method based on intonation model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101000766B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103035252B (en) * | 2011-09-30 | 2015-04-29 | 西门子公司 | Chinese speech signal processing method, Chinese speech signal processing device and hearing aid device |
CN104347065A (en) * | 2013-07-26 | 2015-02-11 | 英业达科技有限公司 | Device generating appropriate voice signal according to user voice and method thereof |
CN104217722B (en) * | 2014-08-22 | 2017-07-11 | 哈尔滨工程大学 | A kind of dolphin whistle signal time-frequency spectrum contour extraction method |
CN110930975B (en) * | 2018-08-31 | 2023-08-04 | 百度在线网络技术(北京)有限公司 | Method and device for outputting information |
CN109599090B (en) * | 2018-10-29 | 2020-10-30 | 创新先进技术有限公司 | Method, device and equipment for voice synthesis |
CN112767923B (en) * | 2021-01-05 | 2022-12-23 | 上海微盟企业发展有限公司 | Voice recognition method and device |
CN113421543A (en) * | 2021-06-30 | 2021-09-21 | 深圳追一科技有限公司 | Data labeling method, device and equipment and readable storage medium |
CN113851114B (en) * | 2021-11-26 | 2022-02-15 | 深圳市倍轻松科技股份有限公司 | Method and device for determining fundamental frequency of voice signal |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1229194A (en) * | 1997-11-28 | 1999-09-22 | 松下电器产业株式会社 | Fundamental frequency pattern generating method, fundamental frequency pattern generator, and program recording medium |
CN1731509A (en) * | 2005-09-02 | 2006-02-08 | 清华大学 | Mobile speech synthesis method |
CN1787072A (en) * | 2004-12-07 | 2006-06-14 | 北京捷通华声语音技术有限公司 | Method for synthesizing pronunciation based on rhythm model and parameter selecting voice |
-
2007
- 2007-01-09 CN CN2007100716149A patent/CN101000766B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1229194A (en) * | 1997-11-28 | 1999-09-22 | 松下电器产业株式会社 | Fundamental frequency pattern generating method, fundamental frequency pattern generator, and program recording medium |
CN1787072A (en) * | 2004-12-07 | 2006-06-14 | 北京捷通华声语音技术有限公司 | Method for synthesizing pronunciation based on rhythm model and parameter selecting voice |
CN1731509A (en) * | 2005-09-02 | 2006-02-08 | 清华大学 | Mobile speech synthesis method |
Non-Patent Citations (3)
Title |
---|
Hiroya Fujisaki,et al..Analysis and synthesis of fundamental frequency contoursofStandard Chinese using the conmmand-response model.Speech CommunicaitonVol.47 2005.2005,Vol.47(2005),59-70. |
Hiroya Fujisaki,et al..Analysis and synthesis of fundamental frequency contoursofStandard Chinese using the conmmand-response model.Speech CommunicaitonVol.47 2005.2005,Vol.47(2005),59-70. * |
张鹏.汉语语音合成韵律控制方法与实现的研究.哈尔滨工程大学硕士学位论文.2006,53-59. * |
Also Published As
Publication number | Publication date |
---|---|
CN101000766A (en) | 2007-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101000765B (en) | Speech synthetic method based on rhythm character | |
CN101000766B (en) | Chinese intonation base frequency contour generating method based on intonation model | |
Chen et al. | Production of weak elements in speech–evidence from f₀ patterns of neutral tone in Standard Chinese | |
Story | Phrase-level speech simulation with an airway modulation model of speech production | |
Lindblom | The status of phonetic gestures | |
CN106128450A (en) | The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese | |
CN102426834B (en) | Method for testing rhythm level of spoken English | |
CN103165126A (en) | Method for voice playing of mobile phone text short messages | |
Bellegarda et al. | Statistical prosodic modeling: from corpus design to parameter estimation | |
Li et al. | Analysis and modeling of F0 contours for Cantonese text-to-speech | |
Sanchez et al. | Hierarchical modeling of F0 contours for voice conversion | |
CN101887719A (en) | Speech synthesis method, system and mobile terminal equipment with speech synthesis function | |
Přibil et al. | GMM-based speaker gender and age classification after voice conversion | |
Chomphan et al. | Tone correctness improvement in speaker dependent HMM-based Thai speech synthesis | |
Kröger et al. | Articulatory synthesis of speech and singing: State of the art and suggestions for future research | |
CN104376850A (en) | Estimation method for fundamental frequency of Chinese whispered speech | |
Anumanchipalli et al. | A statistical phrase/accent model for intonation modeling | |
TWI402824B (en) | A pronunciation variation generation method for spontaneous speech synthesis | |
Jacewicz et al. | Variability in within-category implementation of stop consonant voicing in American English-speaking children | |
Gutierrez-Arriola et al. | New rule-based and data-driven strategy to incorporate Fujisaki's F/sub 0/model to a text-to-speech system in Castillian Spanish | |
Li et al. | A lyrics to singing voice synthesis system with variable timbre | |
Mertens et al. | Comparing approaches to pitch contour stylization for speech synthesis | |
Sulír et al. | Development of the Slovak HMM-based tts system and evaluation of voices in respect to the used vocoding techniques | |
Dogil et al. | Towards a model of target oriented production of prosody. | |
Zhang et al. | Emotional speech synthesis based on DNN and PAD emotional state model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20110202 Termination date: 20140109 |