CN1787072B - Method for synthesizing pronunciation based on rhythm model and parameter selecting voice - Google Patents

Method for synthesizing pronunciation based on rhythm model and parameter selecting voice Download PDF

Info

Publication number
CN1787072B
CN1787072B CN2004100969685A CN200410096968A CN1787072B CN 1787072 B CN1787072 B CN 1787072B CN 2004100969685 A CN2004100969685 A CN 2004100969685A CN 200410096968 A CN200410096968 A CN 200410096968A CN 1787072 B CN1787072 B CN 1787072B
Authority
CN
China
Prior art keywords
syllable
cost
parameters
acoustic
sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2004100969685A
Other languages
Chinese (zh)
Other versions
CN1787072A (en
Inventor
陈明
吕士楠
张连毅
武卫东
肖娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing InfoQuick SinoVoice Speech Technology Corp.
Original Assignee
JIETONG HUASHENG SPEECH TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIETONG HUASHENG SPEECH TECHNOLOGY Co Ltd filed Critical JIETONG HUASHENG SPEECH TECHNOLOGY Co Ltd
Priority to CN2004100969685A priority Critical patent/CN1787072B/en
Publication of CN1787072A publication Critical patent/CN1787072A/en
Application granted granted Critical
Publication of CN1787072B publication Critical patent/CN1787072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The invention provides a voice synthesizing method based on rhythm model and parameter-based sound selection, making acoustics rhythm parameter planning to obtain the target values of acoustics parameters expected by each syllable; then making maximum matching, selecting those with the smallest difference as really used samples; after the maximum matching, making single character matching treatment on the unmatched segments; calculating a synthesizing cost of all segment paths through all syllable candidate samples, where the synthesizing cost is determined by the difference between the acoustics parameters of each candidate sample and their planned values and the difference synthesis between the candidate samples of two adjacent syllables in the paths; obtaining a path with the lowest synthesizing cost by dynamical planning algorithm; as all syllable samples are selected, obtaining the data in a voice base and making waveform splicing and obtaining the final synthesis result.

Description

Select the phoneme synthesizing method of sound based on rhythm model and parameter
Technical field
The present invention relates to the speech synthesis technique field, be specifically related to phoneme synthesizing method.
Background technology
At present, the synthetic developing direction of Chinese speech is based on the waveform concatenation technology in extensive true recording sound storehouse.So-called extensive true recording sound storehouse, be meant the recording sound storehouse of having recorded a large amount of natural-soundings, its scope has covered the situation of the various pronunciations in most context environmentals substantially, and at different context environmentals, system will choose the raw tone fragment of mating is most spliced.Because therefore being on a grand scale of sound storehouse under nearly all situation, can both be found optimal primitive nature voice, and need not to use other technology to regulate, the final synthetic voice and the consistance of raw tone have therefore been guaranteed.In addition, selected here fragment has surmounted the level of syllable, can be multi-character words even phrase segment, has so just further guaranteed the naturalness of synthetic speech.
The shortcoming that this method exists at present is, when splicing, generally adopted and selected suitable syllable based on the method for rhythm level coupling, also promptly according to position, position in prosodic phrase and the position in speech of syllable in whole word that will synthesize, select storehouse these positions sample that mates of trying one's best that neutralizes to splice.(for example the pitch of prosodic phrase head is generally higher although there is certain dependence the real parameters,acoustic (pitch, the duration of a sound, loudness of a sound) of the syllable in short and position, and the rhythm end is lower, and the syllable duration of a sound at rhythm end is longer, the duration of a sound of the syllable in the middle of three words is the shortest etc. in three syllables), but this relation is not absolute, what is more important can not guarantee that a plurality of natural statement of recording in a large number in the storehouse has the consistent pitch or the duration of a sound in addition.Therefore in this case, will produce the uncontinuity in the splicing.For example,,,, may cause these two continuous syllables not meet the Changing Pattern of actual speech owing to do not consider actual parameters,acoustic though carried out selecting sound according to the position if from two statements, select respectively for two continuous syllables in a word.Cause the pitch saltus step on the sense of hearing like this, or the duration of a sound do not match, reduced the naturalness of voice.
The objective of the invention is at existing existing defective of waveform concatenation phoneme synthesizing method and deficiency based on extensive recording sound storehouse, adopt that a kind of to select the method for sound to carry out dynamic Chinese speech based on rhythm model and parameter synthetic, make the syllable sample that splices on real parameters,acoustic, satisfy certain rhythm model, make parameters,acoustic on changing, can control, also just can eliminate in splicing because the naturalness of selecting not matching of syllable to cause reduces.
Summary of the invention
In view of this, the present invention is based on rhythm model and carry out parameters,acoustic planning, obtain the desired value of the desirable parameters,acoustic of each syllable; Carry out maximum match again, select the real sample that uses of conduct of gap minimum.After finishing maximum match, the section at not mating carries out the processing of individual character coupling.Calculate the integrate-cost that each bar runs through the section path of all syllable candidate samples, integrate-cost is to determine by the gap between the parameters,acoustic between the candidate samples of two adjacent syllables in gap between the parameters,acoustic of each candidate samples and its planning value and the path is comprehensive.Obtain the path of integrate-cost minimum by dynamic programming algorithm.Behind the selected sample of all syllables, in sound bank, obtain data and carry out waveform concatenation, obtain final synthetic result.
Provided by the inventionly select the phoneme synthesizing method of sound, comprise the steps: based on rhythm model and parameter
(a) set up rhythm model storehouse, record sound storehouse and index database on a large scale;
(b) text of wanting synthetic speech is carried out pre-service, described pre-service comprises that punctuate, regularization of text, participle, part-of-speech tagging, syntactic analysis, rhythmite level structure are analyzed and commentaries on classics phonetic;
(c) according to the attribute of syllable: in the speech of each syllable in position, the prosodic phrase in position and the sentence sound of position and this syllable connect attribute, the company's of accent attribute, from the rhythm model storehouse, find the parameters,acoustic value that each syllable has, finish planning the parameters,acoustic of each syllable; Wherein said parameters,acoustic comprises: pitch, the duration of a sound and loudness of a sound;
(d), from index database, obtain all candidate samples that this syllable exists in extensive dictation library for each syllable;
(e) calculate the parameters,acoustic in each parameters,acoustic, location parameter and planning of mating string, the cost C between the location parameter j, find described cost C jMiddle minimum cost C MinLess than threshold value C ThThe coupling string, thereby obtain maximum match length in all candidate samples of current syllable;
(f) section that does not mate in the text is carried out the byte matching treatment:
Calculate the parameters,acoustic of parameters,acoustic, location parameter and the planning of each all candidate samples of syllable, the node cost between the location parameter;
Connection cost between all candidate samples of two adjacent syllables of calculating;
Adopt dynamic programming algorithm, in each path, calculate the path of overall cost minimum; Overall cost is the summation of the connection cost between all node costs and the adjacent node on the path for this reason;
Be provided with each syllable choose sample by optimal path the both candidate nodes of process;
(g) according to selected sample, from described extensive recording sound storehouse, obtain Wave data, splice.
Adopt method provided by the invention can solve the existing discontinuous problem of the existing splicing of waveform concatenation phoneme synthesizing method, improved the naturalness of phonetic synthesis based on extensive recording sound storehouse.
Description of drawings
Fig. 1 is the flow process of phonetic synthesis;
Fig. 2 is the flow process of maximum match step;
Figure 3 shows that individual character selects the rapid example of foot.
Embodiment
Before concrete phonetic synthesis, set up following resource base earlier:
Extensive recording sound storehouse: speech waveform data, each syllable reference position and its parameters,acoustic data (pitch, the duration of a sound, loudness of a sound) in speech waveform.
Index database: to all syllables, write down the sequence number of its all sample in extensive recording sound storehouse, searched extensive recording sound storehouse, can obtain the related data of this syllable fast by this sequence number.
Rhythm model storehouse:, also be which type of pitch, the duration of a sound, the loudness of a sound of each syllable in a word should be by the rhythm model that the statistics training obtains.The numerical value of these parameters,acoustics is closely related with the factors such as length of sentence pattern, part of speech sequence, sentence and prosodic phrase.
The flow process of phonetic synthesis as shown in Figure 1.
Specifically describe as follows:
1, pre-service
For the voice that will synthesize, at first to pass through the text pre-treatment step.This step comprises punctuate, regularization of text, participle, part-of-speech tagging, syntactic analysis, the analysis of rhythmite level structure, changes phonetic etc.Finally can obtain following result:
The phonetic of each syllable in short;
In the speech of each syllable in position, the prosodic phrase position and the sentence in the position;
Part of speech of each speech (for example noun, verb, adjective etc.) and syntactic constituent (subject, predicate, object etc.).
2, parametric programming
By some attributes, from the rhythm model storehouse, find each syllable the parameters,acoustic that should have, also be which type of pitch, the duration of a sound, the loudness of a sound of each syllable should be, finish planning to the parameters,acoustic of each syllable.These attributes comprise: this syllable be in prefix, speech, suffix or monosyllabic word; The speech at this syllable place is in beginning of the sentence, sentence or end of the sentence; What the tone of this syllable front and back is, also promptly accent connects attribute; What the simple or compound vowel of a Chinese syllable of this syllable front and the initial consonant of back be, also is that sound connects attribute; Preceding sticking, the sticking attribute in back of this syllable; The position of this syllable place prosodic phrase, the intonation pattern of this syllable place statement; The part of speech of this syllable place speech, described syntactic constituent etc.
Suppose in short total K syllable (from 1 to K), then afterwards parameters,acoustic of each syllable is as follows in its planning: X k={ H k, L k, T k, A k(k=1 ..., K) be respectively the high point of articulation, the low point of articulation, the duration of a sound and the loudness of a sound that k syllable planned.Its location parameter is Y simultaneously k={ S k, P k, W kRepresent respectively each syllable in sentence, in the prosodic phrase and the position in the speech, wherein beginning of the sentence, prosodic phrase head or prefix all are defined as 0, in the sentence, be defined as 1 in the prosodic phrase or in the speech, end of the sentence, prosodic phrase end or speech end are defined as 2.
3, obtain all candidate samples
For each syllable, from index database, obtain all samples that this syllable exists in extensive dictation library, be called candidate samples.
Index database has been listed all samples of all syllables, and is to discharge according to the order of syllable.For each syllable, total how many samples have all been write down, then the sequence number of each sample of journal in extensive recording sound storehouse.Sample identifies with its sequence number in extensive recording sound storehouse.Therefore, provide a syllable after, just can obtain its all samples in extensive recording sound storehouse fast.
4, maximum match
As shown in Figure 2, begin to handle, establish n=1 from first syllable; (S4.1)
To all candidate samples of current syllable (n syllable), check that whether the follow-up syllable of its candidate samples in former sentence is complementary with the follow-up syllable that will synthesize statement, writes down the length of its coupling.If can not carry out the coupling of follow-up syllable, then matching length was 1 (expression only can be mated self syllable); (S4.2)
Calculate the maximum match length in all candidate samples of current syllable, establish L maximum match length for this reason; (S4.3)
If matching length L is 1, expression does not have polysyllabic coupling, then changes S4.10; (S4.4)
To current syllable, select the candidate samples of all matching length>=L and the string of follow-up L-1 syllable composition thereof to be the coupling string.Here may find one or more coupling strings.Suppose to find J coupling string, and suppose that the parameters,acoustic of each sample in certain string and the location parameter in former sentence are as follows: X ' J, k=H ' J, k, L ' J, k, T ' J, k, A ' J, kAnd Y ' J, k=S ' J, k, P ' J, k, W ' J, k(j=1 ..., J, k=0 ..., L-1); (S4.5)
Calculate the parameters,acoustic in each parameters,acoustic, location parameter and planning of mating string, the cost C between the location parameter j,
C j = Σ k = 0 L - 1 f ( X n + k , X j , k ′ , Y n + k , Y j , k ′ ) L ( S 4.6 )
Wherein:
f(X i,X′ j,Y i,Y′ j)=g(X i,X′ j)+h(Y i,Y′ j)
g ( X i , X j ′ ) = ω H ( H i - H j ′ ) 2 + ω L ( L i - L j ′ ) 2 + ω T ( T i - T j ′ ) 2 + ω A ( A i - A j ′ ) 2
h(Y i,Y′ j)=ω S|S i-S′ j|+ω P|P i-P′ j|+ω W|W i-W′ j|
Wherein ω is a different parameters weight separately.
The coupling string of minimum cost is found in calculating, and establishing its cost is C Min
C min=min(C j)(j=1,J) (S4.7)
If minimum cost C MinGreater than threshold value C Th, represent that parameters,acoustic of this coupling string and the parameters,acoustic of being planned differ too big, the coupling string of this length can't obtain the result that conforms to ideal value.(S4.8a) then shorten matching length, L=L-1 changes S4.4 then; (S4.8b)
The sample of choosing that identifies syllable to be synthesized is the sample of coupling string representative, identifies a continuous L syllable altogether; (S4.9)
Meet step S4.4, n=n+L is set, maximum match is not carried out in expression, and this moment, L=1 also promptly jumped to next syllable.Perhaps meet step S4.9, L syllable of maximum match skipped in expression; (S4.10)
Whether be ultima, if not, jump to S4.2 and handle.Otherwise withdraw from the processing of maximum match.(S4.11)
5, individual character is selected
Through the step of maximum match, the designated sample of choosing of some syllable in a word, other syllable are not then specified the sample of choosing as yet.For example below in the words: " the up-to-date phonetic synthesis product of having released of Jie Tonghua sound voice technology company limited ", " technology company limited ", " voice ", " product " have had the sample of choosing through maximum match, three parts formed in then remaining syllable, " Jie Tonghua sound voice ", " the up-to-date release ", " synthesizing ", the syllable in these parts does not all have to specify chooses sample.It is exactly pointer carries out sample to the syllable in these parts selection that the individual character is here selected.These parts are called " treatment region ".
Handle operation at each treatment region below.
Suppose that this treatment region is made of N syllable, and sequence number is from C to C+N-1.Concerning each syllable, several candidate samples are arranged, this number of sampling of supposing n syllable is M n(n=C...C+N-1).Defining each candidate samples is W Ij(i=C ... C+N-1; J=1 ... .M i).Therefore, formed the lattice of throwing the net as shown in Figure 3, each candidate samples is a node in this grid.And wherein any path of running through this grid all is a possible sound result that selects.
Calculate the parameters,acoustic of parameters,acoustic, location parameter and the planning of each all candidate samples of syllable, the cost between the location parameter, be called the node cost.Parameters,acoustic and the location parameter of supposing j both candidate nodes of n syllable are X ' N, j=H ' N, j, L ' N, j, T ' N, j, A ' N, jAnd Y ' N, j=S ' N, j, P ' N, j, W ' N, j(n=1 ..., N, j=1 ..., J).Then its node cost is: D N, j=f (X n, X ' N, j, Y n, Y ' N, j).This function definition is the same.
Connection cost between all candidate samples of two adjacent syllables of calculating.For example the connection cost between k the candidate of j candidate of n syllable and n+1 syllable is: E N, j, k=g (X ' N, j, X ' N+1, k).This function definition is the same.
The overall cost that defines a paths is the summation of the connection cost between all node costs and the adjacent node on the path for this reason.
Therefore,, suppose that any one is path from first node to the end-node path for this treatment region node grid, wherein concerning n syllable, the path process be the individual node of p (n).Therefore, the overall cost in this path is:
C path = Σ n = I I + N - 1 D n , p ( n ) + Σ n = I I + N - 2 E n , p ( n ) , p ( n + 1 )
Adopt dynamic programming algorithm, in various possible paths, calculate optimal path, also promptly select the path of overall cost minimum.For example, we have chosen the represented path of line of overstriking in Fig. 3.The concrete steps of dynamic programming are as follows:
At first calculate from the local optimum path of 2 syllables of the 1st syllable to the (also promptly being the syllable of C+1), also promptly to each node W of the 2nd syllable correspondence from sequence number Ij(i=C+1, j=1...M C+1), calculate cost from all nodes of previous syllable to this node, this cost is made up of the cost that is connected of certain node of 2 syllables of node cost and this node to the of certain node of previous syllable.As shown in Figure 3, for the 2nd node of the 2nd syllable, calculate the cost of each node of the 1st syllable to this node.It is calculated as follows;
Cost(W C,1,W C+1,2)=21+6=27
Cost(W C,2,W C+1,2)=32+10=42
Cost(W C,3,W C+1,2)=24+12=36
Cost(W C,4,W C+1,2)=18+8=26
The path of cost Cost minimum is exactly the local optimum path, also promptly from W C, 4To W C+1,2The path, its local optimum path cost is 26.Equally a local optimum path that all has from certain node of first syllable to it is also arranged, suppose that local optimum path cost separately is respectively 16 and 20 to the 1st node of the 2nd syllable and the 3rd node.As shown in Figure 3.
And then calculate the local optimum path of the 3rd syllable (also being that sequence number is the syllable of C+2).Each node W to this syllable correspondence Ij(i=C+1, j=1...M C+1), calculate local path from all nodes of first syllable to the best of this node.Because from the best local path of 2 certain nodes of syllable of first syllable to the as calculated, so add from the local path cost of 3 syllables of the 2nd syllable to the as long as calculate the cost result of this best local path now.For example concerning the 2nd node of the 3rd syllable, its cost is:
Cost(W C+1,1,W C+2,2)=16+18+27=61
Cost(W C+1,2,W C+2,2)=26+22+10=58
Cost(W C+1,3,W C+2,2)=20+34+11=65
Therefore, we can know that its local optimum path is from W from the 2nd node of three syllables of second syllable to the C+1,2To W C+2,2The path, again from W C+1,2Recall local optimum path forward, know that promptly be from W from first syllable up to the optimal path of the 2nd node of the 3rd syllable from first syllable to it C, 4To W C+1,2Arrive W again C+2,2The path.
Calculate the optimal path of each node of ultima so always, the Cost value that compares the local optimum path of all these nodes again, get the minimum pairing node of Cost the last node as whole optimal path, by to the recalling of local optimal path, just can know the optimal path of an integral body then.
Be provided with each syllable on this treatment region choose sample by optimal path the both candidate nodes of process.
Handle next treatment region, till no any treatment region.
6, waveform concatenation
By top step, sample all selected in each syllable.After selecting all samples, in fact just know its sequence number in extensive recording sound storehouse, and by this sequence number, in extensive recording sound storehouse, searching, just can obtain the length value that the reference position of the pairing speech waveform data of this sample and the duration by parameters,acoustic obtain.By these values, just can from extensive recording sound storehouse, read out corresponding Wave data.All Wave datas of choosing sample are coupled together, just finished waveform concatenation, thereby obtain final phonetic synthesis result.

Claims (1)

1. one kind is selected the phoneme synthesizing method of sound based on rhythm model and parameter, comprises the steps:
(a) set up rhythm model storehouse, record sound storehouse and index database on a large scale;
(b) text of wanting synthetic speech is carried out pre-service, described pre-service comprises that punctuate, regularization of text, participle, part-of-speech tagging, syntactic analysis, rhythmite level structure are analyzed and commentaries on classics phonetic;
(c) according to the attribute of syllable: in the speech of each syllable in position, the prosodic phrase in position and the sentence sound of position and this syllable connect attribute, the company's of accent attribute, from the rhythm model storehouse, find the parameters,acoustic value that each syllable has, finish planning the parameters,acoustic of each syllable; Wherein said parameters,acoustic comprises: pitch, the duration of a sound and loudness of a sound;
(d), from index database, obtain all candidate samples that this syllable exists in extensive dictation library for each syllable;
(e) parameters,acoustic in each coupling that may form by the candidate samples institute of calculating adjacent syllable parameters,acoustic, location parameter and planning of going here and there, the cost C between the location parameter j, find described cost C jMiddle minimum cost C MinLess than threshold value C ThThe coupling string, be provided with these adjacent syllables choose sample mate for this reason the string pairing candidate samples;
(f) section that does not mate in the text is carried out the byte matching treatment:
Calculate the parameters,acoustic of parameters,acoustic, location parameter and the planning of each all candidate samples of syllable, the node cost between the location parameter;
Connection cost between all candidate samples of two adjacent syllables of calculating;
Adopt dynamic programming algorithm, in each path, calculate the path of overall cost minimum; Described overall cost is the summation of the connection cost between all node costs and the adjacent node on the path for this reason;
Be provided with each syllable choose sample by optimal path the both candidate nodes of process;
(g) according to selected sample, from described extensive recording sound storehouse, obtain Wave data, splice.
CN2004100969685A 2004-12-07 2004-12-07 Method for synthesizing pronunciation based on rhythm model and parameter selecting voice Active CN1787072B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2004100969685A CN1787072B (en) 2004-12-07 2004-12-07 Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2004100969685A CN1787072B (en) 2004-12-07 2004-12-07 Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

Publications (2)

Publication Number Publication Date
CN1787072A CN1787072A (en) 2006-06-14
CN1787072B true CN1787072B (en) 2010-06-16

Family

ID=36784491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2004100969685A Active CN1787072B (en) 2004-12-07 2004-12-07 Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

Country Status (1)

Country Link
CN (1) CN1787072B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1945692B (en) * 2006-10-16 2010-05-12 安徽中科大讯飞信息科技有限公司 Intelligent method for improving prompting voice matching effect in voice synthetic system
CN101000766B (en) * 2007-01-09 2011-02-02 黑龙江大学 Chinese intonation base frequency contour generating method based on intonation model
JP6234134B2 (en) * 2013-09-25 2017-11-22 三菱電機株式会社 Speech synthesizer
CN104575487A (en) * 2014-12-11 2015-04-29 百度在线网络技术(北京)有限公司 Voice signal processing method and device
CN104916284B (en) * 2015-06-10 2017-02-22 百度在线网络技术(北京)有限公司 Prosody and acoustics joint modeling method and device for voice synthesis system
CN105489216B (en) * 2016-01-19 2020-03-03 百度在线网络技术(北京)有限公司 Method and device for optimizing speech synthesis system
CN106356052B (en) * 2016-10-17 2019-03-15 腾讯科技(深圳)有限公司 Phoneme synthesizing method and device
WO2018167522A1 (en) * 2017-03-14 2018-09-20 Google Llc Speech synthesis unit selection
CN109599090B (en) * 2018-10-29 2020-10-30 创新先进技术有限公司 Method, device and equipment for voice synthesis
CN110047462B (en) * 2019-01-31 2021-08-13 北京捷通华声科技股份有限公司 Voice synthesis method and device and electronic equipment
CN110797006B (en) * 2020-01-06 2020-05-19 北京海天瑞声科技股份有限公司 End-to-end speech synthesis method, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5220629A (en) * 1989-11-06 1993-06-15 Canon Kabushiki Kaisha Speech synthesis apparatus and method
CN1175052A (en) * 1996-07-25 1998-03-04 松下电器产业株式会社 Phoneme synthesizing method and equipment
US20030061051A1 (en) * 2001-09-27 2003-03-27 Nec Corporation Voice synthesizing system, segment generation apparatus for generating segments for voice synthesis, voice synthesizing method and storage medium storing program therefor
US20030195743A1 (en) * 2002-04-10 2003-10-16 Industrial Technology Research Institute Method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure
CN1471027A (en) * 2002-07-25 2004-01-28 摩托罗拉公司 Method and apparatus for compressing voice library

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5220629A (en) * 1989-11-06 1993-06-15 Canon Kabushiki Kaisha Speech synthesis apparatus and method
CN1175052A (en) * 1996-07-25 1998-03-04 松下电器产业株式会社 Phoneme synthesizing method and equipment
US20030061051A1 (en) * 2001-09-27 2003-03-27 Nec Corporation Voice synthesizing system, segment generation apparatus for generating segments for voice synthesis, voice synthesizing method and storage medium storing program therefor
US20030195743A1 (en) * 2002-04-10 2003-10-16 Industrial Technology Research Institute Method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure
CN1471027A (en) * 2002-07-25 2004-01-28 摩托罗拉公司 Method and apparatus for compressing voice library

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
朱东来,王仁华,凌震华,李威.基于隐马尔科夫模型的汉语韵律词基频模型.声学学报(中文版)27 6.2002,27(6),523-528.
朱东来,王仁华,凌震华,李威.基于隐马尔科夫模型的汉语韵律词基频模型.声学学报(中文版)27 6.2002,27(6),523-528. *

Also Published As

Publication number Publication date
CN1787072A (en) 2006-06-14

Similar Documents

Publication Publication Date Title
JP4080989B2 (en) Speech synthesis method, speech synthesizer, and speech synthesis program
JP4328698B2 (en) Fragment set creation method and apparatus
US6684187B1 (en) Method and system for preselection of suitable units for concatenative speech
US7219060B2 (en) Speech synthesis using concatenation of speech waveforms
JP5665780B2 (en) Speech synthesis apparatus, method and program
JP4130190B2 (en) Speech synthesis system
US7454343B2 (en) Speech synthesizer, speech synthesizing method, and program
JP4551803B2 (en) Speech synthesizer and program thereof
JP2007249212A (en) Method, computer program and processor for text speech synthesis
US8626510B2 (en) Speech synthesizing device, computer program product, and method
JP4406440B2 (en) Speech synthesis apparatus, speech synthesis method and program
CN104835493A (en) Speech synthesis dictionary generation apparatus and speech synthesis dictionary generation method
CN1787072B (en) Method for synthesizing pronunciation based on rhythm model and parameter selecting voice
JP4639932B2 (en) Speech synthesizer
Lee et al. A text-to-speech platform for variable length optimal unit searching using perception based cost functions
JP2009133890A (en) Voice synthesizing device and method
JP3281281B2 (en) Speech synthesis method and apparatus
JP5387410B2 (en) Speech synthesis apparatus, speech synthesis method, and speech synthesis program
EP1589524B1 (en) Method and device for speech synthesis
JP4034751B2 (en) Speech synthesis apparatus, speech synthesis method, and speech synthesis program
Clark et al. Joint prosodic and segmental unit selection speech synthesis
JP3571925B2 (en) Voice information processing device
JP2006084854A (en) Device, method, and program for speech synthesis
KR20100072962A (en) Apparatus and method for speech synthesis using a plurality of break index
JP3378448B2 (en) Speech unit selection method, speech synthesis device, and instruction storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee
CP03 Change of name, title or address

Address after: 100193, No. two, building 10, Zhongguancun Software Park, 8 northeast Wang Xi Road, Beijing, Haidian District, 206-1

Patentee after: Beijing InfoQuick SinoVoice Speech Technology Corp.

Address before: E101 development building, 12, information road, Haidian District, Beijing, Zhongguancun

Patentee before: Jietong Huasheng Speech Technology Co., Ltd.