CN100517463C

CN100517463C - Speech synthesis system and method

Info

Publication number: CN100517463C
Application number: CNB2004100871367A
Authority: CN
Inventors: 邱全成; 马飞
Original assignee: Inventec Corp
Current assignee: Inventec Corp
Priority date: 2004-11-01
Filing date: 2004-11-01
Publication date: 2009-07-22
Anticipated expiration: 2024-11-01
Also published as: CN1770261A

Abstract

Disclosed are a voice synthesis system and a method for making the same for words pre-analysis processing. The voice synthesis system comprises database, analysis module, query module, hit voice module and synthesis module. The invention is characterized in that separates words into etyma and affix, and has an index on the most voice wave data of each etyma and affix of word for synthesizing voice data for the word and having better pronouncing effects.

Description

Speech synthesis system and method

Technical field

The invention relates to a kind of speech synthesis system and method, particularly can synthesize the system and method for word pronunciation data automatically about a kind of.

Background technology

E-dictionary is because its volume is small and exquisite, and storage volume is big, and has true man pronunciation and unlimited function such as expanding resource, become the indispensable instrument that a lot of people learn foreign languages.

The speech utterance function that most of now e-dictionaries have, mostly realize by dual mode, one, be that pronunciation data with all words in the dictionary is made into recording file in advance and is stored in this dictionary, and link with corresponding word data, when the user clicks this word, can provide the right pronunciation of each word of user.Yet this way often can't be upgraded via the pairing voice data of including in after expanding of word for follow-up in synchronization gain, thereby the function of pronunciation of these expansion neologisms can't be provided.Another kind method then is the automatic synthetic job that carries out voice by TTS (Text-To-Speech) engine, and right synthetic in this way voice are comparatively stiff, and the user can't be provided satisfied pronunciation information.

Therefore, how providing a kind of and can synthesize the voice messaging technology of word automatically, and have the system of preferable voice effect simultaneously, is technical task anxious to be solved at present.

Summary of the invention

For overcoming the shortcoming of above-mentioned prior art, fundamental purpose of the present invention is to provide a kind of speech synthesis system and method, can resolve the word-building form of word automatically, so as to the synthetic pairing voice data of this word.

For reaching above-mentioned purpose, the present invention promptly provides a kind of speech synthesis system and method, and speech synthesis system of the present invention comprises at least: database is used to store many word data and corresponding speech sound waves data thereof; Parsing module is used to analyze the word-building form of word, and carries out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming; Enquiry module is used for the decomposition result according to this parsing module, inquires about relevant word data at root and affixe from this database respectively, obtains corresponding speech sound waves data; Cutting the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that unit carries out the corresponding sound of cutting with the word data that this enquiry module inquired with the syllable, obtains the pairing voice data of this root and affixe; And synthesis module, be used for process this and cut the resulting speech sound waves data of sound resume module and carry out permutation and combination, be combined to form man-to-man corresponding relation with this root and affixe are formed, synthesize the voice data of this word.

Wherein, said system is applicable in the e-dictionary, and this parsing module is that the word-building rule according to word is decomposed into word the combination that is combined into by a plurality of roots and affixe; This enquiry module then by this root or affixe content, retrieves the word data that all comprise this root or affixe content from database.This enquiry module also comprises the screening unit, this screening unit is used for the root and the affixe data that decomposite according to this parsing module, in all word data relevant that enquiry module inquires, filter out best word data by contrast, carry out subsequent treatment for cutting the sound module with this root or affixe content.

Phoneme synthesizing method of the present invention comprises: (1) at first, the word-building form that provides parsing module to be used to analyze word, and carry out corresponding decomposing program according to this analysis result is the combination of being made up of root and affixe with this word deforming; (2) provide enquiry module, be used for decomposition result according to this parsing module at this each root and affixe content, the relevant word data of inquiry from this database respectively, and then obtain corresponding speech sound waves data; (3) providing and cut the sound module, so as to root and the affixe data that decomposites with reference to this parsing module, is that the word data that unit finds enquiry module is cut sound with the syllable, obtains the pairing voice data of this root and affixe; And (4) provide synthesis module, is used for and will carries out permutation and combination via cutting the resulting speech sound waves data of sound resume module, is combined to form man-to-man corresponding relation with this root and affixe are formed, synthesizes the voice data of this word.

In sum, the data that each step of the inventive method generates all leaves in the database, and this database also stores many word data and corresponding speech sound waves data thereof.In step (1), this parsing module is that the word-building rule according to word is decomposed into word the combination that is combined into by a plurality of roots and affixe.In step (2), this enquiry module is according to this root or affixe content, retrieves the word data that all comprise this root or affixe content from this database.And in this method step (2), comprise that also order screening unit is according to root and affixe data that this parsing module decomposited, through all word data relevant that contrast inquires from this enquiry module, filter out best word data, cut the sound module for this and carry out follow-up treatment step with this root or affixe content.Wherein, the comparing result when this screening unit is that then this word data is best word data when existing word data and this root and affixe data in full accord in this database; When a plurality of word candidate data, this screening unit then is that word-building type unanimity and difference and this root or affixe reckling are best word data with the comparing result.

Therefore, speech synthesis system of the present invention and method can be decomposed into word the composition form of some roots and affixe, and retrieve each root and affixe and distinguish corresponding best voice acoustic logging data, so as to the voice data of synthetic this word automatically, and has preferable voice effect simultaneously.

Description of drawings

Fig. 1 is the required basic structure block schematic diagram of speech synthesis system of the present invention; And

Fig. 2 is the operating process synoptic diagram of phoneme synthesizing method of the present invention.

Embodiment

Below by particular specific embodiment explanation embodiments of the present invention.

Fig. 1 is that speech synthesis system of the present invention is applied in the synoptic diagram in the e-dictionary.As shown in the figure, speech synthesis system 100 of the present invention is applicable in the e-dictionary 1, is used for the voice of synthetic word automatically.This speech synthesis system 100 comprises: database 110, parsing module 120, enquiry module 130, cut sound module 140 and synthesis module 150.Wherein, this enquiry module 130 comprises screening unit 131 in addition.

Database 110 is used to store many word data and pairing speech sound waves data thereof.In the present embodiment, this database 110 is divided into word library and sound bank (not marking), wherein, this word library stores the related data of all words in the e-dictionary 1, for example phonetic symbol, part of speech, literal and figure lexical or textual analysis data etc. expand and upgrade for the user, and sound bank is then deposited the several ripple data of voice of word, it and this word library interlink, and corresponding mutually with each word in this word library.

Parsing module 120 is used to analyze the word-building form of word, carries out corresponding decomposition by this analysis result, is the combination that is formed by root and affixe with this word deforming.In English word, major part is the derivative that is combined by root and affixe (prefix/postfix).Its mainly contain " root+suffix " combination form as: paint+-er forms painter; " prefix+root " array configuration as: inter-+vene forms intervene; " root+root " array configuration as: tele+scope forms telescope; And " prefix+root+suffix " array configuration is formed inaudible etc. as: in-+aud+-ible.In the present embodiment, this parsing module 120 is the above-mentioned word-building rules of utilization, and word is decomposed into the array configuration of some roots and affixe, and for example word methodology can be decomposed into root method and suffix ology.

130 decomposition result that are used for by parsing module 120 of enquiry module, respectively at each root and affixe content, the relevant word data of inquiry from database 110 is so as to obtaining corresponding speech sound waves data.Wherein, also include screening unit 131 in this enquiry module 130, be used for the root and the affixe data that decomposite according to parsing module 120, through contrasting from all word data relevant that enquiry module 130 inquires with this root or affixe content, filter out best word data, cut sound module 140 for this and carry out subsequent treatment (being detailed later).This judgment principle is: if in the database 110 when existing word data and this root and affixe data in full accord (being generally root), be the word data of the best; When having a plurality of word candidate data (affixe usually), then with the word-building type consistent and with this root or affixe difference reckling be the word data of the best.

In the present embodiment, the root method that makes this enquiry module 130 at first solve in 120 minutes according to parsing module, from database 110, retrieve the word that all comprise this root, as " method ", " methodic ", word data such as " methodist " and " unmethodical ", then, make this screening unit 131 at the word data that retrieves, compare with this root " method " respectively, exist word " method " to conform to fully with this root data as finding in database 110, then this word " method " promptly is regarded as the best word data corresponding to this root.

Then, all include the word data of this affixe ology to make this enquiry module 130 continue retrieval from database 110, as " technology ", " sociology " and " biology " etc.; Then, this screening unit 131 promptly compares one by one to this word data, as the word of not finding to conform to fully with affixe " ology "; Then analyze the word-building position of ology in this word, because this affixe ology is " suffix " in word " methodology ", 131 of this screening unit filter out the word data that all are suffix with ology; At last each word and this affixe ology are carried out diversity ratio, for example after removing ology, remain alphabetical minimum person, find that after contrasting word " biology " is the most similar to affixe " ology ", promptly select it and be best word data for best.

Cut 140 of sound modules and be used for the root and the affixe data that decomposite with reference to this parsing module 120, and be that sound is cut with the word data that this enquiry module 130 inquires by unit, thereby obtain this root and the pairing voice data of affixe with the syllable.In the present embodiment, the result of these enquiry module 130 inquiries is the word " method " of corresponding root " method " and the word " biology " of corresponding affixe " ology ".Because method and this root are in full accord, so its pairing " speech sound waves 1 (not marking) " data promptly can directly be utilized.This cuts the content of sound module 140 with reference to affixe " ology ", with syllable (vowel or word sound) is that the sound processing is cut accordingly at the speech sound waves data of this word " biology " by unit, and cut at its vowel place, with the back segment speech sound waves data of intercepting word, i.e. " ology " corresponding " speech sound waves 2 (not marking) " data.

150 of synthesis modules are used for handling the speech sound waves data that obtains and carrying out permutation and combination cut sound module 140 via this, form with this root and affixe and are combined to form man-to-man corresponding relation, with the voice data of synthetic this word.In the present embodiment, this synthesis module 150 will be cut " speech sound waves 1 " data and " speech sound waves 2 " data that obtains after sound module 140 is handled, position according to its pairing root and affixe concerns respectively, carry out corresponding arrangement, method (speech sound waves 1)+ology (speech sound waves 2) just, the voice data of synthetic word " methodology ".

Fig. 2 is a process flow diagram, shows the running program of phoneme synthesizing method of the present invention, and phoneme synthesizing method of the present invention is applicable in the e-dictionary.As shown in the figure, at first, carry out step S210, pre-database construction 110 is used for storing the relevant lexical or textual analysis data of these e-dictionary 1 all words and the speech sound waves data of correspondence thereof, then proceeds to step S220.

In step S220, make this parsing module 120 analyze the word-building form of word " methodology ", and word is decomposed into root method+ suffix ology according to analysis result, then, proceed to step S230.

In step S230, make the decomposition result of this enquiry module 130, at this each root and the affixe word data that inquiry is correlated with from database 110 respectively, so as to obtaining corresponding speech sound waves data according to parsing module 120.In the present embodiment, these enquiry module 130 corresponding roots " method " retrieve " method ", " methodic ", " methodist " and word data such as " unmethodical " from database 110; Corresponding suffix " ology " then retrieves " technology ", " sociology " and word data such as " biology " from database 110; Subsequently, make 131 pairs of these word data in this screening unit compare one by one, so as to filtering out to best word data " method " that should root " method ", and to best word data " biology " that should affixe " ology ", then, proceed to step S240.

In step S240, make this cut sound module 140 with reference to this root and affixe data, with the syllable is that unit cuts sound respectively with the resulting best word data of enquiry module 130 inquiries, obtain pairing " speech sound waves 1 " data of this root " method " and, then proceed to step S250 corresponding to " speech sound waves 2 " data of this affixe " ology ".

In step S250, make this synthesis module 150 cut sound module 140 and carry out " speech sound waves 1 " data and " speech sound waves 2 " data that the sound processing is obtained of cutting via this, carry out corresponding permutation and combination according to its corresponding root method with putting in order of affixe ology, be method (speech sound waves 1)+ology (speech sound waves 2), the voice data of synthetic this word.

In sum, speech synthesis system of the present invention and method are applicable in the e-dictionary, this method is at first carried out preanalysis at word and is handled, to identify root and the affixe of forming this word, also in the language database of e-dictionary, retrieve the best voice parameter of each root and affixe, and with these all speech parameters that search out according to smoothing algorithm permutation and combination in addition, synthesize the voice data of this word.

Claims

1. a speech synthesis system is characterized in that, this system comprises at least:

Database is used to store many word data and corresponding speech sound waves data thereof;

Parsing module is used to analyze the word-building form of word, and carries out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming;

Enquiry module is used for the decomposition result according to this parsing module, inquires about relevant word data at root and affixe from this database respectively, obtains corresponding speech sound waves data;

Cutting the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that unit carries out the corresponding sound of cutting with the word data that this enquiry module inquired with the syllable, obtains the pairing voice data of this root and affixe; And

Synthesis module is used for process this and cuts the resulting speech sound waves data of sound resume module and carry out permutation and combination, is combined to form man-to-man corresponding relation with this root and affixe are formed, synthesizes the voice data of this word.

2. speech synthesis system as claimed in claim 1 is characterized in that this system is applicable to e-dictionary.

3. speech synthesis system as claimed in claim 1 is characterized in that, this parsing module is the word-building rule according to word, word is decomposed into the combination that is combined into by a plurality of roots and affixe.

4. speech synthesis system as claimed in claim 3 is characterized in that this affixe comprises prefix and suffix.

5. speech synthesis system as claimed in claim 1 is characterized in that, this enquiry module is according to this root or affixe content, retrieves the word data that all comprise this root or affixe content from this database.

6. as claim 1 or 5 described speech synthesis systems, it is characterized in that, this enquiry module also wraps the screening unit, be used for the root and the affixe data that decomposite according to this parsing module, from all word data relevant that this enquiry module was inquired, filter out best word data with way of contrast, cut the sound module for this and handle with this root or affixe content.

7. speech synthesis system as claimed in claim 6 is characterized in that, if exist word data and this root and affixe data in full accord in this database, is best word data.

8. speech synthesis system as claimed in claim 6 is characterized in that, when having a plurality of word candidate data, is the word data of the best with word-building type unanimity and difference and this root or affixe reckling.

9. a phoneme synthesizing method is applicable in the speech synthesis system, and this method comprises:

(1) providing parsing module, be used to analyze the word-building form of word, and carry out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming;

(2) provide enquiry module, be used for decomposition result,, obtain corresponding speech sound waves data at this each root and the affixe word data that inquiry is correlated with from this database respectively according to this parsing module;

(3) providing and cut the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that sound is cut with the word data that this enquiry module inquired by unit with the syllable, obtains the pairing voice data of this root and affixe; And

(4) provide synthesis module, be used for process this and cut the resulting speech sound waves data of sound resume module and carry out permutation and combination, be combined to form man-to-man corresponding relation, synthesize the voice data of this word with this root and affixe form.

10. phoneme synthesizing method as claimed in claim 9 is characterized in that this speech synthesis system is applicable to e-dictionary.

11. phoneme synthesizing method as claimed in claim 9 is characterized in that, the data storage that each step of this method generates is in database.

12. phoneme synthesizing method as claimed in claim 11 is characterized in that, this database also is used to store many word data and corresponding speech sound waves data thereof.

13. phoneme synthesizing method as claimed in claim 9 is characterized in that, in this step (1), this parsing module is the word-building rule according to word, and word is decomposed into combination by root and at least one affixe be combined into.

14. phoneme synthesizing method as claimed in claim 9 is characterized in that, in this step (2), this enquiry module is according to this root or affixe content, retrieves all and comprise one of them word data of this root and affixe at least from this database.

15. phoneme synthesizing method as claimed in claim 14, it is characterized in that, in this step (2), also comprise the screening unit is provided, according to root and the affixe data that this parsing module decomposited, from all word data relevant that this enquiry module inquired, filter out best word data by contrast, cut the sound module for this and handle with this root or affixe content.

16. phoneme synthesizing method as claimed in claim 15, it is characterized in that, in this step (2), the comparing result when the screening unit is when existing word data and this root and affixe data in full accord in this database, and then this word data is best word data.

17. phoneme synthesizing method as claimed in claim 15, it is characterized in that, in this step (2), when having a plurality of word candidate data, this screening unit then is that word-building type unanimity and difference and this root or affixe reckling are best word data with the comparing result.