CN100517463C - Speech synthesis system and method - Google Patents
Speech synthesis system and method Download PDFInfo
- Publication number
- CN100517463C CN100517463C CNB2004100871367A CN200410087136A CN100517463C CN 100517463 C CN100517463 C CN 100517463C CN B2004100871367 A CNB2004100871367 A CN B2004100871367A CN 200410087136 A CN200410087136 A CN 200410087136A CN 100517463 C CN100517463 C CN 100517463C
- Authority
- CN
- China
- Prior art keywords
- word
- data
- affixe
- root
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Disclosed are a voice synthesis system and a method for making the same for words pre-analysis processing. The voice synthesis system comprises database, analysis module, query module, hit voice module and synthesis module. The invention is characterized in that separates words into etyma and affix, and has an index on the most voice wave data of each etyma and affix of word for synthesizing voice data for the word and having better pronouncing effects.
Description
Technical field
The invention relates to a kind of speech synthesis system and method, particularly can synthesize the system and method for word pronunciation data automatically about a kind of.
Background technology
E-dictionary is because its volume is small and exquisite, and storage volume is big, and has true man pronunciation and unlimited function such as expanding resource, become the indispensable instrument that a lot of people learn foreign languages.
The speech utterance function that most of now e-dictionaries have, mostly realize by dual mode, one, be that pronunciation data with all words in the dictionary is made into recording file in advance and is stored in this dictionary, and link with corresponding word data, when the user clicks this word, can provide the right pronunciation of each word of user.Yet this way often can't be upgraded via the pairing voice data of including in after expanding of word for follow-up in synchronization gain, thereby the function of pronunciation of these expansion neologisms can't be provided.Another kind method then is the automatic synthetic job that carries out voice by TTS (Text-To-Speech) engine, and right synthetic in this way voice are comparatively stiff, and the user can't be provided satisfied pronunciation information.
Therefore, how providing a kind of and can synthesize the voice messaging technology of word automatically, and have the system of preferable voice effect simultaneously, is technical task anxious to be solved at present.
Summary of the invention
For overcoming the shortcoming of above-mentioned prior art, fundamental purpose of the present invention is to provide a kind of speech synthesis system and method, can resolve the word-building form of word automatically, so as to the synthetic pairing voice data of this word.
For reaching above-mentioned purpose, the present invention promptly provides a kind of speech synthesis system and method, and speech synthesis system of the present invention comprises at least: database is used to store many word data and corresponding speech sound waves data thereof; Parsing module is used to analyze the word-building form of word, and carries out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming; Enquiry module is used for the decomposition result according to this parsing module, inquires about relevant word data at root and affixe from this database respectively, obtains corresponding speech sound waves data; Cutting the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that unit carries out the corresponding sound of cutting with the word data that this enquiry module inquired with the syllable, obtains the pairing voice data of this root and affixe; And synthesis module, be used for process this and cut the resulting speech sound waves data of sound resume module and carry out permutation and combination, be combined to form man-to-man corresponding relation with this root and affixe are formed, synthesize the voice data of this word.
Wherein, said system is applicable in the e-dictionary, and this parsing module is that the word-building rule according to word is decomposed into word the combination that is combined into by a plurality of roots and affixe; This enquiry module then by this root or affixe content, retrieves the word data that all comprise this root or affixe content from database.This enquiry module also comprises the screening unit, this screening unit is used for the root and the affixe data that decomposite according to this parsing module, in all word data relevant that enquiry module inquires, filter out best word data by contrast, carry out subsequent treatment for cutting the sound module with this root or affixe content.
Phoneme synthesizing method of the present invention comprises: (1) at first, the word-building form that provides parsing module to be used to analyze word, and carry out corresponding decomposing program according to this analysis result is the combination of being made up of root and affixe with this word deforming; (2) provide enquiry module, be used for decomposition result according to this parsing module at this each root and affixe content, the relevant word data of inquiry from this database respectively, and then obtain corresponding speech sound waves data; (3) providing and cut the sound module, so as to root and the affixe data that decomposites with reference to this parsing module, is that the word data that unit finds enquiry module is cut sound with the syllable, obtains the pairing voice data of this root and affixe; And (4) provide synthesis module, is used for and will carries out permutation and combination via cutting the resulting speech sound waves data of sound resume module, is combined to form man-to-man corresponding relation with this root and affixe are formed, synthesizes the voice data of this word.
In sum, the data that each step of the inventive method generates all leaves in the database, and this database also stores many word data and corresponding speech sound waves data thereof.In step (1), this parsing module is that the word-building rule according to word is decomposed into word the combination that is combined into by a plurality of roots and affixe.In step (2), this enquiry module is according to this root or affixe content, retrieves the word data that all comprise this root or affixe content from this database.And in this method step (2), comprise that also order screening unit is according to root and affixe data that this parsing module decomposited, through all word data relevant that contrast inquires from this enquiry module, filter out best word data, cut the sound module for this and carry out follow-up treatment step with this root or affixe content.Wherein, the comparing result when this screening unit is that then this word data is best word data when existing word data and this root and affixe data in full accord in this database; When a plurality of word candidate data, this screening unit then is that word-building type unanimity and difference and this root or affixe reckling are best word data with the comparing result.
Therefore, speech synthesis system of the present invention and method can be decomposed into word the composition form of some roots and affixe, and retrieve each root and affixe and distinguish corresponding best voice acoustic logging data, so as to the voice data of synthetic this word automatically, and has preferable voice effect simultaneously.
Description of drawings
Fig. 1 is the required basic structure block schematic diagram of speech synthesis system of the present invention; And
Fig. 2 is the operating process synoptic diagram of phoneme synthesizing method of the present invention.
Embodiment
Embodiment
Below by particular specific embodiment explanation embodiments of the present invention.
Fig. 1 is that speech synthesis system of the present invention is applied in the synoptic diagram in the e-dictionary.As shown in the figure, speech synthesis system 100 of the present invention is applicable in the e-dictionary 1, is used for the voice of synthetic word automatically.This speech synthesis system 100 comprises: database 110, parsing module 120, enquiry module 130, cut sound module 140 and synthesis module 150.Wherein, this enquiry module 130 comprises screening unit 131 in addition.
Database 110 is used to store many word data and pairing speech sound waves data thereof.In the present embodiment, this database 110 is divided into word library and sound bank (not marking), wherein, this word library stores the related data of all words in the e-dictionary 1, for example phonetic symbol, part of speech, literal and figure lexical or textual analysis data etc. expand and upgrade for the user, and sound bank is then deposited the several ripple data of voice of word, it and this word library interlink, and corresponding mutually with each word in this word library.
Parsing module 120 is used to analyze the word-building form of word, carries out corresponding decomposition by this analysis result, is the combination that is formed by root and affixe with this word deforming.In English word, major part is the derivative that is combined by root and affixe (prefix/postfix).Its mainly contain " root+suffix " combination form as: paint+-er forms painter; " prefix+root " array configuration as: inter-+vene forms intervene; " root+root " array configuration as: tele+scope forms telescope; And " prefix+root+suffix " array configuration is formed inaudible etc. as: in-+aud+-ible.In the present embodiment, this parsing module 120 is the above-mentioned word-building rules of utilization, and word is decomposed into the array configuration of some roots and affixe, and for example word methodology can be decomposed into root method and suffix ology.
130 decomposition result that are used for by parsing module 120 of enquiry module, respectively at each root and affixe content, the relevant word data of inquiry from database 110 is so as to obtaining corresponding speech sound waves data.Wherein, also include screening unit 131 in this enquiry module 130, be used for the root and the affixe data that decomposite according to parsing module 120, through contrasting from all word data relevant that enquiry module 130 inquires with this root or affixe content, filter out best word data, cut sound module 140 for this and carry out subsequent treatment (being detailed later).This judgment principle is: if in the database 110 when existing word data and this root and affixe data in full accord (being generally root), be the word data of the best; When having a plurality of word candidate data (affixe usually), then with the word-building type consistent and with this root or affixe difference reckling be the word data of the best.
In the present embodiment, the root method that makes this enquiry module 130 at first solve in 120 minutes according to parsing module, from database 110, retrieve the word that all comprise this root, as " method ", " methodic ", word data such as " methodist " and " unmethodical ", then, make this screening unit 131 at the word data that retrieves, compare with this root " method " respectively, exist word " method " to conform to fully with this root data as finding in database 110, then this word " method " promptly is regarded as the best word data corresponding to this root.
Then, all include the word data of this affixe ology to make this enquiry module 130 continue retrieval from database 110, as " technology ", " sociology " and " biology " etc.; Then, this screening unit 131 promptly compares one by one to this word data, as the word of not finding to conform to fully with affixe " ology "; Then analyze the word-building position of ology in this word, because this affixe ology is " suffix " in word " methodology ", 131 of this screening unit filter out the word data that all are suffix with ology; At last each word and this affixe ology are carried out diversity ratio, for example after removing ology, remain alphabetical minimum person, find that after contrasting word " biology " is the most similar to affixe " ology ", promptly select it and be best word data for best.
Cut 140 of sound modules and be used for the root and the affixe data that decomposite with reference to this parsing module 120, and be that sound is cut with the word data that this enquiry module 130 inquires by unit, thereby obtain this root and the pairing voice data of affixe with the syllable.In the present embodiment, the result of these enquiry module 130 inquiries is the word " method " of corresponding root " method " and the word " biology " of corresponding affixe " ology ".Because method and this root are in full accord, so its pairing " speech sound waves 1 (not marking) " data promptly can directly be utilized.This cuts the content of sound module 140 with reference to affixe " ology ", with syllable (vowel or word sound) is that the sound processing is cut accordingly at the speech sound waves data of this word " biology " by unit, and cut at its vowel place, with the back segment speech sound waves data of intercepting word, i.e. " ology " corresponding " speech sound waves 2 (not marking) " data.
150 of synthesis modules are used for handling the speech sound waves data that obtains and carrying out permutation and combination cut sound module 140 via this, form with this root and affixe and are combined to form man-to-man corresponding relation, with the voice data of synthetic this word.In the present embodiment, this synthesis module 150 will be cut " speech sound waves 1 " data and " speech sound waves 2 " data that obtains after sound module 140 is handled, position according to its pairing root and affixe concerns respectively, carry out corresponding arrangement, method (speech sound waves 1)+ology (speech sound waves 2) just, the voice data of synthetic word " methodology ".
Fig. 2 is a process flow diagram, shows the running program of phoneme synthesizing method of the present invention, and phoneme synthesizing method of the present invention is applicable in the e-dictionary.As shown in the figure, at first, carry out step S210, pre-database construction 110 is used for storing the relevant lexical or textual analysis data of these e-dictionary 1 all words and the speech sound waves data of correspondence thereof, then proceeds to step S220.
In step S220, make this parsing module 120 analyze the word-building form of word " methodology ", and word is decomposed into root method+ suffix ology according to analysis result, then, proceed to step S230.
In step S230, make the decomposition result of this enquiry module 130, at this each root and the affixe word data that inquiry is correlated with from database 110 respectively, so as to obtaining corresponding speech sound waves data according to parsing module 120.In the present embodiment, these enquiry module 130 corresponding roots " method " retrieve " method ", " methodic ", " methodist " and word data such as " unmethodical " from database 110; Corresponding suffix " ology " then retrieves " technology ", " sociology " and word data such as " biology " from database 110; Subsequently, make 131 pairs of these word data in this screening unit compare one by one, so as to filtering out to best word data " method " that should root " method ", and to best word data " biology " that should affixe " ology ", then, proceed to step S240.
In step S240, make this cut sound module 140 with reference to this root and affixe data, with the syllable is that unit cuts sound respectively with the resulting best word data of enquiry module 130 inquiries, obtain pairing " speech sound waves 1 " data of this root " method " and, then proceed to step S250 corresponding to " speech sound waves 2 " data of this affixe " ology ".
In step S250, make this synthesis module 150 cut sound module 140 and carry out " speech sound waves 1 " data and " speech sound waves 2 " data that the sound processing is obtained of cutting via this, carry out corresponding permutation and combination according to its corresponding root method with putting in order of affixe ology, be method (speech sound waves 1)+ology (speech sound waves 2), the voice data of synthetic this word.
In sum, speech synthesis system of the present invention and method are applicable in the e-dictionary, this method is at first carried out preanalysis at word and is handled, to identify root and the affixe of forming this word, also in the language database of e-dictionary, retrieve the best voice parameter of each root and affixe, and with these all speech parameters that search out according to smoothing algorithm permutation and combination in addition, synthesize the voice data of this word.
Claims (17)
1. a speech synthesis system is characterized in that, this system comprises at least:
Database is used to store many word data and corresponding speech sound waves data thereof;
Parsing module is used to analyze the word-building form of word, and carries out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming;
Enquiry module is used for the decomposition result according to this parsing module, inquires about relevant word data at root and affixe from this database respectively, obtains corresponding speech sound waves data;
Cutting the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that unit carries out the corresponding sound of cutting with the word data that this enquiry module inquired with the syllable, obtains the pairing voice data of this root and affixe; And
Synthesis module is used for process this and cuts the resulting speech sound waves data of sound resume module and carry out permutation and combination, is combined to form man-to-man corresponding relation with this root and affixe are formed, synthesizes the voice data of this word.
2. speech synthesis system as claimed in claim 1 is characterized in that this system is applicable to e-dictionary.
3. speech synthesis system as claimed in claim 1 is characterized in that, this parsing module is the word-building rule according to word, word is decomposed into the combination that is combined into by a plurality of roots and affixe.
4. speech synthesis system as claimed in claim 3 is characterized in that this affixe comprises prefix and suffix.
5. speech synthesis system as claimed in claim 1 is characterized in that, this enquiry module is according to this root or affixe content, retrieves the word data that all comprise this root or affixe content from this database.
6. as claim 1 or 5 described speech synthesis systems, it is characterized in that, this enquiry module also wraps the screening unit, be used for the root and the affixe data that decomposite according to this parsing module, from all word data relevant that this enquiry module was inquired, filter out best word data with way of contrast, cut the sound module for this and handle with this root or affixe content.
7. speech synthesis system as claimed in claim 6 is characterized in that, if exist word data and this root and affixe data in full accord in this database, is best word data.
8. speech synthesis system as claimed in claim 6 is characterized in that, when having a plurality of word candidate data, is the word data of the best with word-building type unanimity and difference and this root or affixe reckling.
9. a phoneme synthesizing method is applicable in the speech synthesis system, and this method comprises:
(1) providing parsing module, be used to analyze the word-building form of word, and carry out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming;
(2) provide enquiry module, be used for decomposition result,, obtain corresponding speech sound waves data at this each root and the affixe word data that inquiry is correlated with from this database respectively according to this parsing module;
(3) providing and cut the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that sound is cut with the word data that this enquiry module inquired by unit with the syllable, obtains the pairing voice data of this root and affixe; And
(4) provide synthesis module, be used for process this and cut the resulting speech sound waves data of sound resume module and carry out permutation and combination, be combined to form man-to-man corresponding relation, synthesize the voice data of this word with this root and affixe form.
10. phoneme synthesizing method as claimed in claim 9 is characterized in that this speech synthesis system is applicable to e-dictionary.
11. phoneme synthesizing method as claimed in claim 9 is characterized in that, the data storage that each step of this method generates is in database.
12. phoneme synthesizing method as claimed in claim 11 is characterized in that, this database also is used to store many word data and corresponding speech sound waves data thereof.
13. phoneme synthesizing method as claimed in claim 9 is characterized in that, in this step (1), this parsing module is the word-building rule according to word, and word is decomposed into combination by root and at least one affixe be combined into.
14. phoneme synthesizing method as claimed in claim 9 is characterized in that, in this step (2), this enquiry module is according to this root or affixe content, retrieves all and comprise one of them word data of this root and affixe at least from this database.
15. phoneme synthesizing method as claimed in claim 14, it is characterized in that, in this step (2), also comprise the screening unit is provided, according to root and the affixe data that this parsing module decomposited, from all word data relevant that this enquiry module inquired, filter out best word data by contrast, cut the sound module for this and handle with this root or affixe content.
16. phoneme synthesizing method as claimed in claim 15, it is characterized in that, in this step (2), the comparing result when the screening unit is when existing word data and this root and affixe data in full accord in this database, and then this word data is best word data.
17. phoneme synthesizing method as claimed in claim 15, it is characterized in that, in this step (2), when having a plurality of word candidate data, this screening unit then is that word-building type unanimity and difference and this root or affixe reckling are best word data with the comparing result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100871367A CN100517463C (en) | 2004-11-01 | 2004-11-01 | Speech synthesis system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100871367A CN100517463C (en) | 2004-11-01 | 2004-11-01 | Speech synthesis system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1770261A CN1770261A (en) | 2006-05-10 |
CN100517463C true CN100517463C (en) | 2009-07-22 |
Family
ID=36751506
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2004100871367A Expired - Fee Related CN100517463C (en) | 2004-11-01 | 2004-11-01 | Speech synthesis system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100517463C (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101645190B (en) * | 2009-07-22 | 2011-03-30 | 合肥讯飞数码科技有限公司 | Word inquiring system and inquiring method thereof |
CN103680261B (en) * | 2012-08-31 | 2017-03-08 | 英业达科技有限公司 | Lexical learning system and its method |
CN105531757B (en) * | 2013-09-20 | 2019-08-06 | 株式会社东芝 | Voice selecting auxiliary device and voice selecting method |
CN108962218A (en) * | 2017-05-27 | 2018-12-07 | 北京搜狗科技发展有限公司 | A kind of word pronunciation method and apparatus |
CN109271037B (en) * | 2017-07-13 | 2022-09-09 | 北京搜狗科技发展有限公司 | Method and device for establishing error correction word bank |
CN109545014A (en) * | 2018-12-28 | 2019-03-29 | 杭州晶智能科技有限公司 | A kind of foreign language word exercising method based on interactive voice |
CN110444190A (en) * | 2019-08-13 | 2019-11-12 | 广州国音智能科技有限公司 | Method of speech processing, device, terminal device and storage medium |
CN111681467B (en) * | 2020-06-01 | 2022-09-23 | 广东小天才科技有限公司 | Vocabulary learning method, electronic equipment and storage medium |
CN112434521A (en) * | 2020-11-13 | 2021-03-02 | 北京搜狗科技发展有限公司 | Vocabulary processing method and device |
-
2004
- 2004-11-01 CN CNB2004100871367A patent/CN100517463C/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN1770261A (en) | 2006-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6243680B1 (en) | Method and apparatus for obtaining a transcription of phrases through text and spoken utterances | |
EP0867859B1 (en) | Speech recognition language models | |
Allauzen et al. | General indexation of weighted automata-application to spoken utterance retrieval | |
US6910012B2 (en) | Method and system for speech recognition using phonetically similar word alternatives | |
US6092044A (en) | Pronunciation generation in speech recognition | |
EP1949260B1 (en) | Speech index pruning | |
Mandal et al. | Recent developments in spoken term detection: a survey | |
US8412528B2 (en) | Back-end database reorganization for application-specific concatenative text-to-speech systems | |
EP0867858A2 (en) | Pronunciation generation in speech recognition | |
WO2001009879A1 (en) | Expanding an effective vocabulary of a speech recognition system | |
CN101515457A (en) | Speech recognition on large lists using fragments | |
CN104485107B (en) | Audio recognition method, speech recognition system and the speech recognition apparatus of title | |
KR20050076712A (en) | Segmental tonal modeling for tonal languages | |
US5689617A (en) | Speech recognition system which returns recognition results as a reconstructed language model with attached data values | |
JP2008532099A (en) | Computer-implemented method for indexing and retrieving documents stored in a database and system for indexing and retrieving documents | |
CA2222582C (en) | Speech synthesizer having an acoustic element database | |
CN100517463C (en) | Speech synthesis system and method | |
Lileikytė et al. | Conversational telephone speech recognition for Lithuanian | |
US20020040296A1 (en) | Phoneme assigning method | |
Ordelman et al. | Compound decomposition in dutch large vocabulary speech recognition. | |
HaCohen-Kerner et al. | Language and gender classification of speech files using supervised machine learning methods | |
US5987412A (en) | Synthesising speech by converting phonemes to digital waveforms | |
US5970454A (en) | Synthesizing speech by converting phonemes to digital waveforms | |
Möbius et al. | The Bell Labs German text-to-speech system: an overview | |
US20060074673A1 (en) | Pronunciation synthesis system and method of the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20090722 Termination date: 20101101 |