CN100517463C - Speech synthesis system and method - Google Patents

Speech synthesis system and method Download PDF

Info

Publication number
CN100517463C
CN100517463C CNB2004100871367A CN200410087136A CN100517463C CN 100517463 C CN100517463 C CN 100517463C CN B2004100871367 A CNB2004100871367 A CN B2004100871367A CN 200410087136 A CN200410087136 A CN 200410087136A CN 100517463 C CN100517463 C CN 100517463C
Authority
CN
China
Prior art keywords
word
data
affixe
root
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004100871367A
Other languages
Chinese (zh)
Other versions
CN1770261A (en
Inventor
邱全成
马飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Corp
Original Assignee
Inventec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Corp filed Critical Inventec Corp
Priority to CNB2004100871367A priority Critical patent/CN100517463C/en
Publication of CN1770261A publication Critical patent/CN1770261A/en
Application granted granted Critical
Publication of CN100517463C publication Critical patent/CN100517463C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed are a voice synthesis system and a method for making the same for words pre-analysis processing. The voice synthesis system comprises database, analysis module, query module, hit voice module and synthesis module. The invention is characterized in that separates words into etyma and affix, and has an index on the most voice wave data of each etyma and affix of word for synthesizing voice data for the word and having better pronouncing effects.

Description

Speech synthesis system and method
Technical field
The invention relates to a kind of speech synthesis system and method, particularly can synthesize the system and method for word pronunciation data automatically about a kind of.
Background technology
E-dictionary is because its volume is small and exquisite, and storage volume is big, and has true man pronunciation and unlimited function such as expanding resource, become the indispensable instrument that a lot of people learn foreign languages.
The speech utterance function that most of now e-dictionaries have, mostly realize by dual mode, one, be that pronunciation data with all words in the dictionary is made into recording file in advance and is stored in this dictionary, and link with corresponding word data, when the user clicks this word, can provide the right pronunciation of each word of user.Yet this way often can't be upgraded via the pairing voice data of including in after expanding of word for follow-up in synchronization gain, thereby the function of pronunciation of these expansion neologisms can't be provided.Another kind method then is the automatic synthetic job that carries out voice by TTS (Text-To-Speech) engine, and right synthetic in this way voice are comparatively stiff, and the user can't be provided satisfied pronunciation information.
Therefore, how providing a kind of and can synthesize the voice messaging technology of word automatically, and have the system of preferable voice effect simultaneously, is technical task anxious to be solved at present.
Summary of the invention
For overcoming the shortcoming of above-mentioned prior art, fundamental purpose of the present invention is to provide a kind of speech synthesis system and method, can resolve the word-building form of word automatically, so as to the synthetic pairing voice data of this word.
For reaching above-mentioned purpose, the present invention promptly provides a kind of speech synthesis system and method, and speech synthesis system of the present invention comprises at least: database is used to store many word data and corresponding speech sound waves data thereof; Parsing module is used to analyze the word-building form of word, and carries out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming; Enquiry module is used for the decomposition result according to this parsing module, inquires about relevant word data at root and affixe from this database respectively, obtains corresponding speech sound waves data; Cutting the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that unit carries out the corresponding sound of cutting with the word data that this enquiry module inquired with the syllable, obtains the pairing voice data of this root and affixe; And synthesis module, be used for process this and cut the resulting speech sound waves data of sound resume module and carry out permutation and combination, be combined to form man-to-man corresponding relation with this root and affixe are formed, synthesize the voice data of this word.
Wherein, said system is applicable in the e-dictionary, and this parsing module is that the word-building rule according to word is decomposed into word the combination that is combined into by a plurality of roots and affixe; This enquiry module then by this root or affixe content, retrieves the word data that all comprise this root or affixe content from database.This enquiry module also comprises the screening unit, this screening unit is used for the root and the affixe data that decomposite according to this parsing module, in all word data relevant that enquiry module inquires, filter out best word data by contrast, carry out subsequent treatment for cutting the sound module with this root or affixe content.
Phoneme synthesizing method of the present invention comprises: (1) at first, the word-building form that provides parsing module to be used to analyze word, and carry out corresponding decomposing program according to this analysis result is the combination of being made up of root and affixe with this word deforming; (2) provide enquiry module, be used for decomposition result according to this parsing module at this each root and affixe content, the relevant word data of inquiry from this database respectively, and then obtain corresponding speech sound waves data; (3) providing and cut the sound module, so as to root and the affixe data that decomposites with reference to this parsing module, is that the word data that unit finds enquiry module is cut sound with the syllable, obtains the pairing voice data of this root and affixe; And (4) provide synthesis module, is used for and will carries out permutation and combination via cutting the resulting speech sound waves data of sound resume module, is combined to form man-to-man corresponding relation with this root and affixe are formed, synthesizes the voice data of this word.
In sum, the data that each step of the inventive method generates all leaves in the database, and this database also stores many word data and corresponding speech sound waves data thereof.In step (1), this parsing module is that the word-building rule according to word is decomposed into word the combination that is combined into by a plurality of roots and affixe.In step (2), this enquiry module is according to this root or affixe content, retrieves the word data that all comprise this root or affixe content from this database.And in this method step (2), comprise that also order screening unit is according to root and affixe data that this parsing module decomposited, through all word data relevant that contrast inquires from this enquiry module, filter out best word data, cut the sound module for this and carry out follow-up treatment step with this root or affixe content.Wherein, the comparing result when this screening unit is that then this word data is best word data when existing word data and this root and affixe data in full accord in this database; When a plurality of word candidate data, this screening unit then is that word-building type unanimity and difference and this root or affixe reckling are best word data with the comparing result.
Therefore, speech synthesis system of the present invention and method can be decomposed into word the composition form of some roots and affixe, and retrieve each root and affixe and distinguish corresponding best voice acoustic logging data, so as to the voice data of synthetic this word automatically, and has preferable voice effect simultaneously.
Description of drawings
Fig. 1 is the required basic structure block schematic diagram of speech synthesis system of the present invention; And
Fig. 2 is the operating process synoptic diagram of phoneme synthesizing method of the present invention.
Embodiment
Embodiment
Below by particular specific embodiment explanation embodiments of the present invention.
Fig. 1 is that speech synthesis system of the present invention is applied in the synoptic diagram in the e-dictionary.As shown in the figure, speech synthesis system 100 of the present invention is applicable in the e-dictionary 1, is used for the voice of synthetic word automatically.This speech synthesis system 100 comprises: database 110, parsing module 120, enquiry module 130, cut sound module 140 and synthesis module 150.Wherein, this enquiry module 130 comprises screening unit 131 in addition.
Database 110 is used to store many word data and pairing speech sound waves data thereof.In the present embodiment, this database 110 is divided into word library and sound bank (not marking), wherein, this word library stores the related data of all words in the e-dictionary 1, for example phonetic symbol, part of speech, literal and figure lexical or textual analysis data etc. expand and upgrade for the user, and sound bank is then deposited the several ripple data of voice of word, it and this word library interlink, and corresponding mutually with each word in this word library.
Parsing module 120 is used to analyze the word-building form of word, carries out corresponding decomposition by this analysis result, is the combination that is formed by root and affixe with this word deforming.In English word, major part is the derivative that is combined by root and affixe (prefix/postfix).Its mainly contain " root+suffix " combination form as: paint+-er forms painter; " prefix+root " array configuration as: inter-+vene forms intervene; " root+root " array configuration as: tele+scope forms telescope; And " prefix+root+suffix " array configuration is formed inaudible etc. as: in-+aud+-ible.In the present embodiment, this parsing module 120 is the above-mentioned word-building rules of utilization, and word is decomposed into the array configuration of some roots and affixe, and for example word methodology can be decomposed into root method and suffix ology.
130 decomposition result that are used for by parsing module 120 of enquiry module, respectively at each root and affixe content, the relevant word data of inquiry from database 110 is so as to obtaining corresponding speech sound waves data.Wherein, also include screening unit 131 in this enquiry module 130, be used for the root and the affixe data that decomposite according to parsing module 120, through contrasting from all word data relevant that enquiry module 130 inquires with this root or affixe content, filter out best word data, cut sound module 140 for this and carry out subsequent treatment (being detailed later).This judgment principle is: if in the database 110 when existing word data and this root and affixe data in full accord (being generally root), be the word data of the best; When having a plurality of word candidate data (affixe usually), then with the word-building type consistent and with this root or affixe difference reckling be the word data of the best.
In the present embodiment, the root method that makes this enquiry module 130 at first solve in 120 minutes according to parsing module, from database 110, retrieve the word that all comprise this root, as " method ", " methodic ", word data such as " methodist " and " unmethodical ", then, make this screening unit 131 at the word data that retrieves, compare with this root " method " respectively, exist word " method " to conform to fully with this root data as finding in database 110, then this word " method " promptly is regarded as the best word data corresponding to this root.
Then, all include the word data of this affixe ology to make this enquiry module 130 continue retrieval from database 110, as " technology ", " sociology " and " biology " etc.; Then, this screening unit 131 promptly compares one by one to this word data, as the word of not finding to conform to fully with affixe " ology "; Then analyze the word-building position of ology in this word, because this affixe ology is " suffix " in word " methodology ", 131 of this screening unit filter out the word data that all are suffix with ology; At last each word and this affixe ology are carried out diversity ratio, for example after removing ology, remain alphabetical minimum person, find that after contrasting word " biology " is the most similar to affixe " ology ", promptly select it and be best word data for best.
Cut 140 of sound modules and be used for the root and the affixe data that decomposite with reference to this parsing module 120, and be that sound is cut with the word data that this enquiry module 130 inquires by unit, thereby obtain this root and the pairing voice data of affixe with the syllable.In the present embodiment, the result of these enquiry module 130 inquiries is the word " method " of corresponding root " method " and the word " biology " of corresponding affixe " ology ".Because method and this root are in full accord, so its pairing " speech sound waves 1 (not marking) " data promptly can directly be utilized.This cuts the content of sound module 140 with reference to affixe " ology ", with syllable (vowel or word sound) is that the sound processing is cut accordingly at the speech sound waves data of this word " biology " by unit, and cut at its vowel place, with the back segment speech sound waves data of intercepting word, i.e. " ology " corresponding " speech sound waves 2 (not marking) " data.
150 of synthesis modules are used for handling the speech sound waves data that obtains and carrying out permutation and combination cut sound module 140 via this, form with this root and affixe and are combined to form man-to-man corresponding relation, with the voice data of synthetic this word.In the present embodiment, this synthesis module 150 will be cut " speech sound waves 1 " data and " speech sound waves 2 " data that obtains after sound module 140 is handled, position according to its pairing root and affixe concerns respectively, carry out corresponding arrangement, method (speech sound waves 1)+ology (speech sound waves 2) just, the voice data of synthetic word " methodology ".
Fig. 2 is a process flow diagram, shows the running program of phoneme synthesizing method of the present invention, and phoneme synthesizing method of the present invention is applicable in the e-dictionary.As shown in the figure, at first, carry out step S210, pre-database construction 110 is used for storing the relevant lexical or textual analysis data of these e-dictionary 1 all words and the speech sound waves data of correspondence thereof, then proceeds to step S220.
In step S220, make this parsing module 120 analyze the word-building form of word " methodology ", and word is decomposed into root method+ suffix ology according to analysis result, then, proceed to step S230.
In step S230, make the decomposition result of this enquiry module 130, at this each root and the affixe word data that inquiry is correlated with from database 110 respectively, so as to obtaining corresponding speech sound waves data according to parsing module 120.In the present embodiment, these enquiry module 130 corresponding roots " method " retrieve " method ", " methodic ", " methodist " and word data such as " unmethodical " from database 110; Corresponding suffix " ology " then retrieves " technology ", " sociology " and word data such as " biology " from database 110; Subsequently, make 131 pairs of these word data in this screening unit compare one by one, so as to filtering out to best word data " method " that should root " method ", and to best word data " biology " that should affixe " ology ", then, proceed to step S240.
In step S240, make this cut sound module 140 with reference to this root and affixe data, with the syllable is that unit cuts sound respectively with the resulting best word data of enquiry module 130 inquiries, obtain pairing " speech sound waves 1 " data of this root " method " and, then proceed to step S250 corresponding to " speech sound waves 2 " data of this affixe " ology ".
In step S250, make this synthesis module 150 cut sound module 140 and carry out " speech sound waves 1 " data and " speech sound waves 2 " data that the sound processing is obtained of cutting via this, carry out corresponding permutation and combination according to its corresponding root method with putting in order of affixe ology, be method (speech sound waves 1)+ology (speech sound waves 2), the voice data of synthetic this word.
In sum, speech synthesis system of the present invention and method are applicable in the e-dictionary, this method is at first carried out preanalysis at word and is handled, to identify root and the affixe of forming this word, also in the language database of e-dictionary, retrieve the best voice parameter of each root and affixe, and with these all speech parameters that search out according to smoothing algorithm permutation and combination in addition, synthesize the voice data of this word.

Claims (17)

1. a speech synthesis system is characterized in that, this system comprises at least:
Database is used to store many word data and corresponding speech sound waves data thereof;
Parsing module is used to analyze the word-building form of word, and carries out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming;
Enquiry module is used for the decomposition result according to this parsing module, inquires about relevant word data at root and affixe from this database respectively, obtains corresponding speech sound waves data;
Cutting the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that unit carries out the corresponding sound of cutting with the word data that this enquiry module inquired with the syllable, obtains the pairing voice data of this root and affixe; And
Synthesis module is used for process this and cuts the resulting speech sound waves data of sound resume module and carry out permutation and combination, is combined to form man-to-man corresponding relation with this root and affixe are formed, synthesizes the voice data of this word.
2. speech synthesis system as claimed in claim 1 is characterized in that this system is applicable to e-dictionary.
3. speech synthesis system as claimed in claim 1 is characterized in that, this parsing module is the word-building rule according to word, word is decomposed into the combination that is combined into by a plurality of roots and affixe.
4. speech synthesis system as claimed in claim 3 is characterized in that this affixe comprises prefix and suffix.
5. speech synthesis system as claimed in claim 1 is characterized in that, this enquiry module is according to this root or affixe content, retrieves the word data that all comprise this root or affixe content from this database.
6. as claim 1 or 5 described speech synthesis systems, it is characterized in that, this enquiry module also wraps the screening unit, be used for the root and the affixe data that decomposite according to this parsing module, from all word data relevant that this enquiry module was inquired, filter out best word data with way of contrast, cut the sound module for this and handle with this root or affixe content.
7. speech synthesis system as claimed in claim 6 is characterized in that, if exist word data and this root and affixe data in full accord in this database, is best word data.
8. speech synthesis system as claimed in claim 6 is characterized in that, when having a plurality of word candidate data, is the word data of the best with word-building type unanimity and difference and this root or affixe reckling.
9. a phoneme synthesizing method is applicable in the speech synthesis system, and this method comprises:
(1) providing parsing module, be used to analyze the word-building form of word, and carry out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming;
(2) provide enquiry module, be used for decomposition result,, obtain corresponding speech sound waves data at this each root and the affixe word data that inquiry is correlated with from this database respectively according to this parsing module;
(3) providing and cut the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that sound is cut with the word data that this enquiry module inquired by unit with the syllable, obtains the pairing voice data of this root and affixe; And
(4) provide synthesis module, be used for process this and cut the resulting speech sound waves data of sound resume module and carry out permutation and combination, be combined to form man-to-man corresponding relation, synthesize the voice data of this word with this root and affixe form.
10. phoneme synthesizing method as claimed in claim 9 is characterized in that this speech synthesis system is applicable to e-dictionary.
11. phoneme synthesizing method as claimed in claim 9 is characterized in that, the data storage that each step of this method generates is in database.
12. phoneme synthesizing method as claimed in claim 11 is characterized in that, this database also is used to store many word data and corresponding speech sound waves data thereof.
13. phoneme synthesizing method as claimed in claim 9 is characterized in that, in this step (1), this parsing module is the word-building rule according to word, and word is decomposed into combination by root and at least one affixe be combined into.
14. phoneme synthesizing method as claimed in claim 9 is characterized in that, in this step (2), this enquiry module is according to this root or affixe content, retrieves all and comprise one of them word data of this root and affixe at least from this database.
15. phoneme synthesizing method as claimed in claim 14, it is characterized in that, in this step (2), also comprise the screening unit is provided, according to root and the affixe data that this parsing module decomposited, from all word data relevant that this enquiry module inquired, filter out best word data by contrast, cut the sound module for this and handle with this root or affixe content.
16. phoneme synthesizing method as claimed in claim 15, it is characterized in that, in this step (2), the comparing result when the screening unit is when existing word data and this root and affixe data in full accord in this database, and then this word data is best word data.
17. phoneme synthesizing method as claimed in claim 15, it is characterized in that, in this step (2), when having a plurality of word candidate data, this screening unit then is that word-building type unanimity and difference and this root or affixe reckling are best word data with the comparing result.
CNB2004100871367A 2004-11-01 2004-11-01 Speech synthesis system and method Expired - Fee Related CN100517463C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100871367A CN100517463C (en) 2004-11-01 2004-11-01 Speech synthesis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100871367A CN100517463C (en) 2004-11-01 2004-11-01 Speech synthesis system and method

Publications (2)

Publication Number Publication Date
CN1770261A CN1770261A (en) 2006-05-10
CN100517463C true CN100517463C (en) 2009-07-22

Family

ID=36751506

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100871367A Expired - Fee Related CN100517463C (en) 2004-11-01 2004-11-01 Speech synthesis system and method

Country Status (1)

Country Link
CN (1) CN100517463C (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645190B (en) * 2009-07-22 2011-03-30 合肥讯飞数码科技有限公司 Word inquiring system and inquiring method thereof
CN103680261B (en) * 2012-08-31 2017-03-08 英业达科技有限公司 Lexical learning system and its method
CN105531757B (en) * 2013-09-20 2019-08-06 株式会社东芝 Voice selecting auxiliary device and voice selecting method
CN108962218A (en) * 2017-05-27 2018-12-07 北京搜狗科技发展有限公司 A kind of word pronunciation method and apparatus
CN109271037B (en) * 2017-07-13 2022-09-09 北京搜狗科技发展有限公司 Method and device for establishing error correction word bank
CN109545014A (en) * 2018-12-28 2019-03-29 杭州晶智能科技有限公司 A kind of foreign language word exercising method based on interactive voice
CN110444190A (en) * 2019-08-13 2019-11-12 广州国音智能科技有限公司 Method of speech processing, device, terminal device and storage medium
CN111681467B (en) * 2020-06-01 2022-09-23 广东小天才科技有限公司 Vocabulary learning method, electronic equipment and storage medium
CN112434521A (en) * 2020-11-13 2021-03-02 北京搜狗科技发展有限公司 Vocabulary processing method and device

Also Published As

Publication number Publication date
CN1770261A (en) 2006-05-10

Similar Documents

Publication Publication Date Title
US6243680B1 (en) Method and apparatus for obtaining a transcription of phrases through text and spoken utterances
EP0867859B1 (en) Speech recognition language models
Allauzen et al. General indexation of weighted automata-application to spoken utterance retrieval
US6910012B2 (en) Method and system for speech recognition using phonetically similar word alternatives
US6092044A (en) Pronunciation generation in speech recognition
EP1949260B1 (en) Speech index pruning
Mandal et al. Recent developments in spoken term detection: a survey
US8412528B2 (en) Back-end database reorganization for application-specific concatenative text-to-speech systems
EP0867858A2 (en) Pronunciation generation in speech recognition
WO2001009879A1 (en) Expanding an effective vocabulary of a speech recognition system
CN101515457A (en) Speech recognition on large lists using fragments
CN104485107B (en) Audio recognition method, speech recognition system and the speech recognition apparatus of title
KR20050076712A (en) Segmental tonal modeling for tonal languages
US5689617A (en) Speech recognition system which returns recognition results as a reconstructed language model with attached data values
JP2008532099A (en) Computer-implemented method for indexing and retrieving documents stored in a database and system for indexing and retrieving documents
CA2222582C (en) Speech synthesizer having an acoustic element database
CN100517463C (en) Speech synthesis system and method
Lileikytė et al. Conversational telephone speech recognition for Lithuanian
US20020040296A1 (en) Phoneme assigning method
Ordelman et al. Compound decomposition in dutch large vocabulary speech recognition.
HaCohen-Kerner et al. Language and gender classification of speech files using supervised machine learning methods
US5987412A (en) Synthesising speech by converting phonemes to digital waveforms
US5970454A (en) Synthesizing speech by converting phonemes to digital waveforms
Möbius et al. The Bell Labs German text-to-speech system: an overview
US20060074673A1 (en) Pronunciation synthesis system and method of the same

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090722

Termination date: 20101101