Embodiment
For principle of the present invention is described, the embodiment of Fig. 1 shows a spelling letter-pronunciation maker.To more fully introduce below, mixing decision tree of the present invention except pronunciation maker described herein, can also be applied in the multiple different application.Selecting the pronunciation maker to illustrate is because it can give prominence to a lot of aspects and the benefit of mixing the decision tree structure.
The pronunciation maker adopts two stages, and the 1st stage was adopted one group of simple alphabetical decision tree 10, and the 2nd stage adopted one group to mix decision tree 12.List entries 14 is sent into dynamic programming aligned phoneme sequence maker 16 such as alphabetical sequence B-I-B-L-E.The simple letter-tree 10 of sequence generator utilization generates the spell out the words list of pronunciations 18 of the candidate scheme that may pronounce of list entries of an expression.
Sequence generator is each letter in the checking sequence successively, uses the decision tree that interrelates with this letter during inspection so that select a kind of phoneme pronunciation according to the probability data that is included in the simple letter-tree.
Preferably the simple alphabetical decision tree of this group comprises each alphabetical decision tree in the alphabet.Fig. 2 illustrates the example of the simple alphabetical decision tree of letter e.Decision tree comprises a plurality of interior nodes (among the figure with ellipse representation) and a plurality of leaf node (representing with rectangle among the figure).Each interior nodes is with whether problem is expanded.Whether problem is a kind of problem of answering "Yes" or "No".These problems are pointed to given letter (is letter e in this occasion) and its adjacent letters in the list entries in simple letter-tree.Note among Fig. 2 each interior nodes transfer to left or right-hand be according to answer is "yes" or "No" is decided to relevant issues.
Employed abbreviation is as follows among Fig. 2: the numeral in the problem, and as the position of current relatively letter in "+1 " or " 1 " representative spelling.Such as, "+1L=' R '? " represent " back of current letter (is letter e in this occasion) is R? "Abbreviation CONS and VOW represent the type of letter, i.e. consonant and vowel.Lack adjacent letters, i.e. zero letter, situation with symbol "-" representative, it some letters and and during corresponding phoneme pronunciation contraposition as filler or erect-position character.Symbol " # " is represented word boundary.
What fill in the leaf node is probability data, and on behalf of the numerical value of probability of the orthoepy of given letter, these probability datas connect possible phoneme pronunciation and the concrete phoneme of expression.Such as, symbol " iy=>0.51 " representative " probability of the phoneme ' iy ' in this leaf is 0.51 ".Zero phoneme, promptly silent sound is represented by symbol "-".
Sequence generator 16 (Fig. 1) is exactly to utilize simple alphabetical decision tree 10 to construct one or more pronunciation hypothesis schemes, and they are stored in the tabulation 18.Preferably a digit score all get in touch in each pronunciation, and this scoring is that the probability score by each single phoneme that will utilize simple alphabetical decision tree 10 selections combines and obtains.The pronunciation of word can utilize dynamic programming to select n-optimal candidate scheme to mark by constructing a matrix that may make up then.Selecting another method of n-optimal candidate scheme is to utilize a kind of method of replacing, promptly at first confirms the word candidates scheme of probability maximum, generates additional candidate scheme by following iterative replacement then.
At first select to have the pronunciation of maximum probability scoring, its method is that the higher assessment that will confirm by the check leaf node divides each corresponding scoring of phoneme to multiply each other, and utilizes this selection as maximum candidate scheme of probability or the 1st best word candidates scheme then.Select additional (n-the best) candidate scheme afterwards, method is by checking the phoneme data in the leaf node to confirm this phoneme again, is not selected originally, but with the phoneme differences minimum of originally selecting.Replace the phoneme of originally selecting and generate the second best word candidates scheme with the phoneme of this difference minimum then.But the said process iteration repeats, up to selecting till the n-optimal candidate scheme of number required.Tabulation 18 can be by scoring descending sort, so become the 1st by simple alphabetical analysis and judgement for best pronunciation just appears in this tabulation.
As mentioned above, simple letter is analyzed through regular meeting and is produced bad result.This is because simple alphabetical analysis can't judge will generate what phoneme by follow-up letter at each letter place.Like this, simple alphabetical analysis can be created on the higher assessment that in fact can not occur in the natural-sounding and divides pronunciation.Produce one probably to two " ll " pronunciation of pronunciation: ah-k-ih-l-l-iy-z all such as, proper noun " Achilles ".The 2nd " l " is actually silent sound: ah-k-ih-l-iy-z in natural-sounding.Utilize the sequence generator of simple letter-tree not have the mechanism of screening the pronunciation of words that occurs never in the natural-sounding.
The target in the 2nd stage of this articulatory system is to solve above-mentioned this problem.One is mixed tree scoring estimator 20 and utilizes this group to mix the vitality that each pronunciation in the tabulation 18 evaluated in decision tree 12.The working method of this scoring estimator is each letter in the sequential search list entries and gives phoneme by sequence generator 16 to each letter simultaneously.
With simple letter-tree collection class seemingly, mixing the tree set has one to mix tree to each letter in the alphabet.An example of mixing tree is shown in Fig. 3.Similar with simple letter-tree, mix tree and have interior nodes and leaf node.Interior nodes is with ellipse representation among Fig. 3, and leaf node is represented with rectangle.Whether each interior nodes is filled with problem, and what fill in each leaf node is probability data.Though mix the tree construction of tree and the structural similarity of simple letter-tree, a serious difference arranged.The interior nodes of mixing tree can comprise two different class problems.Interior nodes can comprise about the given letter in the sequence and the problem of adjacent letters thereof, maybe can comprise the phoneme that interrelates with this letter and with the problem of the corresponding adjacent phoneme of this sequence.Thus decision tree be mix just be that it comprises the problem of mixing class.
Employed abbreviation has just increased some additional abbreviations again with employed similar in Fig. 2 in Fig. 3.Symbol L representative is about the problem of letter and adjacent letters thereof.Symbol P representative is about the problem of phoneme and adjacent phoneme thereof.Such as, problem "+1L=' D '? " represent " letter in+1 position is ' D '? "Abbreviation CONS and SYL are the phoneme types, i.e. consonant and syllable.Such as, problem "+1P=CONS? " represent " phoneme in+1 position is a consonant? "Numeral in the leaf node provides the probability of phoneme, similar with in the simple letter-tree.
Mix tree scoring estimator according to mixing tree problem and utilizing the probability in this mixing leaf nodes that each pronunciation of tabulating in 18 is marked again.As needs, the pronunciation table can be stored with corresponding scoring becomes tabulation 22.As needs, 22 storages of this can being tabulated become descending sort, so the 1st the pronunciation scoring of listing is the highest.
It is different with the pronunciation that occupies the highest scoring position in tabulation 18 to occupy the pronunciation of the highest scoring position under many circumstances in tabulation 22.This is that the mixing tree scoring estimator of tree 12 will not comprise self-congruent aligned phoneme sequence pronunciation or the pronunciation that can not occur in natural-sounding has been screened because utilize to mix.
As needs, selector module 24 addressable tabulations 22 are so that the one or more pronunciations in the retrieval tabulation.Usually selector switch 24 can be retrieved out and it can be provided as output pronunciation 26 pronunciation that has the highest scoring.
As mentioned above, a kind of possible embodiment that just uses mixing tree of the present invention that the pronunciation maker shown in Fig. 1 is represented.As another kind of embodiment, dynamic programming aligned phoneme sequence maker 16 and the simple alphabetical decision tree 10 that is associated with it can save in the application that one or more pronunciations of given word sequence have existed.When having the pronunciation dictionary of exploitation in advance, just may run into this situation.Mix tree scoring estimator 20 in this occasion, with the mixing decision tree 12 that is associated with it, can be used to the entry of this pronunciation dictionary is marked, identify entry, thereby the suspicious entry of the dictionary of constructing is identified with low scoring.Such system, such as, can combine with lexicographer's the tool of production.
An output pronunciation of selecting 22 from tabulating or a plurality of output pronunciation can be used to form the pronunciation dictionary that speech recognition and phonetic synthesis are used.In the linguistic context of speech recognition, the word that pronunciation dictionary can not find in recognizer uses to the recognizer vocabulary in the training stage provides pronunciation.In the phonetic synthesis linguistic context, pronunciation dictionary can be used to generate and is used to connect the phoneme sound of reading.This system, such as, can be used to strengthen the function of Email (E-mail) reader or other text-voice application.
This mixing tree points-scoring system of the present invention can be used for that one of a lot of needss pronounce or the one group of multiple application that may pronounce in.Such as, the user squeezes into a word in dynamic online dictionary, and system just can provide a possible list of pronunciations by probability sorting.This points-scoring system also can be used as the feedback tool of user language learning system.Langue leaning system with speech-sound synthesizing function can be used to show that the attempt that spells out the words and analyze reads people's the pronunciation of the pronunciation of the word in this newspeak, and most probable and the impossible pronunciation of system when can be the user he or she being provided the pronouncing of this word.
Generate decision tree
Generate simple letter-tree shown in Fig. 4 and mix the system that sets.At the center of decision tree generation system is tree maker 40.Tree maker utilization be the tree growth algorithm that can operate one group of training data 42 that the system development personnel provide in advance.Usually training data comprises that the alphabetical phoneme corresponding with the known orthoepy of word is to aiming at permutation table.Training data can generate by aligning alignment processes shown in Figure 5.Fig. 5 illustrates the aligning alignment processes that routine speech BIBLE is carried out.Spell out the words 44 and 46 phonemes that are sent to the letter that will spell out the words and corresponding pronunciation that pronounce aim at the dynamic programming alignment modules 48 of arranging.Please note that E last in the example that illustrates is silent sound.Then with alphabetical phoneme to being stored as data 42.
See Fig. 4 back, tree maker wherein and 3 additional ingredients: one group of possible whether problem 50, one group is used to each node to select best problem or judges whether this node should be the rule 52 of leaf node, and the pruning method 53 that is used to prevent over training, collaborative work together.
This group possible whether problem may comprise alphabetical problem 54 and phoneme problem 56, this depends on being simple letter-tree or mixing tree of growing.When the growth be simple letter-tree the time, only use alphabetical problem 54; And when growth be when mixing tree, both can use alphabetical problem 54, also can use phoneme problem 56.
The rule that is used for selecting best problem to fill each node of preferred embodiment is now followed the design of Gini criterion.Also can use other split criterion to replace it.Understanding can be with reference to Breiman about the more situation of split criterion, people's such as Friedman " Classification and Regression Trees (classification and regression tree) ".Basically, but the Gini criterion is that to be used for selecting a problem and using decision node from one group of possible whether problem 50 be the stopping rule of leaf node.The Gini criterion is used the notion of a kind of being called " impure foreign matter (impurity) ".Impure foreign matter is a nonnegative number forever.It is applied to node make comprise phase geometric ratio part might category node have maximum impure foreign matter, only comprise a kind of node that may category and have zero impure foreign matter (minimum possible value).The function that can satisfy above-mentioned condition has several.They are decided by the counting of each category of intranodal, and the impure foreign matter of Gini can be defined as follows.Suppose that C is the class set that data item can belong to, and T is a current tree node, makes that f (1|T) is the ratio part that belongs to the training data item of class 1 among the node T, and f (2|T) is the ratio part or the like that belongs to the training data item of class 2 in the node.So have:
In order to illustrate with example, supposing the system is tree of letter " E " growth.On the given node T of this tree, system, such as, can there be 10 example explanations " E " in word, how to pronounce.In 5 of these examples, the pronunciation of " E " is " iy " (sound of " ee " in " cheeze "); In 3 of these examples, the pronunciation of " E " is " eh " (" e " sound in " bed "); And " E " is "-" (promptly as the silent sound in " maple ") in remaining 2 example.
Suppose that this system considers two possible whether problem Q that can be applicable to 10 examples
1With and Q
2To Q
1The clauses and subclauses of answering "Yes" comprise the example of 4 " iy " and 1 "-" example, and (all the other 5 clauses and subclauses are to Q
1Answer "No").To Q
2The clauses and subclauses of answering "Yes" comprise the example of 3 " iy " and 3 " eh " examples, and (all the other 4 clauses and subclauses are to Q
2Answer "No").Fig. 6 has schematically compared both of these case.
The Gini criterion is answered this system should select which problem, Q to this node
1Or Q
2Select the Gini criterion of correct problem to be: the problem of obtaining should be the reduction maximum that can make at impure foreign matter when close node moves towards child node.The reduction Δ T of this impure foreign matter is defined as Δ I=i (T)-p
Yes* i (yes)-p
No* i (no), p wherein
YesBe ratio part of the clauses and subclauses of trend " yes " child node, and p
NoIt is ratio part of the clauses and subclauses of trend " no " child node.
To above-mentioned example application Gini criterion:
So to Q
1Drawing Δ I is:
i(T)-p
yes(Q
1)=1-0.8
2-0.2
2=0.32
I (T)-p
No(Q
1)=1-0.2
2-0.6
2So=0.56, Δ I (Q
1)=0.62-0.5*0.32-0.5*0.56=0.18.To Q
2, I (yes, Q are arranged
2)=1-0.5
2-0.5
2=0.5, and (to) i (no, Q
2)=(is same)=0.5.So, Δ I (Q
2)=0.6-(0.6) * (0.5)-(0.4) * (0.5)=0.12.At this occasion, Q
1Provide the reduction of maximum impure foreign matter.To select it to replace Q
2 Regular collection 52 explanations are to produce maximum that problem that reduces of impure foreign matter when moving towards its child node by close node for the best problem of node.
Tree maker application rule 52 is grown and whether is selected the decision tree of problem from gathering 50.This maker makes the tree growth till the tree that grows optimum dimension with continuing.Rule 52 comprises one group when setting the stopping rule that the growth that can make tree when growing into preliminary dimension stops.The size that the tree growth is reached in this preferred embodiment is greater than desired dimension limit.Pruning method 53 is used for hedge clipper to desirable size.Pruning method can adopt the Breiman method described in the list of references of quoting in the above.
So the tree maker is created on the simple letter-tree that roughly illustrates in 60, or is created on the mixing tree that roughly illustrates in 70, this depends on possible whether whether problem set 50 only comprises simple alphabetical problem still is mixed with the phoneme problem.The corpus of training data 42 comprises letter, and phoneme is right, as mentioned above.When the simple letter-tree of growth, only use these right letter parts to expand interior nodes.Otherwise when growth mixed tree, the letter of training data and phoneme part all can be used to expand interior nodes.In both cases, these right phoneme parts all can be used to fill leaf node.The probability data that is associated with these phoneme datas in the leaf node is to aim at the number of times of arranging with a given letter in whole training data corpus and generate by counting a given phoneme.
Letter-pronunciation the decision tree that is generated by said method can be stored in the storer so that be applied in the application of multiple different speech processes.Though it is of a great variety that this class is used, below enumerate several examples so that some function and the advantage of outstanding these trees.
Fig. 6 illustrates and utilizes simple letter-tree and mix both generate pronunciation from the alphabetical sequence that spells out the words the situation of setting.Though the embodiment that illustrates uses simple letter-tree simultaneously and mixes two parts of tree, other application can only be used a kind of and not use another kind.In the illustrated embodiment, simple letter-tree set is stored within the storer 80, is stored within the storer 82 and mix tree.In a lot of the application, each letter in the alphabet all there is a tree.84 pairs of list entries 86 of dynamic programming sequence generator are operated so that generate pronunciation according to simple letter-tree 80 88.In fact, each letter in the list entries is all considered separately, and is utilized applicable simple letter-tree to select for this letter the pronunciation of probability maximum.As top explanation, simple letter-tree will be putd question to a series of whether problem about given letter and adjacent letter thereof.After all letters in sequence have all passed through and considered, just generate the pronunciation that obtains at last by connecting by the phoneme that sequence generator is selected.
Improve pronunciation and can use mixing tree collection 82.Simple letter-tree is only putd question at letter, and mixes the problem that can propose about letter of setting, and also can propose the problem about phoneme.Scorer 90 can be accepted phoneme information from sequence generator 84.In this respect, sequence generator 84 can utilize simple letter-tree 80 to generate a plurality of different pronunciations, and according to its probability score separately these pronunciations is sorted.The sequencing table of pronunciation can be stored in 92 for scorer 90 visits.
Scorer 90 accepts to supply with the same list entries 86 of sequence generator 84 as input.Scorer 90 is utilized will mix in the data that are required to derive from when the phoneme problem reacted storer 92 and is set 82 problem and be applied to alphabetical sequence.The output that obtains at 94 a places normally ratio is better pronounced in the pronunciation that 88 places provide.Its reason is to mix tree to trend towards filtering the pronunciation that can not occur in natural-sounding.Produce one probably to two " ll " pronunciation of pronunciation: ah-k-ih-l-l-iy-z all such as, proper noun Achilles.The 2nd " l " is actually silent sound: ah-k-ih-l-iy-z in natural-sounding.
As needs, scorer 90 also can produce one as n the sequencing table that may pronounce at 96 places.The combination that the scoring that is associated with each pronunciation represents to give the individual probability of each phoneme in the pronunciation.In the application that these scorings itself can be used for discerning insecure pronunciation.Such as, the mark with phonetic symbols that is provided by a linguist group just can utilize the mixing tree apace any problematic pronunciation to be discerned.
Letter-sound pronunciation maker
For principle of the present invention is described, the embodiment among Fig. 8 illustrates a two stage spelling letter-pronunciation maker.As below will explaining more comprehensively, mixing decision tree method of the present invention except pronunciation maker described herein, also can be applicable in the multiple different application.This one or two stage pronunciation maker is selected to illustrate, because it can give prominence to a lot of aspects and the strong point of mixing the decision tree structure.
This one or two stage pronunciation maker comprises the 1st stage 116 and one group of the 2nd stage 120 that is used for checking the phoneme-mixing decision tree 112 of list entries 114 on the phoneme level an of application of preferably using one group of letter-sentence structure-linguistic context-dialect decision tree 110.Letter-sentence structure-linguistic context-dialect decision tree inspection relates to letter and the adjacent letters (being alphabetical relevant issues) thereof in the sequence that spells out the words; The other problems of checking is that what word is positioned at (being the linguistic context relevant issues) before or after the concrete word; The other other problems of checking is that the word in the sentence is the syntactic relation (being the sentence structure relevant issues) of other words in what part of speech and the sentence; Further another other problems of checking is to require with which kind of dialect to pronounce.Preferably being selected by the user will be by the dialect of dialect selecting arrangement 150 pronunciations.
Another kind of embodiment of the present invention comprises the alphabetical relevant issues of use and at least one word level feature (being sentence structure relevant issues or linguistic context relevant issues).Such as, an embodiment uses one group of letter-sentence structure decision tree in the 1st stage.Another one embodiment uses one group of letter-linguistic context-dialect decision tree of not checking the sentence structure of list entries.
Should understand, the word that the present invention is not limited to occur in the sentence, but also comprise other linguistics structures that show sentence structure, as the sentence or the phrase of cutting.
A list entries 114 as the alphabetical sequence of a sentence, is to be sent to text based pronunciation maker 116.Such as, list entries 114 can be following sentence: " Did youknow who read the autobiography? "
Sentence structure 115 is the contents that are input to text based pronunciation maker 116.This input content provides correct information by letter-sentence structure-linguistic context-dialect decision tree 110 for text based pronunciation maker 116.The target of sentence structure data 115 is that what part of speech each word in the list entries 114 is.Such as, the word " read " in above-mentioned list entries example will identify software module 129 by sentence structure and be designated verb (relative with noun or adjective).Sentence structure sign software engineering can obtain from " Xtag " research project of the mechanism as University Pennsylvania.In addition, following list of references has been discussed sentence structure sign software engineering: GeorgeFoster, " Statistical Lexical Disdmbiguation ", the Master of Computer Science paper, McGill University, Montreal, Canada (November11,1991).
Text based pronunciation maker 116 uses decision trees 110 to generate expression may the pronounce list of pronunciations 118 of candidate scheme of list entries that spells out the words.A pronunciation of each pronunciation (A for example pronounces) expression list entries 114 of tabulation 118 preferably comprises reading again of each word.In addition, the read-out speed of each word can be judged in this preferred embodiment.
Sentence rate calculations software module 152 is made by text based pronunciation maker 116 and is used for judging which kind of speed each word should read with.Checking the linguistic context of sentences such as, sentence rate calculator 152 should read with the speed fast or slower than normal speed so that judge some word in the sentence whether.Such as, sentence that has an exclamation mark at the sentence tail produce the word show the predetermined number before the sentence tail should have the normal duration for the short duration so that pass on the speed data of the impulsive force of admiration declarative sentence better.
Each letter and word in the text based pronunciation maker 116 sequential search sequences are used the decision tree basis that is associated with this letter or word sentence structure (or word linguistic context) and are included in the pronunciation that probability data in the decision tree is selected this letter.
Preferably letter-sentence structure-linguistic context-dialect decision tree set 110 comprises the decision tree of the sentence structure of each alphabetical and related language in the alphabet.
Fig. 9 illustrates an example of the letter-sentence structure-linguistic context-dialect decision tree 140 of the letter " E " that can be used in the word " READ ".This decision tree comprises single interior nodes (scheming this with ellipse representation) and a plurality of leaf node (representing with rectangle among the figure).Whether each interior nodes is filled with problem.Whether problem is a kind of problem of answering "Yes" or "No".These problems are pointed in letter-sentence structure-linguistic context-dialect decision tree 140: its adjacent letters in a given letter (is letter " E " in this occasion) and the list entries; Or the sentence structure (being noun, verb or the like) of this word in the sentence; Or the linguistic context of sentence and dialect.Note among Fig. 9 each interior nodes be branched off into left or right-hand be according to answer is "yes" or "No" is decided to relevant issues.
Preferably the problem of the 1st interior nodes is about reading the problem of employed dialect.Interior nodes 138 is just represented such problem.If will use southern dialect to read, then by southern dialect tree 139, it generates the phoneme value that more can represent southern dialect at leaf node at last.
Employed abbreviation is as follows among Fig. 9: the numeral in the problem, and as the position of current relatively letter in "+1 " or " 1 " representative spelling.Symbol L representative is about the problem of letter and its adjacent letters.Such as, " 1L=' R ' or (or) ' L '? " represent " front of current letter (is letter e in this occasion) is L or R? "Abbreviation CONS and VOW represent the type of letter, i.e. consonant and vowel.Symbol " # " is represented word boundary.The sentence structure sign of i word of item ' tag (i) ' expression, wherein i=0 represents current word, and i=-1 represents previous word, and i=+1 represents back word or the like.Like this, " tag (0)=PRES? " represent " current word is the verb of present tense? "
What fill in the leaf node is probability data, and on behalf of the numerical value of probability of the orthoepy of given letter, these probability datas connect possible phoneme pronunciation and the concrete phoneme of expression.Zero phoneme, promptly silent sound is represented by symbol "-".
Such as, " E " among present tense verb " READ " and " LEAD " gives its correct pronunciation " iy " by decision tree 140 with probability 1.0 at leaf node 142." E " in the past tense of " READ " (such as " Who read a book ") then is endowed pronunciation " eh " at leaf node 144 with probability 0.9.
Decision tree 110 (among Fig. 8) preferably comprises the linguistic context related question.Such as, the linguistic context related question of interior nodes can check whether word " you " front is word " did ".In this linguistic context, " y " in " you " normally sends out the sound of " ja " in spoken language.
The present invention also generates the data of the indication rhythm and reads tone, tubbiness or pause so that pass on again when reading a sentence.The sentence structure related question helps to judge the reading again of phoneme, tone and tubbiness.Such as, whether the 1st word in interior nodes 141 (among Fig. 9) the inquiry sentence is the query synonym, as interrogative sentence " who read a book? " in " who ". because the 1st word in example sentence is a query synonym in this example, so just select to have the leaf node 144 that phoneme is read again.Leaf node 146 is represented another, the selection of phoneme anacrusis.
As the another one example, at yet, the phoneme of the ultima of last word often has a key signature so that express the query meaning of sentence more naturally in the sentence.Another example is to comprise the present invention who pauses naturally that can be contained in when reading a sentence.The present invention by the inquiry relevant punctuate, as comma and fullstop, problem and can comprise this pause.
So text based pronunciation maker 116 (Fig. 8) just can utilize decision tree 110 to construct one or more pronunciation hypothesis schemes and be stored in the tabulation 118.Preferably a digit score all get in touch in each pronunciation, and this scoring is that the probability score by each single phoneme that will utilize decision tree 110 selections combines and obtains.The pronunciation of word can utilize dynamic programming to select n-optimal candidate scheme to mark by constructing a matrix that may make up then.
Selecting another method of n-optimal candidate scheme is to utilize a kind of method of replacing, promptly at first confirms the word candidates scheme of probability maximum, generates additional candidate scheme by following iterative replacement then.At first select to have the pronunciation of maximum probability scoring, its method divides each corresponding scoring of phoneme to multiply each other (confirming by the check leaf node) higher assessment, utilizes this selection as maximum candidate scheme of probability or the 1st best word candidates scheme then.Select additional (n-the best) candidate scheme afterwards, method is by checking the phoneme data in the leaf node to confirm this phoneme again, is not selected originally, but with the phoneme differences minimum of originally selecting.Replace the phoneme of originally selecting and generate the second best word candidates scheme with the phoneme of this difference minimum then.But the said process iteration repeats, up to selecting till the n-optimal candidate scheme of number required.Tabulation 118 can be by scoring descending sort, so become the 1st by simple alphabetical analysis and judgement for best pronunciation just appears in this tabulation.
Decision tree 110 often can only generate goodish result.This is because these decision trees can't judge will generate what phoneme by follow-up letter at each letter place.Like this, decision tree 110 can be created on the higher assessment branch pronunciation that in fact can not occur in the natural-sounding.Produce one probably to two " ll " pronunciation of pronunciation: ah-k-ih-l-l-iy-z all such as, proper noun " Achilles ".The 2nd " l " is actually silent sound: ah-k-ih-l-iy-z in natural-sounding.Utilize the pronunciation maker of decision tree 110 not have the mechanism of screening the pronunciation of words that occurs never in the natural-sounding.
The target in the 2nd stage 120 of this articulatory system 108 is to solve above-mentioned this problem.A phoneme mixes tree scoring estimator 120 and utilizes this group phoneme mixing decision tree 112 to evaluate the vitality of each pronunciation in the tabulation 118.The working method of this scoring estimator 120 is each letter in the sequential search list entries 114 and gives phoneme by text based pronunciation maker 116 to each letter simultaneously.
Phoneme mixes tree scoring estimator 120 according to phoneme mixing tree problem 112 and utilize the probability in this mixing leaf nodes that each pronunciation of tabulating in 118 is marked again.As needs, 122 storages of this can being tabulated become descending sort, so the 1st the pronunciation scoring of listing is the highest.
It is different with the pronunciation that occupies the highest scoring position in tabulation 118 to occupy the pronunciation of the highest scoring position under many circumstances in tabulation 122.This is that tree scoring estimator 120 will not comprise self-congruent aligned phoneme sequence pronunciation or the pronunciation that can not occur in natural-sounding has been screened because the phoneme that utilizes phoneme to mix tree 112 mixes.
In this preferred embodiment, phoneme mixes tree scoring estimator 120 and utilizes sentence rate calculator 152 to determine the speed data of the pronunciation in the tabulation 122.In addition, estimator 120 utilizes permission to propose to have the problem and the permission about dialect of examine to mix tree at leaf node with the phoneme aspect the similar mode of said method is determined stressed and other rhythms.
As needs, selector module 124 addressable tabulations 122 are with the one or more pronunciations in the retrieval tabulation.Usually selector switch 124 can be retrieved out and it can be provided as output pronunciation 126 pronunciation that has the highest scoring.
As mentioned above, a kind of possible embodiment that just uses mixing tree of the present invention that the pronunciation maker shown in Fig. 8 is represented.In another kind of embodiment, an output pronunciation of selecting 122 from tabulating or a plurality of output pronunciation can be used to form the pronunciation dictionary that speech recognition and phonetic synthesis are used.In the linguistic context of speech recognition, the word that pronunciation dictionary can not find in recognizer uses to the recognizer vocabulary in the training stage provides pronunciation.In the phonetic synthesis linguistic context, pronunciation dictionary can be used to generate and is used to connect the phoneme sound of reading.This system, such as, can be used to strengthen the function of Email (E-mail) reader or other text-voice application.
This mixing tree points-scoring system of the present invention (i.e. letter, sentence structure, linguistic context and phoneme) can be used for that one of a lot of needs pronounce or the one group of multiple application that may pronounce in.Such as, the user squeezes into a sentence in dynamic language study, and system just can provide a possible list of pronunciations by probability sorting for this sentence.This points-scoring system also can be used as the feedback tool of user language learning system.Langue leaning system with speech-sound synthesizing function can be used to show a spelling sentence and analyzes people's pronunciation that the pronunciation of the sentence in this newspeak is read in attempt, and most probable and the impossible pronunciation of system when can be the user he or she being provided the pronouncing of this sentence.
Though the description of this invention carries out with its currently preferred forms, should understand this mixing tree articulatory system can have multiple application.Therefore, can not break away from the spirit of following claim and the present invention is necessarily revised and changes.