US6208968B1 - Computer method and apparatus for text-to-speech synthesizer dictionary reduction - Google Patents

Computer method and apparatus for text-to-speech synthesizer dictionary reduction Download PDF

Info

Publication number
US6208968B1
US6208968B1 US09/212,874 US21287498A US6208968B1 US 6208968 B1 US6208968 B1 US 6208968B1 US 21287498 A US21287498 A US 21287498A US 6208968 B1 US6208968 B1 US 6208968B1
Authority
US
United States
Prior art keywords
entry
dictionary
phoneme
string
grapheme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US09/212,874
Inventor
Anthony J. Vitale
Ginger Chun-Che Lin
Thomas Kopec
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Development Co LP
Original Assignee
Compaq Computer Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Compaq Computer Corp filed Critical Compaq Computer Corp
Priority to US09/212,874 priority Critical patent/US6208968B1/en
Assigned to DIGITAL EQUIPMENT CORPORATION reassignment DIGITAL EQUIPMENT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KOPEC, THOMAS, LIN, GINGER CHUN-CHE, VITALE, ANTHONY J.
Priority to US09/795,070 priority patent/US6347298B2/en
Application granted granted Critical
Publication of US6208968B1 publication Critical patent/US6208968B1/en
Assigned to COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P. reassignment COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: COMPAQ COMPUTER CORPORATION, DIGITAL EQUIPMENT CORPORATION
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: COMPAQ INFORMANTION TECHNOLOGIES GROUP LP
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Definitions

  • a “speech synthesizer” is a computer device or system for generating audible speech from written text. That is, a written form of a string or sequence of characters (e.g., a sentence) is provided as input, and the speech synthesizer generates the spoken equivalent or audible characterization of the input.
  • the generated speech output is not merely a literal reading of each input character, but a language dependent, in-context verbalization of the input. If the input was the phone number (508) 691-1234 given in response to a prior question of “What is your phone number?”, the speech synthesizer does not produce the reading “parenthesis, five hundred eight, close parenthesis, six hundred ninety-one . . . ” Instead, the speech synthesizer recognizes the context and supporting punctuation and produces the spoken equivalent “five (pause) zero (pause) eight (pause) six . . . ” just as an English-speaking person normally pronounces a phone number.
  • the first speech synthesizers were formed of a dictionary, engine and digital vocalizer.
  • the dictionary served as a look-up table. That is, the dictionary cross referenced the text or visual form of a character string (e.g., word or other unit) and the phonetic pronunciation of the character string/word.
  • the visual form of a character string unit e.g., word
  • the phonetic pronunciation or phoneme of character string units is indicated by symbols from a predetermined set of phonetic symbols.
  • the engine is the working or processing member that searches the dictionary for a character string unit (or combination thereof) matching the input text.
  • the engine performs pattern matching between the sequence of characters in the input text and the sequence of characters in “words” (character string units) listed in the dictionary.
  • the engine obtains from the dictionary entry (or combination of entries) of the matching word (or combination of words), the corresponding phoneme or combination of phonemes.
  • the purpose of the engine is thought of as translating a grapheme (input text) to a corresponding phoneme (the corresponding symbols indicating pronunciation of the input text).
  • the engine employs a binary search through the dictionary for the input text.
  • the dictionary is loaded into the computer processor physical memory space (RAM) along with the speech synthesizer program.
  • RAM computer processor physical memory space
  • the memory footprint i.e., the physical memory space in RAM needed while running the speech synthesizer program, thus must be large enough to hold the dictionary.
  • the dictionary portion of today's speech synthesizers continue to grow in size, the memory footprint is problematic due to the limited available memory (RAM and ROM) in some/most applications.
  • the digital vocalizer receives the phoneme data generated by the engine. Based on the phoneme data together with timing and stress data, the digital vocalizer generates sound signals for “reading” or “speaking” the input text. Typically, the digital vocalizer employs a sound and speaker system for producing the audible characterization of the input text.
  • the dictionary is replaced by a rule set.
  • the rule set is used in combination with the dictionary instead of completely substituting therefor.
  • the rule set is a group of statements in the form
  • Each such statement determines the phoneme for a grapheme that matches the IF condition.
  • rule-based speech synthesizers are DECTALK by Digital Equipment Corporation of Maynard, Mass. and TrueVoice by Centigram Communications of San Jose, Calif.
  • rule sets reduces the number of entries required in a dictionary for a speech synthesizer system, the dictionaries remain relatively large in size (i.e., number of entries) compared to other parts of the system requiring memory. This is problematic because dictionaries must be completely stored in memory during the speech synthesis process to ensure fast and efficient look-up of entries if needed.
  • Dictionaries used by text-to-speech synthesis systems may grow to become quite large. Dictionary size depends on how many words or word portions in a particular language are determined to be too complex, too difficult or too time consuming to translate into phonemes by rule set processing alone. Such words or word portions are candidates to be included as entries in the dictionary. However, certain problems are encountered when large dictionaries are used in text-to-speech synthesis systems as mentioned above.
  • the invention recognizes the problems with prior art text-to-speech synthesis systems that use dictionaries and provides a method and apparatus to reduce the overall size of the dictionaries used in such systems.
  • the invention uses a two phase dictionary reduction process to eliminate entries that are not required in the dictionary.
  • phase one any entries in the dictionary with respective phonemes that can be fully generated by rules in a rule set are marked or indicated to be deleted from the dictionary.
  • phase two any entries in the dictionary, called root word entries, that can provide phonemes for the text-to-speech translation process of larger (longer) entries are marked or indicated to be saved in the dictionary, and the entries of longer character strings that can be translated using the shorter root word entries in conjunction with rules are indicated to be deleted from the dictionary.
  • the invention aggregates the entries marked to be saved or removes the entries marked to be deleted and the resulting set of entries is stored as the reduced dictionary.
  • Phase one or phase two of the invention each may be performed independently, followed by the aggregation step.
  • phase one may be followed by phase two and then by the aggregation process.
  • the invention method and apparatus In order for embodiments of phase one to determine if the phoneme of an entry in the dictionary can be fully generated (and hence the dictionary entry can be fully matched) by using the rule set, the invention method and apparatus generate a rule-based phoneme string for the grapheme string of the subject entry and then determine if the rule-based phoneme string matches the corresponding phoneme string of the entry. If there is a match, the subject entry is indicated to be deleted from the dictionary, thus reducing overall dictionary size. Since rules alone can produce the required phoneme string for the subject entry, the invention recognizes that there is no need for the entry to remain in the dictionary.
  • Embodiments of phase one may also check if the grapheme string of a dictionary entry is a homograph. If so, the preferred embodiment skips to the next entry in the dictionary for processing.
  • a homograph is a word that can be pronounced two different ways but which has one spelling, such as “abstract”, “wind”, and “record”. Due to multiple pronunciations, homograph dictionary entries are skipped since they may have more than one associated phoneme string.
  • the correct phoneme string is selected from a homograph dictionary entry based on the context of surrounding language in the text being translated.
  • Embodiments of phase two determine if dictionary entries, referred to as root word entries, are required in the dictionary. This is accomplished by the invention combining grapheme and phoneme strings of the root word entry from the dictionary with respective grapheme and phoneme portions of an affix rule of an affix rule set of the speech syntheses system. This step of combining forms a grapheme combination and phoneme combination pair. Phase two then determines if the grapheme combination and phoneme combination pair exists as another matching entry in the dictionary, and if so, indicates the root word entry to be saved in the dictionary. The matching entry is thus marked for removal/deletion.
  • phase two saves root words in the dictionary that can be used to assist in the translation of another longer word (the matching entry) in conjunction with rule-based processing, and removes the matching entries from the dictionary which can be correctly translated with a combination of rule processing and root word phonemes.
  • phase two select and process each root word entry in the dictionary. Specifically for each root word entry, the invention combines the grapheme string of the root word entry with the grapheme portion of the affix rule to form a grapheme combination, and combines the phoneme string of the root word entry with the phoneme portion of the affix rule to form a phoneme combination. Then phase two determines if the grapheme combination exists as a matching grapheme string in an entry in the dictionary. If so, the invention obtains the corresponding phoneme string as a matching phoneme string for the matching entry.
  • phase two determines if the phoneme combination matches the matching phoneme string, and if so, indicates the root word entry to be saved in the dictionary.
  • the root words that are saved in the dictionary are root words that can be used in the translation of the other matching entries.
  • Phase two also determines if the matching entry has been indicated to be saved in the dictionary. If not, the invention indicates the matching entry to be deleted from the dictionary. As such, phase two reduces the dictionary size by determining which entries rely on phonemes of root words, and saves the root words and deletes entries that can be matched by the root words and rule processing.
  • the invention By using either phase one or phase two alone, or phase one followed by phase two, the invention reduces the number of entries in a dictionary.
  • the invention computer method and apparatus forms a reduced (i.e., smaller in size) dictionary.
  • the reduced dictionary is adaptable to text-to-speech synthesis applications requiring minimal storage space, entry search time, and dictionary load time.
  • FIG. 1 schematically illustrates the operation of a text-to-speech synthesis system using rule sets and a dictionary to translate words in text to electronically generated speech.
  • FIG. 2 is a flow diagram illustrating the two phases of the dictionary reduction process of the invention.
  • FIG. 3 is a flow chart illustrating the steps involved in phase one of the dictionary reduction process of FIG. 2 according to the invention.
  • FIG. 4 is a flow chart illustrating the steps involved in phase two of the dictionary reduction process of FIG. 2 according to the invention.
  • FIG. 1 illustrates the general operation of a typical computerized text-to-speech synthesis system 100 that uses a dictionary 104 that can be reduced in size by this invention.
  • the text-to-speech synthesizer 101 accepts written text 102 containing words, phrases, names, symbols and so forth as input. Speech synthesizer 101 then employs rule sets 103 a through 103 c in conjunction with dictionary 104 to translate the input text 102 into audible electronically generated speech 107 .
  • the generated speech is output through a speaker device 106 for example.
  • the present invention is a method and apparatus for eliminating unnecessary entries in dictionary 104 to reduce its overall size.
  • a dictionary reduced in size by this invention requires less storage space on disk and in memory when used during the text-to-speech translation process performed by text-to-speech synthesizer 101 . Also, since there are less entries in dictionary 104 after the reduction process of the present invention, the processing time required to load and to search the dictionary 104 may be reduced as well.
  • Table 1 illustrates a small example of entries from a dictionary, such as those that might be found in dictionary 104 .
  • the entries in Table 1 are examples and are not limitations on the present invention or speech synthesis system 100 .
  • each dictionary entry 1 through 10 contains ( i ) a grapheme (i.e., character) string portion (Column 1 ) comprising one or more graphemes, and ( ii ) a phoneme string portion (Column 2 ) comprising one or more phonemes.
  • a grapheme string corresponds to a word in the dictionary, but the term “word” as used herein does not necessarily mean the formal linguistic unit in the language of the dictionary. Rather, some words in the dictionary can be portions or segments of longer more formally, commonly known words.
  • a single grapheme is any character or symbol in the entire alphabet of the language of the dictionary, such as English.
  • a grapheme may be a letter “A” through “Z” or “a” through “z”, numbers such as ‘0’ through ‘9’, or another character or symbol such as “?”, “!”, “@”, and so forth.
  • a grapheme string is one or more graphemes appended together.
  • a phoneme is one or more character symbols used to represent a single phonetic utterance or sound that may be made when speaking the language of the dictionary.
  • the entire set of phonemes for a language represents all possible utterances that may be combined to pronounce words in that language.
  • a phoneme string is a series of phonemes appended together which represent the phonetic pronunciation of one or more corresponding graphemes (i.e., a grapheme string). As such, a correctly assembled phoneme string represents the phonetic pronunciation for the corresponding grapheme string in a given dictionary entry.
  • dictionary entry number nine has as a subject grapheme string, the word “longing”, and indicates a corresponding phoneme string of “l'cG G”. There are sub-strings (i.e., respective graphemes) in the word “longing” that correspond to each phoneme in this phoneme string.
  • example dictionary entries 1 through 10 resemble dictionary entries of words such as those found in a normal English dictionary.
  • a dictionary that can be reduced by this invention may however contain other information as well, such as word definitions, but this invention is not concerned with this other information.
  • Dictionaries that can be reduced in size by the invention can be created specifically for text-to-speech synthesis systems, or alternatively, the invention may reduce off-the-shelf commercially available dictionaries, such as those supplied on CD-ROM's for other types of application programs besides speech synthesis.
  • the dictionary to be reduced can be for any language, so long as each entry contains a grapheme string and a corresponding phoneme string.
  • a dictionary not specifically designed for use by a text-to-speech synthesis system is usually very large in size, and contains entries for most words in a language. Dictionaries in the prior art that are designed specifically for text-to-speech synthesis systems are usually larger in size than what is actually needed to perform the text-to-speech synthesis process. The invention is advantageous since it reduces both these and various other types of dictionaries.
  • rule sets (such as 103 a, b , and c in FIG. 1) are frequently used in text-to-speech synthesis systems 100 to quickly translate graphemes of words into phonemes which may then be converted to audible sounds 107 .
  • Grapheme-to-phoneme rules contained in rule sets 103 , provide a concise way to analyze a character string in the language and produce the required phonemic data for sound synthesis.
  • rules in a rule set 103 may be generic in that they may convert character strings that are generally not considered to be words worthy of existing in the dictionary 104 .
  • Each rule set 103 a through 103 c contains a number of rules in the form:
  • Each rule determines the proper corresponding phoneme(s) for a grapheme string that matches the IF condition.
  • the previously noted rule-based text-to-speech synthesizer called DECtalk from Digital Equipment Corporation of Maynard, Mass. uses rule sets 103 in combination with a dictionary 104 to translate text to speech.
  • each rule of the rule set 103 is considered with respect to the input text 102 .
  • Rule-based processing typically proceeds one word or unit of text at a time from the beginning to the end of the input text.
  • Each word or input text unit is then processed by selecting a number of graphemes (i.e. characters) from either the beginning, middle, or end of the input text 102 .
  • the graphemes selected depend upon the rule set being used. If a rule condition (“IF-Condition,” part of the rule) matches any portion of the input text 102 , then the text-to-speech synthesizer 101 determines that the rule applies. As such, the synthesizer stores the corresponding phoneme data (i.e., the phonemic result) from the rule in a working buffer.
  • the synthesizer 101 similarly processes each succeeding rule in the rule set 103 against the remaining input text 102 (i.e., remainder parts thereof) for which phoneme data is needed. After processing all of the text 102 via rules in the rule sets 103 , the working buffer holds the phoneme data corresponding to the text which may then be converted to audible speech.
  • the working buffer holds the phoneme data corresponding to the text which may then be converted to audible speech.
  • Table 2 illustrates ten example rules from a specific type of rule set, called a suffix rule set (e.g. 103 c in FIG. 1) used for English text strings.
  • Text-to-speech synthesis systems 100 may use multiple rule sets to obtain phonemic data (i.e., phonemes) for different parts of a given input text/character string 102 (e.g. individual words).
  • phonemic data i.e., phonemes
  • There may be rule sets for matching (i) suffixes, which are one or more graphemes obtained from the end of a character string, (ii) prefixes, which are one or more graphemes selected from the beginning of a character string, and (iii) infixes, which are one or more graphemes selected from somewhere in the middle of the subject text string, between the beginning and the end.
  • Suffix and prefix rule sets are called “Affix” rule sets, since they match grapheme portions (i.e., strings) obtained starting from either the beginning or end of a word.
  • rule set 103 - a corresponds to a prefix rule set
  • rule set 103 - b corresponds to an infix rule set
  • rule set 103 - c corresponds to a suffix rule set, for example.
  • the example suffix rules in Table 2 map a respective suffix-like (ending) grapheme portion to corresponding phonemic data or phoneme portion (i.e., one or more phonemes). For example, Rule 9 is used to convert an ending text string (i.e., the suffix grapheme string) “ful” to the phoneme string “fL”.
  • the suffix rules shown in Table 2 are given for example only.
  • a full suffix rule set may contain many more entries than those shown in Table 2. While not illustrated in a table, rules in a prefix rule set are similar in nature to the rules in the suffix rule set above, but match prefix grapheme portions of character strings to prefix phonemic data. Likewise, an infix rule set contains rules for matching infix grapheme portions, obtained from the middle of text strings, to phonemic data as well.
  • rule sets themselves may be generated by an analysis of dictionary entries containing a grapheme string and corresponding phoneme strings.
  • a rule set generation process is described as a separate invention in co-pending U.S. Pat. application Ser. No. (Unknown) filed Oct. 26, 1998, entitled “Automatic Grapheme-to-Phoneme Rule-Set Generation”, which is assigned to the assignee of this invention and is hereby incorporated by reference in its entirety.
  • a dictionary having many entries which has not yet been reduced by the teachings of this invention, is used for rule set generation in the referenced application. After the rule sets have been generated from an analysis of the dictionary, the dictionary may then be reduced by phase one and/or phase two of the present invention.
  • FIG. 2 illustrates the two phases used in the present invention to reduce the size of a dictionary 104 in a text-to-speech synthesis system 100 .
  • Phase one includes step 150 of the reduction process shown in FIG. 2, and may be performed independently of phase two which is represented by step 151 . Accordingly, the reduction process of the invention may begin at either of the “Begin Reduction” indicators 154 or 155 in FIG. 2 .
  • Phase one (Step 150 ) of the invention is based on the observation that an unreduced dictionary 104 may be reduced in size by eliminating (i.e., deleting or removing) any entries in the dictionary 104 that can be fully matched by the rules in rule sets 103 a-c in conjunction with rule set processing.
  • entries in the dictionary 104 that occur in input text 102 and that may be matched entirely by rules, need not remain in the dictionary 104 .
  • phase one (Step 150 ) determines for each entry in the dictionary 104 , if the entry can be fully matched (i.e.
  • phase one of the dictionary reduction process marks for elimination any entries in the dictionary 104 that can be properly matched or translated to phonemes by the rule set 103 .
  • phase two (Step 151 ) is typically performed next. However, processing may alternatively bypass phase two (Step 151 ) by following optional processing path 153 to step 152 , where the reduced dictionary 104 - a is created.
  • Phase two is based on the observation that some entries in the dictionary 104 , called root word entries, may provide phonemic data for the text-to-speech translation process of longer words/text strings. As such, these root word entries should not be removed from the dictionary 104 to reduce its size, since the synthesis of longer words in text 102 that contain the root words (i.e., are dependent on these root word entries) can be performed using the root word entries. Furthermore, if longer word entries in dictionary 104 may be translated to phonemes using root word entries in conjunction with rule processing, then the longer word entries can be removed from the dictionary 104 to even further reduce its size.
  • Step 151 thus determines if a root word entry in the dictionary 104 can be used to support the text-to-speech synthesis of other dictionary entries. If so, then that root word entry is indicated or marked to be saved in dictionary 104 . Step 151 also determines, based on that root word entry, if longer word entries (i) have not been previously indicated to be saved in the dictionary 104 , and (ii) can be translated via phonemes provided by one or more root word entries and rule processing (i.e., the longer word entries contain the root word and some other characters). If these two conditions are met, then the longer word entry is indicated to be deleted from the dictionary 104 .
  • phase one may be followed by phase two (Step 151 ).
  • phase two can indicate a word to be saved that was previously indicated to be deleted during phase one processing. That is, if phase one determines a word (i.e., subject character string) can be matched by rules alone and thus indicates the corresponding dictionary entry is not needed and should be deleted, phase two may subsequently reverse this decision and indicate that the dictionary entry containing the subject word/character string, which is determined to be a root word of other longer words, should be saved.
  • a word i.e., subject character string
  • step 152 is performed.
  • Step 152 creates a reduced dictionary 104 - a based on the entries in dictionary 104 that have been indicated to be saved and/or deleted by phase one and/or phase two processing.
  • Step 152 may be performed in a variety of ways, with the objective of creating reduced dictionary 104 - a which is smaller in size (i.e., memory and storage requirements) than initial dictionary 104 .
  • Step 152 aggregates entries from the original unreduced dictionary 104 that have been indicated to be saved, and eliminates entries indicated to be deleted.
  • FIG. 3 illustrates the processing steps for a preferred embodiment of phase one (Step 150 in FIG. 2 ).
  • the processing of FIG. 3 reduces the number of entries in dictionary 104 to produce reduced dictionary 104 - a .
  • word linked list 207 is created by step 200 .
  • the word linked list 207 is a series of data structures 208 that each contain a single entry from dictionary file 104 .
  • Each data structure 208 includes ( a ) an indication of the respective entry grapheme string 208 - a , ( b ) an indication of the corresponding phoneme string 208 - b , ( c ) a delete flag 208 - c that may be set or un-set as needed, and ( d ) a save flag 208 - d that indicates root words that must be saved.
  • the delete flag 208 - c and the save flag 208 - d for each data structure 208 are initially set to false for each word entry.
  • the first data structure 208 in word linked list 207 corresponds to the first entry from dictionary 104
  • the second data structure 208 in word linked list 207 corresponds to the second dictionary entry, and so forth.
  • each entry in dictionary file 104 is read into memory and stored in the word linked list 207 as a separate data structure 208 .
  • Steps 201 through 206 are then performed for each data structure 208 in the word linked list 207 .
  • step 201 attempts to match any one of the affix rules from affix rule sets 103 - a and 103 - c to the grapheme string 208 - a of the subject data structure 208 .
  • step 201 attempts to match suffix rules to the end, and prefix rules to the beginning of grapheme string 208 - a . If any affix rule matches, processing skips to the next word linked list data structure 208 to obtain the next grapheme string 208 - a .
  • any dictionary 104 entry words i.e.
  • step 202 then uses rules in infix rule file 103 - b to generate phonemes based on an analysis of the subject grapheme string 208 - a . That is, step 202 takes the grapheme string (i.e. the dictionary entry character string/word) for the subject data structure 208 currently being processed by steps 201 through 206 and attempts to parse the grapheme string 208 - a using only grapheme-to-phoneme rules from infix rule set 103 - b .
  • the grapheme string i.e. the dictionary entry character string/word
  • Step 202 This parsing process (Step 202 ) ultimately creates a rule-based phoneme string, just as if the grapheme string were input text being translated for text-to-speech synthesis using infix rules.
  • rule processing is described in detail in the co-pending U.S. Patent application, entitled “Computer Method and Apparatus for Translating Text to Sound.”
  • step 202 take the first entry in the dictionary example from Table 1. Assume that no affix rules matched the “aardvark” grapheme string in step 201 .
  • step 202 “aardvark” would be parsed by infix rules in the infix rule set 103 - b to produce an infix rule-based phoneme string for this word, such as “ardvark”.
  • This resultant rule-based phoneme string may or may not be equivalent to the corresponding phoneme string 208 - b in the current data structure 208 /dictionary 104 entry.
  • step 203 normalizes the stress notation marks in the generated rule-based phoneme string.
  • the exact normalization mechanism depends on the characteristics and structure of the rule sets and the dictionary; in the preferred embodiment, the stress mark for a syllable always precedes a vowel phoneme in the syllable, and the rules may place the stress marks further to the left; thus, the preferred embodiment normalizes stress marks by shifting them to the right until they reach a vowel phoneme. For example, if the rule-based phoneme string for “abase” were “x'bes”, the ′′′′′ stress mark would be shifted to the right by one character resulting in the phoneme string “xb'es”. Stress normalization corrects for different, but equivalent placement of the stress mark relative to the syllable boundaries of the word which can occur due to different dialects of a language.
  • step 204 compares the normalized rule-based phoneme string (originally in step 202 ) with the phoneme string portion 208 - b in the subject data structure 208 for the current dictionary entry in the word linked list 207 .
  • the comparison is performed to determine if the rule-based phoneme string produced from the rule processing of step 202 matches the phoneme string portion 208 - b of the current data structure 208 dictionary entry.
  • a “match” or “no match” decision is performed in step 205 . If the two phoneme strings do not match, then the rule-based phoneme string from step 202 is different than the actual phoneme string 208 - b for the subject data structure 208 entry obtained from the dictionary 104 .
  • steps 204 and 205 determine if the rule generated phoneme string for the subject data structure 208 and its corresponding phoneme string 208 - b from the corresponding dictionary entry are the same or not. If they are not the same, then infix rules alone cannot be used to generate a correct phoneme string for this dictionary entry, and the entry should remain in the dictionary.
  • step 205 determines that the rule-based phoneme string and the actual phoneme string 208 - b for the current data structure 208 (i.e., the phoneme string obtained from the corresponding dictionary entry) are the same (i.e., they match each other), then step 206 sets the delete flag 208 - c in the data structure 208 .
  • the entry need not remain in the dictionary 104 , since rule-based processing alone can generate rule-based phonemic data identical or equivalent to that found in the phoneme string portion of the entry in dictionary 104 . That is, since the subject grapheme can be correctly converted to phonemes by infix rules, there is no need to maintain the respective entry in the dictionary 104 .
  • step 205 and/or 206 After step 205 and/or 206 are complete, processing returns to obtain the next data structure 208 for the next entry in word linked list 207 .
  • phase one is complete.
  • Certain data structures 208 in word linked list 207 will have their delete flags 208 - c set, indicating that the corresponding entries are to be deleted from dictionary 104 .
  • processing proceeds to step 152 in FIG. 2 via path 153 , in order to process the word linked list 207 into a reduced dictionary 104 - a .
  • Step 152 selects those entries not indicated to be deleted for storage in reduced dictionary 104 - a.
  • phase two is performed after phase one, after all entries have been processed by steps 201 through 206 in FIG. 3, processing proceeds to step 300 in FIG. 4 to begin the steps of phase two. If phase two is performed without first performing phase one ( 150 of FIG. 3) of the dictionary reduction process, processing begins in phase two by creating the same word link list 207 containing the same entries in data structures 208 as described above with respect to step 200 of phase one.
  • phase two consists of two nested loops of processing, which are illustrated by the dotted lines labeled 151 and 305 and titled “for each word” and “for each affix”, respectively in FIG. 4 .
  • the outer loop 151 of phase two processing begins by selecting the first data structure 208 from word linked list 207 , and proceeds to step 300 .
  • Each data structure 208 in word linked list 207 is processed by steps 300 through 303 , and the data structure 208 that is currently being processed is called the root word entry. After each root word entry is selected, steps 300 through 303 are then performed for this root word entry for every affix in affix table 304 .
  • Affix table 304 is a data structure, such as a table or linked list, which has entries that each hold a single grapheme string portion and phonemic data portion for a single respective affix rule from the affix (i.e., prefix and suffix) rule sets 103 - a and 103 - c .
  • the word linked list 207 has dictionary entry data structures 208 each containing a grapheme 208 - a and phoneme 208 - b pair
  • each affix table entry corresponds to an affix rule and holds the rule's grapheme string and phoneme string portions.
  • the affix table 304 may be created before phase two processing has started, or step 300 may create the affix table 304 before processing any data structures 208 from word linked list 207 .
  • affix table 304 may appear just as the rule set in Table 2, except that the affix table 304 contains an affix entry for all rules in both the suffix and prefix rule sets 103 - a and 103 - c (i.e., the affix rule sets).
  • the affix table 304 is created to provide access to affix rule information in computer memory in order to increase the speed of phase two processing.
  • step 300 may directly access affix rule sets 103 - a and 103 - c instead of the affix table 304 , with the same objective.
  • the affix entry that is processed at any point in time is referred to herein as the current affix entry.
  • step 300 in FIG. 4 creates combinations of the grapheme and phoneme strings of a root word entry data structure 208 from the word linked list 207 (i.e., the dictionary) with respective grapheme and phoneme portions of affix entries (i.e., rules) from the affix table 304 (i.e., the affix rule sets).
  • word linked list 207 i.e., the dictionary
  • affix entries i.e., rules
  • step 300 appends the grapheme portion from the current affix entry to the respective end or beginning of the grapheme string 208 - a of the current root word entry being processed to create a grapheme combination. If the current affix entry is a prefix rule, the grapheme portion for this prefix rule is appended to the beginning of the root word entry's grapheme string 208 - a . If the current affix entry is a suffix rule, the grapheme portion for this suffix rule is appended to the end of the root word entry's grapheme string 208 - a .
  • Step 300 also appends the phoneme portion from the current affix entry to the end or beginning of the phoneme string portion 208 - b of the current root word entry data structure 208 being processed, to create a phoneme combination. If the current affix entry is a prefix rule, the phoneme portion for this prefix rule is appended to the beginning of the root word entry's phoneme string 208 - b . If the current affix entry is a suffix rule, the phoneme portion for this suffix rule is appended to the end of the root word entry's phoneme string 208 - b . In this manner, step 300 creates a grapheme combination and phoneme combination pair.
  • step 300 suppose the current root word entry data structure 208 corresponds to Dictionary Entry 8 from Table 1, which has “long” as its grapheme string 208 - a and “l'cG” as its phoneme string 208 - b . Also suppose the current affix entry from affix table 304 corresponds to Suffix Rule 2 in Table 2, which has “-ing” as a grapheme portion and “x
  • Step 300 also combines the dictionary entry's phoneme string 208 - b “l'cG” with the phoneme portion of the affix entry (i.e. the suffix rule 2 ) “x
  • the grapheme combination and phoneme combination pair thus appears as “longing l'cG
  • Step 301 then compares this grapheme combination and phoneme combination pair with the grapheme string 208 - a and phoneme string 208 - b pair in every other dictionary entry stored in each data structure 208 in word linked list 207 .
  • Step 302 determines if any of the comparisons match each other. If steps 301 and 302 determine that another dictionary entry exists in word linked list 207 that has the same grapheme string 208 - a and phoneme string 208 - b , this other dictionary entry's data structure 208 is called a matching word entry. That is, steps 301 and 302 determine if the grapheme combination and phoneme combination pair created in step 300 exists as a dictionary entry elsewhere in the dictionary 104 .
  • step 302 If a match occurs in step 302 , it has been determined that the combination of graphemes and phonemes from a root word along with graphemes and phonemes from an affix rule can produce the same grapheme and phoneme combination as another matching entry in the dictionary 104 . Accordingly, step 303 indicates the current data structure 208 for that root word entry to be saved by setting the save flag 208 - d to true. Step 303 then sets the delete flag 208 - c in the matching word entry to true. That is, phase two can determine the a root word entry previously indicated to be deleted by the delete flag 208 - c should actually be saved in the dictionary 104 by marking the save flag 208 - d to true.
  • step 303 sets the delete flag 208 - c to true for the matching word entry (i.e., data structure 208 that matched the grapheme combination and phoneme combination pair) to indicate that the matching word entry is to be deleted.
  • processing returns to step 300 where the next entry in affix table 304 is applied to the current root word data structure 208 via steps 300 through 303 .
  • the next data structure 208 from word linked list 207 is selected as the current root word entry.
  • the processing of phase two is complete. Processing then proceeds to step 152 in FIG. 2 (described above) in order to create the reduced dictionary 104 - a from the word linked list 207 .
  • Any data structure 208 in word linked list 207 that is indicated as having a save flag 208 - d marked true, or a delete flag 208 - c marked false is saved in the reduced dictionary 104 - a .
  • a save flag 208 - d marked true overrides a delete flag 208 - c marked true. Therefore, any word entry data structures 208 having a save flag 208 - d equal to true will be saved in the reduced dictionary, regardless of what delete flag 208 - c indicates. In this manner, phase two considerably reduces the size of dictionary 104 .
  • step 202 in phase one only uses infix rule set 103 - b to generate the rule-based phoneme string from the grapheme string 208 - a of the current data structure 208 /dictionary entry.
  • infix rule set 103 - b contains a set of rules that match individual graphemes (i.e. letters) to individual phonemes, for the entire alphabet of the language. That is, in infix rule set 103 - b , there are separate rules for “a”, “b”, “c”, and so forth, which match each of these letters to a corresponding phoneme.
  • step 202 is certain to always be able to produce at least one complete rule-based phoneme string from the subject data structure grapheme string 208 - a , even if step 202 must match graphemes to phonemes letter by letter.
  • step 202 can use prefix, infix, and suffix rule sets for rule processing to generate a rule-based phoneme string.
  • phase one and/or phase two may be accomplished while still obtaining the same beneficial result of the invention.
  • phase two (FIG. 4 )
  • the two processing loops could be reversed.
  • processing could be performed for each affix rule, and then for each word entry with that affix rule.
  • the next affix rule would be selected and processing would repeat beginning again with the first word entry.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

A computerized method and apparatus for reducing the size of a dictionary used in a text-to-speech synthesis system are provided. In an initial phase, the method and apparatus determine if entries in the dictionary, each containing a grapheme string and a corresponding phoneme string, can be fully matched by using at least one rule set used to synthesize words to phonemic data. If the entry can be fully matched using rule processing alone, the entry is indicated to be deleted from the dictionary. In a second phase, the method and apparatus determine if the entry, considered as a root word entry, is required in the dictionary in order to support phoneme synthesis of other entries containing the root word entry, and if so, the root word entry is indicated to be saved in the dictionary. If the other entries containing the root word entry can have correct phonemic data generated from a combination of the root word entries phonemic data and phonemes generated from rule set processing, then the other entries are indicated to be deleted from the dictionary. After all words have been processed by phase one and/or phase two, the entries indicated to be saved are aggregated to form a reduced dictionary.

Description

BACKGROUND OF THE INVENTION
Generally speaking, a “speech synthesizer” is a computer device or system for generating audible speech from written text. That is, a written form of a string or sequence of characters (e.g., a sentence) is provided as input, and the speech synthesizer generates the spoken equivalent or audible characterization of the input. The generated speech output is not merely a literal reading of each input character, but a language dependent, in-context verbalization of the input. If the input was the phone number (508) 691-1234 given in response to a prior question of “What is your phone number?”, the speech synthesizer does not produce the reading “parenthesis, five hundred eight, close parenthesis, six hundred ninety-one . . . ” Instead, the speech synthesizer recognizes the context and supporting punctuation and produces the spoken equivalent “five (pause) zero (pause) eight (pause) six . . . ” just as an English-speaking person normally pronounces a phone number.
Historically the first speech synthesizers were formed of a dictionary, engine and digital vocalizer. The dictionary served as a look-up table. That is, the dictionary cross referenced the text or visual form of a character string (e.g., word or other unit) and the phonetic pronunciation of the character string/word. In linguistic terms the visual form of a character string unit (e.g., word) is called a “grapheme” and the corresponding phonetic pronunciation is termed a “phoneme”. The phonetic pronunciation or phoneme of character string units is indicated by symbols from a predetermined set of phonetic symbols.
The engine is the working or processing member that searches the dictionary for a character string unit (or combination thereof) matching the input text. In basic terms, the engine performs pattern matching between the sequence of characters in the input text and the sequence of characters in “words” (character string units) listed in the dictionary. Upon finding a match, the engine obtains from the dictionary entry (or combination of entries) of the matching word (or combination of words), the corresponding phoneme or combination of phonemes. To that end, the purpose of the engine is thought of as translating a grapheme (input text) to a corresponding phoneme (the corresponding symbols indicating pronunciation of the input text).
Typically the engine employs a binary search through the dictionary for the input text. The dictionary is loaded into the computer processor physical memory space (RAM) along with the speech synthesizer program. The memory footprint, i.e., the physical memory space in RAM needed while running the speech synthesizer program, thus must be large enough to hold the dictionary. Where the dictionary portion of today's speech synthesizers continue to grow in size, the memory footprint is problematic due to the limited available memory (RAM and ROM) in some/most applications.
The digital vocalizer receives the phoneme data generated by the engine. Based on the phoneme data together with timing and stress data, the digital vocalizer generates sound signals for “reading” or “speaking” the input text. Typically, the digital vocalizer employs a sound and speaker system for producing the audible characterization of the input text.
To improve on memory requirements of speech synthesizers, another design was developed. In that design, the dictionary is replaced by a rule set. Alternatively, the rule set is used in combination with the dictionary instead of completely substituting therefor. At any rate, the rule set is a group of statements in the form
IF (condition)-then-(phonemic result)
Each such statement determines the phoneme for a grapheme that matches the IF condition. Examples of rule-based speech synthesizers are DECTALK by Digital Equipment Corporation of Maynard, Mass. and TrueVoice by Centigram Communications of San Jose, Calif. Though the use of rule sets reduces the number of entries required in a dictionary for a speech synthesizer system, the dictionaries remain relatively large in size (i.e., number of entries) compared to other parts of the system requiring memory. This is problematic because dictionaries must be completely stored in memory during the speech synthesis process to ensure fast and efficient look-up of entries if needed.
These and other problems exist in speech synthesizer technology. New solutions have been attempted but with little success. As a result, highly accurate and/or memory space efficient speech synthesizers are yet to come.
SUMMARY OF THE INVENTION
Dictionaries used by text-to-speech synthesis systems may grow to become quite large. Dictionary size depends on how many words or word portions in a particular language are determined to be too complex, too difficult or too time consuming to translate into phonemes by rule set processing alone. Such words or word portions are candidates to be included as entries in the dictionary. However, certain problems are encountered when large dictionaries are used in text-to-speech synthesis systems as mentioned above.
The invention recognizes the problems with prior art text-to-speech synthesis systems that use dictionaries and provides a method and apparatus to reduce the overall size of the dictionaries used in such systems. Specifically, the invention uses a two phase dictionary reduction process to eliminate entries that are not required in the dictionary. In phase one, any entries in the dictionary with respective phonemes that can be fully generated by rules in a rule set are marked or indicated to be deleted from the dictionary. In phase two, any entries in the dictionary, called root word entries, that can provide phonemes for the text-to-speech translation process of larger (longer) entries are marked or indicated to be saved in the dictionary, and the entries of longer character strings that can be translated using the shorter root word entries in conjunction with rules are indicated to be deleted from the dictionary. After phase one and/or phase two are complete, the invention aggregates the entries marked to be saved or removes the entries marked to be deleted and the resulting set of entries is stored as the reduced dictionary.
Phase one or phase two of the invention each may be performed independently, followed by the aggregation step. Alternatively, phase one may be followed by phase two and then by the aggregation process.
In order for embodiments of phase one to determine if the phoneme of an entry in the dictionary can be fully generated (and hence the dictionary entry can be fully matched) by using the rule set, the invention method and apparatus generate a rule-based phoneme string for the grapheme string of the subject entry and then determine if the rule-based phoneme string matches the corresponding phoneme string of the entry. If there is a match, the subject entry is indicated to be deleted from the dictionary, thus reducing overall dictionary size. Since rules alone can produce the required phoneme string for the subject entry, the invention recognizes that there is no need for the entry to remain in the dictionary.
Embodiments of phase one may also check if the grapheme string of a dictionary entry is a homograph. If so, the preferred embodiment skips to the next entry in the dictionary for processing. A homograph is a word that can be pronounced two different ways but which has one spelling, such as “abstract”, “wind”, and “record”. Due to multiple pronunciations, homograph dictionary entries are skipped since they may have more than one associated phoneme string. During text-to-speech processing, the correct phoneme string is selected from a homograph dictionary entry based on the context of surrounding language in the text being translated.
Embodiments of phase two determine if dictionary entries, referred to as root word entries, are required in the dictionary. This is accomplished by the invention combining grapheme and phoneme strings of the root word entry from the dictionary with respective grapheme and phoneme portions of an affix rule of an affix rule set of the speech syntheses system. This step of combining forms a grapheme combination and phoneme combination pair. Phase two then determines if the grapheme combination and phoneme combination pair exists as another matching entry in the dictionary, and if so, indicates the root word entry to be saved in the dictionary. The matching entry is thus marked for removal/deletion. Thus, phase two saves root words in the dictionary that can be used to assist in the translation of another longer word (the matching entry) in conjunction with rule-based processing, and removes the matching entries from the dictionary which can be correctly translated with a combination of rule processing and root word phonemes.
To create the grapheme combination and phoneme combination pair, embodiments of phase two select and process each root word entry in the dictionary. Specifically for each root word entry, the invention combines the grapheme string of the root word entry with the grapheme portion of the affix rule to form a grapheme combination, and combines the phoneme string of the root word entry with the phoneme portion of the affix rule to form a phoneme combination. Then phase two determines if the grapheme combination exists as a matching grapheme string in an entry in the dictionary. If so, the invention obtains the corresponding phoneme string as a matching phoneme string for the matching entry. Then, phase two determines if the phoneme combination matches the matching phoneme string, and if so, indicates the root word entry to be saved in the dictionary. Thus, the root words that are saved in the dictionary are root words that can be used in the translation of the other matching entries. Phase two also determines if the matching entry has been indicated to be saved in the dictionary. If not, the invention indicates the matching entry to be deleted from the dictionary. As such, phase two reduces the dictionary size by determining which entries rely on phonemes of root words, and saves the root words and deletes entries that can be matched by the root words and rule processing.
By using either phase one or phase two alone, or phase one followed by phase two, the invention reduces the number of entries in a dictionary. To that end, the invention computer method and apparatus forms a reduced (i.e., smaller in size) dictionary. The reduced dictionary is adaptable to text-to-speech synthesis applications requiring minimal storage space, entry search time, and dictionary load time.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout different views. The drawings are not meant to limit the invention to particular mechanisms for carrying out the invention in practice, but rather, are illustrative of certain ways of performing the invention. Others will be readily apparent to those skilled in the art.
FIG. 1 schematically illustrates the operation of a text-to-speech synthesis system using rule sets and a dictionary to translate words in text to electronically generated speech.
FIG. 2 is a flow diagram illustrating the two phases of the dictionary reduction process of the invention.
FIG. 3 is a flow chart illustrating the steps involved in phase one of the dictionary reduction process of FIG. 2 according to the invention.
FIG. 4 is a flow chart illustrating the steps involved in phase two of the dictionary reduction process of FIG. 2 according to the invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Generally, the present invention provides a method and apparatus for reducing the size of a dictionary used in a text-to-speech synthesis system. FIG. 1 illustrates the general operation of a typical computerized text-to-speech synthesis system 100 that uses a dictionary 104 that can be reduced in size by this invention. In operation, the text-to-speech synthesizer 101 accepts written text 102 containing words, phrases, names, symbols and so forth as input. Speech synthesizer 101 then employs rule sets 103 a through 103 c in conjunction with dictionary 104 to translate the input text 102 into audible electronically generated speech 107. The generated speech is output through a speaker device 106 for example. The present invention is a method and apparatus for eliminating unnecessary entries in dictionary 104 to reduce its overall size. A dictionary reduced in size by this invention requires less storage space on disk and in memory when used during the text-to-speech translation process performed by text-to-speech synthesizer 101. Also, since there are less entries in dictionary 104 after the reduction process of the present invention, the processing time required to load and to search the dictionary 104 may be reduced as well.
In order to better understand the details of the dictionary reduction processes performed by this invention, a brief explanation of dictionary entries and rule set structure and processing will be presented next. Table 1 below illustrates a small example of entries from a dictionary, such as those that might be found in dictionary 104. The entries in Table 1 are examples and are not limitations on the present invention or speech synthesis system 100.
TABLE 1
EXAMPLE PORTION OF DICTIONARY
Grapheme String Phoneme String
Dictionary Entry
1 aardvark 'ardvark
Dictionary Entry 2 aaron '@r|n
Dictionary Entry 3 aback xb'@k
Dictionary Entry 4 abacus '@bxkxs
Dictionary Entry 5 abalone '@bxl'oni
Dictionary Entry 6 abandon xb'@ndxn
Dictionary Entry 7 abase xb'es
Dictionary Entry 8 long l'cG
Dictionary Entry 9 longing l'cG|G
Dictionary Entry 10 longingly l'cG|Gli
In Table 1, each dictionary entry 1 through 10 contains (i) a grapheme (i.e., character) string portion (Column 1) comprising one or more graphemes, and (ii) a phoneme string portion (Column 2) comprising one or more phonemes. Generally, a grapheme string corresponds to a word in the dictionary, but the term “word” as used herein does not necessarily mean the formal linguistic unit in the language of the dictionary. Rather, some words in the dictionary can be portions or segments of longer more formally, commonly known words. A single grapheme is any character or symbol in the entire alphabet of the language of the dictionary, such as English. A grapheme may be a letter “A” through “Z” or “a” through “z”, numbers such as ‘0’ through ‘9’, or another character or symbol such as “?”, “!”, “@”, and so forth. A grapheme string is one or more graphemes appended together.
A phoneme is one or more character symbols used to represent a single phonetic utterance or sound that may be made when speaking the language of the dictionary. The entire set of phonemes for a language represents all possible utterances that may be combined to pronounce words in that language. A phoneme string is a series of phonemes appended together which represent the phonetic pronunciation of one or more corresponding graphemes (i.e., a grapheme string). As such, a correctly assembled phoneme string represents the phonetic pronunciation for the corresponding grapheme string in a given dictionary entry.
For example, in Table 1, dictionary entry number nine has as a subject grapheme string, the word “longing”, and indicates a corresponding phoneme string of “l'cG G”. There are sub-strings (i.e., respective graphemes) in the word “longing” that correspond to each phoneme in this phoneme string.
In Table 1, example dictionary entries 1 through 10 resemble dictionary entries of words such as those found in a normal English dictionary. A dictionary that can be reduced by this invention may however contain other information as well, such as word definitions, but this invention is not concerned with this other information. Dictionaries that can be reduced in size by the invention can be created specifically for text-to-speech synthesis systems, or alternatively, the invention may reduce off-the-shelf commercially available dictionaries, such as those supplied on CD-ROM's for other types of application programs besides speech synthesis. The dictionary to be reduced can be for any language, so long as each entry contains a grapheme string and a corresponding phoneme string.
A dictionary not specifically designed for use by a text-to-speech synthesis system is usually very large in size, and contains entries for most words in a language. Dictionaries in the prior art that are designed specifically for text-to-speech synthesis systems are usually larger in size than what is actually needed to perform the text-to-speech synthesis process. The invention is advantageous since it reduces both these and various other types of dictionaries.
As noted previously, large dictionaries are difficult to store entirely in memory during text-to-speech processing, since they can be many megabytes in size. Also, performing text-to-speech translations by looking up each word in a large dictionary is slow compared to rule-based translation processing, which will be discussed shortly. Since text-to-speech synthesis is typically a real-time process, extremely fast processors and large amounts of memory would be needed to perform translations using a dictionary alone.
Accordingly, rule sets (such as 103 a, b, and c in FIG. 1) are frequently used in text-to-speech synthesis systems 100 to quickly translate graphemes of words into phonemes which may then be converted to audible sounds 107. Grapheme-to-phoneme rules, contained in rule sets 103, provide a concise way to analyze a character string in the language and produce the required phonemic data for sound synthesis. Furthermore, rules in a rule set 103 may be generic in that they may convert character strings that are generally not considered to be words worthy of existing in the dictionary 104.
Each rule set 103 a through 103 c contains a number of rules in the form:
IF (condition) -then- (phonemic result).
Each rule determines the proper corresponding phoneme(s) for a grapheme string that matches the IF condition. The previously noted rule-based text-to-speech synthesizer called DECtalk from Digital Equipment Corporation of Maynard, Mass. uses rule sets 103 in combination with a dictionary 104 to translate text to speech.
During rule processing, each rule of the rule set 103 is considered with respect to the input text 102. Rule-based processing typically proceeds one word or unit of text at a time from the beginning to the end of the input text. Each word or input text unit is then processed by selecting a number of graphemes (i.e. characters) from either the beginning, middle, or end of the input text 102. The graphemes selected depend upon the rule set being used. If a rule condition (“IF-Condition,” part of the rule) matches any portion of the input text 102, then the text-to-speech synthesizer 101 determines that the rule applies. As such, the synthesizer stores the corresponding phoneme data (i.e., the phonemic result) from the rule in a working buffer. The synthesizer 101 similarly processes each succeeding rule in the rule set 103 against the remaining input text 102 (i.e., remainder parts thereof) for which phoneme data is needed. After processing all of the text 102 via rules in the rule sets 103, the working buffer holds the phoneme data corresponding to the text which may then be converted to audible speech. For more complete details on the translation of text to sound via rule processing, see co-pending Pat. application Ser. No. 09/071,441, filed May 1, 1998, entitled “Computer Method and Apparatus for Translating Text To Sound”, which is assigned to the assignee of the present application and which is incorporated herein by reference in its entirety.
Table 2 below illustrates ten example rules from a specific type of rule set, called a suffix rule set (e.g. 103 c in FIG. 1) used for English text strings.
TABLE 2
EXAMPLE PORTION OF A SUFFIX RULE SET
Phonemic Data
Grapheme Portion (Phoneme Portion)
Rule 1 -able xbl
Rule 2 -ing x|G
Rule 3 -less l|s
Rule 4 -ment mxnt
Rule 5 -ness n|s
Rule 6 -ship S|p
Rule 7 -dom dxm
Rule 8 -ers Rz
Rule 9 -fill fL
Rule 10 -ify |fA
Text-to-speech synthesis systems 100, such as that shown in FIG. 1, may use multiple rule sets to obtain phonemic data (i.e., phonemes) for different parts of a given input text/character string 102 (e.g. individual words). There may be rule sets for matching (i) suffixes, which are one or more graphemes obtained from the end of a character string, (ii) prefixes, which are one or more graphemes selected from the beginning of a character string, and (iii) infixes, which are one or more graphemes selected from somewhere in the middle of the subject text string, between the beginning and the end. Suffix and prefix rule sets are called “Affix” rule sets, since they match grapheme portions (i.e., strings) obtained starting from either the beginning or end of a word. In FIG. 1, rule set 103-a corresponds to a prefix rule set, rule set 103-b corresponds to an infix rule set, and rule set 103-c corresponds to a suffix rule set, for example.
The example suffix rules in Table 2 map a respective suffix-like (ending) grapheme portion to corresponding phonemic data or phoneme portion (i.e., one or more phonemes). For example, Rule 9 is used to convert an ending text string (i.e., the suffix grapheme string) “ful” to the phoneme string “fL”. The suffix rules shown in Table 2 are given for example only. A full suffix rule set may contain many more entries than those shown in Table 2. While not illustrated in a table, rules in a prefix rule set are similar in nature to the rules in the suffix rule set above, but match prefix grapheme portions of character strings to prefix phonemic data. Likewise, an infix rule set contains rules for matching infix grapheme portions, obtained from the middle of text strings, to phonemic data as well.
Interestingly, rule sets themselves may be generated by an analysis of dictionary entries containing a grapheme string and corresponding phoneme strings. A rule set generation process is described as a separate invention in co-pending U.S. Pat. application Ser. No. (Unknown) filed Oct. 26, 1998, entitled “Automatic Grapheme-to-Phoneme Rule-Set Generation”, which is assigned to the assignee of this invention and is hereby incorporated by reference in its entirety. Typically, a dictionary having many entries, which has not yet been reduced by the teachings of this invention, is used for rule set generation in the referenced application. After the rule sets have been generated from an analysis of the dictionary, the dictionary may then be reduced by phase one and/or phase two of the present invention.
FIG. 2 illustrates the two phases used in the present invention to reduce the size of a dictionary 104 in a text-to-speech synthesis system 100. Phase one includes step 150 of the reduction process shown in FIG. 2, and may be performed independently of phase two which is represented by step 151. Accordingly, the reduction process of the invention may begin at either of the “Begin Reduction” indicators 154 or 155 in FIG. 2.
Phase one (Step 150) of the invention is based on the observation that an unreduced dictionary 104 may be reduced in size by eliminating (i.e., deleting or removing) any entries in the dictionary 104 that can be fully matched by the rules in rule sets 103 a-c in conjunction with rule set processing. During text-to-speech synthesis processing, entries in the dictionary 104 that occur in input text 102, and that may be matched entirely by rules, need not remain in the dictionary 104. As such, phase one (Step 150) determines for each entry in the dictionary 104, if the entry can be fully matched (i.e. can have corresponding phonemes generated) by using the rules of the rule sets 103 a-c, and if so, marks or indicates those entries to be deleted from the dictionary 104. That is, phase one of the dictionary reduction process marks for elimination any entries in the dictionary 104 that can be properly matched or translated to phonemes by the rule set 103.
After phase one is complete, phase two (Step 151) is typically performed next. However, processing may alternatively bypass phase two (Step 151) by following optional processing path 153 to step 152, where the reduced dictionary 104-a is created.
Phase two (step 151) is based on the observation that some entries in the dictionary 104, called root word entries, may provide phonemic data for the text-to-speech translation process of longer words/text strings. As such, these root word entries should not be removed from the dictionary 104 to reduce its size, since the synthesis of longer words in text 102 that contain the root words (i.e., are dependent on these root word entries) can be performed using the root word entries. Furthermore, if longer word entries in dictionary 104 may be translated to phonemes using root word entries in conjunction with rule processing, then the longer word entries can be removed from the dictionary 104 to even further reduce its size. Step 151 thus determines if a root word entry in the dictionary 104 can be used to support the text-to-speech synthesis of other dictionary entries. If so, then that root word entry is indicated or marked to be saved in dictionary 104. Step 151 also determines, based on that root word entry, if longer word entries (i) have not been previously indicated to be saved in the dictionary 104, and (ii) can be translated via phonemes provided by one or more root word entries and rule processing (i.e., the longer word entries contain the root word and some other characters). If these two conditions are met, then the longer word entry is indicated to be deleted from the dictionary 104.
As noted previously, phase one (Step 150) may be followed by phase two (Step 151). In such cases, phase two can indicate a word to be saved that was previously indicated to be deleted during phase one processing. That is, if phase one determines a word (i.e., subject character string) can be matched by rules alone and thus indicates the corresponding dictionary entry is not needed and should be deleted, phase two may subsequently reverse this decision and indicate that the dictionary entry containing the subject word/character string, which is determined to be a root word of other longer words, should be saved.
After either phase one or phase two or both phase one and two have been completed, step 152 is performed. Step 152 creates a reduced dictionary 104-a based on the entries in dictionary 104 that have been indicated to be saved and/or deleted by phase one and/or phase two processing. Step 152 may be performed in a variety of ways, with the objective of creating reduced dictionary 104-a which is smaller in size (i.e., memory and storage requirements) than initial dictionary 104. Step 152 aggregates entries from the original unreduced dictionary 104 that have been indicated to be saved, and eliminates entries indicated to be deleted.
FIG. 3 illustrates the processing steps for a preferred embodiment of phase one (Step 150 in FIG. 2). The processing of FIG. 3 reduces the number of entries in dictionary 104 to produce reduced dictionary 104-a. To accomplish this, first, word linked list 207 is created by step 200. The word linked list 207 is a series of data structures 208 that each contain a single entry from dictionary file 104. Each data structure 208 includes (a) an indication of the respective entry grapheme string 208-a, (b) an indication of the corresponding phoneme string 208-b, (c) a delete flag 208-c that may be set or un-set as needed, and (d) a save flag 208-d that indicates root words that must be saved. The delete flag 208-c and the save flag 208-d for each data structure 208 are initially set to false for each word entry. The first data structure 208 in word linked list 207 corresponds to the first entry from dictionary 104, the second data structure 208 in word linked list 207 corresponds to the second dictionary entry, and so forth. From the example dictionary entries in Table 1 above, the entry for “aardvark” and its phonemic data “ardvark” is stored as the first data structure 208 in the word linked list 207, followed by another data structure 208 for the dictionary entry for “aaron”, and so forth. In a preferred embodiment, each entry in dictionary file 104 is read into memory and stored in the word linked list 207 as a separate data structure 208.
Steps 201 through 206 are then performed for each data structure 208 in the word linked list 207. Beginning with the first word linked list data structure 208, step 201 attempts to match any one of the affix rules from affix rule sets 103-a and 103-c to the grapheme string 208-a of the subject data structure 208. For instance, step 201 attempts to match suffix rules to the end, and prefix rules to the beginning of grapheme string 208-a. If any affix rule matches, processing skips to the next word linked list data structure 208 to obtain the next grapheme string 208-a. Thus, any dictionary 104 entry words (i.e. grapheme string 208-a of each data structure 208) that can initially be matched to an affix (prefix or suffix) rule are skipped by phase one. The reason for this is that words having a prefix and/or a suffix are typically complex words which include one or more root words. Words containing root words and a prefix and/or suffix that can be matched with affix rules are dealt with during phase two processing.
If no affix rules match any beginning or ending graphemes in grapheme string 208-a, step 202 then uses rules in infix rule file 103-b to generate phonemes based on an analysis of the subject grapheme string 208-a. That is, step 202 takes the grapheme string (i.e. the dictionary entry character string/word) for the subject data structure 208 currently being processed by steps 201 through 206 and attempts to parse the grapheme string 208-a using only grapheme-to-phoneme rules from infix rule set 103-b. This parsing process (Step 202) ultimately creates a rule-based phoneme string, just as if the grapheme string were input text being translated for text-to-speech synthesis using infix rules. As noted previously, rule processing is described in detail in the co-pending U.S. Patent application, entitled “Computer Method and Apparatus for Translating Text to Sound.” As an example of step 202, take the first entry in the dictionary example from Table 1. Assume that no affix rules matched the “aardvark” grapheme string in step 201. In step 202, “aardvark” would be parsed by infix rules in the infix rule set 103-b to produce an infix rule-based phoneme string for this word, such as “ardvark”. This resultant rule-based phoneme string may or may not be equivalent to the corresponding phoneme string 208-b in the current data structure 208/dictionary 104 entry.
After a rule-based phoneme string is generated via infix rule processing in step 202, step 203 normalizes the stress notation marks in the generated rule-based phoneme string. The exact normalization mechanism depends on the characteristics and structure of the rule sets and the dictionary; in the preferred embodiment, the stress mark for a syllable always precedes a vowel phoneme in the syllable, and the rules may place the stress marks further to the left; thus, the preferred embodiment normalizes stress marks by shifting them to the right until they reach a vowel phoneme. For example, if the rule-based phoneme string for “abase” were “x'bes”, the ″″′ stress mark would be shifted to the right by one character resulting in the phoneme string “xb'es”. Stress normalization corrects for different, but equivalent placement of the stress mark relative to the syllable boundaries of the word which can occur due to different dialects of a language.
Next, step 204 compares the normalized rule-based phoneme string (originally in step 202) with the phoneme string portion 208-b in the subject data structure 208 for the current dictionary entry in the word linked list 207. The comparison is performed to determine if the rule-based phoneme string produced from the rule processing of step 202 matches the phoneme string portion 208-b of the current data structure 208 dictionary entry. A “match” or “no match” decision is performed in step 205. If the two phoneme strings do not match, then the rule-based phoneme string from step 202 is different than the actual phoneme string 208-b for the subject data structure 208 entry obtained from the dictionary 104. Accordingly, the entry remains in the dictionary 104 and processing proceeds to step 201 to process the next dictionary entry data structure 208. That is, steps 204 and 205 determine if the rule generated phoneme string for the subject data structure 208 and its corresponding phoneme string 208-b from the corresponding dictionary entry are the same or not. If they are not the same, then infix rules alone cannot be used to generate a correct phoneme string for this dictionary entry, and the entry should remain in the dictionary.
However, if step 205 determines that the rule-based phoneme string and the actual phoneme string 208-b for the current data structure 208 (i.e., the phoneme string obtained from the corresponding dictionary entry) are the same (i.e., they match each other), then step 206 sets the delete flag 208-c in the data structure 208. This indicates that the corresponding dictionary entry is to be deleted. In this instance, the entry need not remain in the dictionary 104, since rule-based processing alone can generate rule-based phonemic data identical or equivalent to that found in the phoneme string portion of the entry in dictionary 104. That is, since the subject grapheme can be correctly converted to phonemes by infix rules, there is no need to maintain the respective entry in the dictionary 104.
After step 205 and/or 206 are complete, processing returns to obtain the next data structure 208 for the next entry in word linked list 207. After all entries have been processed by steps 201 through 206, phase one is complete. Certain data structures 208 in word linked list 207 will have their delete flags 208-c set, indicating that the corresponding entries are to be deleted from dictionary 104. At this point, if only phase one of dictionary reduction is to be performed, processing proceeds to step 152 in FIG. 2 via path 153, in order to process the word linked list 207 into a reduced dictionary 104-a. Step 152, as noted above, selects those entries not indicated to be deleted for storage in reduced dictionary 104-a.
If phase two is performed after phase one, after all entries have been processed by steps 201 through 206 in FIG. 3, processing proceeds to step 300 in FIG. 4 to begin the steps of phase two. If phase two is performed without first performing phase one (150 of FIG. 3) of the dictionary reduction process, processing begins in phase two by creating the same word link list 207 containing the same entries in data structures 208 as described above with respect to step 200 of phase one.
In either case, phase two consists of two nested loops of processing, which are illustrated by the dotted lines labeled 151 and 305 and titled “for each word” and “for each affix”, respectively in FIG. 4. The outer loop 151 of phase two processing begins by selecting the first data structure 208 from word linked list 207, and proceeds to step 300. Each data structure 208 in word linked list 207 is processed by steps 300 through 303, and the data structure 208 that is currently being processed is called the root word entry. After each root word entry is selected, steps 300 through 303 are then performed for this root word entry for every affix in affix table 304.
Affix table 304 is a data structure, such as a table or linked list, which has entries that each hold a single grapheme string portion and phonemic data portion for a single respective affix rule from the affix (i.e., prefix and suffix) rule sets 103-a and 103-c. Just as the word linked list 207 has dictionary entry data structures 208 each containing a grapheme 208-a and phoneme 208-b pair, each affix table entry corresponds to an affix rule and holds the rule's grapheme string and phoneme string portions. The affix table 304 may be created before phase two processing has started, or step 300 may create the affix table 304 before processing any data structures 208 from word linked list 207. As an example, affix table 304 may appear just as the rule set in Table 2, except that the affix table 304 contains an affix entry for all rules in both the suffix and prefix rule sets 103-a and 103-c (i.e., the affix rule sets). The affix table 304 is created to provide access to affix rule information in computer memory in order to increase the speed of phase two processing. In an alternative embodiment, step 300 may directly access affix rule sets 103-a and 103-c instead of the affix table 304, with the same objective. The affix entry that is processed at any point in time is referred to herein as the current affix entry.
The objective of phase two is to determine if the current root word entry 208 can provide proper phonemes 208-b for text-to-speech synthesis of a longer dictionary entry that contains the root word entry's grapheme string 208-a as part of its grapheme string. To perform this processing, step 300 in FIG. 4 creates combinations of the grapheme and phoneme strings of a root word entry data structure 208 from the word linked list 207 (i.e., the dictionary) with respective grapheme and phoneme portions of affix entries (i.e., rules) from the affix table 304 (i.e., the affix rule sets). More specifically, step 300 appends the grapheme portion from the current affix entry to the respective end or beginning of the grapheme string 208-a of the current root word entry being processed to create a grapheme combination. If the current affix entry is a prefix rule, the grapheme portion for this prefix rule is appended to the beginning of the root word entry's grapheme string 208-a. If the current affix entry is a suffix rule, the grapheme portion for this suffix rule is appended to the end of the root word entry's grapheme string 208-a. Step 300 also appends the phoneme portion from the current affix entry to the end or beginning of the phoneme string portion 208-b of the current root word entry data structure 208 being processed, to create a phoneme combination. If the current affix entry is a prefix rule, the phoneme portion for this prefix rule is appended to the beginning of the root word entry's phoneme string 208-b. If the current affix entry is a suffix rule, the phoneme portion for this suffix rule is appended to the end of the root word entry's phoneme string 208-b. In this manner, step 300 creates a grapheme combination and phoneme combination pair.
As an example of step 300, suppose the current root word entry data structure 208 corresponds to Dictionary Entry 8 from Table 1, which has “long” as its grapheme string 208-a and “l'cG” as its phoneme string 208-b. Also suppose the current affix entry from affix table 304 corresponds to Suffix Rule 2 in Table 2, which has “-ing” as a grapheme portion and “x|G” as a phoneme portion. Since the affix entry is for a suffix rule, step 300 combines the dictionary entry grapheme string “long” (i.e., the root word) and the rule grapheme portion “ing” to create the grapheme combination “longing”. Step 300 also combines the dictionary entry's phoneme string 208-b “l'cG” with the phoneme portion of the affix entry (i.e. the suffix rule 2) “x|G”, to create the phoneme combination “l'cG|G”. The grapheme combination and phoneme combination pair thus appears as “longing l'cG|G.”
Step 301 then compares this grapheme combination and phoneme combination pair with the grapheme string 208-a and phoneme string 208-b pair in every other dictionary entry stored in each data structure 208 in word linked list 207. Step 302 then determines if any of the comparisons match each other. If steps 301 and 302 determine that another dictionary entry exists in word linked list 207 that has the same grapheme string 208-a and phoneme string 208-b, this other dictionary entry's data structure 208 is called a matching word entry. That is, steps 301 and 302 determine if the grapheme combination and phoneme combination pair created in step 300 exists as a dictionary entry elsewhere in the dictionary 104.
If a match occurs in step 302, it has been determined that the combination of graphemes and phonemes from a root word along with graphemes and phonemes from an affix rule can produce the same grapheme and phoneme combination as another matching entry in the dictionary 104. Accordingly, step 303 indicates the current data structure 208 for that root word entry to be saved by setting the save flag 208-d to true. Step 303 then sets the delete flag 208-c in the matching word entry to true. That is, phase two can determine the a root word entry previously indicated to be deleted by the delete flag 208-c should actually be saved in the dictionary 104 by marking the save flag 208-d to true. If saved, phase two has, at this point, also determined that the root word entry in the dictionary 104 can be used along with the rules to translate the matching word entry, and thus the matching word entry is not needed in the dictionary 104. Accordingly, step 303 sets the delete flag 208-c to true for the matching word entry (i.e., data structure 208 that matched the grapheme combination and phoneme combination pair) to indicate that the matching word entry is to be deleted.
After steps 302 and/or 303, processing returns to step 300 where the next entry in affix table 304 is applied to the current root word data structure 208 via steps 300 through 303. When no more affix table entries are available, the next data structure 208 from word linked list 207 is selected as the current root word entry. When all dictionary entries stored in word linked list 207 have been processed with all of the affix rules in affix table 304, the processing of phase two is complete. Processing then proceeds to step 152 in FIG. 2 (described above) in order to create the reduced dictionary 104-a from the word linked list 207. Any data structure 208 in word linked list 207 that is indicated as having a save flag 208-d marked true, or a delete flag 208-c marked false is saved in the reduced dictionary 104-a. Thus, a save flag 208-d marked true overrides a delete flag 208-c marked true. Therefore, any word entry data structures 208 having a save flag 208-d equal to true will be saved in the reduced dictionary, regardless of what delete flag 208-c indicates. In this manner, phase two considerably reduces the size of dictionary 104.
In a preferred embodiment of the invention, step 202 in phase one (FIG. 3) only uses infix rule set 103-b to generate the rule-based phoneme string from the grapheme string 208-a of the current data structure 208/dictionary entry. This is because infix rule set 103-b contains a set of rules that match individual graphemes (i.e. letters) to individual phonemes, for the entire alphabet of the language. That is, in infix rule set 103-b, there are separate rules for “a”, “b”, “c”, and so forth, which match each of these letters to a corresponding phoneme. By using infix rule set 103-b, step 202 is certain to always be able to produce at least one complete rule-based phoneme string from the subject data structure grapheme string 208-a, even if step 202 must match graphemes to phonemes letter by letter. In an alternative embodiment, step 202 can use prefix, infix, and suffix rule sets for rule processing to generate a rule-based phoneme string.
While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. For example, rearrangement of certain processing steps in phase one and/or phase two may be accomplished while still obtaining the same beneficial result of the invention. As an example of such a rearrangement, in phase two (FIG. 4), the two processing loops could be reversed. Thus, instead of performing the processing for each word entry, and then for each affix rule on that word entry, processing could be performed for each affix rule, and then for each word entry with that affix rule. When all word entries were processed for a rule, the next affix rule would be selected and processing would repeat beginning again with the first word entry.
Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation, many other equivalents to the specific embodiments of the invention described specifically herein. Such equivalents are intended to be encompassed in the scope of the claims.

Claims (29)

What is claimed is:
1. A method for reducing the size of a dictionary used in a speech synthesis system having a set of rules for determining phonemes from graphemes, the dictionary containing a plurality of entries, each entry comprising a grapheme string and a corresponding, phoneme string, the method comprising the steps of:
determining if a given entry in the dictionary can be fully matched by using rules of the rule set, and if so, indicating the entry to be deleted from the dictionary;
determining if the given entry is required in the dictionary in order to support other entries, and if so, indicating the given entry to be saved; and
aggregating the entries indicated as to be saved, to form a reduced dictionary therefrom;
wherein the given entry comprises a grapheme string and a corresponding phoneme string.
2. A method for reducing the size of a dictionary used in a speech synthesis system having a set of rules for determining phonemes from graphemes, the dictionary containing a plurality of entries, each entry comprising a grapheme string and a corresponding phoneme string, the method comprising the steps of:
for each entry in the dictionary, determining if the entry in the dictionary can be fully matched by using rules of the rule set, and if so, indicating the entry to be deleted from the dictionary; and
creating a reduced dictionary from the entries remaining after omitting any entries indicated as to be deleted;
wherein each entry comprises a grapheme string and a corresponding phoneme string.
3. The method of claim 2, wherein the step of determining if an entry in the dictionary can be fully matched by using rules of the rule set includes the steps of:
generating a rule-based phoneme string for the grapheme string of the entry using rules in the rule set; and
determining if the e-based phoneme string matches the corresponding phoneme string of the entry, and if so, indicating the entry to be deleted from the dictionary.
4. The method of claim 3 wherein the step of determining if an entry in the dictionary can be fully matched, is performed for each entry in the dictionary starting with a first entry.
5. A method for reducing the size of a dictionary used in a speech synthesis system having a set of rules for determining phonemes from graphemes, the dictionary containing a plurality of entries, each entry comprising a grapheme string and a corresponding phoneme string, the method comprising the steps of:
for each entry in the dictionary, determining if the entry in the dictionary can be fully matched by using rules of the rule set, and if so, indicating the entry to be deleted from the dictionary including the steps of:
generating a rule-based phoneme string for the gapheme string of the entry using rules in the rule set; and
determining if the rule-based phoneme string matches the corresponding phoneme string of the entry, and if so, indicating the entry to be deleted from the dictionary;
creating a reduced dictionary from the entries remaining after omitting any entries indicated as to be deleted;
providing an affix rule set for the speech synthesis system, the affix rule set for determining phonemes from beginning and ending graphemes of character strings; and
before generating a rule based phoneme string, checking if any affix rule from the affix rule set matches a portion of the grapheme string of the entry, and if so, skipping to a next entry in the dictionary for processing;
wherein the step of determining if an entry in the dictionary can be fully matched, is performed for each entry in the dictionary starting with a first entry.
6. The method of claim 5 further including the step of checking if the grapheme string of the entry is a homograph, and if so, skipping to a next entry in the dictionary for processing.
7. The method of claim 5, further including the step of deleting entries that have been marked as to be deleted from the dictionary.
8. The method of claim 5, wherein the step of creating a reduced dictionary includes the step of saving in a reduced dictionary, entries that have not been indicated to be deleted.
9. A method for reducing the size of a dictionary used in a speech synthesis system, the dictionary containing a plurality of entries, each entry comprising a grapheme string and a corresponding phoneme string, the method comprising the steps of:
determining if a given entry is required in the dictionary in order to produce the phoneme string of another entry, and if so, indicating the given entry to be saved; and
creating a dictionary containing entries indicated to be saved; and
wherein the speech synthesis system includes an affix rule set containing affix rules for determining phonemes from beginning and ending graphemes of character strings, each affix rule having a grapheme portion and a corresponding phoneme portion; and
the step of determining if the given entry is required in the dictionary includes the steps of:
combining grapheme and phoneme strings of a root word entry in the dictionary with respective grapheme and phoneme portions of an affix rule of the affix rule set to form a grapheme combination and phoneme combination pair; and
determining if the grapheme combination and phoneme combination pair exists as a matching entry in the dictionary, and if so, indicating the root word entry to be saved in the dictionary and indicating the matching entry to be deleted.
10. The method of claim 9, wherein the steps of combining and determining if the grapheme combination and phoneme combination pair exists as a matching entry in the dictionary are performed respectively for the root word entry with each affix rule in the affix rule set.
11. The method of claim 10 wherein the step of determining if an entry is required, is performed for each root entry in the dictionary starting with a first root word entry.
12. The method of claim 11, wherein the step of combining includes the steps of:
combining the grapheme string of the root word entry with the grapheme portion of the affix rule to form the grapheme combination; and
combining the phoneme string of the root word entry with the phoneme portion of the affix rule to form the phoneme combination.
13. The method of claim 12, wherein the step of determining if the grapheme combination and phoneme combination pair exists as a matching entry in the dictionary, includes the steps of:
determining if the grapheme combination exists as a matching grapheme string in an entry in the dictionary, and if so, obtaining the corresponding phoneme string as a matching phoneme string for the entry;
determining if the phoneme combination matches the matching phoneme string, and if so, indicating the root word entry to be saved in the dictionary and indicating the matching entry to be deleted in the dictionary.
14. The method of claim 13, wherein, before the step of determining if the phoneme combination matches the matching phoneme string, normalizing any lexical stress in the phoneme combination and the matching phoneme string.
15. The method of claim 11, further comprising the step of saving in a reduced dictionary the entries that have been indicated to be saved.
16. The method of claim 11, further comprising the step of deleting entries that have been indicated to be deleted from the dictionary.
17. The method of claim 11, wherein the entries in the dictionary are arranged according to length of grapheme string with the shortest grapheme string first.
18. The method of claim 11, wherein the steps of combining and determining if the grapheme combination and phoneme combination pair exists as a matching entry in the dictionary, are performed first with rules from the affix rule set for determining phonemes from beginning graphemes.
19. The method of claim 11, wherein the step of determining if the grapheme combination and phoneme combination pair exists as a matching entry in the dictionary includes the steps of:
determining if the grapheme combination exists as a matching grapheme string in an entry in the dictionary, and if so, obtaining the corresponding phoneme string as a matching phoneme string for the entry;
determining if the phoneme combination matches the matching phoneme string, and if so, indicating the root word entry to be saved in the dictionary and indicating the matching entry to be deleted in the dictionary.
20. The method of claim 19, wherein, before the step of determining if the phoneme combination matches the matching phoneme string is performed, normalizing any lexical stress in the phoneme combination and the matching phoneme string.
21. The method of claim 19, further comprising the step of saving in a reduced dictionary, entries that have been indicated to be saved.
22. A method for reducing the size of a dictionary used in a speech synthesis system having a set of rules for determining phonemes from gaphemes, the dictionary containing a plurality of entries, each entry comprising a grapheme string and a corresponding phoneme string, the method comprising the steps of:
determining if a given entry in the dictionary can be fully matched by using rules of the rule set, and if so, indicating the entry to be deleted from the dictionary including the steps of:
generating a rule-based phoneme string for the grapheme string of the entry using rules in the rule set; and
determining if the rule-based phoneme string matches the corresponding phoneme string of the entry, and if so, indicating the entry to be deleted from the dictionary;
determining if the given entry is required in the dictionary in order to support other entries, and if so, indicating the given entry to be saved; and
aggregating the entries indicated as to be saved, to form a reduced dictionary therefrom.
23. The method of claim 22 wherein the step of determining if the given entry is required in the dictionary includes the steps of:
providing in the speech synthesis system an affix rule set containing affix rules for determining phonemes from beginning and ending graphemes of character strings, each affix rule having a grapheme portion and a corresponding phoneme portion;
combining grapheme and phoneme strings of a root word entry from the dictionary with respective grapheme and phoneme portions of an affix rule of the affix rule set to form a grapheme combination and phoneme combination pair; and
determining if the grapheme combination and phoneme combination pair exists as a matching entry in the dictionary, and if so, indicating the root word entry to be saved in the dictionary, and, indicating the matching entry to be deleted from the dictionary.
24. The method of claim 23, wherein the steps of combining and determining if the grapheme combination and phoneme combination pair exists as a matching entry in the dictionary are performed respectively for the root entry with each affix rule in the affix rule set; and
the step of determining if an entry is required, is performed for each root word entry in the dictionary starting with a first root word entry.
25. The method of claim 23 further including the step of:
before generating a rule based phoneme string, determining if any affix rule from the affix rule set matches a portion of the grapheme string of the entry, and if so, skipping to a next entry in the dictionary for processing.
26. The method of claim 23 further including the step of checking if the grapheme string of the entry is a homograph, and if so, skipping to a next entry in the dictionary for processing.
27. The method of claim 23, wherein the step of combining includes the steps of:
combining the grapheme string of the root word entry with the grapheme portion of the affix rule to form the grapheme combination; and
combining the phoneme string of the root word entry with the phoneme portion of the affix rule to form the phoneme combination.
28. The method of claim 27, wherein the step of determining if the grapheme combination and phoneme combination pair exists as a matching entry in the dictionary includes the steps of:
determining if the grapheme combination exists as a matching grapheme string in an entry in the dictionary, and if so, obtaining the corresponding phoneme string as a matching phoneme string for the entry;
determining if the phoneme combination matches the matching phoneme string, and if so, indicating the root word entry to be saved in the dictionary and indicating the matching entry to be deleted from the dictionary.
29. The method of claim 28, wherein, before the step of determining if the phoneme combination matches the matching phoneme string is performed, normalizing any lexical stress in the phoneme combination and the matching phoneme string.
US09/212,874 1998-12-16 1998-12-16 Computer method and apparatus for text-to-speech synthesizer dictionary reduction Expired - Fee Related US6208968B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US09/212,874 US6208968B1 (en) 1998-12-16 1998-12-16 Computer method and apparatus for text-to-speech synthesizer dictionary reduction
US09/795,070 US6347298B2 (en) 1998-12-16 2001-02-26 Computer apparatus for text-to-speech synthesizer dictionary reduction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/212,874 US6208968B1 (en) 1998-12-16 1998-12-16 Computer method and apparatus for text-to-speech synthesizer dictionary reduction

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US09/795,070 Continuation US6347298B2 (en) 1998-12-16 2001-02-26 Computer apparatus for text-to-speech synthesizer dictionary reduction

Publications (1)

Publication Number Publication Date
US6208968B1 true US6208968B1 (en) 2001-03-27

Family

ID=22792741

Family Applications (2)

Application Number Title Priority Date Filing Date
US09/212,874 Expired - Fee Related US6208968B1 (en) 1998-12-16 1998-12-16 Computer method and apparatus for text-to-speech synthesizer dictionary reduction
US09/795,070 Expired - Fee Related US6347298B2 (en) 1998-12-16 2001-02-26 Computer apparatus for text-to-speech synthesizer dictionary reduction

Family Applications After (1)

Application Number Title Priority Date Filing Date
US09/795,070 Expired - Fee Related US6347298B2 (en) 1998-12-16 2001-02-26 Computer apparatus for text-to-speech synthesizer dictionary reduction

Country Status (1)

Country Link
US (2) US6208968B1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6347298B2 (en) * 1998-12-16 2002-02-12 Compaq Computer Corporation Computer apparatus for text-to-speech synthesizer dictionary reduction
US20020026313A1 (en) * 2000-08-31 2002-02-28 Siemens Aktiengesellschaft Method for speech synthesis
US20020046025A1 (en) * 2000-08-31 2002-04-18 Horst-Udo Hain Grapheme-phoneme conversion
US6456209B1 (en) * 1998-12-01 2002-09-24 Lucent Technologies Inc. Method and apparatus for deriving a plurally parsable data compression dictionary
US20030074196A1 (en) * 2001-01-25 2003-04-17 Hiroki Kamanaka Text-to-speech conversion system
US20120089400A1 (en) * 2010-10-06 2012-04-12 Caroline Gilles Henton Systems and methods for using homophone lexicons in english text-to-speech
WO2012172596A1 (en) * 2011-06-14 2012-12-20 三菱電機株式会社 Pronunciation information generating device, in-vehicle information device, and database generating method
DE102011118059A1 (en) * 2011-11-09 2013-05-16 Elektrobit Automotive Gmbh Technique for outputting an acoustic signal by means of a navigation system
US20140222415A1 (en) * 2013-02-05 2014-08-07 Milan Legat Accuracy of text-to-speech synthesis
US20150012261A1 (en) * 2012-02-16 2015-01-08 Continetal Automotive Gmbh Method for phonetizing a data list and voice-controlled user interface
US20150213330A1 (en) * 2014-01-30 2015-07-30 Abbyy Development Llc Methods and systems for efficient automated symbol recognition
US20160246799A1 (en) * 2015-02-20 2016-08-25 International Business Machines Corporation Policy-based, multi-scheme data reduction for computer memory
US9852728B2 (en) * 2015-06-08 2017-12-26 Nuance Communications, Inc. Process for improving pronunciation of proper nouns foreign to a target language text-to-speech system
CN112487797A (en) * 2020-11-26 2021-03-12 北京有竹居网络技术有限公司 Data generation method and device, readable medium and electronic equipment

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW388826B (en) * 1998-12-21 2000-05-01 Inventec Corp Quickly word-identifying method
US7107215B2 (en) * 2001-04-16 2006-09-12 Sakhr Software Company Determining a compact model to transcribe the arabic language acoustically in a well defined basic phonetic study
JP2003036088A (en) * 2001-07-23 2003-02-07 Canon Inc Dictionary managing apparatus for voice conversion
FI114051B (en) * 2001-11-12 2004-07-30 Nokia Corp Procedure for compressing dictionary data
GB2393369A (en) * 2002-09-20 2004-03-24 Seiko Epson Corp A method of implementing a text to speech (TTS) system and a mobile telephone incorporating such a TTS system
US7146319B2 (en) * 2003-03-31 2006-12-05 Novauris Technologies Ltd. Phonetically based speech recognition system and method
US20050055197A1 (en) * 2003-08-14 2005-03-10 Sviatoslav Karavansky Linguographic method of compiling word dictionaries and lexicons for the memories of electronic speech-recognition devices
TWI250509B (en) * 2004-10-05 2006-03-01 Inventec Corp Speech-synthesizing system and method thereof
GB2428853A (en) * 2005-07-22 2007-02-07 Novauris Technologies Ltd Speech recognition application specific dictionary
US8099281B2 (en) * 2005-06-06 2012-01-17 Nunance Communications, Inc. System and method for word-sense disambiguation by recursive partitioning
US20070283334A1 (en) * 2006-06-02 2007-12-06 International Business Machines Corporation Problem detection facility using symmetrical trace data
US20080282153A1 (en) * 2007-05-09 2008-11-13 Sony Ericsson Mobile Communications Ab Text-content features
KR101300839B1 (en) * 2007-12-18 2013-09-10 삼성전자주식회사 Voice query extension method and system
US8983841B2 (en) * 2008-07-15 2015-03-17 At&T Intellectual Property, I, L.P. Method for enhancing the playback of information in interactive voice response systems
JPWO2010018796A1 (en) * 2008-08-11 2012-01-26 旭化成株式会社 Exception word dictionary creation device, exception word dictionary creation method and program, and speech recognition device and speech recognition method
JP5765874B2 (en) * 2008-10-09 2015-08-19 アルパイン株式会社 Speech synthesis apparatus, speech synthesis method, and speech synthesis program
CN102117614B (en) * 2010-01-05 2013-01-02 索尼爱立信移动通讯有限公司 Personalized text-to-speech synthesis and personalized speech feature extraction
US9646601B1 (en) * 2013-07-26 2017-05-09 Amazon Technologies, Inc. Reduced latency text-to-speech system
US10134388B1 (en) * 2015-12-23 2018-11-20 Amazon Technologies, Inc. Word generation for speech recognition
US10373610B2 (en) * 2017-02-24 2019-08-06 Baidu Usa Llc Systems and methods for automatic unit selection and target decomposition for sequence labelling
US10872598B2 (en) * 2017-02-24 2020-12-22 Baidu Usa Llc Systems and methods for real-time neural text-to-speech
US10896669B2 (en) 2017-05-19 2021-01-19 Baidu Usa Llc Systems and methods for multi-speaker neural text-to-speech
US11017761B2 (en) 2017-10-19 2021-05-25 Baidu Usa Llc Parallel neural text-to-speech
US10872596B2 (en) 2017-10-19 2020-12-22 Baidu Usa Llc Systems and methods for parallel wave generation in end-to-end text-to-speech
US10796686B2 (en) 2017-10-19 2020-10-06 Baidu Usa Llc Systems and methods for neural text-to-speech using convolutional sequence learning

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4775956A (en) * 1984-01-30 1988-10-04 Hitachi, Ltd. Method and system for information storing and retrieval using word stems and derivative pattern codes representing familes of affixes
US4979216A (en) 1989-02-17 1990-12-18 Malsheen Bathsheba J Text to speech synthesis system and method using context dependent vowel allophones
US5157759A (en) 1990-06-28 1992-10-20 At&T Bell Laboratories Written language parser system
US5323316A (en) * 1991-02-01 1994-06-21 Wang Laboratories, Inc. Morphological analyzer
US5384893A (en) 1992-09-23 1995-01-24 Emerson & Stern Associates, Inc. Method and apparatus for speech synthesis based on prosodic analysis
US5490061A (en) * 1987-02-05 1996-02-06 Toltran, Ltd. Improved translation system utilizing a morphological stripping process to reduce words to their root configuration to produce reduction of database size
US5651095A (en) * 1993-10-04 1997-07-22 British Telecommunications Public Limited Company Speech synthesis using word parser with knowledge base having dictionary of morphemes with binding properties and combining rules to identify input word class
US5668928A (en) * 1995-01-31 1997-09-16 Kor Team International, Inc. Speech recognition system and method with automatic syntax generation
US5671426A (en) * 1993-06-22 1997-09-23 Kurzweil Applied Intelligence, Inc. Method for organizing incremental search dictionary
US5751906A (en) * 1993-03-19 1998-05-12 Nynex Science & Technology Method for synthesizing speech from text and for spelling all or portions of the text by analogy
US5754977A (en) * 1996-03-06 1998-05-19 Intervoice Limited Partnership System and method for preventing enrollment of confusable patterns in a reference database
EP0848372A2 (en) * 1996-12-10 1998-06-17 Matsushita Electric Industrial Co., Ltd. Speech synthesizing system and redundancy-reduced waveform database therefor
US5845246A (en) * 1995-02-28 1998-12-01 Voice Control Systems, Inc. Method for reducing database requirements for speech recognition systems
US5913194A (en) * 1997-07-14 1999-06-15 Motorola, Inc. Method, device and system for using statistical information to reduce computation and memory requirements of a neural network based speech synthesis system
US5930756A (en) * 1997-06-23 1999-07-27 Motorola, Inc. Method, device and system for a memory-efficient random-access pronunciation lexicon for text-to-speech synthesis
EP0952531A1 (en) * 1998-04-24 1999-10-27 BRITISH TELECOMMUNICATIONS public limited company Linguistic converter

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6208968B1 (en) * 1998-12-16 2001-03-27 Compaq Computer Corporation Computer method and apparatus for text-to-speech synthesizer dictionary reduction

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4775956A (en) * 1984-01-30 1988-10-04 Hitachi, Ltd. Method and system for information storing and retrieval using word stems and derivative pattern codes representing familes of affixes
US5490061A (en) * 1987-02-05 1996-02-06 Toltran, Ltd. Improved translation system utilizing a morphological stripping process to reduce words to their root configuration to produce reduction of database size
US4979216A (en) 1989-02-17 1990-12-18 Malsheen Bathsheba J Text to speech synthesis system and method using context dependent vowel allophones
US5157759A (en) 1990-06-28 1992-10-20 At&T Bell Laboratories Written language parser system
US5323316A (en) * 1991-02-01 1994-06-21 Wang Laboratories, Inc. Morphological analyzer
US5384893A (en) 1992-09-23 1995-01-24 Emerson & Stern Associates, Inc. Method and apparatus for speech synthesis based on prosodic analysis
US5751906A (en) * 1993-03-19 1998-05-12 Nynex Science & Technology Method for synthesizing speech from text and for spelling all or portions of the text by analogy
US5671426A (en) * 1993-06-22 1997-09-23 Kurzweil Applied Intelligence, Inc. Method for organizing incremental search dictionary
US5651095A (en) * 1993-10-04 1997-07-22 British Telecommunications Public Limited Company Speech synthesis using word parser with knowledge base having dictionary of morphemes with binding properties and combining rules to identify input word class
US5668928A (en) * 1995-01-31 1997-09-16 Kor Team International, Inc. Speech recognition system and method with automatic syntax generation
US5845246A (en) * 1995-02-28 1998-12-01 Voice Control Systems, Inc. Method for reducing database requirements for speech recognition systems
US5754977A (en) * 1996-03-06 1998-05-19 Intervoice Limited Partnership System and method for preventing enrollment of confusable patterns in a reference database
EP0848372A2 (en) * 1996-12-10 1998-06-17 Matsushita Electric Industrial Co., Ltd. Speech synthesizing system and redundancy-reduced waveform database therefor
US5930756A (en) * 1997-06-23 1999-07-27 Motorola, Inc. Method, device and system for a memory-efficient random-access pronunciation lexicon for text-to-speech synthesis
US5913194A (en) * 1997-07-14 1999-06-15 Motorola, Inc. Method, device and system for using statistical information to reduce computation and memory requirements of a neural network based speech synthesis system
EP0952531A1 (en) * 1998-04-24 1999-10-27 BRITISH TELECOMMUNICATIONS public limited company Linguistic converter

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
Bachenko, J., et al., "A Parser for Real-Time Speech Synthesis of Conversational Texts," Third Conference on Applied Natural Language Processing, Proceedings of the Conference, pp. 25-32 (1992).
Bachenko, J., et al., "Prosodic Phrasing for Speech Synthesis of Written Telecommunications by the Deaf," IEEE Global Telecommunications Conference; GLOBECOM '91, 2:1391-5 (1991).
Carlson, R., et al., "Predicting Name Pronunciation for a Reverse Directory Service," Eurospeech 89. European Conference on Speech Communications and Technology, pp. 113-115 (1989).
Fitzpatrick, E., et al., "Parsing for Prosody: What a Text-to-Speech System Needs from Syntax," Proceedings of the Annual AI Systems in Government Conference, pp. 188-194 (1989).
Lazzaro, J.J., "Even as We Speak," Byte, p. 165 (Apr. 1992).
McGlashan, S., et al., "Dialogue Management for Telephone Information Systems," Third Conference on Applied Natural Language Processing, Proceedings of the Conference, pp. 245-246 (1992).
Medina, D., "Humanizing Synthetic Speech," Information Week, p. 46 (Mar. 18, 1991).
Takahashi, J., et al., "Interactive Voice Technology Development for Telecommunications Applications," Speech Communication, 17:287-301.
Wolf, H.E., et al., "Text-Sprache-Umsetzung für Anwendungen bei automatischen Informations- und Transaktions-systemen (Text-to-Speech Conversion for Automatic Information Services and Order Systems)," Informationstechnik it, vol. 31, No. 5, pp. 334-341 (1989).
Yiourgalis, N., et al., "Text to Speech System for Greek," 1991 conference on Acoustics, Speech and Signal Processing, 1:525-8 (1991).
Zimmerman, J., "Giving Feeling to Speech," Byte, 17(4):168 (1992).

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6456209B1 (en) * 1998-12-01 2002-09-24 Lucent Technologies Inc. Method and apparatus for deriving a plurally parsable data compression dictionary
US6347298B2 (en) * 1998-12-16 2002-02-12 Compaq Computer Corporation Computer apparatus for text-to-speech synthesizer dictionary reduction
US20020026313A1 (en) * 2000-08-31 2002-02-28 Siemens Aktiengesellschaft Method for speech synthesis
US20020046025A1 (en) * 2000-08-31 2002-04-18 Horst-Udo Hain Grapheme-phoneme conversion
US7107216B2 (en) * 2000-08-31 2006-09-12 Siemens Aktiengesellschaft Grapheme-phoneme conversion of a word which is not contained as a whole in a pronunciation lexicon
US7333932B2 (en) * 2000-08-31 2008-02-19 Siemens Aktiengesellschaft Method for speech synthesis
US20030074196A1 (en) * 2001-01-25 2003-04-17 Hiroki Kamanaka Text-to-speech conversion system
US7260533B2 (en) * 2001-01-25 2007-08-21 Oki Electric Industry Co., Ltd. Text-to-speech conversion system
US20120089400A1 (en) * 2010-10-06 2012-04-12 Caroline Gilles Henton Systems and methods for using homophone lexicons in english text-to-speech
WO2012172596A1 (en) * 2011-06-14 2012-12-20 三菱電機株式会社 Pronunciation information generating device, in-vehicle information device, and database generating method
DE102011118059A1 (en) * 2011-11-09 2013-05-16 Elektrobit Automotive Gmbh Technique for outputting an acoustic signal by means of a navigation system
US20150012261A1 (en) * 2012-02-16 2015-01-08 Continetal Automotive Gmbh Method for phonetizing a data list and voice-controlled user interface
US9405742B2 (en) * 2012-02-16 2016-08-02 Continental Automotive Gmbh Method for phonetizing a data list and voice-controlled user interface
US20140222415A1 (en) * 2013-02-05 2014-08-07 Milan Legat Accuracy of text-to-speech synthesis
US9311913B2 (en) * 2013-02-05 2016-04-12 Nuance Communications, Inc. Accuracy of text-to-speech synthesis
US20150213330A1 (en) * 2014-01-30 2015-07-30 Abbyy Development Llc Methods and systems for efficient automated symbol recognition
US9892114B2 (en) * 2014-01-30 2018-02-13 Abbyy Development Llc Methods and systems for efficient automated symbol recognition
US20160246799A1 (en) * 2015-02-20 2016-08-25 International Business Machines Corporation Policy-based, multi-scheme data reduction for computer memory
US10089319B2 (en) * 2015-02-20 2018-10-02 International Business Machines Corporation Policy-based, multi-scheme data reduction for computer memory
US11113245B2 (en) 2015-02-20 2021-09-07 International Business Machines Corporation Policy-based, multi-scheme data reduction for computer memory
US9852728B2 (en) * 2015-06-08 2017-12-26 Nuance Communications, Inc. Process for improving pronunciation of proper nouns foreign to a target language text-to-speech system
CN112487797A (en) * 2020-11-26 2021-03-12 北京有竹居网络技术有限公司 Data generation method and device, readable medium and electronic equipment
CN112487797B (en) * 2020-11-26 2024-04-05 北京有竹居网络技术有限公司 Data generation method and device, readable medium and electronic equipment

Also Published As

Publication number Publication date
US20010012999A1 (en) 2001-08-09
US6347298B2 (en) 2002-02-12

Similar Documents

Publication Publication Date Title
US6208968B1 (en) Computer method and apparatus for text-to-speech synthesizer dictionary reduction
US6076060A (en) Computer method and apparatus for translating text to sound
US8566099B2 (en) Tabulating triphone sequences by 5-phoneme contexts for speech synthesis
US6094633A (en) Grapheme to phoneme module for synthesizing speech alternately using pairs of four related data bases
US6029132A (en) Method for letter-to-sound in text-to-speech synthesis
US5949961A (en) Word syllabification in speech synthesis system
KR900009170B1 (en) Synthesis-by-rule type synthesis system
KR100509797B1 (en) Method and apparatus using decision trees to generate and score multiple pronunciations for a spelled word
US20070255567A1 (en) System and method for generating a pronunciation dictionary
US6347295B1 (en) Computer method and apparatus for grapheme-to-phoneme rule-set-generation
JPH1039895A (en) Speech synthesising method and apparatus therefor
US6477495B1 (en) Speech synthesis system and prosodic control method in the speech synthesis system
US6829580B1 (en) Linguistic converter
US6961695B2 (en) Generating homophonic neologisms
JPH06282290A (en) Natural language processing device and method thereof
JP3589972B2 (en) Speech synthesizer
JP2002358091A (en) Method and device for synthesizing voice
JP3414326B2 (en) Speech synthesis dictionary registration apparatus and method
KR0180650B1 (en) Sentence analysis method for korean language in voice synthesis device
KR100932643B1 (en) Method of grapheme-to-phoneme conversion for Korean TTS system without a morphological and syntactic analysis and device thereof
Jose et al. Initial experiments with Tamil LVCSR
Olaszi Analysis of Written and Spoken Form of Hungarian Numbers for TTS Applications
KR0136423B1 (en) Phonetic change processing method by validity check of sound control symbol
JP2005534968A (en) Deciding to read kanji
Singh et al. Punjabi text-to-speech synthesis system

Legal Events

Date Code Title Description
AS Assignment

Owner name: DIGITAL EQUIPMENT CORPORATION, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VITALE, ANTHONY J.;LIN, GINGER CHUN-CHE;KOPEC, THOMAS;REEL/FRAME:009684/0050

Effective date: 19981208

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

CC Certificate of correction
AS Assignment

Owner name: COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DIGITAL EQUIPMENT CORPORATION;COMPAQ COMPUTER CORPORATION;REEL/FRAME:012447/0903;SIGNING DATES FROM 19991209 TO 20010620

AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: CHANGE OF NAME;ASSIGNOR:COMPAQ INFORMANTION TECHNOLOGIES GROUP LP;REEL/FRAME:014102/0224

Effective date: 20021001

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20090327