WO1996012271A1 - Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program - Google Patents

Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program Download PDF

Info

Publication number
WO1996012271A1
WO1996012271A1 PCT/US1995/013134 US9513134W WO9612271A1 WO 1996012271 A1 WO1996012271 A1 WO 1996012271A1 US 9513134 W US9513134 W US 9513134W WO 9612271 A1 WO9612271 A1 WO 9612271A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
word
data
synthesized
database
Prior art date
Application number
PCT/US1995/013134
Other languages
French (fr)
Other versions
WO1996012271A9 (en
Inventor
Olivier Gautherot
Tsakhi Segal
Avraham Barel
Uri Weiner
Original Assignee
National Semiconductor Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Semiconductor Corporation filed Critical National Semiconductor Corporation
Priority to EP95937434A priority Critical patent/EP0734568A1/en
Priority to KR1019960703143A priority patent/KR960706671A/en
Publication of WO1996012271A1 publication Critical patent/WO1996012271A1/en
Publication of WO1996012271A9 publication Critical patent/WO1996012271A9/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts

Definitions

  • the present invention relates to techniques for synthesizing speech for use in data processing systems, telephone answering machines, and other devices, and more specifically, to an apparatus and method capable of synthesizing speech in multiple languages using a single application program.
  • Synthesized speech is used in many electronic devices as part of the user interface to enable a user to interact with or obtain information from the device.
  • Such devices typically contain a speech synthesizer chip which consists of a processor having speech synthesis capability
  • the synthesized speech may be output through any one of several mediums, e.g., audio voice synthesis, morse code, message display, etc.
  • the speech synthesizer chip may be separate from the other functional units of the device, or it may be incorporated with additional functions such as memory, digital signal processing, timers, etc As shown in Fig. 1.
  • a typical speech synthesis chip 1 contains a system control ler 10 which is linked to a word synthesizer 12 by means of a communication link 14
  • Word synthesizer 12 accesses vocabulary database 16 in order to retrieve word data needed to construct sentences in response to instructions issued by controller 10
  • Vocabulary database 16 stores the words or groups o f words used to synthesize the sentences requested by controller 10 in a non-volatile memory
  • Controller 10 typically contains an application program stored in a read-only memory (ROM ) with the program being designed for the specific application for which the sy nthesized words are required
  • ROM read-only memory
  • the application program includes routines written for each sentence which the speech synthesis chip I is expected to produce for the desired application.
  • Each routine generates a desired sentence by causing controller 10 to issue a set of commands to word synthesizer 12 where each command causes a word or group of words in that sentence to be synthesized.
  • the grammar rules, word order structure, and rules for constructing numbers (among other characteristics) specific to a particular language are embedded in the application program and are reflected in the order and types of commands w hich the program causes ontroller 10 to issue
  • the present invention is directed to an apparatus and method for synthesizing a finite set of sentences and numbers in one of several languages using an application program which is independent of the language being synthesized
  • the invention includes a system controller which communicates with a sentence and word synthesizer by means of a communication link
  • the sentence and word synthesizer responds to instructions from the controller by accessing a vocabulary and sentence database which contains all of the language specific information usually found in a controller resident application program in standard implementations of speech synthesizers
  • the language specific information is encoded in a language independent format in the database Therefore the application program can be w ritten in a form which is independent of the language to be synthesized
  • the database contains all of the language specific information and its contents is retrieved b y an indexing sy stem winch assigns an index number to each sentence
  • the application program causes the controller to issue a command to retriev e a desired sentence bv using its index number w here the command includes intormation regardi ng the specific data needed
  • variables are ivpicallv numbers
  • the control terms act to control the operation of the sentence synthesizer and determine the siructurc of the sentence being synthesized
  • thev mas determ ine w hether the singular or plural lorm of a w ord is appropriate or act to produce the proper pronunciation ol a number depending upon its context
  • the controller issues a command instructinc the sentence synthesizer to produce a sentence having a prescribed index number
  • the command inc ludes the values of any variables needed t o complete the sentence
  • the sentence synthesizer retriev es tlu sentence content from the database and then implements the sentence according to the words control terms and variables contained in it
  • Each daia w ord in the sentence is read bv a word decoder w hich determines it the data word is a word v ariable or control term for each word to be synthesized tn .
  • sentence sy nthesizer instructs a w ord synthesizer to retrieve that word from the database and prodnee it in spoken torm
  • sentence svnthesizer points to a data table which contains the spoken word equivalents of the number or numbers to be produced by the speech synthesizer fhe data table points to the entries in the word database corresponding to the words needed to produce the spoken number
  • w ords are then retriev ed and produced as speech by the action of the w ord su ⁇ thes ⁇ /er T he conirol terms are interpreted bv the sentence synthesizer as commands to carry out operations w hich implement the urammar rules contextual checking ete of the language and therebv determine the final sentence structure
  • Fig 1 is a block diagram of a typical speech synthesis chip
  • Fig 2 is a block diagram of a speech synthesis chip constructed according to the present invention
  • Fig 3 is a flowchart showing the operation of the sentence synthesizer module of the present invention
  • Fig 4 shows how a simple sentence is constructed and synthesized by the speech synthesis chip of the present invention
  • Fig 5 is a block diagram of a telephone answering machine which incorporates the speech synthesizer chip of the present invention DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
  • Fig 2 is a block diagram of a speech synthesis chip 100 constructed according to the present invention
  • Speech chip 100 includes a system controller 102 which communicates with a sentence and word synthesizer 104 via a communication link 103
  • System controller 102 can take the form of a separate processor which interacts with synthesizer 104 via communication link 103 in a master/slave type of architecture, or controller 102 can be a separate software module running on the same processor as synthesizer 104 In the latter situation, communication betw een controller 102 and synthesizer 104 occurs via the internal registers of the processor or by means of a v ariable in memory.
  • Synthesizer 104 accesses vocabulary and sentence database 108 in order to construct sy nthesized speech sentences in response to commands issued by controller 102
  • Database 108 is typically separated into two sections, a vocabulary or word section 109 and a sentence section 1 10
  • Database 108 contains the words, grammar rules, numbers, and contextual information needed tor synthesizer 104 to synthesize sentences in response to commands from controller 102
  • Sy nthesizer 104 ty pically contains tw o modules a sentence sy nthesizer 105 and a word synthesizer 106
  • Sentence sy nthesizer 105 acts to control the production of a desired sentence by interpreting the data retriev ed from database 108 in response to a command from controller 102 to synthesize a particular sentence W ord sy nthesizer 106 acts to synthesize specific words in response to commands from sentence sy nthesizer 105
  • Database 108 contains all of the language specific information needed to synthesize any of the set of sentences which system 100 is capable of synthesizing T h is is accomplished by use of a data structure which includes the language specific information in the definition of the sentence T hus as w ill be described in greater detail later w hen controller 102 issues a command to synthesize a particular sentence by providing its index, sentence synthesizer 105 retriev es that sentence structure from sentence section 1 10 of database 108. where the sentence structure contains al l of the grammar and contextual rules of the language being synthesized This significant reduces the complexity of the application program which is resident in controller 102. and makes the speec h sy nthesis system more flexible and capable of being used to synthesize multiple languages
  • Fig 3 is a flowchart showing the operation of sentence sy nthesizer 105 module of the present invention
  • Sentence synthesizer 105 receives an instruction from controller 102 to synthesize sentence (n).
  • n represents the index of the sentence to be produced ( box 200)
  • a pointer is set to the sentence w ith index (n) ( box 210) in the sentence sect ion 1 10 of database 108
  • the sentence content is retriev ed from database 108 and the data is read one data w ord at a time by a word decoder contained in sentence sy nthesizer 105 ( box 220)
  • a test is then performed to determine if the data w ord which has been read by the decoder is an end marker, signify ing the end of the sentence data ( box 230 ) If the data word is an end marker the program ends (box 250) I f the data word is not an end marker the character of the data word determines w hether a number is
  • Controller 102 issues a command to synthesizer 104 via communication link 103
  • the command of the form "synthesize sentence (n, x 1 , x 2 , x 3 , ... )", where n is a number corresponding to the sentence index, and x 1 , x 2 , x 3 , etc. represent values of the arguments or variables to be inserted into the sentence structure
  • synthesizer 104 accesses database 108. using a pointer to retrie the sentence corresponding to index (n) from the sentence database portion 1 10 of database 108
  • a sentence contained in database 108 is composed of data words representing three types of objects, words, variables, and control terms
  • the words are fixed entries ("You have , etc in the example sentence) for the invariant parts of the sentence
  • the sentence structure in database 108 contains pointers for the words to be synthesized which direct the word synthesizer portion 106 of synthesizer 104 to retrieve those word(s) from the word section 109 of database 108 and then synthesize them
  • the variables or arguments correspond to portions of the sentence which change w ith the situatio in which the sentence is being synthesized They are usuallv numerals and the sentence structure contain a pointer to a numeral decoder or table 300 w hich translates the number (in this case 2 1 ) to its corresponding words ("twenty-one")
  • the control terms are instructions w hich cause the synthesizer to check for a particular condition, such as the existence of a plural argument If the condition is satisfied the index to the next word to be synthesized
  • a word decoder After retrieval of the appropriate sentence, a word decoder reads each data word from the sentence w here the data words correspond to the words, v ariables, and control terms previously described If the data w ord corresponds to a word or word group, that w ord or w ord group is retrieved from the v ocabularx or word section 109 of database 108 and then is svnthesized bv word synthesizer 106 Sentence synthesizer 105 then reads the next data w ord, which is the case of the example of Fig 4 is an instruction to go to table I to retrieve a number The instruction to go to table 1 can if necessary be followed b ⁇ a logic step which determines the contexi in which the number is being used in the sentence so that the appropriate spoken form of the number w ill be sy nthesized This logic step is important in languages such as German in which the form of a number (the actual w ords used to express that number) depends
  • this context determining logic is represented by a context selector ( box 310) T he w ord follow ing the instruction to go to table 1 is read next and provides the argument for the variable in the sentence, in this case th e number of messages Based on this argument and the results of the context selector logic the appropriate entrx in table 1 or another data table is located A pointer or pomiers trom that entrv indicates the w ords in w ord section 109 of database 108 which correspond to the argument needed for the sentence This is followed by an instruction to word synthesizer 106 to synthesize those words
  • Sentence synthesizer 105 then reads the next data word, w hich in this case is a control term of instruction to check if the argument is singular or plural If the argument is singular, tne word messag e is retrieved trom the w ord section 109 of database 108 and is then spoken by w ord sv ntnesizer 106 I f the argument is plural, then sentence synthesizer 105 increments the w ord index by one therein causing the word "messages" to be retrieved and synthesized
  • sentence sv nthesizer 105 of the present invention performs the processing steps necessary to retrieve the sentence to be synthesized, parse through the data words which comprise the content of that sentence, and control the synthesizing of each of the w ords or variables in that sentence In this way the complete sentence is svnthesized bv a sequence of logic steps and instructions to retrieve words from the word section 109 of database 108 and then synthesize those words.
  • the present invention solves the problems inherent in the prior art approach of synthesizing a sentence word by word under the control of a controller by incorporating the language specific information in the same database as the word information.
  • the application program being run by the master controller need only issue a command to synthesize a desired sentence which is identified by its index number.
  • Control is then passed to the sentence synthesizer which retrieves the sentence content data and parses through it to carry out the process of synthesizing the sentence
  • the language specific information is expressed in terms of the sentence content, i.e., as data, it can be stored in a standard memory device instead of being expressed as code which runs on the controller This reduces the complexity of the application program while also reducing system costs and increasing the flexibility of the system.
  • the controller need issue only a single command in order to synthesize an entire sentence (as opposed to currently available systems in which the controller must issue a command for each word to be synthesized), the controller can be used to perform or monitor other system functions during the synthesis process.
  • controller 102 can issue a command to sentence synthesizer 105 to select a different portion of database 108 to use w hen retrieving sentence and word data, or database 108 may be replaced by a different memory device w hich contains the sentence content and words needed for the new language Because the sentence content tor the new language contains all of the language specific information required when synthesizing a sentence in that language the application program being executed by controller 102 does not have to be changed
  • FIG. 5 is a block diagram of a telephone answering machine w hich incorporates the speech synthesizer chip of the present invention
  • Fig 5 represents a ty pical application of the present invention, a is only one example of many environments in which the present invention may he utilized to prov ide efficient multi-language speech synthesis capabilities
  • System controller 102 In order to retrieve messages from a telephone answering machine, a user depresses a key on keypad 401. System controller 102 decodes which key has been depressed and translates the keystroke into an action to be implemented by the speech synthesizer If. for example, the action is to announce the current time, system controller 102 will issue a command to module interlace 402 to synthesize a particular sentence from sentence database 110. which is pan of language database 108 The sentence to be synthesized is identified by its index number, n. Module interface 402 sends the sentence index to sentence and word synthesizer 104 As previously described with reference to F igs 3-4.
  • sentence synthesizer module 105 of synthesizer 104 will retrieve the sentence definition corresponding to the sentence with index n from sentence section 1 10 of database 108. decode its content, and convert it to a series of words to be synthesized or control terms, where the words may include the steps of conv erting numbers into the equivalent words
  • the decoded words which are to be spoken are passed to the word synthesizer 106 module of synthesizer 104, along with instructions to synthesize those words Word sy nthesizer 106 retrieves the desired words from the word section 109 of database 108.
  • Codec module 403 is under the control of module interlace 402 and is responsible for performing the digital-to-analog and analog-to-digital conversion functions required by the system Codec module 403 converts the decompressed digital samples to analog signals which are then produced as audible speech by means of a loudspeaker 408 If desired, the sentence can also be displayed visually by means of a display 409
  • a request to the answering machine is provided by means of a signal transmitted over a telephone line instead of keypad 401. that request enters the system via telephone line interface 404
  • the incoming signal is passed by interface 404 to analog multiplexer 405 which controls the input, output, and processing of analog signals
  • Analog multiplexer 405 sends the signal to codec module 403 which reads the signal and converts it to digital form
  • a digital signal processing (DSP) and systems function module 406 decodes the signal read by codec module 403 and determines if the decoded signal corresponds to an instruction the system is designed to recognize If so module interface 402 informs system controller 102 what the instruction or digit represented by the incoming signal is. and controller 102 then implements that instruction as previously described
  • DSP and systems functions module 406 can also perform other functions such as voice compression and decompression, tone generation and detection real-time clock generation, memory management, etc
  • most telephone answ ering systems include a microphone 407 for recording messages, and a loudspeaker 408 for playing back messages and the synthesized speech
  • analog multiplexer 405 controls the input output function w hich cause analog signals to enter the other system modules or cause analog signals to oe produced by the system
  • a display 409 may also be included to visual ly display system information or messages to the user
  • the sentence synthesizer module 105 can be modified depending upon the application bv altering the set of variables which are recognized automatically and the se t of grammar rules
  • the set of v ariables may be expanded to account for a larger set of numbe r s w hile the grammar rules w hich are usuallv encoded as control terms in a sentence, can be chanced to permit the sy nthesis of oiher language or of additional aspects of the same language
  • Such modificat ions allow the speech sy nthesizer svstem the present invention to more efficiently adapt to new uses or marke ts in w hich a product incorporating the svstem will be sold
  • the structure of the sentence database data table can be expressed as tw o columns a first column containing an index number for the line of sentence data w ords contained in the second column and a corresponding line of data words which define the contents ol the sentence
  • the line of data words can be expressed as a sequence of numbers in a fixed-size bnurv representation Each number corresponds to an index for an entry representing the sampled spoken form ol a particular w ord stored in w ord section 109 of database 108.
  • sentence synthesizer 105 reads each data word in the line of data words retriev ed trom sentence section 1 10. it either instructs word synthesizer 106 to retrieve a particular word from word section 109 of database 108 and produce that word as spoken speech implement a conditional test or other instruction defined bv a control word, or point to a designated number table containing the word indices for the spoken word equivalents of a variable in the sentence If a number is spoken differently depending upon the context (as in the number one being spoken as "one" or "first"), a different number table should be constructed for each context.
  • the entries in a number table represent the index numbers for the words contained in word section 109 which correspond to the spoken words for the number to be synthesized
  • Control words can be used to determine whether the word "AM” or "PM” should be used in a time announcement, whether a singular or plural term should be used (and point to the appropriate word in the word section of the database), select the proper day of the week to announce, etc.
  • the option codes can be used to select the appropriate number table, determine whether the time is announced in 12 or 24 hour format, or perform other functions which involve synthesizing words for numbers.
  • the data representing the various indices and the digital data representing the spoken words is burned into a ROM
  • a memory device can store both the data representing the word samples for each word to be spoken, and the various links which allow sentence synthesizer 105 to control how a sentence is produced If it is desired to have the synthesizer be able to produce speech in more than one language a different ROM or section of an existing ROM should be used to store those words
  • the database structure and design of the sentence synthesizer of the present invention permit multiple languages to be produced by a controller running a single application program

Abstract

An apparatus and method for synthesizing a finite set of sentences and numbers in one of several languages using an application program which is independent of the language being synthesized. The invention includes a system controller which communicates with a sentence and word synthesizer by means of a communication link. The sentence and word synthesizer responds to instructions from the controller by accessing a vocabulary and sentence database which contains all of the language dependent information found in an application program contained in the controller in standard implementations of speech synthesizers. The language dependent information such as grammar rules, etc., is encoded in a language independent format in the database. Therefore, the application program can be written in a form which is independent of the specific language to be synthesized. If it is desired to synthesize a different language, only that portion of the database containing the language specific grammar rules, sentence structure, etc. need be replaced instead of adding new code to the application program. Thus, by instructing the speech synthesizer to access a different database or by changing the memory containing the database a new language can be synthesized. This simplifies the process of synthesizing speech in multiple languages and reduces the development cost for such a device and the complexity of the controller.

Description

SPEECH SYNTHESIS APPARATUS AND METHOD FOR SYNTHESIZING A FINITE SET OF SENTENCES AND NUMBERS USING ONE PROGRAM
TECHNICAL FIELD
The present invention relates to techniques for synthesizing speech for use in data processing systems, telephone answering machines, and other devices, and more specifically, to an apparatus and method capable of synthesizing speech in multiple languages using a single application program.
BACKGROUND OF THE INVENTION
Synthesized speech is used in many electronic devices as part of the user interface to enable a user to interact with or obtain information from the device. Such devices typically contain a speech synthesizer chip which consists of a processor having speech synthesis capability The synthesized speech may be output through any one of several mediums, e.g., audio voice synthesis, morse code, message display, etc. The speech synthesizer chip may be separate from the other functional units of the device, or it may be incorporated with additional functions such as memory, digital signal processing, timers, etc As shown in Fig. 1. a typical speech synthesis chip 1 contains a system control ler 10 which is linked to a word synthesizer 12 by means of a communication link 14 Word synthesizer 12 accesses vocabulary database 16 in order to retrieve word data needed to construct sentences in response to instructions issued by controller 10 Vocabulary database 16 stores the words or groups o f words used to synthesize the sentences requested by controller 10 in a non-volatile memory
Controller 10 typically contains an application program stored in a read-only memory (ROM ) with the program being designed for the specific application for which the sy nthesized words are required
The application program includes routines written for each sentence which the speech synthesis chip I is expected to produce for the desired application. Each routine generates a desired sentence by causing controller 10 to issue a set of commands to word synthesizer 12 where each command causes a word or group of words in that sentence to be synthesized. The grammar rules, word order structure, and rules for constructing numbers (among other characteristics) specific to a particular language are embedded in the application program and are reflected in the order and types of commands w hich the program causes ontroller 10 to issue
Because different languages have different structural characteristics ( grammar rules, etc ). it is very difficult to design a speech synthesis device which is capable of synthesizing speech in multiple languages, or which is capable of synthesizing speech in one of a selected set of languages depending. upon the need. Present systems use a different application program (or each intended language, so that the languages whose speech is to be synthesized and the sentences which will be produced must be identified prior to developing the system. In addition, the applications programs for each language must be designed and tested prior to production of the speech synthesis chip, leading to a lengths development cycle. This is a result of the need to include the applications programs for all languages w hich are expected to be synthesized in the controller at the time a speech synthesis chip is being designed Note that the language specific routines can also be produced using a context tree grammar tool, thereby reducing the amount of code and/or memory required
Even if a speech synthesis chip is designed w ith a multi-language capability if synthesis of a now language is required, or if new sentences need to be synthesized in an existing language, new application program code is nccessan Replacement of the controller ROM and alterations to the vocabulary database are also necessary in this situation. This reduces the flexibility of the speech synthesis chip and makes the use of controllers containing masked ROM impractical. Another problem encountered with currently available speech synthesis chips is that in some languages, the manner in which numbers and certain words are pronounced depends upon the context in which they are used For instance, a given number may be pronounced in different ways or represented by different words in the same sentence In order a account for this situation, an application program needs to be able 10 recognize different contexts and determine the appropriate word or pronunciation to be used a capability lacking in most word-by- word speech synthesizers Even if available, this capability further increases the complexity of the program and the load on the communication link connecting the controller and word synthesizer
What is desired is a method of synthesizing a finite set of sentences and numbers in an arbitrary language in a manner which does not require that a new application program be written for each language It is also desired to have a speech synthesis chip which implements the above method
SUMMARY OF THE INVENTION
The present invention is directed to an apparatus and method for synthesizing a finite set of sentences and numbers in one of several languages using an application program which is independent of the language being synthesized The invention includes a system controller which communicates with a sentence and word synthesizer by means of a communication link The sentence and word synthesizer responds to instructions from the controller by accessing a vocabulary and sentence database which contains all of the language specific information usually found in a controller resident application program in standard implementations of speech synthesizers The language specific information is encoded in a language independent format in the database Therefore the application program can be w ritten in a form which is independent of the language to be synthesized The database contains all of the language specific information and its contents is retrieved b y an indexing sy stem winch assigns an index number to each sentence The application program causes the controller to issue a command to retriev e a desired sentence bv using its index number w here the command includes intormation regardi ng the specific data needed to produce the desired sentence Lach sentence is constructed as a sei of words variables and control terms The words are fixed entries and tin. variables are ivpicallv numbers The control terms act to control the operation of the sentence synthesizer and determine the siructurc of the sentence being synthesized For example thev mas determ ine w hether the singular or plural lorm of a w ord is appropriate or act to produce the proper pronunciation ol a number depending upon its context In operation the controller issues a command instructinc the sentence synthesizer to produce a sentence having a prescribed index number The command inc ludes the values of any variables needed t o complete the sentence The sentence synthesizer retriev es tlu sentence content from the database and then implements the sentence according to the words control terms and variables contained in it Each daia w ord in the sentence is read bv a word decoder w hich determines it the data word is a word v ariable or control term for each word to be synthesized tn . sentence sy nthesizer instructs a w ord synthesizer to retrieve that word from the database and prodnee it in spoken torm For each variable the sentence svnthesizer points to a data table which contains the spoken word equivalents of the number or numbers to be produced by the speech synthesizer fhe data table points to the entries in the word database corresponding to the words needed to produce the spoken number These w ords are then retriev ed and produced as speech by the action of the w ord suιthesι/er T he conirol terms are interpreted bv the sentence synthesizer as commands to carry out operations w hich implement the urammar rules contextual checking ete of the language and therebv determine the final sentence structure
If it is desired to synthesize a different language or to alter the sentences w hich ejn be sv nthesized onlv that portion of the database containing the language specilie grammar rules sentence structure etc needs to be replaced This is a more efficient means of cxpandim. the capabilities ol the speech synthesi s chip than adding new code to the application program Bv instructim, the speech sy nthesizer to access a difterent database or by changing the memory containing the database a new language can be sy nthesized This simplifies the process of synthesizing speech in multiple languages and reduces the development costs for such a device and the complexity of the controller
Further objects and advantages of the present invention w ill become apparent Ironi the follow ing detailed description and accompanying drawings BRIEF DESCRIPTION OF THE DRAWINGS
Fig 1 is a block diagram of a typical speech synthesis chip
Fig 2 is a block diagram of a speech synthesis chip constructed according to the present invention Fig 3 is a flowchart showing the operation of the sentence synthesizer module of the present invention
Fig 4 shows how a simple sentence is constructed and synthesized by the speech synthesis chip of the present invention
Fig 5 is a block diagram of a telephone answering machine which incorporates the speech synthesizer chip of the present invention DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Fig 2 is a block diagram of a speech synthesis chip 100 constructed according to the present invention Speech chip 100 includes a system controller 102 which communicates with a sentence and word synthesizer 104 via a communication link 103 System controller 102 can take the form of a separate processor which interacts with synthesizer 104 via communication link 103 in a master/slave type of architecture, or controller 102 can be a separate software module running on the same processor as synthesizer 104 In the latter situation, communication betw een controller 102 and synthesizer 104 occurs via the internal registers of the processor or by means of a v ariable in memory.
Synthesizer 104 accesses vocabulary and sentence database 108 in order to construct sy nthesized speech sentences in response to commands issued by controller 102 Database 108 is typically separated into two sections, a vocabulary or word section 109 and a sentence section 1 10 Database 108 contains the words, grammar rules, numbers, and contextual information needed tor synthesizer 104 to synthesize sentences in response to commands from controller 102 Sy nthesizer 104 ty pically contains tw o modules a sentence sy nthesizer 105 and a word synthesizer 106 Sentence sy nthesizer 105 acts to control the production of a desired sentence by interpreting the data retriev ed from database 108 in response to a command from controller 102 to synthesize a particular sentence W ord sy nthesizer 106 acts to synthesize specific words in response to commands from sentence sy nthesizer 105
Database 108 contains all of the language specific information needed to synthesize any of the set of sentences which system 100 is capable of synthesizing T h is is accomplished by use of a data structure which includes the language specific information in the definition of the sentence T hus as w ill be described in greater detail later w hen controller 102 issues a command to synthesize a particular sentence by providing its index, sentence synthesizer 105 retriev es that sentence structure from sentence section 1 10 of database 108. where the sentence structure contains al l of the grammar and contextual rules of the language being synthesized This significant reduces the complexity of the application program which is resident in controller 102. and makes the speec h sy nthesis system more flexible and capable of being used to synthesize multiple languages
Fig 3 is a flowchart showing the operation of sentence sy nthesizer 105 module of the present invention Sentence synthesizer 105 receives an instruction from controller 102 to synthesize sentence (n). where n represents the index of the sentence to be produced ( box 200) In response, a pointer is set to the sentence w ith index (n) ( box 210) in the sentence sect ion 1 10 of database 108 The sentence content is retriev ed from database 108 and the data is read one data w ord at a time by a word decoder contained in sentence sy nthesizer 105 ( box 220) A test is then performed to determine if the data w ord which has been read by the decoder is an end marker, signify ing the end of the sentence data ( box 230 ) If the data word is an end marker the program ends (box 250) I f the data word is not an end marker the character of the data word determines w hether a number is constructed by means of a data table a word synthesized by word synthesizer 106 or a control term w hich modifies the final sentence structure is implemented ( box 240) As an example of the operation of speech synthesis chip 100 of the present invention, the process of synthesizing a sentence produced by a telephone answering machine will be described The example sentence is, "You have 21 messages". Fig 4 shows how this simple sentence is constructed and synthesized by the speech synthesis chip of the present invention
Controller 102 issues a command to synthesizer 104 via communication link 103 The command of the form "synthesize sentence (n, x 1, x2, x3, ... )", where n is a number corresponding to the sentence index, and x1 , x2, x3, etc. represent values of the arguments or variables to be inserted into the sentence structure In response to this command, synthesizer 104 accesses database 108. using a pointer to retrie the sentence corresponding to index (n) from the sentence database portion 1 10 of database 108
As shown in Fig 4, a sentence contained in database 108 is composed of data words representing three types of objects, words, variables, and control terms The words are fixed entries ("You have , etc in the example sentence) for the invariant parts of the sentence The sentence structure in database 108 contains pointers for the words to be synthesized which direct the word synthesizer portion 106 of synthesizer 104 to retrieve those word(s) from the word section 109 of database 108 and then synthesize them The variables or arguments correspond to portions of the sentence which change w ith the situatio in which the sentence is being synthesized They are usuallv numerals and the sentence structure contain a pointer to a numeral decoder or table 300 w hich translates the number (in this case 2 1 ) to its corresponding words ("twenty-one") The control terms are instructions w hich cause the synthesizer to check for a particular condition, such as the existence of a plural argument If the condition is satisfied the index to the next word to be synthesized bv the word synthesizer is automatical incremented resulting in the production of the next word in word section 109 of database 108 A more detailed description of this process is given below
After retrieval of the appropriate sentence, a word decoder reads each data word from the sentence w here the data words correspond to the words, v ariables, and control terms previously described If the data w ord corresponds to a word or word group, that w ord or w ord group is retrieved from the v ocabularx or word section 109 of database 108 and then is svnthesized bv word synthesizer 106 Sentence synthesizer 105 then reads the next data w ord, which is the case of the example of Fig 4 is an instruction to go to table I to retrieve a number The instruction to go to table 1 can if necessary be followed b\ a logic step which determines the contexi in which the number is being used in the sentence so that the appropriate spoken form of the number w ill be sy nthesized This logic step is important in languages such as German in which the form of a number ( the actual w ords used to express that number) depends upon the context in which the number is being used In hg 4. this context determining logic is represented by a context selector ( box 310) T he w ord follow ing the instruction to go to table 1 is read next and provides the argument for the variable in the sentence, in this case th e number of messages Based on this argument and the results of the context selector logic the appropriate entrx in table 1 or another data table is located A pointer or pomiers trom that entrv indicates the w ords in w ord section 109 of database 108 which correspond to the argument needed for the sentence This is followed by an instruction to word synthesizer 106 to synthesize those words
Sentence synthesizer 105 then reads the next data word, w hich in this case is a control term of instruction to check if the argument is singular or plural If the argument is singular, tne word messag e is retrieved trom the w ord section 109 of database 108 and is then spoken by w ord sv ntnesizer 106 I f the argument is plural, then sentence synthesizer 105 increments the w ord index by one therein causing the word "messages" to be retrieved and synthesized
As can be understood from the previous example sentence sv nthesizer 105 of the present invention performs the processing steps necessary to retrieve the sentence to be synthesized, parse through the data words which comprise the content of that sentence, and control the synthesizing of each of the w ords or variables in that sentence In this way the complete sentence is svnthesized bv a sequence of logic steps and instructions to retrieve words from the word section 109 of database 108 and then synthesize those words.
The present invention solves the problems inherent in the prior art approach of synthesizing a sentence word by word under the control of a controller by incorporating the language specific information in the same database as the word information. The application program being run by the master controller need only issue a command to synthesize a desired sentence which is identified by its index number. Control is then passed to the sentence synthesizer which retrieves the sentence content data and parses through it to carry out the process of synthesizing the sentence Because the language specific information is expressed in terms of the sentence content, i.e., as data, it can be stored in a standard memory device instead of being expressed as code which runs on the controller This reduces the complexity of the application program while also reducing system costs and increasing the flexibility of the system. In addition, because the controller need issue only a single command in order to synthesize an entire sentence (as opposed to currently available systems in which the controller must issue a command for each word to be synthesized), the controller can be used to perform or monitor other system functions during the synthesis process.
in order to synthesize a sentence in a different language, controller 102 can issue a command to sentence synthesizer 105 to select a different portion of database 108 to use w hen retrieving sentence and word data, or database 108 may be replaced by a different memory device w hich contains the sentence content and words needed for the new language Because the sentence content tor the new language contains all of the language specific information required when synthesizing a sentence in that language the application program being executed by controller 102 does not have to be changed
As noted, support for synthesizing numbers is provided through the use of data tables which contain the sequence of words to produce in order to synthesize a given number, w hen that number is spoken in a particular context Different contexts are supported by using different tables, with pointers from context selector 310 indicating which table to use Because the selection of the appropriate context is determined by the control terms within the sentence data, different contexts may be used within the same sentence
A typical speech synthesis system constructed according to the present invention is show n in Fig 5 which is a block diagram of a telephone answering machine w hich incorporates the speech synthesizer chip of the present invention Although Fig 5 represents a ty pical application of the present invention, a is only one example of many environments in which the present invention may he utilized to prov ide efficient multi-language speech synthesis capabilities
In order to retrieve messages from a telephone answering machine, a user depresses a key on keypad 401. System controller 102 decodes which key has been depressed and translates the keystroke into an action to be implemented by the speech synthesizer If. for example, the action is to announce the current time, system controller 102 will issue a command to module interlace 402 to synthesize a particular sentence from sentence database 110. which is pan of language database 108 The sentence to be synthesized is identified by its index number, n. Module interface 402 sends the sentence index to sentence and word synthesizer 104 As previously described with reference to F igs 3-4. sentence synthesizer module 105 of synthesizer 104 will retrieve the sentence definition corresponding to the sentence with index n from sentence section 1 10 of database 108. decode its content, and convert it to a series of words to be synthesized or control terms, where the words may include the steps of conv erting numbers into the equivalent words The decoded words which are to be spoken are passed to the word synthesizer 106 module of synthesizer 104, along with instructions to synthesize those words Word sy nthesizer 106 retrieves the desired words from the word section 109 of database 108. w here they may be stored as compressed digitized samples of previously recorded speech Word sy nthesizer 106 then acts to decompress the data into a scries of samples, and sends the decompressed samples to a codec module 403 Codec module 403 is under the control of module interlace 402 and is responsible for performing the digital-to-analog and analog-to-digital conversion functions required by the system Codec module 403 converts the decompressed digital samples to analog signals which are then produced as audible speech by means of a loudspeaker 408 If desired, the sentence can also be displayed visually by means of a display 409
If a request to the answering machine is provided by means of a signal transmitted over a telephone line instead of keypad 401. that request enters the system via telephone line interface 404 The incoming signal is passed by interface 404 to analog multiplexer 405 which controls the input, output, and processing of analog signals Analog multiplexer 405 sends the signal to codec module 403 which reads the signal and converts it to digital form A digital signal processing (DSP) and systems function module 406 decodes the signal read by codec module 403 and determines if the decoded signal corresponds to an instruction the system is designed to recognize If so module interface 402 informs system controller 102 what the instruction or digit represented by the incoming signal is. and controller 102 then implements that instruction as previously described
In more complicated systems. DSP and systems functions module 406 can also perform other functions such as voice compression and decompression, tone generation and detection real-time clock generation, memory management, etc In addition, most telephone answ ering systems include a microphone 407 for recording messages, and a loudspeaker 408 for playing back messages and the synthesized speech As mentioned previously , analog multiplexer 405 controls the input output function w hich cause analog signals to enter the other system modules or cause analog signals to oe produced by the system A display 409 may also be included to visual ly display system information or messages to the user
The sentence synthesizer module 105 can be modified depending upon the application bv altering the set of variables which are recognized automatically and the se t of grammar rules The set of v ariables may be expanded to account for a larger set of numbe r s w hile the grammar rules w hich are usuallv encoded as control terms in a sentence, can be chanced to permit the sy nthesis of oiher language or of additional aspects of the same language Such modificat ions allow the speech sy nthesizer svstem the present invention to more efficiently adapt to new uses or marke ts in w hich a product incorporating the svstem will be sold
As is evident from the foregoing description of the prese nt inv ention creation ol database 108 is an important part of the invention specificallv . the structure and contents of sentence section 1 10 of database 108 As previously discussed, the process of sy nthe sizing a sentence is begun by controller 102 issuing a command to sentence sv nthesizer 105 to "synthesize sentence ( n)' w here ( n ) is me index ol the desired sentence Sentence synthesizer 105 responds to this command bv accessing sentence section 1 10 of database 108 and retrieving the sentence data correspond me in the index (n l Sentence section 1 10 may be organized as a data table w ith each entry being identif ied bv its respectiv e index
The structure of the sentence database data table can be expressed as tw o columns a first column containing an index number for the line of sentence data w ords contained in the second column and a corresponding line of data words which define the contents ol the sentence The line of data words can be expressed as a sequence of numbers in a fixed-size bnurv representation Each number corresponds to an index for an entry representing the sampled spoken form ol a particular w ord stored in w ord section 109 of database 108. or to a contiol w ord w hich affects the structui e of the sentence or an option code which selects the number table to use w hen generating the spoken w ord equivalents of a giv en number Note that if certain words are pronounced differently depending upon the context, the different versions of the word should be stored as different entries in word section 109
As sentence synthesizer 105 reads each data word in the line of data words retriev ed trom sentence section 1 10. it either instructs word synthesizer 106 to retrieve a particular word from word section 109 of database 108 and produce that word as spoken speech implement a conditional test or other instruction defined bv a control word, or point to a designated number table containing the word indices for the spoken word equivalents of a variable in the sentence If a number is spoken differently depending upon the context (as in the number one being spoken as "one" or "first"), a different number table should be constructed for each context. The entries in a number table represent the index numbers for the words contained in word section 109 which correspond to the spoken words for the number to be synthesized Control words can be used to determine whether the word "AM" or "PM" should be used in a time announcement, whether a singular or plural term should be used (and point to the appropriate word in the word section of the database), select the proper day of the week to announce, etc. The option codes can be used to select the appropriate number table, determine whether the time is announced in 12 or 24 hour format, or perform other functions which involve synthesizing words for numbers.
After the structure of the database has been defined, the data representing the various indices and the digital data representing the spoken words is burned into a ROM In this way, a memory device can store both the data representing the word samples for each word to be spoken, and the various links which allow sentence synthesizer 105 to control how a sentence is produced If it is desired to have the synthesizer be able to produce speech in more than one language a different ROM or section of an existing ROM should be used to store those words Thus, the database structure and design of the sentence synthesizer of the present invention permit multiple languages to be produced by a controller running a single application program
The terms and expressions which have been employ ed herein are used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding equivalents of the features shown and described, or portions thereof it being recognized that various modifications are possible within the scope of the invention claimed

Claims

We claim:
1 A method of synthesizing a sentence in a desired language using a speech synthesis system comprising:
issuing a command to synthesize the sentence, wherein the command includes a sentence index which identifies the sentence;
retrieving a set of data corresponding to the identified sentence from a database by using t sentence index, wherein the set of data includes a string of data words corresponding to words and numerical variables contained in the sentence, and further wherein, the data words can include control terms which incorporate grammar rules of the language being synthesized into the data words and determine the structure of the sentence being synthesized.
reading each data word in the string of data words.
producing synthesized speech corresponding to a word or numeral represented by the data word if the data word corresponds to a word or numeral in the sentence to be synthesized, and
implementing an action which affects the structure of the sentence if the data word is a control term, whereby a sentence may be efficiently synthesized in one of a giv en set of languages by retrieving the identified sentence from a pan of the database containing d3ta w ords for sentences in that language
2 The speech synthesis method of claim I . wherein the command to synthesize the sentence i s issued by a controller which communicates with the senicnce synthesizer bv means of a communication link
3 The speech synthesis method of claim I w herein the sy nthesized speech corresponding to a word is produced by issuing an instruction from the sentence sy nthesizer to a w ord synthesizer
4 The speech synthesis method of claim I w herein the step of sy nthesizing a numeral further comprises
retrieving a word or words which represent a number or numbers to be sy nthesized. and issuing an instruction from the sentence sy nthesizer to a w ord sy nthesizer to sy nthesize that w ord or words
5 The speech synthesis method of claim I . wherein the control terms include instructions w hich determine the appropriate form for a word depending upon its context 6 The speech synthesis method of claim I w herein the control terms include instructions which determine whether a word to be synthesized is singular or plural
The speech sy nthesis method of claim I w herein the database contains sentences corresponding to a plurality of languages
8 A speech synthesis system capable of efficiently synthesizing sentences in multiple languages, comprising
a system controller which issues a single command to synthesize a particular sentence.
a speech synthesizer which receives the command issued by the controller, wherein the speech synthesizer includes a sentence synthesizer module which controls the production of the synthesized sentence and a word synthesizer module which produces synthesized speech corresponding to a desired word or group of words: and
a database which includes a sentence database and a word database and from which the sentence or word synthesizer retrieves a set of data corresponding to the sentence to be synthesized. wherein the set of data includes a string of data words corresponding to the words and numerical variables contained in the sentence, and further wherein, the data words can include control terms which incorporate grammar rules of the language being synthesized into the data words and determine the structure of the sentence being synthesized
9 The speech synthesis system of claim 8, wherein the database further comprises
a data table containing a word or words representing a number or numbers to be synthesized, wherein an entry in the data table is retrieved by the sentence synthesizer when a data word is a numerical variable.
10 The speech synthesis system of claim 8. wherein the sentence synthesizer further comprises a word decoder which reads each data word in the string of data words and determines if that data word is a word, a numerical variable, or a control term
1 1 The speech synthesis system of claim 8. wherein the database includes data w ords for more than one language.
12. The speech synthesis system of claim 1 1 . wherein the data words for different languages are stored in different parts of the database. 13 The speech synthesis system of claim 12. further c omprising
control means for selecting the part of the data base from which to retrieve the set of data corresponding to the sentence to be synthesized based on the language in w hich the sentence is to be synthesized
14 The speech synthesis system of claim 8. further comprising
a plurality of databases, w herein each database contains data w ords for a di f ferent language, and further, wherein the sentence and word synthesizer selects w hich database to retriev e the set of data corresponding to the sentence to be synthesized based on the language in which the sentence is to be synthesized.
PCT/US1995/013134 1994-10-14 1995-10-16 Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program WO1996012271A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP95937434A EP0734568A1 (en) 1994-10-14 1995-10-16 Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program
KR1019960703143A KR960706671A (en) 1994-10-14 1995-10-16 SPEECH SYNTHESIS APPARATUS AND METHOD FOR SYNTHESIZING A FINITE SET OF SENTENCES AND NUMBERS USING ONE PROGRAM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US32313694A 1994-10-14 1994-10-14
US08/323,136 1994-10-14

Publications (2)

Publication Number Publication Date
WO1996012271A1 true WO1996012271A1 (en) 1996-04-25
WO1996012271A9 WO1996012271A9 (en) 1996-07-04

Family

ID=23257867

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1995/013134 WO1996012271A1 (en) 1994-10-14 1995-10-16 Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program

Country Status (3)

Country Link
EP (1) EP0734568A1 (en)
KR (1) KR960706671A (en)
WO (1) WO1996012271A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997007499A2 (en) * 1995-08-14 1997-02-27 Philips Electronics N.V. A method and device for preparing and using diphones for multilingual text-to-speech generating
WO2001006489A1 (en) * 1999-07-21 2001-01-25 Lucent Technologies Inc. Improved text to speech conversion

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0251296A2 (en) * 1986-06-30 1988-01-07 Wang Laboratories Inc. Portable communication terminal for remote data query
EP0450533A2 (en) * 1990-03-31 1991-10-09 Gold Star Co. Ltd Speech synthesis by segmentation on linear formant transition region
EP0606520A2 (en) * 1993-01-15 1994-07-20 ALCATEL ITALIA S.p.A. Method of implementing intonation curves for vocal messages, and speech synthesis method and system using the same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0251296A2 (en) * 1986-06-30 1988-01-07 Wang Laboratories Inc. Portable communication terminal for remote data query
EP0450533A2 (en) * 1990-03-31 1991-10-09 Gold Star Co. Ltd Speech synthesis by segmentation on linear formant transition region
EP0606520A2 (en) * 1993-01-15 1994-07-20 ALCATEL ITALIA S.p.A. Method of implementing intonation curves for vocal messages, and speech synthesis method and system using the same

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997007499A2 (en) * 1995-08-14 1997-02-27 Philips Electronics N.V. A method and device for preparing and using diphones for multilingual text-to-speech generating
WO1997007499A3 (en) * 1995-08-14 1997-04-03 Philips Electronics Nv A method and device for preparing and using diphones for multilingual text-to-speech generating
WO2001006489A1 (en) * 1999-07-21 2001-01-25 Lucent Technologies Inc. Improved text to speech conversion

Also Published As

Publication number Publication date
KR960706671A (en) 1996-12-09
EP0734568A1 (en) 1996-10-02

Similar Documents

Publication Publication Date Title
CN111566656B (en) Speech translation method and system using multi-language text speech synthesis model
US6801897B2 (en) Method of providing concise forms of natural commands
US5878393A (en) High quality concatenative reading system
US8825486B2 (en) Method and apparatus for generating synthetic speech with contrastive stress
EP0140777B1 (en) Process for encoding speech and an apparatus for carrying out the process
KR900009170B1 (en) Synthesis-by-rule type synthesis system
US8775185B2 (en) Speech samples library for text-to-speech and methods and apparatus for generating and using same
US8914291B2 (en) Method and apparatus for generating synthetic speech with contrastive stress
JP5198046B2 (en) Voice processing apparatus and program thereof
US20040141597A1 (en) Method for enabling the voice interaction with a web page
US4455615A (en) Intonation-varying audio output device in electronic translator
EP0734568A1 (en) Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program
Olive A scheme for concatenating units for speech synthesis
WO1996012271A9 (en) Speech synthesis apparatus and method for synthesizing a finite set of sentences and numbers using one program
JP4649207B2 (en) A method of natural language recognition based on generated phrase structure grammar
van Leeuwen et al. Speech Maker: a flexible and general framework for text-to-speech synthesis, and its application to Dutch
Ouh-Young et al. A Chinese text-to-speech system based upon a syllable concatenation model
KR0175249B1 (en) How to process pronunciation of Korean sentences for speech synthesis
Tatham et al. Prosodic Assignment in Spruce Text to Speech Synthesis
KR100292376B1 (en) Device and method for converting sentence
Hertz et al. A look at the SRS synthesis rules for Japanese
Malcangi et al. Toward languageindependent text-to-speech synthesis
JPH01119822A (en) Sentence reader
Eady et al. Pitch assignment rules for speech synthesis by word concatenation
CN113889112A (en) On-line voice recognition method based on kaldi

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): DE KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

WWE Wipo information: entry into national phase

Ref document number: 1995937434

Country of ref document: EP

COP Corrected version of pamphlet

Free format text: PAGES 1-7,DESCRIPTION,REPLACED BY NEW PAGES 1-8;PAGES 8 AND 9,CLAIMS,REPLACED BY NEW PAGES 9 AND 10;PAGES 1/5-5/5,DRAWINGS,REPLACED BY NEW PAGES 1/4-4/4;DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 1995937434

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWW Wipo information: withdrawn in national office

Ref document number: 1995937434

Country of ref document: EP