EP0059880A2 - System zur Synthese der Sprache aus einem Text - Google Patents
System zur Synthese der Sprache aus einem Text Download PDFInfo
- Publication number
- EP0059880A2 EP0059880A2 EP82101379A EP82101379A EP0059880A2 EP 0059880 A2 EP0059880 A2 EP 0059880A2 EP 82101379 A EP82101379 A EP 82101379A EP 82101379 A EP82101379 A EP 82101379A EP 0059880 A2 EP0059880 A2 EP 0059880A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- allophone
- code
- digital signals
- allophonic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 21
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 20
- 238000009499 grossing Methods 0.000 claims abstract description 33
- 238000000034 method Methods 0.000 claims abstract description 24
- 230000002194 synthesizing effect Effects 0.000 claims description 16
- 230000007704 transition Effects 0.000 claims description 14
- 239000004065 semiconductor Substances 0.000 claims description 11
- 230000001174 ascending effect Effects 0.000 claims description 4
- 238000004519 manufacturing process Methods 0.000 claims description 2
- 230000004048 modification Effects 0.000 claims 1
- 238000012986 modification Methods 0.000 claims 1
- 238000010276 construction Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000003595 spectral effect Effects 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- MQJKPEGWNLWLTK-UHFFFAOYSA-N Dapsone Chemical compound C1=CC(N)=CC=C1S(=O)(=O)C1=CC=C(N)C=C1 MQJKPEGWNLWLTK-UHFFFAOYSA-N 0.000 description 1
- 206010071299 Slow speech Diseases 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
Definitions
- This invention pertains to an electronic text-to-speech synthesizing system and to an electronic speech producing system which may be included as a component thereof. More particularly, this invention concerns a text-to-speech synthesizing system which receives digital code such as ASCII representative of characters, determines an allophonic code for each incoming character set and sends such allophonic code to the speech producing system which decodes the allophonic code and assigns pitch for synthesizing, in a linear predictive coding speech synthesizer, speechlike sound, having unlimited vocabulary.
- digital code such as ASCII representative of characters
- Waveform encoding and parameter encoding generally categorize the prior art techniques.
- Waveform encoding includes uncompressed digital data-pulse code modulation (PCM), delta modulation (DM), continuous variable slope delta modulation (CVSD) and a technique developed by Mozer (see U.S. Patent No. 4,214,125).
- Parameter encoding includes channel vocoder, Formant synthesis, and linear predictive coding (LPC).
- PCM involves converting a speech signal into digital information using an A/D converter.
- Digital information is stored in memory and played back through a D/A converter through a lowpass filter, amplifier and speaker.
- the advantage of this approach is its simplicity.
- Both A/D converters and D/A converters are available and relatively inexpensive.
- the problem involved is the amount of data storage required. Assuming a maximum frequency of 4K Hz, and further assuming each speech sample being represented by 8 to 12 bits, one second of speech requires 64K to 96K bits of memory.
- DM is a technique for compressing the speech data by assuming that the analog-speech signal is either increasing or decreasing in amplitude.
- the speech signal is sampled at a rate of approximately 64,000 times per second. Each sample is then compared to the estimated value of the previous sample. If the first value is greater than the estimated value of the latter, then the slope of the signal generated by the model is positive. If not, the slope is then negative.
- the magnitude of the slope is chosen such that it is at least as large as the maximum expected slope of the signal.
- CVSD is a technique that is an extension of DM which is accomplished by allowing the slope of the generated signal to vary.
- the data rate in DM is typically in the order of 64K bits per second and in CVSD it is approximately 16K-32K bits per second.
- the Mozer technique takes advantage of the periodicity of voiced speech waveform and the perceptual insensitivity to the phase information of the speech signal. Compressing the information in the speech waveform requires phase- angle adjustment to obtain a time-symmetrical pitch waveform which makes one-half of the waveform redundant; half period zeroing to eliminate relatively low-power segments of the waveform; digital compression using DM and repetition of pitch periods to eliminate redundant (or similar) speech segments.
- the data rate of this technique is approximately 2.4K bits per second.
- speech characteristics other than the original speech waveform are used in the analysis and synthesis. These characteristics are used to control the synthesis model to create an output speech signal which is similar to the original.
- the commonly used techniques attempt to describe the spectral response, the spectral peaks or the vocal tract.
- the channel vocoder has a bank of band-pass filter which are designed so that the frequency range of the speech signal can be divided into relatively narrow frequency ranges. After the signal has been divided into the narrow bands the energy is detected and stored for each band.
- the production of the speech signal is accomplished by a bank of narrow band frequency generators, which correspond to the frequencies of the band-pass filters, controlled by pitch information extracted from the original speech signal.
- the signal amplitude of each of the frequency generators is determined by the energy of the original speech signal detected during the analysis.
- the data rate of the channel vocoder is typically in the order of 2.4K bits per second.
- the short time frequency spectrum is analyzed to the extent that the spectral shape is recreated using the formant center frequencies, their band-widths and the pitch period as the inputs.
- the formants are the peaks in a frequency spectrum envelope.
- the data rate for formant synthesis is typically 500 bits per second.
- Linear predicitve coding can best be described as a mathematical model of the human vocal tract.
- the parameters used to control the model represent the amount of energy delivered by the lungs (amplitude), the vibration of the vocal cords (pitch period and the voiced/unvoiced decision), and the shape of the vocal tract (reflection coefficients).
- LPC synthesis has been accomplished through computer simulation techniques. More recently, LPC synthesizers have been fabricated in a semiconductor, integrated circuit chip such as that described and claimed in United States Patent No. 4,209,836 entitled "Speech Synthesis Integrated Circuit Device" and assigned to the assignee of this invention.
- This invention is a combination of a speech construction technique and a speech synthesis technique.
- the prior art set out above involves synthesis techniques.
- the library of available components sounds includes phonemes, allophones, diphones, demisyllables, morphs and combinations of these sounds.
- Speech construction techniques involving phonemes are flexible techniques in the prior art. In English, there are 16 vovel phonemes and 24 consonant phonemes making a total of 40. Theoretically, any word or phrase desire should be capable of being constructed from.these phonemes. However, when each phoneme is actually pronounced there are many minor variations that may occur between sounds, which may in turn modify the pronunciation of the phoneme. This inaccuracy in representing sounds causes difficulty in understanding the resulting speech produced by the synthesis device.
- a diphone is defined as the sound that extends from the middle of one phoneme to the middle of the next phoneme. It is chosen as a component sound to reduce smoothing requirements between adjacent diphones.
- a large inventory of diphones is usually required. The storage requirement is in the order of 250K bytes, with a computer required to handle the construction program.
- Demisyllables have been used in the prior art as component sounds for speech construction.
- a syllable in any language may be divided into an initial demisyllable, final demisyllable and possible phonetic affixes.
- the initial demisyllable consists of any initial consonants and the transition into the vowel.
- the final demisyllable consists of the vowel and any co-final consonants.
- the phonetic affixes consist of all syllable-final non-core consonants.
- the prior art system requires a library of 841 initial and final demisyllables and 5 phonetic affixes.
- the memory requirement is in the order of 50K bytes.
- a morph is the smallest unit of sound that has a meaning.
- a dictionary of 12,000 morphs was used which required approximately 600K bytes of memory. The speech generated is intelligible and quite natural but the memory requirement is prohibitive.
- An allophone is a subset of a phoneme, which is modified by the environment in which it occurs. For example, the aspirated /p/ in "push” and the unaspirated /p/ in "Spain" are different allophones of the phoneme /p/. Thus, allophones are more accurate in representing sounds than phonemes. According to the present invention, 127 allophones are stored in 3,000 bytes of memory. The storage requirement is much less than the aforementioned system using diphones, demisyllables and morphs.
- Text-to-speech synthesizer systesms have been fabricated using phonemes and formant synthesis. This invention utilizes the flexibility of allophones coupled with LPC synthesis.
- digital information in the form of ASCII code is serially entered into the system.
- the ASCII code may be entered from a local or remote terminal, a keyboard, a computer, etc.
- the character code is received by a microcontroller which interrogates a set of rules located in a read-only memory (ROM) to get a match for a particular character set.
- the rules are made up of characters which are dependent upon neighboring characters for the selection of allophonic codes. Each character set is comapred with its appropriate rule character sets until a match is found.
- the information is set in the ROM in the form of ASCII code so that a direct comparison of ASCII code is made.
- the allophonic code corresponding to the matched allophone is retrieved.
- the allophonic code is presented to a speech producing system which synthesizes sound through the use of a digital semiconductor LPC synthesizer. It is to be understood, however, that other sound components such as the aformentioned phonemes, diphones, demisyllables and morphs in coded forms are also contemplated for use with this LPC synthesizer.
- the allophonic code in this preferred embodiment is contemplated for use in other digital synthesizers as well as the LPC synthesizer of this preferred embodiment.
- An allophone library is stored in a ROM.
- a microprocessor receives the allophonic code and addresses the ROM at the address corresponding to the particular allophonic code entered.
- An allophone, represented by its speech parameters, is retrieved from the ROM, followed by other allophones forming the words and phrases.
- a dedicated micro-controller is used for concatenating (stringing) the allophones to form the words and phrases.
- stringing allophones an interpolation frame of 25ms is created between allophones to smooth out sound transitions in LPC parameters. However, no interpolation is required when the voicing transition occurs. Energy is another parameter that must be smoothed.
- interpolation frames are usually created at both ends of the string with energy tapered toward zero.
- the smoothing technique described subsequently herein reduces the abrupt changes in sound which are usually perceived as pops, squeaks, squeals, etc.
- Stress and intonation greatly contribute to the perceptual naturalness and contextual meaning of constructive speech. Stress means the emphasis of a certain syllable within a word, whereas intonation applies to the overall up-and-down patterns of pitch within a multi-syllable word, phrase or sentence.
- the contextual meaning of a sentence may be changed completely by assigning stress and intonation differently. Therefore, English does not sound natural if it is randomly intoned.
- the stress and intonation patterns which are a part of the speech construction technique herein contribute to the understandability and naturalness of the resulting speech. Stress and intonation is based on gradient pitch control of the stressed syllables preceding the primary stress of the phrase.
- All the secondary stress syllables of the sentence are thought of as lying along a line of pitch values tangent to the line of the pitch values of the unstressed syllables.
- the unstressed syllables lie on a mid-level line of pitch, with the stress syllables lying on a downward slanted tangent to produce an overall down drift sensation.
- the user is required to mark stressed syllables in the allophonic code.
- the stressed syllables then become the anchor point of the pitch patterns.
- a microprocessor automatically assigns the appropriate pitch values to the allophones which have been strung.
- LPC parameters which have been strung together and designated in pitch as set out above.
- the LPC parameters are then sent to the speech synthesis device, which in this preferred embodiment is the device described in U.S. Patent No. 4,209/636 mentioned earlier and which is incorporated herein by reference.
- the smoothing mentioned above is accomplished by circuitry on the synthesizer chip. The smoothing could also be accomplished through the microprocessor.
- the principal object of this invention is to provide a text-to-speech system with a speech producing system as a component thereof that has unlimited vocabulary in any language.
- Another object of this invention is to provide a text-to- speech system which is low cost in terms of storage and yet provides understandable synthetic speech.
- a further object of this invention is to provide a stress and intonation pattern to the input textual material so that the pitch is adjusted automatically according to a natural sounding intonation pattern at the output.
- An all encompassing object of this invention is to provide a highly flexible, low cost text-to-speech system with the advantages of unlimited vocabulary and good speech quality.
- Figure 1 illustrates the text-tospeech system 10 having a 420 rules processor 17 with a digital character input (ASCII) for comparison to the rules 16 which are stored in a ROM.
- the 420 rules processor 11 is a Texas Instruments Incorporated Type TMC0420 microcomputer described in detail in appendix A which includes 26 sheets of specification and 9 sheets of drawings.
- the rules ROM 16 is a Texas Instruments Type TMS6100 (TMC350) voice synthesis memory which is a ROM internally organized as 16Kx8 bits.
- the allophonic code retrieved from rules ROM 16 is entered in the system 420 microprocessor 11 which is connected to control the stringer controller 13 ' and synthesizer 14. Allophone library 12 is accessed through the stringer controller 13. The output of synthesizer 14 is through speaker 15 which produces speech-like sounds in response to the input allophonic code.
- the 350 stringer controller 13 is a Texas Instruments T M C0356, which is described in detail in Appendix B which comprises 21 specification sheets, and 11 sheets of drawings.
- Allophone library 12 is a Texas Instruments Type TM S 6100 also. It may or may not be included because the 356 stringer controller 13 has an internal ROM which may be used to contain the library.
- the 420 system microprocessor 11 also is a Type TMC0420 microcomputer. Appendices A and B are enclosed herewith and incorporated by reference.
- Synthesizer 14 is fully described in previously mentioned United States Patent No. 4,209,836. However, in addition, 236 synthesizer 14 has the facility for selectively smoothing between allophones and has circuitry for providing a selection of speech rate which is not part of this invention.
- Figures 2a-2p set out the allophone rules.
- a rules AW]b in the allophonic code /AW3/ which is pronounced as the "a” in "saw”.
- A sounds are categorized in one group followed by the "B” sounds, etc. These are listed as “A” rules, "B” rules, "C” rules and so on.
- Figures 3a-3f form the flowchart detailing the operation of the 420 rules processor 17 in searching the rules ROM 16 for each of the incoming digital characters. The appropriate allophonic code is retrieved and stress is assigned.
- Fig. 3a the system is initialized, and the rule file is opened.
- the 420 rules processor 17 is thereby instructed to read information from the rules ROM 16 and to do the matching.
- the first character input (in ASCII) code in this preferred embodiment is shifted to the right and then the first character is skipped.
- the first character is a space because of the shift to the right and is skipped so that when a comparison is made, it is noted that the neighboring character to the left of the next character is a space and the proper allophonic code can be assigend.
- the next character is read and the question "end of text?" is asked. If the answer is yes, the routine goes to "STRESS" on Figure 3b.
- Each rule contains the ASCII characters set to define an allophone and the corresponding allophonic code.
- the allophonic code is read out to "STRESS" of Figure 3b. Fig. 3b and the next character is obtained.
- a pointer receives the beginning of the display buffer. The pointer also gets the beginning of the allophone buffer. Then the question"?" is asked. If the answer is true, "?" is deleted from queue 1 and it is determined whether the allophone starts with "wh”. If the answer is true, then a question bit flag is set. If the answer is "no”, the question bit flag is cleared. Then a reset word/ phrase bit is set, and a reset allophone/allophone-bit flag is reset, followed by beginning of allophone buffer sent to the pointer. Figure 3b, it is seen, is dedicated to concatenating the flags.
- the pointer receives the beginning of the allophone buffer and the primary receives a 1 with the vowel receiving a O in an initialization process.
- the primary is increased by the sum of primary+vowel to determine which vowel gets the primary stress. If it is determined that there is no " ⁇ ", then no primary stress is indicated and it is determined whether the allophone is the end of frame. If the answer is false, the pointer is incremented and the routine shown in Fig. b is repeated. If it is an end of frame, then the primary is reset to O and it is determined whether the last vowel receives the primary stress. If the answer is yes, then a vowel bit flag is set. If the answer is no, the vowel bit flag is not set. In either event, the information thus derived (overhead) is sent to queue 2 which is the speaking queue. Next, the pointer is set to the beginning of the allophone buffer.
- the secondary bit flag is initialized and then, in Figure 3f, it is determined whether the allophone is a "-", indicating a secondary stress. If the answer is true, then the "-" must be removed from queue 1 and the pointer is indexed. Next it is determined whether the following allophone is a ">", indicating that the next vowel is to receive the secondary stress. If the answer is true, then the code for ">" must be deleted from queue 1 and the secondary flag is incremented by 1 and the question whether a skip is to be performed is again asked. If there is no skip, then it is determined whether the allophone is a vowel. If the answer is false, the pointer is incremented by 1 until a vowel is reached.
- the secondary stress flag is decremented by 1 and the question is asked whether the secondary is now equal to O. If the answer is true, a secondary stress flag is set as indicated on Figure 3e. If the answer is false, the pointer is incremented.
- allophone buffer is down loaded to queue 2, the speaking queue.
- FIG 4 is a block diagram of the speech producing system which has been described in association with Figure 1.
- Figures 5a through 5c illustrate the allophones within the allophone library 12.
- allophone 18 is coded within ROM 12 as "AW3" which is pronounced as the "a” in the word “saw”.
- Allophone 80 is set in the ROM 12 as code corresponding to allophone "GG” which is pronounced as the "g” in the word “bag”. Pronunciation is given for all of the allophones stored in the allophone library 12.
- Each allophone is made up of as many as 10 frames, the frames varying from four bits for a zero energy frame, to ten bits for a "repeat frame” to 28 bits for a "unvoiced frame” to 49 bits for a "voice frame”.
- Fig. 3 illustrates this frame structure. A detailed description is present in previously mentioned United States Patent No. 4,209,836.
- the number of frames in a given allophone is determined by a well-known LPC analysis of a speaker's voice. That is, the analysis provides the breakdown of the frames required, the energy for each frame, and the reflection coefficients for each frame. This information is stored then to represent the allophone sounds set out in Figs. 5a-5c.
- SLOW D is present only when the last frame in an allophone is indicated by a single bit in the frame.
- the actual interpolation (smoothing) circuitry and its operation is described in detail in U.S. Patent No. 4,209,836.
- the primary stress to be given is sent, followed by the information as to which vowel is the last one in the word. Finally, a send 2 is called to send the entire 8 bits (7 bits allophone, 1 bit stress flag). It should be noted that the previous send routine involved sending only 4 bits.
- a send 2 flag is set and a status command is sent to the 356 stringer 13. Then, if the 356 FIFO is ready to receive information, the FIFO is loaded.
- an execute command is sent to the 356 stringer 13 after which a status command is sent. If the 356 stringer 13 is ready, a speak command is given. If it is not ready, the status command is again sent until the stringer 13 is ready. Then the allophone is sent and the countdown register containing the number of allophones is decremented. If the countdown equals zero, the routine is again started at word/phrase. If the countdown is not equal to zero, then the send 2 routine is again called and the next allophone is brought with the procedure being repeated until the entire word has been completed.
- Figure 9a-9i form a flowchart of the details of the control of the action of the 356 stringer 13 on the allophones. Beginning in Figure 9a, the starting point is to "read an allophone address” and then to "read a frame of allophone speech data”. On path 31 to Figure 9b, a decision block inquiring "first frame of the allophone" is reached. If the answer is “yes”, then it is necessary to decode the flags F1-F5. If the answer is "no”, then it is necessary to only decode flags F3, F4 and F5. As indicated above, flags F1 and F2 determine the nature of the allophone and need not be further decoded.
- the text-to-speech system accepts ASCII code, looks up the appropriate allophonic code in the allophone rules, and assigns stress and pitch.
- the allophonic code is then received through the 420 microprocessor 11 shown in Figure 1.
- the code received is related to an address in the allophone library 12.
- the code is sent by the 420 microprocessor 11 to 356 stringer 13 where the address is read and the allophone is brought out when handled as indicated in Figures 9a-9i.
- the basic control by the 420 microprocessor 11 in causing the action by the 356 stringer 13 is shown in Figures 8a-8b.
- the 286 synthesizer 14 receives the allophone parameters from the 356 stringer 13 and forms an analog signal representative of the allophone to the speakter 15 which then provides speech-like sound.
- the inventive speech producing system in its preferred embodiment, describes an LPC synthesizer on an integrated circuit chip with LPC parameter inputs provided through allophones read from the allophonic library. It is of course contemplated that other waveform encoding types of code inputs may be used as inputs to a speech synthesizer. Also, the specific implementation shown herein is not to be considered as limiting. For example, a single computer could be used for the functions of the microcomputer, the allophone library, and the stringer of this invention without departing from its scope.. The breadth and scope of this invention is limited only by the appended claims.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
- Document Processing Apparatus (AREA)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US240693 | 1981-03-05 | ||
US06/240,694 US4685135A (en) | 1981-03-05 | 1981-03-05 | Text-to-speech synthesis system |
US240694 | 1981-03-05 | ||
US06/240,693 US4398059A (en) | 1981-03-05 | 1981-03-05 | Speech producing system |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0059880A2 true EP0059880A2 (de) | 1982-09-15 |
EP0059880A3 EP0059880A3 (de) | 1984-09-19 |
Family
ID=26933628
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP82101379A Withdrawn EP0059880A3 (de) | 1981-03-05 | 1982-02-24 | System zur Synthese der Sprache aus einem Text |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP0059880A3 (de) |
Cited By (63)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2553555A1 (fr) * | 1983-10-14 | 1985-04-19 | Texas Instruments France | Procede de codage de la parole et dispositif pour sa mise en oeuvre |
US4659877A (en) * | 1983-11-16 | 1987-04-21 | Speech Plus, Inc. | Verbal computer terminal system |
US4716583A (en) * | 1983-11-16 | 1987-12-29 | Speech Plus, Inc. | Verbal computer terminal system |
EP0606520A2 (de) * | 1993-01-15 | 1994-07-20 | ALCATEL ITALIA S.p.A. | Verfahren zur Realisierung von Tonkurven für Sprachnachrichten und Verfahren zur Sprachsynthese und Einrichtung zu seiner Anwendung |
WO1994017516A1 (en) * | 1993-01-21 | 1994-08-04 | Apple Computer, Inc. | Intonation adjustment in text-to-speech systems |
GB2296846A (en) * | 1995-01-07 | 1996-07-10 | Ibm | Synthesising speech from text |
US5806035A (en) * | 1995-05-17 | 1998-09-08 | U.S. Philips Corporation | Traffic information apparatus synthesizing voice messages by interpreting spoken element code type identifiers and codes in message representation |
US5970456A (en) * | 1995-04-20 | 1999-10-19 | Mannesman Vdo Ag | Traffic information apparatus comprising a message memory and a speech synthesizer |
RU2460154C1 (ru) * | 2011-06-15 | 2012-08-27 | Александр Юрьевич Бредихин | Способ автоматизированной обработки текста и компьютерное устройство для реализации этого способа |
US9412392B2 (en) | 2008-10-02 | 2016-08-09 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9548050B2 (en) | 2010-01-18 | 2017-01-17 | Apple Inc. | Intelligent automated assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US9668024B2 (en) | 2014-06-30 | 2017-05-30 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10102359B2 (en) | 2011-03-21 | 2018-10-16 | Apple Inc. | Device access using voice authentication |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0059318A1 (de) * | 1981-03-03 | 1982-09-08 | Texas Instruments Incorporated | Streifenkodeleser für Sprachsynthesesystem |
-
1982
- 1982-02-24 EP EP82101379A patent/EP0059880A3/de not_active Withdrawn
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0059318A1 (de) * | 1981-03-03 | 1982-09-08 | Texas Instruments Incorporated | Streifenkodeleser für Sprachsynthesesystem |
Non-Patent Citations (7)
Title |
---|
COMPCON 81, 22nd COMPUTER SOCIETY INTERNATIONAL CONFERENCE, 23rd-26th February 1981, San Francisco, pages 141-144, IEEE, New York, US R.H. WIGGINS: "Application of LSI to speech synthesis" * |
EDN ELECTRICAL DESIGN NEWS, vol. 25, no. 14, August 1980, pages 99-103, Denver, US E. TEJA: "Versatile voice output demands sophisticated software" * |
ELECTRONIC DESIGN, vol. 29, no. 13, June 1981, pages 121-127, Waseca, US W. SMITH et al.: "Phonemes, allophones, and LPC team to synthesize speech" * |
ELECTRONICS, vol. 53, no. 3, 10th February 1981, pages 122-125, New York, US KUN-SHAN LIN et al.: "Software rules give personal computer real word power" * |
ICASSP79, 1979 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 2nd-4th April 1979, Washington, pages 880-883, IEEE, New York, US E. VIVALDA et al.: "Real time text processing for Italian speech synthesis" * |
IEEE TRANSACTIONS ON CUNSUMER ELECTRONICS, vol. CE-27, no. 2, May 1981, 10th ANNUAL IEEE CHICAGO FALL CONFERENCE ON CONSUMER ELECTRONICS, 10th-11th November 1980, Des Plaines, Illinois, pages 144-152, IEEE, New York, US * |
THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, vol. 54, no. 1, July 1973, page 339, abstract NN6, New York, US G.M. KUHN: "Two-pass procedure for synthesis by rule" * |
Cited By (81)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2553555A1 (fr) * | 1983-10-14 | 1985-04-19 | Texas Instruments France | Procede de codage de la parole et dispositif pour sa mise en oeuvre |
EP0140777A1 (de) * | 1983-10-14 | 1985-05-08 | TEXAS INSTRUMENTS FRANCE Société dite: | Verfahren zur Codierung von Sprache und Einrichtung zur Durchführung des Verfahrens |
US4912768A (en) * | 1983-10-14 | 1990-03-27 | Texas Instruments Incorporated | Speech encoding process combining written and spoken message codes |
US4659877A (en) * | 1983-11-16 | 1987-04-21 | Speech Plus, Inc. | Verbal computer terminal system |
US4716583A (en) * | 1983-11-16 | 1987-12-29 | Speech Plus, Inc. | Verbal computer terminal system |
EP0606520A2 (de) * | 1993-01-15 | 1994-07-20 | ALCATEL ITALIA S.p.A. | Verfahren zur Realisierung von Tonkurven für Sprachnachrichten und Verfahren zur Sprachsynthese und Einrichtung zu seiner Anwendung |
EP0606520A3 (de) * | 1993-01-15 | 1994-12-28 | Alcatel Italia | Verfahren zur Realisierung von Tonkurven für Sprachnachrichten und Verfahren zur Sprachsynthese und Einrichtung zu seiner Anwendung. |
WO1994017516A1 (en) * | 1993-01-21 | 1994-08-04 | Apple Computer, Inc. | Intonation adjustment in text-to-speech systems |
GB2296846A (en) * | 1995-01-07 | 1996-07-10 | Ibm | Synthesising speech from text |
US5970456A (en) * | 1995-04-20 | 1999-10-19 | Mannesman Vdo Ag | Traffic information apparatus comprising a message memory and a speech synthesizer |
US5806035A (en) * | 1995-05-17 | 1998-09-08 | U.S. Philips Corporation | Traffic information apparatus synthesizing voice messages by interpreting spoken element code type identifiers and codes in message representation |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9412392B2 (en) | 2008-10-02 | 2016-08-09 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US9548050B2 (en) | 2010-01-18 | 2017-01-17 | Apple Inc. | Intelligent automated assistant |
US10049675B2 (en) | 2010-02-25 | 2018-08-14 | Apple Inc. | User profiling for voice input processing |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US10102359B2 (en) | 2011-03-21 | 2018-10-16 | Apple Inc. | Device access using voice authentication |
RU2460154C1 (ru) * | 2011-06-15 | 2012-08-27 | Александр Юрьевич Бредихин | Способ автоматизированной обработки текста и компьютерное устройство для реализации этого способа |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US9668024B2 (en) | 2014-06-30 | 2017-05-30 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
Also Published As
Publication number | Publication date |
---|---|
EP0059880A3 (de) | 1984-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4685135A (en) | Text-to-speech synthesis system | |
EP0059880A2 (de) | System zur Synthese der Sprache aus einem Text | |
US4398059A (en) | Speech producing system | |
US5524172A (en) | Processing device for speech synthesis by addition of overlapping wave forms | |
US7010488B2 (en) | System and method for compressing concatenative acoustic inventories for speech synthesis | |
Syrdal et al. | Applied speech technology | |
WO1990009657A1 (en) | Text to speech synthesis system and method using context dependent vowell allophones | |
JPH031200A (ja) | 規則型音声合成装置 | |
Lerner | Computers: Products that talk: Speech-synthesis devices are being incorporated into dozens of products as difficult technical problems are solved | |
Venkatagiri et al. | Digital speech synthesis: Tutorial | |
O'Shaughnessy | Design of a real-time French text-to-speech system | |
d’Alessandro et al. | The speech conductor: gestural control of speech synthesis | |
JPS5972494A (ja) | 規則合成方式 | |
Lukaszewicz et al. | Microphonemic method of speech synthesis | |
Datta et al. | Epoch Synchronous Overlap Add (ESOLA) | |
Eady et al. | Pitch assignment rules for speech synthesis by word concatenation | |
JPS5880699A (ja) | 音声合成方式 | |
JPS6187199A (ja) | 音声分析合成装置 | |
EP0681729B1 (de) | System zur sprachsynthese und spracherkennung | |
Chowdhury | Concatenative Text-to-speech synthesis: A study on standard colloquial bengali | |
JPH01321496A (ja) | 音声合成装置 | |
JPH0990987A (ja) | 音声合成方法及び装置 | |
KR920003934B1 (ko) | 음성합성기의 복합코딩방법 | |
Yazu et al. | The speech synthesis system for an unlimited Japanese vocabulary | |
JPH0572599B2 (de) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Designated state(s): DE FR GB IT NL |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
AK | Designated contracting states |
Designated state(s): DE FR GB IT NL |
|
17P | Request for examination filed |
Effective date: 19850314 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 19860523 |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: FRANTZ, GENE A. Inventor name: GOUDIE, KATHLEEN M. Inventor name: LIN, KUN-SHAN |