EP0886853A1 - Auf mikrosegmenten basierendes sprachsyntheseverfahren - Google Patents
Auf mikrosegmenten basierendes sprachsyntheseverfahrenInfo
- Publication number
- EP0886853A1 EP0886853A1 EP97917259A EP97917259A EP0886853A1 EP 0886853 A1 EP0886853 A1 EP 0886853A1 EP 97917259 A EP97917259 A EP 97917259A EP 97917259 A EP97917259 A EP 97917259A EP 0886853 A1 EP0886853 A1 EP 0886853A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- vowel
- speech
- segments
- microsegments
- synthesis method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003786 synthesis reaction Methods 0.000 title abstract description 33
- 238000000034 method Methods 0.000 title abstract description 26
- 230000008569 process Effects 0.000 title abstract description 15
- 230000007704 transition Effects 0.000 claims abstract description 24
- 238000001308 synthesis method Methods 0.000 claims description 19
- 230000009467 reduction Effects 0.000 claims description 11
- 230000008859 change Effects 0.000 claims description 8
- 238000004904 shortening Methods 0.000 claims description 6
- 210000001260 vocal cord Anatomy 0.000 claims description 4
- 238000012986 modification Methods 0.000 claims description 3
- 230000004048 modification Effects 0.000 claims description 3
- 230000010355 oscillation Effects 0.000 claims description 2
- 230000015572 biosynthetic process Effects 0.000 abstract description 33
- 238000003860 storage Methods 0.000 description 16
- 230000015654 memory Effects 0.000 description 12
- MQJKPEGWNLWLTK-UHFFFAOYSA-N Dapsone Chemical compound C1=CC(N)=CC=C1S(=O)(=O)C1=CC=C(N)C=C1 MQJKPEGWNLWLTK-UHFFFAOYSA-N 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 5
- 238000005520 cutting process Methods 0.000 description 4
- 230000005284 excitation Effects 0.000 description 4
- 210000000056 organ Anatomy 0.000 description 4
- 238000009826 distribution Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000001944 accentuation Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000005336 cracking Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 210000004704 glottis Anatomy 0.000 description 2
- 210000004283 incisor Anatomy 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 230000000630 rising effect Effects 0.000 description 2
- 210000005182 tip of the tongue Anatomy 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 125000000349 (Z)-3-carboxyprop-2-enoyl group Chemical group O=C([*])/C([H])=C([H])\C(O[H])=O 0.000 description 1
- 241000238557 Decapoda Species 0.000 description 1
- 125000003580 L-valyl group Chemical group [H]N([H])[C@]([H])(C(=O)[*])C(C([H])([H])[H])(C([H])([H])[H])[H] 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009172 bursting Effects 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 210000004268 dentin Anatomy 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 210000001983 hard palate Anatomy 0.000 description 1
- 201000000615 hard palate cancer Diseases 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 230000000414 obstructive effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 210000002640 perineum Anatomy 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 210000001584 soft palate Anatomy 0.000 description 1
- 239000000829 suppository Substances 0.000 description 1
- 238000010189 synthetic method Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000003936 working memory Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
- G10L13/07—Concatenation rules
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
Definitions
- the invention relates to a digital speech synthesis method according to the preamble of claim 1.
- the acoustic conditions in the attachment tube are modeled so that the articulatory positions and movements are simulated mathematically when speaking.
- An acoustic model of the extension tube is therefore calculated, which leads to a considerable computing effort and requires a large computing capacity. Nevertheless, the automatically generated language sounds unnatural and technical.
- the concatenation synthesis is known, in which parts of utterances spoken are chained in such a way that new utterances arise.
- the individual parts of the language thus form the building blocks for the generation of language.
- the size of the parts can range from words and phrases to sections of sounds. For the artificial generation of speech with unlimited vocabulary, half-syllables or smaller are available as units
- Cutouts Larger units only make sense if a limited vocabulary is to be synthesized.
- concatenation synthesis uses signal modules that range from the middle of an acoustically defined speech to the middle of the next speech. This takes into account the perceptually important transitions from one sound to another, which occur as an acoustic consequence of the movements of the speech organs in the speech signal.
- the signal modules are joined together at spectrally relatively constant locations, which is what the potentially available
- Triphone and half-syllable synthesis are based on a principle similar to that of diphone synthesis.
- the cutting point is in the middle of the lute.
- larger units are covered, which means that larger phonetic contexts can be taken into account.
- the number of combinations increases proportionally.
- a cutting point for the units used lies in the middle of the vowel of a syllable.
- the other cutting point is at the beginning or end of a syllable, which means that, depending on the structure of the syllable, sequences of several consonants are also recorded in one language element.
- Speech synthesis system known in which parts of diphones are used for several sounds. There, a speech synthesizer is described which stores standardized speech signal forms which are generated by dividing a double sound and equates them to certain expression symbols. A synthesizer reads the unit speech waveforms according to the Output symbols of the converted sequence of expression symbols from the memory.
- unit speech waveforms are either connected directly if the input speech portion of the input characters is unvoiced, or a predetermined first interpolation method is used if the input speech portion of the input times is voiced, where the same unit waveform is used for both a voiced / g, d, b / and its corresponding unvoiced / k, t, p / sound.
- unit speech waveforms are also to be stored in the memory, which represent the vowel part following a consonant or the vowel part preceding a consonant.
- transition areas from a consonant to a vowel or from a vowel to a consonant can be set equal for the consonants k and g, t and d as well as p and b.
- the storage space requirement is thus reduced, but the specified interpolation process requires a not inconsiderable computing effort.
- each phoneme is formed by phoneme elements stored in a memory, periods of sound vibrations being obtained from natural speech or being artificially synthesized.
- the text to be synthesized is analyzed sentence by sentence grammatically and phonetically according to the rules of language.
- each phoneme is compared to certain types and a number of time segments of noise phonemes with the corresponding duration, amplitudes and spectral distribution. posed.
- the periods of the sound vibrations and the elements of the noise phonemes are stored in digital form as a result of the amplitude values of the corresponding vibration and are changed during the reading process in accordance with the frequency characteristics and to achieve the naturalness of the speech.
- Speech segments that represent phonemes or transitions are generated from synthetic waveforms that are reproduced in a predetermined manner several times, possibly shortened in length and / or reproduced in a voiced manner.
- synthetic waveforms that are reproduced in a predetermined manner several times, possibly shortened in length and / or reproduced in a voiced manner.
- use is made of an inverted reproduction of certain time series. It is also disadvantageous here that considerable storage capacity is required due to extensive analysis and synthesis processes, with a considerably reduced storage space requirement.
- speech reproduction lacks the natural variance.
- Segments for quasi-stationary vowel parts These segments are separated from the middle of long vowel realizations, which are perceived relatively constant in sound. They are used in different text positions or contexts, for example at the beginning of the word, after the semi-vowel segments that follow certain consonants or consonant sequences, in German for example after / h /, / j / and /? /, For the final stretch, between Not diphthongic vowel-vowel sequences and in diphthongs as start and end positions.
- consonant segments are formed in such a way that, regardless of the type of neighboring sounds, they can be used for several occurrences of the sound either generally or, as with plosives, in the context of certain sound groups.
- micro-segments broken down into three categories can be used several times in different phonetic contexts. This means that in the case of sound transitions, the perceptually important transitions from one sound to the other are taken into account without the need for separate acoustic segments for each of the possible connections between two speech sounds.
- the division into microsegments according to the invention which divides a sound transition, enables the use of identical segments for different sound transitions for a group of consonants. With this principle of generalization when using speech signal modules, the memory space required for storing the speech signal modules is reduced. Nevertheless, the quality of the synthetically output speech is very good due to the consideration of the perceptually important sound transitions.
- the language segments for Vowels allow multiple use of the microsegments for different phonetic contexts and thus achieve a significant reduction in storage space.
- segments for quasi-stationary vowel parts are intended for vowels at the beginning of words and vowel-vowel sequences, a significant improvement in the sound of the synthetic speech for word beginnings, diphthongs or vowel-vowel sequences is achieved with a small number of additional microsegments.
- consonant segments for plosives are divided into two microsegments, a first segment which comprises the closing phase and a second segment which comprises the solution phase, a further generalization of the speech segments is achieved.
- the closure phase for all plosives can be represented by a time series of zeros. No storage space is therefore required for this part of the sound reproduction.
- the solution phase of the plosive is differentiated according to the sound that follows in the context.
- a further generalization can be achieved in that when solving for vowels only after the following four vowel groups - front, unrounded vowels; front, rounded
- Vowels deep or centralized vowels and rear, rounded vowels - and in the case of a solution to consonants, a distinction is only made according to three different articulation points, labial, alveolar or velar, so that, for example, for the German language 42 micro-segments for the six plosives / p, t, k, b, d, g / zu three consonant groups according to the articulation point and four vowel groups must be saved. This further reduces the storage space requirement due to the multiple use of the microsegments for different phonetic contexts.
- the start is advantageous for a vowel segment that runs from one articulation point to the middle of the vowel, and for a vowel segment that runs from the middle of the vowel to the following articulation point
- Target position always reached while the movement to or from the "vocal center” is shortened.
- Such a shortening of the microsegment reproduces, for example, unstressed syllables, the deviations from the spectral target quality of the respective vowel to be found in natural, flowing speech being reproduced, thus increasing the naturalness of the synthesis. It is also advantageous that no further memory space requirement corresponding to the segment is required for such linguistic modifications of segments already stored.
- language pauses can be recognized with the analysis on the text to be output as speech.
- the phoneme chain is supplemented with a break symbol to form a symbol chain, digital zeros being inserted in the time series signal when the microsegments are lined up on the break symbols.
- the additional information about a break point and its break duration is determined on the basis of the sentence structure and predetermined rules.
- the pause duration is realized by the number of digital zeros to be inserted depending on the sampling rate.
- Strain symbols is supplemented to form a symbol chain, whereby when the microsegments are lined up, the microsegments experience an extended playing time in the time range corresponding to the symbols, a phrase-final stretch can be simulated in synthetic speech reproduction. This manipulation in the time domain is carried out on the microsegments already assigned. There is therefore no need for additional language modules for realizing final expansions, which keeps the space requirement low.
- Both the length of play for phrase-final syllables and the different reduction levels for stresses can preferably be achieved with the same reduction levels in the microsegments.
- the end syllables of phrases namely of language units, which are noted in the written language with the punctuation marks comma, semicolon, period and colon, for example, become a progressive extension the playing time provided. This is achieved by increasing the playing time of the Microsegments in the phrase-final syllables from the second microsegment by one level each.
- the range of values for the expansion levels goes from 1-6, whereby larger numbers correspond to a longer duration.
- The% symbol does not change the roof.
- Intonation symbols is supplemented to form a symbol chain, whereby when the micro-segments are lined up on the intonation symbols, a change in the fundamental frequency of certain parts of the periods of micro-segments is carried out in the time domain, the melody of linguistic utterances is simulated.
- the fundamental frequency change is preferably carried out by skipping and adding certain samples. For this, the voiced micro-segments, i.e. Vowels and sonorants, marked. Each period is automatically treated separately with the spectrally important first part, in which the vocal folds are closed, and the less important second part, in which the vocal folds are open.
- the markings are set in such a way that only the spectrally non-critical second parts of each period are shortened or lengthened to change the fundamental frequency when the signal is output. This does not significantly increase the storage space required to simulate intonations during speech output and the computing effort due to the manipulation in the time domain is kept low.
- microsegments When chaining different microsegments together for speech synthesis a largely interference-free acoustic transition between successive microsegments is achieved in that the microsegments begin with the first sample value after the first positive zero crossing, ie a zero crossing with a positive signal increase, and with the last sample value before the last positive one End zero crossing.
- the digitally stored time series of the microsegments are thus strung together almost continuously. This prevents cracking noises due to digital jumps.
- closure phases of plosives or word breaks and general speech pauses represented by digital zeros can be inserted essentially continuously at any time.
- Fig. 2 is a spectrogram and time signal of the word
- the input for the speech synthesis system is a text, for example a text file.
- the words of the text are assigned a phoneme chain which represents the pronunciation of the respective word by means of a lexicon stored in the computer.
- new words are often formed by combining words and parts of words, for example with prefixes and suffixes.
- the pronunciation of words such as "house building”, “development”, “buildable” etc. can be derived from a stem, here "building”, and combined with the pronunciation of the prefixes and suffixes.
- the syntactic-semantic analysis is shown in FIG. 1 under the phoneme chain generated as shown above.
- the phoneme chain which comes from the pronunciation information of the lexicon, is modified and additional information about the pause duration and pitch values of the microsegments is inserted.
- a phoneme-based, prosodically differentiated arises Symbol chain that provides the input for the actual speech output.
- the syntactic semantic analysis takes into account word accents, phrase boundaries and intonation.
- the gradations of the emphasis of syllables within a word are marked in the lexicon entries.
- the emphasis levels are thus specified for the reproduction of the microsegments forming this word.
- the stress level of the microsegment of a syllable results from:
- the phonological length of a sound which is designated for each phoneme, for example / e: / for long ⁇ e 'in / fo'ne: tIK /,
- the phrase boundaries at which the final phrase expansion takes place in addition to certain intonational courses are determined by linguistic analysis.
- the sequence of phrases is used to determine the limit of phrases using predefined rules.
- the implementation of the intonation is based on an intonation and pause description system, in which between intonation courses that take place at phrase boundaries (rising, falling, constant, falling-rising) and those that are localized by accents (low, high, rising, falling) is distinguished.
- the assignment of the Intonation processes are based on the syntactic and morphological analysis with the inclusion of certain key words and characters in the text.
- questions with bursting (recognizable by the question mark at the end and the information that the first word of the sentence is a finite verb) have a low accent tone and a high-pitched border tone.
- Normal statements have a high accent tone and a falling final phrase limit.
- the course of the intonation is generated according to predefined rules.
- the phoneme-based symbol chain is converted into a micro-segment sequence for the actual speech output.
- the conversion of a sequence of two phonemes into microsegment sequences takes place via a rule set in which a sequence of microsegments is assigned to each phoneme sequence.
- microsegments specified by the microsegment chain When the successive microsegments specified by the microsegment chain are lined up, the additional information about stress, pause duration, final stretch and intonation is taken into account.
- the microsegment sequence is only modified in the time domain.
- a speech pause is implemented, for example, by inserting digital zeros at the point marked by a corresponding pause symbol.
- the voice output then takes place by digital / analog conversion of the manipulated time series signal, for example via one arranged in the computer "Soundblaster" card.
- Fig. 2 shows a spectrogram in the upper part and the associated time signal for the word example "phonetics" in the lower part.
- the word "phonetics” is represented in symbols as a phoneme sequence between slashes as follows / fone: tIk /.
- This phoneme sequence is plotted on the abscissa representing the time axis in the upper part of FIG. 2.
- the ordinate of the spectrogram of FIG. 2 denotes the frequency content of the speech signal, the degree of blackening to
- Amplitude of the corresponding frequency is proportional.
- the ordinate corresponds to the instantaneous amplitude of the signal.
- the micro-segment boundaries are shown in the middle field with vertical lines.
- the letter abbreviations given therein indicate the designation or symbolization of the respective microsegment.
- the example word "phonetics" thus consists of twelve microsegments.
- the names of the microsegments are chosen so that the sounds outside the brackets indicate the context, the sounding sound being given in the brackets.
- the context-dependent transitions of the speech sounds are thus taken into account.
- the consonant segments ... (f) and (n) e are segmented at the respective sound boundary.
- the plosives / t / and / k / are in a closure phase (t (t) and k (k)), which is digitally simulated by zeroed samples and is used for all plosives, and a short solution phase (here: (t ) I and (k) 7), which is context sensitive, divided.
- the vowels are each divided into vowel halves, the intersection points being at the beginning and in the middle of the vowel.
- FIG. 3 shows another example of a word " womanizer" in the time domain.
- the phoneme sequence is specified with / fraU @ nhElt /.
- the word shown in FIG. 3 comprises 15 microsegments, with quasi-stationary microsegments also occurring here.
- the first two microsegments ... (f) and (r) a are consonant segments whose context is only specified on one side. After the semi-vowel r (a), the one
- aU contains the perceptually important transition between the start and the target position u (U).
- U contains the transition from / U / to l®l, which should normally be followed by @ (@). This would cause / @ / to take too long, so that this segment is omitted from / @ / and / 6 / for long-term reasons and only the second vowel half (@) n is played.
- h represents a consonant segment. The transition from consonants to / h / - unlike vowels - is not specified.
- E contains the breathed portion of the vowel / E / followed by the quasi-stationary E (E).
- E) l contains the second vowel half of / E / with the transition to the dental articulation point.
- E (l) is a consonant microsegment in which only the precontext is specified.
- the / t / is divided into a closure phase t (t) and a solution phase (t) ... which goes to silence
- the large number of possible articulation points is based on three essential areas limited.
- the grouping is based on the similar movements carried out by the articulators to form the sounds. Because of the comparable articulator movements, the spectral transitions between the sounds are similar within the three groups listed in Table 1.
- a further generalization is achieved by grouping the postalveolar consonants / S / (as in Masche) and / Z / (as in Gage) to the alveolar and labiodental consonants / f / and / v / with the labial, so that how given above, / fa (tS) /, / va (tS) /, / fa (dZ) / and / va (dZ) / can also contain the same vowel segments.
- segments for quasi-stationary vowel parts are required to simulate the middle of a long vowel realization.
- the language modules With the generalization according to the invention shown in the language modules, it is theoretically possible to get by with a number of 266 micro-segments for the German language, namely 16 vowels to 3 articulation positions, stationary, to the end; 6 plosives for 3 consonate groups by articulation point and 4 vowel groups; / h /, / j / and /? / to more differentiated vowel groups.
- the number of micro segments required for the German language should be between 320 and 350, depending on the differentiation of sounds. This corresponds to a storage space requirement of approx. 700 kB with 8 bit resolution and 22 kHz sampling rate due to the relatively short time of the microsegment. Compared to the known diphone synthesis, this provides a reduction by a factor of 12 to 32.
- markings are made in the individual microsegments, the one
- Microsegment together with the unabridged rendering has six different levels of play time.
- This method enables a further generalized use of the microsegments.
- the same signal modules provide the basic elements for long and short sounds in both stressed and unstressed syllables.
- the reductions in sentence-unaccented words are also derived from the same micro-segments recorded in sentence-emphasized position.
- the intonation of linguistic utterances can be generated by changing the fundamental frequency of the periodic parts of vowels and sonorants. This is carried out by fundamental frequency manipulation in the time domain on the microsegment, with hardly any loss of sound.
- the first voting period and the "closed phase" (1st part of the period) that is to be kept constant are marked. Due to the monotonous way of speaking, all other periods in the microsegment can be found automatically and thus define the closed phases.
- Microsegment performed uniformly.
- the resulting intonation is largely smoothed out by the natural "auditory integration" of the hearing person.
- the digital signal has, for example, a bandwidth of 8 bits and a sampling rate of 22 kHz.
- microsegments thus separated out are addressed according to the loud and de context and stored in a memory.
- a text to be output as language is fed into the system with the corresponding order of addresses.
- the order of sounds determines the choice of addresses.
- the microsegments are read from the memory and strung together in accordance with this address sequence.
- This digital time series is converted in a digital / analog converter, for example in a so-called sound blaster card, into an analog signal which can be output via voice output devices, for example a loudspeaker or headphones.
- the speech synthesis system according to the invention can be implemented on an ordinary PC, a working memory of approximately 4 MB being sufficient.
- the vocabulary that can be realized with the system is practically unlimited.
- the language is easy to understand, and the computational effort for modifications of the microsegments, for example reductions or changes in the fundamental frequency, is low since the voice signal is processed in the time domain.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrically Operated Instructional Devices (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Description
Claims
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19610019 | 1996-03-14 | ||
DE19610019A DE19610019C2 (de) | 1996-03-14 | 1996-03-14 | Digitales Sprachsyntheseverfahren |
PCT/DE1997/000454 WO1997034291A1 (de) | 1996-03-14 | 1997-03-08 | Auf mikrosegmenten basierendes sprachsyntheseverfahren |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0886853A1 true EP0886853A1 (de) | 1998-12-30 |
EP0886853B1 EP0886853B1 (de) | 1999-08-04 |
Family
ID=7788258
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP97917259A Expired - Lifetime EP0886853B1 (de) | 1996-03-14 | 1997-03-08 | Auf mikrosegmenten basierendes sprachsyntheseverfahren |
Country Status (5)
Country | Link |
---|---|
US (1) | US6308156B1 (de) |
EP (1) | EP0886853B1 (de) |
AT (1) | ATE183010T1 (de) |
DE (2) | DE19610019C2 (de) |
WO (1) | WO1997034291A1 (de) |
Families Citing this family (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19841683A1 (de) * | 1998-09-11 | 2000-05-11 | Hans Kull | Vorrichtung und Verfahren zur digitalen Sprachbearbeitung |
US6928404B1 (en) * | 1999-03-17 | 2005-08-09 | International Business Machines Corporation | System and methods for acoustic and language modeling for automatic speech recognition with large vocabularies |
US7369994B1 (en) * | 1999-04-30 | 2008-05-06 | At&T Corp. | Methods and apparatus for rapid acoustic unit selection from a large speech corpus |
DE19939947C2 (de) * | 1999-08-23 | 2002-01-24 | Data Software Ag G | Digitales Sprachsyntheseverfahren mit Intonationsnachbildung |
US20030191625A1 (en) * | 1999-11-05 | 2003-10-09 | Gorin Allen Louis | Method and system for creating a named entity language model |
US8392188B1 (en) | 1999-11-05 | 2013-03-05 | At&T Intellectual Property Ii, L.P. | Method and system for building a phonotactic model for domain independent speech recognition |
US7085720B1 (en) * | 1999-11-05 | 2006-08-01 | At & T Corp. | Method for task classification using morphemes |
US7286984B1 (en) | 1999-11-05 | 2007-10-23 | At&T Corp. | Method and system for automatically detecting morphemes in a task classification system using lattices |
US7213027B1 (en) * | 2000-03-21 | 2007-05-01 | Aol Llc | System and method for the transformation and canonicalization of semantically structured data |
JP2002221980A (ja) * | 2001-01-25 | 2002-08-09 | Oki Electric Ind Co Ltd | テキスト音声変換装置 |
US20040030555A1 (en) * | 2002-08-12 | 2004-02-12 | Oregon Health & Science University | System and method for concatenating acoustic contours for speech synthesis |
US8768701B2 (en) * | 2003-01-24 | 2014-07-01 | Nuance Communications, Inc. | Prosodic mimic method and apparatus |
US7308407B2 (en) * | 2003-03-03 | 2007-12-11 | International Business Machines Corporation | Method and system for generating natural sounding concatenative synthetic speech |
JP2005031259A (ja) * | 2003-07-09 | 2005-02-03 | Canon Inc | 自然言語処理方法 |
US20050125236A1 (en) * | 2003-12-08 | 2005-06-09 | International Business Machines Corporation | Automatic capture of intonation cues in audio segments for speech applications |
JP4265501B2 (ja) * | 2004-07-15 | 2009-05-20 | ヤマハ株式会社 | 音声合成装置およびプログラム |
DE102005002474A1 (de) | 2005-01-19 | 2006-07-27 | Obstfelder, Sigrid | Handy und Verfahren zur Spracheingabe in ein solches sowie Spracheingabebaustein und Verfahren zur Spracheingabe in einen solchen |
US8924212B1 (en) | 2005-08-26 | 2014-12-30 | At&T Intellectual Property Ii, L.P. | System and method for robust access and entry to large structured data using voice form-filling |
JP2008225254A (ja) * | 2007-03-14 | 2008-09-25 | Canon Inc | 音声合成装置及び方法並びにプログラム |
JP5119700B2 (ja) * | 2007-03-20 | 2013-01-16 | 富士通株式会社 | 韻律修正装置、韻律修正方法、および、韻律修正プログラム |
US7953600B2 (en) * | 2007-04-24 | 2011-05-31 | Novaspeech Llc | System and method for hybrid speech synthesis |
WO2008142836A1 (ja) * | 2007-05-14 | 2008-11-27 | Panasonic Corporation | 声質変換装置および声質変換方法 |
CN101312038B (zh) * | 2007-05-25 | 2012-01-04 | 纽昂斯通讯公司 | 用于合成语音的方法 |
US8321222B2 (en) * | 2007-08-14 | 2012-11-27 | Nuance Communications, Inc. | Synthesis by generation and concatenation of multi-form segments |
JP6047922B2 (ja) * | 2011-06-01 | 2016-12-21 | ヤマハ株式会社 | 音声合成装置および音声合成方法 |
JP5914996B2 (ja) * | 2011-06-07 | 2016-05-11 | ヤマハ株式会社 | 音声合成装置およびプログラム |
US9368104B2 (en) | 2012-04-30 | 2016-06-14 | Src, Inc. | System and method for synthesizing human speech using multiple speakers and context |
PL401372A1 (pl) * | 2012-10-26 | 2014-04-28 | Ivona Software Spółka Z Ograniczoną Odpowiedzialnością | Hybrydowa kompresja danych głosowych w systemach zamiany tekstu na mowę |
PL401371A1 (pl) * | 2012-10-26 | 2014-04-28 | Ivona Software Spółka Z Ograniczoną Odpowiedzialnością | Opracowanie głosu dla zautomatyzowanej zamiany tekstu na mowę |
JP2015014665A (ja) * | 2013-07-04 | 2015-01-22 | セイコーエプソン株式会社 | 音声認識装置及び方法、並びに、半導体集積回路装置 |
DE102013219828B4 (de) * | 2013-09-30 | 2019-05-02 | Continental Automotive Gmbh | Verfahren zum Phonetisieren von textenthaltenden Datensätzen mit mehreren Datensatzteilen und sprachgesteuerte Benutzerschnittstelle |
RU2692051C1 (ru) | 2017-12-29 | 2019-06-19 | Общество С Ограниченной Ответственностью "Яндекс" | Способ и система для синтеза речи из текста |
FR3087566B1 (fr) * | 2018-10-18 | 2021-07-30 | A I O | Dispositif de suivi des mouvements et/ou des efforts d’une personne, methode d’apprentissage dudit dispositif et procede d’analyse des mouvements et/ou des efforts d’une personne |
US11302300B2 (en) * | 2019-11-19 | 2022-04-12 | Applications Technology (Apptek), Llc | Method and apparatus for forced duration in neural speech synthesis |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
BG24190A1 (en) * | 1976-09-08 | 1978-01-10 | Antonov | Method of synthesis of speech and device for effecting same |
JPS5919358B2 (ja) * | 1978-12-11 | 1984-05-04 | 株式会社日立製作所 | 音声内容伝送方式 |
JPH0642158B2 (ja) * | 1983-11-01 | 1994-06-01 | 日本電気株式会社 | 音声合成装置 |
US4692941A (en) * | 1984-04-10 | 1987-09-08 | First Byte | Real-time text-to-speech conversion system |
DE69028072T2 (de) * | 1989-11-06 | 1997-01-09 | Canon Kk | Verfahren und Einrichtung zur Sprachsynthese |
KR940002854B1 (ko) * | 1991-11-06 | 1994-04-04 | 한국전기통신공사 | 음성 합성시스팀의 음성단편 코딩 및 그의 피치조절 방법과 그의 유성음 합성장치 |
JP3083640B2 (ja) * | 1992-05-28 | 2000-09-04 | 株式会社東芝 | 音声合成方法および装置 |
US5878396A (en) * | 1993-01-21 | 1999-03-02 | Apple Computer, Inc. | Method and apparatus for synthetic speech in facial animation |
JPH08502603A (ja) | 1993-01-30 | 1996-03-19 | コリア テレコミュニケーション オーソリティー | 音声合成及び認識システム |
JP3085631B2 (ja) * | 1994-10-19 | 2000-09-11 | 日本アイ・ビー・エム株式会社 | 音声合成方法及びシステム |
US5864812A (en) * | 1994-12-06 | 1999-01-26 | Matsushita Electric Industrial Co., Ltd. | Speech synthesizing method and apparatus for combining natural speech segments and synthesized speech segments |
-
1996
- 1996-03-14 DE DE19610019A patent/DE19610019C2/de not_active Expired - Fee Related
-
1997
- 1997-03-08 EP EP97917259A patent/EP0886853B1/de not_active Expired - Lifetime
- 1997-03-08 DE DE59700315T patent/DE59700315D1/de not_active Expired - Fee Related
- 1997-03-08 AT AT97917259T patent/ATE183010T1/de not_active IP Right Cessation
- 1997-03-08 US US09/142,728 patent/US6308156B1/en not_active Expired - Fee Related
- 1997-03-08 WO PCT/DE1997/000454 patent/WO1997034291A1/de active IP Right Grant
Non-Patent Citations (1)
Title |
---|
See references of WO9734291A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO1997034291A1 (de) | 1997-09-18 |
ATE183010T1 (de) | 1999-08-15 |
DE19610019A1 (de) | 1997-09-18 |
EP0886853B1 (de) | 1999-08-04 |
DE59700315D1 (de) | 1999-09-09 |
DE19610019C2 (de) | 1999-10-28 |
US6308156B1 (en) | 2001-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0886853B1 (de) | Auf mikrosegmenten basierendes sprachsyntheseverfahren | |
DE69028072T2 (de) | Verfahren und Einrichtung zur Sprachsynthese | |
Flanagan et al. | Synthetic voices for computers | |
DE69506037T2 (de) | Audioausgabeeinheit und Methode | |
DE69909716T2 (de) | Formant Sprachsynthetisierer unter Verwendung von Verkettung von Halbsilben mit unabhängiger Überblendung im Filterkoeffizienten- und Quellenbereich | |
DE60112512T2 (de) | Kodierung von Ausdruck in Sprachsynthese | |
WO2000011647A1 (de) | Verfahren und vorrichtungen zur koartikulationsgerechten konkatenation von audiosegmenten | |
Deterding | Phonetics and phonology | |
EP0058130B1 (de) | Verfahren zur Synthese von Sprache mit unbegrenztem Wortschatz und Schaltungsanordnung zur Durchführung des Verfahrens | |
EP1110203B1 (de) | Vorrichtung und verfahren zur digitalen sprachbearbeitung | |
Ramasubramanian et al. | Synthesis by rule of some retroflex speech sounds | |
KR101029493B1 (ko) | 음성 신호 합성 방법, 컴퓨터 판독가능 저장 매체 및 컴퓨터 시스템 | |
US6829577B1 (en) | Generating non-stationary additive noise for addition to synthesized speech | |
Anyanwu | Fundamentals of phonetics, phonology and tonology: with specific African sound patterns | |
Furtado et al. | Synthesis of unlimited speech in Indian languages using formant-based rules | |
JPH0580791A (ja) | 音声規則合成装置および方法 | |
DE19939947C2 (de) | Digitales Sprachsyntheseverfahren mit Intonationsnachbildung | |
JP3267659B2 (ja) | 日本語音声合成方法 | |
Nooteboom et al. | Speech synthesis by rule; Why, what and how? | |
Evgrafova | The Quality Evaluation of Allophone Database for English Concatenative Speech Synthesis | |
JPS63174100A (ja) | 音声規則合成方式 | |
Zhu et al. | A New Chinese Speech Synthesis Method Apply in Chinese Poetry Learning | |
Fant | Speech analysis and features | |
JPH0519780A (ja) | 音声規則合成装置および方法 | |
JPH0439698A (ja) | 音声合成装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19980912 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT CH DE FR GB LI |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
17Q | First examination report despatched |
Effective date: 19990414 |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT CH DE FR GB LI |
|
REF | Corresponds to: |
Ref document number: 183010 Country of ref document: AT Date of ref document: 19990815 Kind code of ref document: T |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REF | Corresponds to: |
Ref document number: 59700315 Country of ref document: DE Date of ref document: 19990909 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: NV Representative=s name: PATENTANWAELTE SCHAAD, BALASS, MENZL & PARTNER AG |
|
GBT | Gb: translation of ep patent filed (gb section 77(6)(a)/1977) |
Effective date: 19990831 |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
REG | Reference to a national code |
Ref country code: GB Ref legal event code: IF02 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20030305 Year of fee payment: 7 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20040308 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: AT Payment date: 20040308 Year of fee payment: 8 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: CH Payment date: 20040323 Year of fee payment: 8 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20040330 Year of fee payment: 8 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee | ||
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: AT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20050308 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20050331 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20050331 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20051130 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20051130 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20070321 Year of fee payment: 11 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20081001 |