EP0107945B1 - Speech synthesizing apparatus - Google Patents
Speech synthesizing apparatus Download PDFInfo
- Publication number
- EP0107945B1 EP0107945B1 EP83306228A EP83306228A EP0107945B1 EP 0107945 B1 EP0107945 B1 EP 0107945B1 EP 83306228 A EP83306228 A EP 83306228A EP 83306228 A EP83306228 A EP 83306228A EP 0107945 B1 EP0107945 B1 EP 0107945B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- vowel
- parameter data
- consonant
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
Definitions
- This invention relates to a speech synthesizing . apparatus for synthesizing speech in accordance with input character strings.
- various speech synthesizing apparatuses for synthesizing speech on the basis of the sentence data to be applied as character strings have become known.
- various speech segments of predetermined units are preliminarily registered as a format of acoustic parameter in a speech segment file, and the corresponding acoustic parameter data is selectively read out from this speech segment file in accordance with the input phoneme data string.
- the speech data is synthesized on the basis of this acoustic parameter data read out in accordance with a predetermined synthesizing rule.
- a desired sentence can be generated at a desired speaking speed since the speech is synthesized in accordance with a predetermined synthesizing rule.
- This apparatus for synthesizing speech by rule is mainly divided, for example, into a V-C-V synthesizing apparatus using a chain consisting of vowel, consonant and vowel as a speech segment of one unit, and a C-V synthesizing apparatus using a monosyllable consisting of consonant and vowel as a speech segment of one unit in dependence upon the format of the speech segment to be registered in the speech segment file.
- Reference characters V and C used herein represent a vowel segment and a consonant segment, respectively.
- Fig. 1 is a schematic block diagram of a conventional speech synthesizing apparatus.
- This speech synthesizing apparatus includes a phoneme converting circuit 2 for converting input character code string into phoneme data string including accent information in accordance with predetermined phoneme conversion rule and accent rule, a speech segment file 4 in which a plurality of speech segments in the form of monosyllable have been stored, an interpolating circuit 6 which sequentially reads out the speech characteristic parameter data of the corresponding speech segment from the speech segment file 4 in accordance with the phoneme data string from the phoneme converting circuit 2 and then interpolates these speech characteristic parameter data, and a speech synthesizer circuit 8 for generating speech data by filter-processing the parameter data from this interpolating circuit 6.
- phonemes In the apparatus for synthesizing speech data by rules of this kind, phonemes must of course be converted with high accuracy to obtain more natural speech with high quality, but it is also required to obtain speech characteristic parameters which represent, with a high fidelity, the characteristics of the speech generated by a human being. For example, when speech is continuously generated, there may be a case where a certain monosyllable in this speech is coarticulated by monosyllables before and after the above-mentioned monosyllable.
- the acoustic energy pattern (speech characteristic parameter) of the speech segment of this monosyllable exhibits the inherent characteristics of the consonant C1 and vowel V1 with high fidelity as schematically shown in Fig. 2.
- the acoustic energy pattern (speech characteristic parameter) of the speech segment of the C1-V1 monosyllable will be changed as shown in Figs.
- this monosyllable is coarticulated by the subsequent C2-V2 monosyllable and is changed to a C11-V11 monosyllable, or it is coarticulated by the subsequent C3-V3 monosyllable and is changed to a C12-V12 monosyllable. Therefore, in order to generate the speech which is more natural and has high quality and is as similar as possible to the speech that is actually generated by a human being, it is required to generate the speech in consideration of the coarticulation between the successive speech segments. However, with a conventional speech synthesizing apparatus, only unnatural speech is obtained because it generates speech by simply coupling the phonemes regardless of the influence due to the coarticulation.
- EP-A-58130 discloses that discreet sound elements corresponding to consonant portions, steady-state vowel portions and transition elements.
- transition elements are composed of a combination of a consonant portion and a coarticulated vowel and it is thus necessary to prepare a large number of such transition elements in order to synthesize natural speech.
- a speech synthesizing apparatus comprising a data generation circuit for generating phoneme string data; memory means in which consonant and vowel characteristic parameter data representative of consonant and vowel segments are stored and which has a consonant segment file in which a plurality of consonant characteristic parameter data representative of a plurality of consonant segments, each of which has a consonant portion and a transient segment to a vowel segment, are stored, and a vowel segment file in which a plurality of vowel characteristic parameter date representative of a plurality of steady-state vowel segments are stored; control means for allowing the corresponding consonant and vowel characteristic parameter data to be generated from said memory means in accordance with said phoneme string data; and synthesizing means for synthesizing a speech signal on the basis of said consonant and vowel characteristic parameter data from said memory means; and including a parameter data series generation circuit for generating a series of consonant and vowel characteristic parameter data on the
- each consonant characteristic parameter data stored in the consonant segment file represents the consonant segment including a consonant portion and a transient segment to the vowel segment; therefore, it is possible to easily obtain the interpolated characteristic parameter data between this consonant characteristic parameter data and the succeeding vowel characteristic parameter data read out from the vowel segment file, thereby making it possible to clearly and naturally synthesize a speech even for a coarticulated monosyllable.
- consonant segments each including a consonant portion and a transient segment which changes from this consonant portion to a vowel segment are registered as a consonant segment C in the consonant segment file
- vowel segments including steady-state and coarticulated vowel segments are registered as a vowel segment V in the vowel segment file.
- Figs. 5A and 5B shows waveforms of a second [a]-sound of speech [hakata] and an [a]-sound of speech [kiai].
- Fig. 6A shows a power spectrum in the frame A of [a]-sound shown in Fig. 5A.
- Fig. 68 shows a power spectrum in the frame B of [a]-sound shown in Fig. 5B.
- the power spectrum of [a]-sound of [kiai] which is strongly affected due to the coarticulation is different from the power spectrum of the second [a]-sound of speech [hakata] which is not so affected due to the coarticulation.
- the speech characteristic parameters representative of the power spectra of different kinds of [a]-sounds are registered in the vowel segment file in dependence upon the degree of the influence due to the coarticulation.
- Figs. 7A to 7C show a speech signal, power spectrum and power sequence of a monosyllable "go" when it was generated.
- Fig. 7D indicates similarity between the power spectrum having the maximum power in the power sequence shown in Fig. 7C and other power spectra.
- time point t1 is determined as a boundary point between consonant and vowel, that is, in this example, the time point t1 is determined as a time point at which the similarity becomes smaller than a predetermined value when the similarity between the power spectrum having the maximum power and the power spectra which sequentially appear toward the direction in which a consonant was generated is sequentially calculated.
- the speech characteristic parameter data representing the power spectra generated during the period from the time when the consonant had been generated to the time point t1, in this example, the power spectra of three frames, is registered as a consonant segment data in the consonant segment file.
- the speech characteristic parameter data representing the power spectrum of one frame generated after a predetermined number of frames from the time point t1, preferably indicative of the power spectrum having the maximum power is registered as a vowel segment data in the vowel segment file.
- the formats of the speech characteristic parameters to be registered in the consonant and vowel segment files are determined in accordance with the speech synthesizing apparatus to be used.
- the speech characteristic parameter is determined by the Formant frequency, its band width and voiced-unvoiced information.
- the speech characteristic parameter is determined by the linear prediction coefficient and voiced-unvoiced information.
- Fig. 8 shows a block diagram of a speech synthesizing apparatus for synthesizing speech by rule as one embodiment according to the present invention.
- This speech synthesizing apparatus includes a consonant segment file 10, a vowel segment file 12, a phoneme converting circuit 14, and a control circuit 16 for generating output data such as consonant segment address data, vowel segment address data, pitch data, etc. in response to the output data from the phoneme converting circuit 14.
- a plurality of speech characteristic parameter data respectively representing a plurality of consonant segments each of which has a consonant portion and a transient segment are stored in the consonant segment file 10.
- a plurality of speech characteristic parameter data respectively representing a plurality of steady-state vowel and coarticulated vowels are stored in the vowel segment file 12.
- the phoneme converting circuit 14 reads out the corresponding phoneme string data and accent data from a phoneme dictionary and an accent dictionary (not shown) on the basis of the character code string corresponding to word, clause or sentence, and then supplies to the control circuit 16.
- This phoneme converting circuit 14 is introduced in, for example, "Letter-to-Sound Rules for Automatic Translation of English Text to Phonetics" by Honey S. Elovitz et al. from Naval Research Lab. (ASSP-24, No. 6, Dec 76, p. 446).
- the control circuit 16 serves to supply the consonant segment address data and vowel segment address data to the consonant segment file 10 and the vowel segment file 12, respectively, in accordance with the phoneme string data from the phoneme converting circuit 14. At the same time, the control circuit 16 writes the time data _corresponding to the time duration of a vowel to be generated and the accent data from the phoneme converting circuit 14 into a random access memory (RAM) 16A.
- RAM random access memory
- the segment address data are determined in accordance with not only the phoneme data indicative of the monosyllable, but also the phoneme data representing a succeeding monosyllable from the phoneme converting circuit 14, for example.
- the speech characteristic parameter data from the consonant segment file 10 is supplied to a first input port of an interpolation circuit 18, while the speech characteristic parameter data from the vowel segment file 12 is supplied to a second input port of the interpolation circuit 18 and to a repetition circuit 20.
- the interpolation circuit 18 calculates a predetermined number of speech characteristic parameter data on the basis of the speech characteristic parameter data indicative of the consonant segment which is constituted by the power spectrum of three frames from the consonant segment file 10 and the speech characteristic parameter data indicative of the vowel segment of the power spectrum of one frame from the vowel segment file 12.
- the calculated speech parameter data respectively represent a corresponding number of vowel segments each having the spectrum of one frame and interpolated between the input consonant and vowel segments.
- the repetition circuit 20 repeatedly fetches from the vowel segment file 12 the speech characteristic parameter data by the number of frames corresponding to the vowel time duration data stored in the RAM 16A.
- the speech characteristic parameter data from the interpolation circuit 18 and repetition circuit 20 are supplied through a switch 24 to a buffer register 22 in this order.
- the speech characteristic parameter data from this buffer register 22 is supplied to an interpolation circuit 26.
- This interpolation circuit 26 interpolates a predetermined number of speech characteristic parameter data between these two speech characteristic parameter data on the basis of the speech characteristic parameter data of the successive two frames from the buffer register 22.
- the speech characteristic parameter data from this interpolation circuit 26 are sequentially supplied to a speech synthesizer 28.
- This speech synthesizer 28 sequentially filter-processes the speech characteristic parameter data from the interpolation circuit 26 according to the pitch period data generated from a pitch generation circuit 30 in accordance with the accent data of the RAM 16A, and then generates a speech signal.
- the phoneme converting circuit 14 supplies the phoneme string data and accent data to the control circuit 16 in accordance with the input character code series.
- This control circuit 16 writes the time length data representing the time duration of a vowel to be generated and the pitch data regarding a speech generating pitch in the RAM 16A on the basis of the phoneme data and accent data from the phoneme converting circuit 14, respectively.
- the control circuit 16 supplies the consonant segment address data and vowel segment address data corresponding to the phoneme string data from the phoneme converting circuit 14 to the consonant segment file 10 and the vowel segment file 12, respectively.
- the control circuit 16 simultaneously generates the switch control signal to set the switch 24 into the first switching position.
- the control circuit 16 supplies the consonant and vowel segment address data coresponding to consonant segment [g] and vowel segment [o] to the consonant and vowel segment files 10 and 12, respectively, on the basis of the phoneme data corresponding to the two successive monosyllables of [goma] generated from the phoneme converting circuit 14. Due to this, the first to third speech characteristic parameter data corresponding to the power spectra of three frames indicative of consonant segment [g] in Fig. 9 are read out from the consonant segment file 10.
- the fourth speech characteristic parameter data corresponding to the power spectrum of one frame indicative of vowel [o] is read out from vowel segment file 12.
- the interpolation circuit 18 calculates the fifth to eighth speech characteristic parameter data indicative of the power spectrum of a predetermined number of frames, in this example, four frames between consonant segment [g] and vowel segment [o] shown in Fig. 9, on the basis of the third speech characteristic parameter data read out from the consonant segment file 10 and the fourth speech characteristic parameter data read out from the vowel segment file 12.
- this interpolation circuit 18 supplies the 1st to 3rd speech characteristic parameter data from the consonant segment file 10, the 5th to 8th speech characteristic parameter data thus calculated, and the 4th speech characteristic parameter data from the vowel segment file 12 to the buffer register 22 through the switch 24 in this order in response to the interpolation control signal from the control circuit 16.
- the switch 24 is set into the second switching position by the switching control signal from the control circuit 16.
- the control circuit 16 then supplies the control pulses of the number corresponding to the vowel time duration data stored in the RAM 16A to the repetition circuit 20 and through an OR gate 32 to the buffer register 22.
- the repetition circuit 20 fetches the speed characteristic parameter data from the vowel segment file 12 a corresponding number of times in response to the control pulse from the control circuit 16, and sequentially supplies to the buffer register 22.
- the speech characteristic parameter data representing the power spectra similar to the power spectra shown in Fig. 7B is stored in the buffer register 22.
- Fig. 9 the speech characteristic parameter data representing the power spectra similar to the power spectra shown in Fig. 7B is stored in the buffer register 22.
- the power spectra shown by the solid lines indicate the power spectra corresponding to the speech characteristic parameter data read out from the consonant and vowel segment files 10 and 12, and the power spectra shown by the broken lines represent the power spectra calculated by the interpolation circuit 18 and the power spectra generated from the repetition circuit 20.
- the control circuit 16 supplies the interpolation control signal through the OR gate 32 to the buffer register 22 and also supplies the interpolation control signal to the interpolation circuit 26, thereby allowing the speech characteristic parameter data in the buffer register 22 to be sequentially sent to the interpolation circuit 26.
- the interpolation circuit 26 then creates a predetermined number of interpolated speech characteristic parameter data on the basis of the speech characteristic parameter data of the successive two frames sent from the buffer register 22 and sequentially supplies to the speech synthesizer 28.
- the control circuit 16 simultaneously reads out the accent data stored in the RAM 16A and supplies to the pitch generation circuit 30, thereby allowing this pitch generation circuit 30 to generate the pitch period data.
- the speech synthesizer 28 synthesizes the speech signal including the pitch information in accordance with the speech characteristic parameter data from the interpolation circuit 26 and the pitch period data from the pitch generation circuit 30 and then generates the synthesized speech signal.
- the repetition circuit 20 is constituted in such a manner that it fetches the vowel characteristic parameter data from the ,, vowel segment file 12 in response to the control pulses from the control circuit 16.
- this repetition circuit 20 may be modified such that a high-level signal is generated from the control circuit 16 over the period of time corresponding to the time length data, and that the repetition circuit 20 fetches the vowel characteristic parameter data at a fixed interval from the vowel segment file 12 in response to this high-level signal.
- the vowel characteristic parameter data each of which represents one frame power spectrum have been stored in the vowel segment file 12
- the vowel characteristic parameter data each of which represents a plurality of power spectra can be stored in this vowel segment file.
Description
- This invention relates to a speech synthesizing . apparatus for synthesizing speech in accordance with input character strings.
- Recently, various speech synthesizing apparatuses for synthesizing speech on the basis of the sentence data to be applied as character strings have become known. For example, in an apparatus for synthesizing speech by rule, various speech segments of predetermined units are preliminarily registered as a format of acoustic parameter in a speech segment file, and the corresponding acoustic parameter data is selectively read out from this speech segment file in accordance with the input phoneme data string. The speech data is synthesized on the basis of this acoustic parameter data read out in accordance with a predetermined synthesizing rule. As described above, in this speech synthesizing apparatus, a desired sentence can be generated at a desired speaking speed since the speech is synthesized in accordance with a predetermined synthesizing rule.
- This apparatus for synthesizing speech by rule is mainly divided, for example, into a V-C-V synthesizing apparatus using a chain consisting of vowel, consonant and vowel as a speech segment of one unit, and a C-V synthesizing apparatus using a monosyllable consisting of consonant and vowel as a speech segment of one unit in dependence upon the format of the speech segment to be registered in the speech segment file. Reference characters V and C used herein represent a vowel segment and a consonant segment, respectively.
- Fig. 1 is a schematic block diagram of a conventional speech synthesizing apparatus. This speech synthesizing apparatus includes a
phoneme converting circuit 2 for converting input character code string into phoneme data string including accent information in accordance with predetermined phoneme conversion rule and accent rule, aspeech segment file 4 in which a plurality of speech segments in the form of monosyllable have been stored, an interpolatingcircuit 6 which sequentially reads out the speech characteristic parameter data of the corresponding speech segment from thespeech segment file 4 in accordance with the phoneme data string from thephoneme converting circuit 2 and then interpolates these speech characteristic parameter data, and a speech synthesizer circuit 8 for generating speech data by filter-processing the parameter data from this interpolatingcircuit 6. - In the apparatus for synthesizing speech data by rules of this kind, phonemes must of course be converted with high accuracy to obtain more natural speech with high quality, but it is also required to obtain speech characteristic parameters which represent, with a high fidelity, the characteristics of the speech generated by a human being. For example, when speech is continuously generated, there may be a case where a certain monosyllable in this speech is coarticulated by monosyllables before and after the above-mentioned monosyllable. When a monosyllable formed of consonant-vowel (C1-V1 ) syllable is independently generated, the acoustic energy pattern (speech characteristic parameter) of the speech segment of this monosyllable exhibits the inherent characteristics of the consonant C1 and vowel V1 with high fidelity as schematically shown in Fig. 2. However, in the case where this monosyllable is successively generated together with other monosyllables, the acoustic energy pattern (speech characteristic parameter) of the speech segment of the C1-V1 monosyllable will be changed as shown in Figs. 3A and 3B in dependence upon, for example, whether the subsequent monosyllable is a C2-V2 syllable or a C3-V3 syllable. In other words, this monosyllable is coarticulated by the subsequent C2-V2 monosyllable and is changed to a C11-V11 monosyllable, or it is coarticulated by the subsequent C3-V3 monosyllable and is changed to a C12-V12 monosyllable. Therefore, in order to generate the speech which is more natural and has high quality and is as similar as possible to the speech that is actually generated by a human being, it is required to generate the speech in consideration of the coarticulation between the successive speech segments. However, with a conventional speech synthesizing apparatus, only unnatural speech is obtained because it generates speech by simply coupling the phonemes regardless of the influence due to the coarticulation.
- EP-A-58130 discloses that discreet sound elements corresponding to consonant portions, steady-state vowel portions and transition elements. However, in this prior art, transition elements are composed of a combination of a consonant portion and a coarticulated vowel and it is thus necessary to prepare a large number of such transition elements in order to synthesize natural speech.
- It is an object of the present invention to provide a speech synthesizing apparatus for synthesizing clear and natural speech.
- According to the invention, there is provided a speech synthesizing apparatus comprising a data generation circuit for generating phoneme string data; memory means in which consonant and vowel characteristic parameter data representative of consonant and vowel segments are stored and which has a consonant segment file in which a plurality of consonant characteristic parameter data representative of a plurality of consonant segments, each of which has a consonant portion and a transient segment to a vowel segment, are stored, and a vowel segment file in which a plurality of vowel characteristic parameter date representative of a plurality of steady-state vowel segments are stored; control means for allowing the corresponding consonant and vowel characteristic parameter data to be generated from said memory means in accordance with said phoneme string data; and synthesizing means for synthesizing a speech signal on the basis of said consonant and vowel characteristic parameter data from said memory means; and including a parameter data series generation circuit for generating a series of consonant and vowel characteristic parameter data on the basis of the consonant and vowel characteristic parameter data from said consonant and vowel characteristic paramteter data from said consonant and vowel segment files, and a synthesis circuit for synthesizing the speech signal on the basis of the parameter data series from said parameter data series generation circuit, characterized in that said vowel segment file further stores a plurality of vowel characteristic parameter data representative of a plurality of coarticulated vowel segments, each of said steady-state and coarticulated vowel segments being formed of one frame parameter data, said control means generates time length data indicative of a vowel duration length in accordance with the phoneme string data from said data generation circuit, and said parameter data series generation circuit includes a repetition circuit which derives the vowel characteristic parameter data from said vowel segment file the number of times corresponding to said time length data.
- In the described embodiment, each consonant characteristic parameter data stored in the consonant segment file represents the consonant segment including a consonant portion and a transient segment to the vowel segment; therefore, it is possible to easily obtain the interpolated characteristic parameter data between this consonant characteristic parameter data and the succeeding vowel characteristic parameter data read out from the vowel segment file, thereby making it possible to clearly and naturally synthesize a speech even for a coarticulated monosyllable.
- An embodiment of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:
- Fig. 1 is a schematic block diagram of a conventional speech synthesizing apparatus;
- Fig. 2 shows the schematic acoustic energy pattern of a monosyllable independently generated;
- Figs. 3A and 3B show the schematic acoustic energy pattern of coarticulated monosyllables;
- Fig. 4 shows the schematic acoustic energy pattern of consonant and vowel segments registered in consonant and vowel segment files used in this invention;
- Figs. 5A and 5B show waveforms of [a]-sound included in different speeches;
- Figs. 6A and 68 show power spectra of selected frames in the [a]-sounds shown in Figs. 5A and 5B;
- Figs. 7A to 7C show a speech signal, power spectra and power sequence of a monosyllable "go";
- Fig. 7D shows similarity between the power spectrum having the maximum power in the power sequence of Fig. 7C and other power spectra;
- Fig. 8 is a block diagram of a speech synthesizing apparatus according to one embodiment of this invention;
- Fig. 9 shows power spectra obtained in the speech synthesizing apparatus of Fig. 8; and
- Fig. 10 is a flowchart illustrating the operation of the speech synthesizing apparatus shown in Fig. 8.
- As shown in Fig. 4, consonant segments each including a consonant portion and a transient segment which changes from this consonant portion to a vowel segment are registered as a consonant segment C in the consonant segment file, and vowel segments including steady-state and coarticulated vowel segments are registered as a vowel segment V in the vowel segment file.
- Figs. 5A and 5B shows waveforms of a second [a]-sound of speech [hakata] and an [a]-sound of speech [kiai]. Fig. 6A shows a power spectrum in the frame A of [a]-sound shown in Fig. 5A. Fig. 68 shows a power spectrum in the frame B of [a]-sound shown in Fig. 5B. As is obvious from these Figs. 5A, 5B, 6A and 6B, the power spectrum of [a]-sound of [kiai] which is strongly affected due to the coarticulation is different from the power spectrum of the second [a]-sound of speech [hakata] which is not so affected due to the coarticulation. As described above, the speech characteristic parameters representative of the power spectra of different kinds of [a]-sounds are registered in the vowel segment file in dependence upon the degree of the influence due to the coarticulation.
- Figs. 7A to 7C show a speech signal, power spectrum and power sequence of a monosyllable "go" when it was generated. Fig. 7D indicates similarity between the power spectrum having the maximum power in the power sequence shown in Fig. 7C and other power spectra. In Fig. 7D, time point t1 is determined as a boundary point between consonant and vowel, that is, in this example, the time point t1 is determined as a time point at which the similarity becomes smaller than a predetermined value when the similarity between the power spectrum having the maximum power and the power spectra which sequentially appear toward the direction in which a consonant was generated is sequentially calculated. The speech characteristic parameter data representing the power spectra generated during the period from the time when the consonant had been generated to the time point t1, in this example, the power spectra of three frames, is registered as a consonant segment data in the consonant segment file. In addition, the speech characteristic parameter data representing the power spectrum of one frame generated after a predetermined number of frames from the time point t1, preferably indicative of the power spectrum having the maximum power is registered as a vowel segment data in the vowel segment file.
- The formats of the speech characteristic parameters to be registered in the consonant and vowel segment files are determined in accordance with the speech synthesizing apparatus to be used. For example, in the Formant synthesizing apparatus, the speech characteristic parameter is determined by the Formant frequency, its band width and voiced-unvoiced information. On the other hand, in the linear prediction synthesizing apparatus, the speech characteristic parameter is determined by the linear prediction coefficient and voiced-unvoiced information.
- Fig. 8 shows a block diagram of a speech synthesizing apparatus for synthesizing speech by rule as one embodiment according to the present invention. This speech synthesizing apparatus includes a
consonant segment file 10, avowel segment file 12, aphoneme converting circuit 14, and acontrol circuit 16 for generating output data such as consonant segment address data, vowel segment address data, pitch data, etc. in response to the output data from thephoneme converting circuit 14. As already described with reference to Fig. 4, a plurality of speech characteristic parameter data respectively representing a plurality of consonant segments each of which has a consonant portion and a transient segment are stored in theconsonant segment file 10. A plurality of speech characteristic parameter data respectively representing a plurality of steady-state vowel and coarticulated vowels are stored in thevowel segment file 12. Thephoneme converting circuit 14 reads out the corresponding phoneme string data and accent data from a phoneme dictionary and an accent dictionary (not shown) on the basis of the character code string corresponding to word, clause or sentence, and then supplies to thecontrol circuit 16. Thisphoneme converting circuit 14 is introduced in, for example, "Letter-to-Sound Rules for Automatic Translation of English Text to Phonetics" by Honey S. Elovitz et al. from Naval Research Lab. (ASSP-24, No. 6, Dec 76, p. 446). - The
control circuit 16 serves to supply the consonant segment address data and vowel segment address data to theconsonant segment file 10 and thevowel segment file 12, respectively, in accordance with the phoneme string data from thephoneme converting circuit 14. At the same time, thecontrol circuit 16 writes the time data _corresponding to the time duration of a vowel to be generated and the accent data from thephoneme converting circuit 14 into a random access memory (RAM) 16A. Where thecontrol circuit 16 generates the consonant and vowel segment address data corresponding to the consonant and vowel which are included in a monosyllable supplied from thephoneme converting circuit 14, the segment address data are determined in accordance with not only the phoneme data indicative of the monosyllable, but also the phoneme data representing a succeeding monosyllable from thephoneme converting circuit 14, for example. - The speech characteristic parameter data from the
consonant segment file 10 is supplied to a first input port of aninterpolation circuit 18, while the speech characteristic parameter data from thevowel segment file 12 is supplied to a second input port of theinterpolation circuit 18 and to arepetition circuit 20. Theinterpolation circuit 18 calculates a predetermined number of speech characteristic parameter data on the basis of the speech characteristic parameter data indicative of the consonant segment which is constituted by the power spectrum of three frames from theconsonant segment file 10 and the speech characteristic parameter data indicative of the vowel segment of the power spectrum of one frame from thevowel segment file 12. The calculated speech parameter data respectively represent a corresponding number of vowel segments each having the spectrum of one frame and interpolated between the input consonant and vowel segments. Therepetition circuit 20 repeatedly fetches from thevowel segment file 12 the speech characteristic parameter data by the number of frames corresponding to the vowel time duration data stored in theRAM 16A. - The speech characteristic parameter data from the
interpolation circuit 18 andrepetition circuit 20 are supplied through aswitch 24 to abuffer register 22 in this order. The speech characteristic parameter data from thisbuffer register 22 is supplied to aninterpolation circuit 26. Thisinterpolation circuit 26 interpolates a predetermined number of speech characteristic parameter data between these two speech characteristic parameter data on the basis of the speech characteristic parameter data of the successive two frames from thebuffer register 22. The speech characteristic parameter data from thisinterpolation circuit 26 are sequentially supplied to aspeech synthesizer 28. Thisspeech synthesizer 28 sequentially filter-processes the speech characteristic parameter data from theinterpolation circuit 26 according to the pitch period data generated from apitch generation circuit 30 in accordance with the accent data of theRAM 16A, and then generates a speech signal. - The operation of the speech synthesizing apparatus shown in Fig. 8 will be described with reference to a power spectrum shown in Fig. 9, and a flowchart shown in Fig. 10.
- The
phoneme converting circuit 14 supplies the phoneme string data and accent data to thecontrol circuit 16 in accordance with the input character code series. Thiscontrol circuit 16 writes the time length data representing the time duration of a vowel to be generated and the pitch data regarding a speech generating pitch in theRAM 16A on the basis of the phoneme data and accent data from thephoneme converting circuit 14, respectively. Furthermore, thecontrol circuit 16 supplies the consonant segment address data and vowel segment address data corresponding to the phoneme string data from thephoneme converting circuit 14 to theconsonant segment file 10 and thevowel segment file 12, respectively. In this case, thecontrol circuit 16 simultaneously generates the switch control signal to set theswitch 24 into the first switching position. - It is now assumed, for example, that the input character code series including the character codes representative of two successive monosyllables of [goma] was supplied to the
phoneme converting circuit 14. In this case, thecontrol circuit 16 supplies the consonant and vowel segment address data coresponding to consonant segment [g] and vowel segment [o] to the consonant and vowel segment files 10 and 12, respectively, on the basis of the phoneme data corresponding to the two successive monosyllables of [goma] generated from thephoneme converting circuit 14. Due to this, the first to third speech characteristic parameter data corresponding to the power spectra of three frames indicative of consonant segment [g] in Fig. 9 are read out from theconsonant segment file 10. The fourth speech characteristic parameter data corresponding to the power spectrum of one frame indicative of vowel [o] is read out fromvowel segment file 12. Theinterpolation circuit 18 calculates the fifth to eighth speech characteristic parameter data indicative of the power spectrum of a predetermined number of frames, in this example, four frames between consonant segment [g] and vowel segment [o] shown in Fig. 9, on the basis of the third speech characteristic parameter data read out from theconsonant segment file 10 and the fourth speech characteristic parameter data read out from thevowel segment file 12. Next, thisinterpolation circuit 18 supplies the 1st to 3rd speech characteristic parameter data from theconsonant segment file 10, the 5th to 8th speech characteristic parameter data thus calculated, and the 4th speech characteristic parameter data from thevowel segment file 12 to thebuffer register 22 through theswitch 24 in this order in response to the interpolation control signal from thecontrol circuit 16. - Thereafter, the
switch 24 is set into the second switching position by the switching control signal from thecontrol circuit 16. Thecontrol circuit 16 then supplies the control pulses of the number corresponding to the vowel time duration data stored in theRAM 16A to therepetition circuit 20 and through anOR gate 32 to thebuffer register 22. Thus, therepetition circuit 20 fetches the speed characteristic parameter data from the vowel segment file 12 a corresponding number of times in response to the control pulse from thecontrol circuit 16, and sequentially supplies to thebuffer register 22. In this way, as shown in Fig. 9, the speech characteristic parameter data representing the power spectra similar to the power spectra shown in Fig. 7B is stored in thebuffer register 22. In Fig. 9, the power spectra shown by the solid lines indicate the power spectra corresponding to the speech characteristic parameter data read out from the consonant and vowel segment files 10 and 12, and the power spectra shown by the broken lines represent the power spectra calculated by theinterpolation circuit 18 and the power spectra generated from therepetition circuit 20. - Next, the
control circuit 16 supplies the interpolation control signal through theOR gate 32 to thebuffer register 22 and also supplies the interpolation control signal to theinterpolation circuit 26, thereby allowing the speech characteristic parameter data in thebuffer register 22 to be sequentially sent to theinterpolation circuit 26. Theinterpolation circuit 26 then creates a predetermined number of interpolated speech characteristic parameter data on the basis of the speech characteristic parameter data of the successive two frames sent from thebuffer register 22 and sequentially supplies to thespeech synthesizer 28. In this case, thecontrol circuit 16 simultaneously reads out the accent data stored in theRAM 16A and supplies to thepitch generation circuit 30, thereby allowing thispitch generation circuit 30 to generate the pitch period data. Thespeech synthesizer 28 synthesizes the speech signal including the pitch information in accordance with the speech characteristic parameter data from theinterpolation circuit 26 and the pitch period data from thepitch generation circuit 30 and then generates the synthesized speech signal. - Although the present invention has been described above with respect to one embodiment, this invention is not limited to only this embodiment. For example; the
repetition circuit 20 is constituted in such a manner that it fetches the vowel characteristic parameter data from the ,,vowel segment file 12 in response to the control pulses from thecontrol circuit 16. However, it may be possible to modify thisrepetition circuit 20 such that a high-level signal is generated from thecontrol circuit 16 over the period of time corresponding to the time length data, and that therepetition circuit 20 fetches the vowel characteristic parameter data at a fixed interval from thevowel segment file 12 in response to this high-level signal. In addition, although a plurality of vowel characteristic parameter data each of which represents one frame power spectrum have been stored in thevowel segment file 12, the vowel characteristic parameter data each of which represents a plurality of power spectra can be stored in this vowel segment file.
Claims (6)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP183410/82 | 1982-10-19 | ||
JP57183410A JPS5972494A (en) | 1982-10-19 | 1982-10-19 | Rule snthesization system |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0107945A1 EP0107945A1 (en) | 1984-05-09 |
EP0107945B1 true EP0107945B1 (en) | 1987-03-18 |
Family
ID=16135290
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP83306228A Expired EP0107945B1 (en) | 1982-10-19 | 1983-10-14 | Speech synthesizing apparatus |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0107945B1 (en) |
JP (1) | JPS5972494A (en) |
DE (1) | DE3370390D1 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0642158B2 (en) * | 1983-11-01 | 1994-06-01 | 日本電気株式会社 | Speech synthesizer |
JPH0756598B2 (en) * | 1984-07-25 | 1995-06-14 | 株式会社日立製作所 | Speech synthesis method of speech synthesizer |
JPH0833744B2 (en) * | 1986-01-09 | 1996-03-29 | 株式会社東芝 | Speech synthesizer |
JP2577372B2 (en) * | 1987-02-24 | 1997-01-29 | 株式会社東芝 | Speech synthesis apparatus and method |
DK46493D0 (en) * | 1993-04-22 | 1993-04-22 | Frank Uldall Leonhard | METHOD OF SIGNAL TREATMENT FOR DETERMINING TRANSIT CONDITIONS IN AUDITIVE SIGNALS |
AU699837B2 (en) * | 1995-03-07 | 1998-12-17 | British Telecommunications Public Limited Company | Speech synthesis |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3975587A (en) * | 1974-09-13 | 1976-08-17 | International Telephone And Telegraph Corporation | Digital vocoder |
DE2531006A1 (en) * | 1975-07-11 | 1977-01-27 | Deutsche Bundespost | Speech synthesis system from diphthongs and phonemes - uses time limit for stored diphthongs and their double application |
DE3105518A1 (en) * | 1981-02-11 | 1982-08-19 | Heinrich-Hertz-Institut für Nachrichtentechnik Berlin GmbH, 1000 Berlin | METHOD FOR SYNTHESIS OF LANGUAGE WITH UNLIMITED VOCUS, AND CIRCUIT ARRANGEMENT FOR IMPLEMENTING THE METHOD |
-
1982
- 1982-10-19 JP JP57183410A patent/JPS5972494A/en active Pending
-
1983
- 1983-10-14 EP EP83306228A patent/EP0107945B1/en not_active Expired
- 1983-10-14 DE DE8383306228T patent/DE3370390D1/en not_active Expired
Non-Patent Citations (1)
Title |
---|
ASSP-24, No. 6, Dec. 76, p. 446 * |
Also Published As
Publication number | Publication date |
---|---|
JPS5972494A (en) | 1984-04-24 |
EP0107945A1 (en) | 1984-05-09 |
DE3370390D1 (en) | 1987-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4862504A (en) | Speech synthesis system of rule-synthesis type | |
US4692941A (en) | Real-time text-to-speech conversion system | |
EP0886853B1 (en) | Microsegment-based speech-synthesis process | |
US4685135A (en) | Text-to-speech synthesis system | |
US4398059A (en) | Speech producing system | |
EP0059880A2 (en) | Text-to-speech synthesis system | |
US5633984A (en) | Method and apparatus for speech processing | |
EP0239394B1 (en) | Speech synthesis system | |
US5463715A (en) | Method and apparatus for speech generation from phonetic codes | |
EP0107945B1 (en) | Speech synthesizing apparatus | |
US6970819B1 (en) | Speech synthesis device | |
EP0144731B1 (en) | Speech synthesizer | |
US6829577B1 (en) | Generating non-stationary additive noise for addition to synthesized speech | |
van Rijnsoever | A multilingual text-to-speech system | |
JP3771565B2 (en) | Fundamental frequency pattern generation device, fundamental frequency pattern generation method, and program recording medium | |
JP2703253B2 (en) | Speech synthesizer | |
KR100202539B1 (en) | Voice synthetic method | |
JPH0594199A (en) | Residual driving type speech synthesizing device | |
JPS62284398A (en) | Sentence-voice conversion system | |
JP2573586B2 (en) | Rule-based speech synthesizer | |
JP2573585B2 (en) | Speech spectrum pattern generator | |
JP2573587B2 (en) | Pitch pattern generator | |
JPS58168096A (en) | Multi-language voice synthesizer | |
JPS63174100A (en) | Voice rule synthesization system | |
JPH055116B2 (en) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19831024 |
|
AK | Designated contracting states |
Designated state(s): DE FR GB NL |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: KABUSHIKI KAISHA TOSHIBA |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB NL |
|
REF | Corresponds to: |
Ref document number: 3370390 Country of ref document: DE Date of ref document: 19870423 |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
REG | Reference to a national code |
Ref country code: GB Ref legal event code: 746 Effective date: 19980909 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 19981009 Year of fee payment: 16 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 19981016 Year of fee payment: 16 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 19981023 Year of fee payment: 16 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: NL Payment date: 19981028 Year of fee payment: 16 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: D6 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 19991014 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20000501 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 19991014 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20000630 |
|
NLV4 | Nl: lapsed or anulled due to non-payment of the annual fee |
Effective date: 20000501 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20000801 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST |