EP0144731B1 - Speech synthesizer - Google Patents
Speech synthesizer Download PDFInfo
- Publication number
- EP0144731B1 EP0144731B1 EP84113186A EP84113186A EP0144731B1 EP 0144731 B1 EP0144731 B1 EP 0144731B1 EP 84113186 A EP84113186 A EP 84113186A EP 84113186 A EP84113186 A EP 84113186A EP 0144731 B1 EP0144731 B1 EP 0144731B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- waveform
- unit
- articulation
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
Description
- This invention relates to a speech synthesizer.
- In the conventional speech synthesis, appropriate syllable waveforms represented by combination of vowel-consonant-vowel (VCV) are prepared in advance, and connected together. However, since the number of phonemes represented by VCV is very large, an enormous memory capacity for storing them is required. On the other hand, there has been proposed a method in which the waveforms corresponding to the combinations of consonant-vowel (CV), or vowel-consonant (VC), namely, demisyllable or diphone, which have a time length of about half that of a single syllable, are prepared in advance, and the waveforms corresponding to the CV or VC to be required for synthesized speech are selected, and are connected together (compiled and synthesized). According to this method, a reduction memory capacity is possible than in the case of preparing VCV, but a relatively large memory capacity is still required because of the large quantity of speech waveform information corresponding to CV and VC.
- The document "Proceedings of the Seminar on Pattern Recognition, Vol. 1, Nov. 1977, 4.4.1 to 4.4.6" and the EP-A-0 058 130 disclose the transcription of alphanumeric characters to synthesize speech from elements stored in a memory. The first-mentioned document describes also semi-diphones. A unit speech waveform obtained by division of a diphone is, however, not specified in these documents.
- Accordingly, it is an object of the invention to provide a speech synthesizer which requires a comparatively small memory capacity in respect of speech data such as speech waveforms to be prepared in advance.
- It is another object of the invention to provide a speech synthesizer which has the above advantage and by which synthesized speech of high quality can be obtained.
- According to the present invention, there is provided a speech synthesizer comprising a converting means for converting the input sequence of characters to a sequence of articulation sym bols corresponding to a unit speech waveform which is obtained by dividing a diphone, a memory for storing the unit speech waveform corresponding to the predetermined articulation symbols and a synthesizing means for reading the unit speech waveforms corresponding to the articulation symbols of the converted sequence of articulation symbols from the memory and synthesizing them.
- This speech synthesizer is characterized by an interpolation method determining means for determining an interpolation method on the basis of the speech part of input characters corresponding to the output of said converting means; and an interpolating means for interpolating the unit speech waveform read from said memory on the basis of the determined interpolation method; furthermore, said interpolation method determining means directly connects the two read unit speech waveforms when said input speech part of the input characters is unvoiced (as well as silence), and determines a predetermined first interpolation method when said input speech part of the input characters is voiced. The same unit waveform is used both for a voiced and its corresponding unvoiced phoneme.
- These and other objects and features of the present invention will become clear by the following description of a preferred embodiment of the present invention with reference to the accompanying drawings.
-
- Fig. 1 is a block diagram showing the structure of an embodiment of a speech synthesizer according to the invention;
- Fig. 2 is a table of information of the synthesizer shown in Fig. 1 which is stored in a
memory 32 of a phoneme symbol/articulationsymbol converting part 30; - Fig. 3 illustrates the concept of the articulatory organs of the human body for explaining the principle of the invention;
- Figs. 4A and 4B show examples of articulatory segments for explaining the principle of the invention;
- Fig. 5 shows waveforms interpolated by a synchronous pitch method used in the present invention; and
- Fig. 6 is a waveform of synthesized speech formed by compiling and synthesizing waveforms of articulation element pieces.
- Referring to Fig. 1, a speech to be synthesized is first indicated by a
keyboard 10. From thekeyboard 10, a sequence of character signals, a stress strength signal (in this embodiment, three-levelled) and a boundary signal between speeches are generated. Hereinunder, the structure and performance of the speech synthesizer shown in Fig. 1 will be described on the assumption that the speech to be synthesized is "kite". - Now, the alphabetical character sequence signals incorporating "kite" are generated by pushing keys "K", "I", "T" and "E". The boundary signal B indicating the boundary such as the beginning, ending and pause of the word "kite" and the stress strength signal ST are also supplied to a phoneme symbol/articulation
symbol converting circuit 20 together with the character sequence signal. The stress strength is determined based on the pitch and strength of each syllable, for example, a high stress strength shows high pitch frequency. The convertingcircuit 20 has aprocessing part 21 and amemory 22. In thememory 22 is stored the phoneme symbol corresponding to the speech which has been prepared in advance. For example, a phoneme symbol /kait/ is stored in correspondence with the "kite". Theprocessing part 21 supplies an address information to thememory 22 in response to the input signal for a sequence of character. Then the phoneme symbol signal /kait/ is read from thememory 22 and supplied to a phoneme symbol/articulationsymbol converting circuit 30. The convertingcircuit 30 has, as well as the convertingcircuit 20, aprocessing part 31 and amemory 32. In thememory 32 is stored an articulation symbol (determined by the phonemes located therefore and thereafter) which is determined in advance corresponding to the phoneme symbol and by the method peculiar to the present invention which will be described in the following. - The articulatory organs of a human being include vocal chords, a tongue, lips, a velum palatinum, etc., as shown in Fig. 3, and various speech is generated by controlling these articulatory organs in accordance with nerve pulse signals. Therefore, if two articulations of the articulatory organs are similar, two similar speech waveforms are generated. Further, it is apparent that if the articulation parameter values representing the movements of these articulatory organs are approximate to each other, the generated speech waveforms are analogous. As described above, in the conventional synthesizing method based on the CV, VC waveform connecting type, many speech waveforms corresponding to CV and VC are prepared, but from the viewpont of the movement of an articulation parameter considerably redundant waveforms are included therein. For example, in the CV, VC waveform connecting type method, the speech waveform corresponding to a phoneme /ka/ and that corresponding to a phoneme /ga/ are prepared separately. However, the movement of the articulatory organs for /ka/ and that for /ga/ are very similar. The relationship between the tongue, palate, etc. is almost the same, and the main difference is in whether the vocal chords are vibrating or not (voiced or unvoiced) in the consonant parts. Therefore, in the voiced section after the unvoiced section of the consonant part /k/ in /ka/ (the section shifting to the normal part of the vowel /a/ which corresponds to (C in Fig. 4A) the articulation parameter is almost the same as that of /ga/ (C in Fig. 4B), which can take the place of the partial waveform of /ka/ in that section with a fairly good approximation. It is clear that in the pairs /kV/-/gV), /tV/-/dV/ and /pV/-/bV/ (V represents a vowel) also, the waveforms in the part shifting to the vowels can be shared. In Figs. 4A and 4B, part A is the silent part at the beginning of /ka/ or /ga/ (represented as *), part B the waveform of "k" in /ka/ or "g" in /ga/, B' the waveform of the part affected by the phoneme following "g" in /ga/, and C and D are, as described above, the speech waveforms of the vowels "a" following the consonant of /ka/ and /ga/.
- Here, the time section which is determined in consideration of manner of articulation is shorter than a CV or VC waveform and can be substituted by a speech waveform based on a different phoneme series, as is shown in Figs. 4A, 4B, is called an articulation segment, and a speech waveform in the articulation segment is called an articulation element piece waveform. That is, the syllables /ka/ and /ga/ are divided into the time sections B and C for the purpose of using the transient parts of those syllables as those for another speech synthesis.
- As described above, articulation segments the manner of articulation of which are the same are represented by the same articulation symbol and the articulation element piece waveform corresponding to this articulation symbol is stored in the
memory 32 in advance. In this way, in thememory 32, the articulation symbols corresponding to a sequence of phoneme symboles are stored in advance. Fig. 2 shows the classified articulation symbols, in which * represents the silent part which is placed at the beginning of speech or immediately before an explosive, "p", "t", "k" explosive parts, and (b)a, (d)a, (g)a represent transient parts of the vowel "a" parts which follow the consonants "b", "d", "g". On the other hand, i(b), i(d), i(g) represent the transient parts of the vowel "i" parts which precede the consonants "b", "d", "g", and ai, au, ao represent the transient parts where the vowel "a" is followed by the vowels "i", "u" and "o". - Now returning to Fig. 1, in response to an address corresponding to the phoneme signal /kait/ sent from the
processing part 21, a sequence of the articulation symbolsmemory 32 in the phoneme symbol/articulationsymbol converting circuit 31. Here, * represents a silent part described above (#1 in Fig. 2), "k" and "t" explosive parts of /k/ and /t/ (#2, #6 in Fig. 2), "g(a)" a transient part shifting from the consonant to the vowel of /ga/ (#3), "ai" a transient part of the vowel link /ai/ (#4) and"i(d)" a transient part shifting from the vowel to the consonant of /id/ (#5), respectively. In this example, /ka/ in the phoneme symbol /kait/ is substituted by a silent explosive "k" and "(g)a" representing the transient part shifting from the consonant to the vowel of the phoneme symbol "ga" resembling /ka/. The phoneme symbol /it/ is substituted by a transient part i(d) shifting from the vowel to the consonant of the phoneme symbol /id/ resembling /it/ and a silent part * is placed immediately before the silent explosive "t". - As described above, speech synthesis by using, in place of /ka/ and /it/, the waveforms taken from /ga/ and /id/ the phoneme sequence of which is different from, but the articulation of which is similar to /ka/ and /it/, dispenses with the need to previously store the transient part of /ka/ or /it/ and enables reduction in the memory capacity. These articulation element piece waveforms can be easily obtained from, for example, waveforms of uttered speech.
- Thus obtained articulation signal is supplied to a waveform
address generation circuit 50. The waveformaddress generation circuit 50 reads the articulation element piece waveform corresponding to each articulation symbol which is contained in the articulation signal, and corresponding to the stress signal ST from an articulation waveform memory which is selected from among memories 80a, 80b and 80c included in an articulation waveform memory 80 by the stress signal ST. In other words, the articulation element piece waveform is generated on the basis of the address corresponding to each articulation symbol from the memory 80. The stress signal ST from theprocessing part 21 is detected in a stressstrength detection circuit 40, and the articulation phoneme piece waveform of the strength corresponding to the strength of the detected stress strength is read from the memory 80. In the articulation waveform memory 80 the articulation element piece waveforms corresponding to the articulation symbols shown in Fig. 2 are stored. - An interpolation
method selection circuit 60 judges whether the articulation symbol (two continuous waveforms) from the phoneme symbol/ articulationsymbol converting circuit 30 is voiced or unvoiced. Theinterpolation circuit 70 is controlled by this judge result to perform the following interpolation, namely, when the articulation symbol is unvoiced (as well as silence) the two continuous articulation element piece waveforms read from the memory 80 are directly connected and, when the articulation symbol is voiced, these waveforms are interpolated, for example, synchronously with a pitch. - Generally, direct connection of the articulation waveforms make an unnatural synthesis because of the discontinuous change of a pitch or spectrum. To eliminate this drawback, in this invention, any spoken word is synthesized by connecting articulation waveforms having several levels of pitches by interpolation process between waveforms on the synchronous pitch process. For example, as shown in Fig. 5 if one pitch period of waveform (element piece waveform) at the connected ending part of a temporally preceding unit speech waveform is f(n), its time length (pitch period) Nf, the element piece waveform at the connected beginning part of a succeeding unit speech waveform g(n), its time length (pitch period) Ng, and the element piece waveform in the i-th section of the interpolation waveform of k pitch section is h,(n), the h,(n) is generated on the basis of the following formulae:
A converter 90 where the interpolated synthesized articulation waveform is converted to an analogue waveform and generated as a synthesized speech. The symbol waveform of a synthesized speech obtained in this way is shown in Fig. 6. - As described above, this invention, in which a unit of speech is used which is shorter from the viewpoint of time than a unit speech waveform such as CV, VC waveforms in the CV, VC waveform compiling type synthesizing method, not only requires a small memory capacity of waveform but also reflects exactly the articulation of the articulatory organs so as to obtain a synthesized speech of high quality.
- In the embodiment above described, an articulation element piece waveform corresponding to an articulation symbol is compiled and synthesized, but it is clear that the reduction in memory capacity is also possible when this invention is applied to the synthesizing method using what is called a "characteristic parameter" such as a Formant parameter.
Claims (12)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP205227/83 | 1983-11-01 | ||
JP58205227A JPH0642158B2 (en) | 1983-11-01 | 1983-11-01 | Speech synthesizer |
Publications (3)
Publication Number | Publication Date |
---|---|
EP0144731A2 EP0144731A2 (en) | 1985-06-19 |
EP0144731A3 EP0144731A3 (en) | 1985-07-03 |
EP0144731B1 true EP0144731B1 (en) | 1988-09-07 |
Family
ID=16503507
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP84113186A Expired EP0144731B1 (en) | 1983-11-01 | 1984-11-02 | Speech synthesizer |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0144731B1 (en) |
JP (1) | JPH0642158B2 (en) |
DE (1) | DE3473956D1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH02141054A (en) * | 1988-11-21 | 1990-05-30 | Nec Home Electron Ltd | Terminal equipment for personal computer communication |
JP3070127B2 (en) * | 1991-05-07 | 2000-07-24 | 株式会社明電舎 | Accent component control method of speech synthesizer |
DE19610019C2 (en) * | 1996-03-14 | 1999-10-28 | Data Software Gmbh G | Digital speech synthesis process |
JP4265501B2 (en) | 2004-07-15 | 2009-05-20 | ヤマハ株式会社 | Speech synthesis apparatus and program |
JP5782751B2 (en) * | 2011-03-07 | 2015-09-24 | ヤマハ株式会社 | Speech synthesizer |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE2531006A1 (en) * | 1975-07-11 | 1977-01-27 | Deutsche Bundespost | Speech synthesis system from diphthongs and phonemes - uses time limit for stored diphthongs and their double application |
JPS5331561A (en) * | 1976-09-04 | 1978-03-24 | Mitsukawa Shiyouichi | Method of manufacturing ssshaped springs |
DE3105518A1 (en) * | 1981-02-11 | 1982-08-19 | Heinrich-Hertz-Institut für Nachrichtentechnik Berlin GmbH, 1000 Berlin | METHOD FOR SYNTHESIS OF LANGUAGE WITH UNLIMITED VOCUS, AND CIRCUIT ARRANGEMENT FOR IMPLEMENTING THE METHOD |
JPS6017120B2 (en) * | 1981-05-29 | 1985-05-01 | 松下電器産業株式会社 | Phoneme piece-based speech synthesis method |
JPS5868099A (en) * | 1981-10-19 | 1983-04-22 | 富士通株式会社 | Voice synthesizer |
US4601052A (en) * | 1981-12-17 | 1986-07-15 | Matsushita Electric Industrial Co., Ltd. | Voice analysis composing method |
NL8200726A (en) * | 1982-02-24 | 1983-09-16 | Philips Nv | DEVICE FOR GENERATING THE AUDITIVE INFORMATION FROM A COLLECTION OF CHARACTERS. |
JPS5972494A (en) * | 1982-10-19 | 1984-04-24 | 株式会社東芝 | Rule snthesization system |
-
1983
- 1983-11-01 JP JP58205227A patent/JPH0642158B2/en not_active Expired - Lifetime
-
1984
- 1984-11-02 DE DE8484113186T patent/DE3473956D1/en not_active Expired
- 1984-11-02 EP EP84113186A patent/EP0144731B1/en not_active Expired
Also Published As
Publication number | Publication date |
---|---|
DE3473956D1 (en) | 1988-10-13 |
JPS6097396A (en) | 1985-05-31 |
JPH0642158B2 (en) | 1994-06-01 |
EP0144731A2 (en) | 1985-06-19 |
EP0144731A3 (en) | 1985-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4862504A (en) | Speech synthesis system of rule-synthesis type | |
US6778962B1 (en) | Speech synthesis with prosodic model data and accent type | |
US7240005B2 (en) | Method of controlling high-speed reading in a text-to-speech conversion system | |
JP3361066B2 (en) | Voice synthesis method and apparatus | |
US5463713A (en) | Synthesis of speech from text | |
US6035272A (en) | Method and apparatus for synthesizing speech | |
US6212501B1 (en) | Speech synthesis apparatus and method | |
EP0239394B1 (en) | Speech synthesis system | |
US20040054537A1 (en) | Text voice synthesis device and program recording medium | |
KR20000005183A (en) | Image synthesizing method and apparatus | |
US6970819B1 (en) | Speech synthesis device | |
EP0144731B1 (en) | Speech synthesizer | |
EP0107945B1 (en) | Speech synthesizing apparatus | |
JPS6050600A (en) | Rule synthesization system | |
JPH08335096A (en) | Text voice synthesizer | |
van Rijnsoever | A multilingual text-to-speech system | |
Furtado et al. | Synthesis of unlimited speech in Indian languages using formant-based rules | |
JP3060276B2 (en) | Speech synthesizer | |
JP3771565B2 (en) | Fundamental frequency pattern generation device, fundamental frequency pattern generation method, and program recording medium | |
JP3081300B2 (en) | Residual driven speech synthesizer | |
JPH0944191A (en) | Voice synthesizer | |
JP3318290B2 (en) | Voice synthesis method and apparatus | |
JPH11161297A (en) | Method and device for voice synthesizer | |
JP2573586B2 (en) | Rule-based speech synthesizer | |
JP2675883B2 (en) | Voice synthesis method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
17P | Request for examination filed |
Effective date: 19841102 |
|
AK | Designated contracting states |
Designated state(s): DE FR GB |
|
AK | Designated contracting states |
Designated state(s): DE FR GB |
|
17Q | First examination report despatched |
Effective date: 19861003 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REF | Corresponds to: |
Ref document number: 3473956 Country of ref document: DE Date of ref document: 19881013 |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
REG | Reference to a national code |
Ref country code: GB Ref legal event code: IF02 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20021030 Year of fee payment: 19 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20021107 Year of fee payment: 19 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20021108 Year of fee payment: 19 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20031102 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20040602 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20031102 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20040730 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST |