EP0139419A1 - Sprachsyntheseeinrichtung - Google Patents
Sprachsyntheseeinrichtung Download PDFInfo
- Publication number
- EP0139419A1 EP0139419A1 EP84305918A EP84305918A EP0139419A1 EP 0139419 A1 EP0139419 A1 EP 0139419A1 EP 84305918 A EP84305918 A EP 84305918A EP 84305918 A EP84305918 A EP 84305918A EP 0139419 A1 EP0139419 A1 EP 0139419A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- phoneme
- string
- variable
- influencing
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000015572 biosynthetic process Effects 0.000 title description 5
- 238000003786 synthesis reaction Methods 0.000 title description 5
- 230000004048 modification Effects 0.000 claims abstract description 10
- 238000012986 modification Methods 0.000 claims abstract description 10
- 238000000034 method Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 2
- 230000002035 prolonged effect Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 8
- 230000001131 transforming effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
Definitions
- One of the known methods' for transforming character strings into synthetic speech is "synthesis by rule.”
- a character string is first transformed into a sequence of phonemes.
- the prosodic parameters the duration, pitch and power
- speech segments having those parameters are selected from a library of the spectral envelopes of such speech segments.
- the phoneme sequence and the parameters are provided to a well-known speech synthesizer which assembles the segments into synthetic speech, adjusting the parameters and connecting them into more-or-less natural speech.
- it is possible to synthesize speech using standard values of each parameter for each phoneme it is also possible to vary the duration of those phonemes which consist of a consonant-vowel (CV) combination.
- CV consonant-vowel
- variable phonemes When one of these variable phonemes is encountered in a phoneme string, its duration may be modified (changed from the standard value) by considering the influence of the phoneme immediately before or after the variable phoneme in the phoneme string.
- duration of CV phonemes is modified in accordance with the influence of influencing phonemes immediately adjacent the variable phoneme, the synthetic speech produced is rether unnatural and unclear.
- One object of this invention is to produce synthetic speech of such clarity and high quality that it is very nearly natural speech.
- Another object of the invention is to produce synthetic speech in which the prosodic parameters of variable phonemes are modified as a function of the existence of influencing phonemes at locations in the phoneme string other than the two locations immediately adjacent the variable phoneme (as a funtlon of a non-influencing phoneme in a non-adjacent mora).
- the invention is based on the careful observation of natural speech by the present inventors. They found, for example, that the duration of a variable phoneme consisting of a double consonant is influenced more by the kind of phoneme which exists in the phoneme string in a location one phoneme removed from the double consonant than by the kind of phoneme which exists immediately adjacent the double consonant in the phoneme string. They observed that, in Japanese speech, a double consonant lasts several milliseconds longer when a "prolonged sound" or a syllabic nasal "N" is at a location in the phoneme string two morae subsequent to the double consonant.
- the invention comprises five main elements.
- a character string translator accepts encoded character strings from a device, such as a keyboard, which is capable of inputting character strings electronically. Using a word memory, the character string translator converts each word in the input character string into a phoneme string corresponding to the characters in the word.
- a variable phoneme detector detects those phonemes in the string the values of whose prosodic parameters may be modified due to the existence of an influencing phoneme at a location in the phoneme string which is at least one mora removed from the location of the variable phoneme. If variable phonemes are detected in the phoneme string, a search is made by an influencing phoneme detector for influencing phonemes at the location in the phoneme string indicated by the variable phoneme detector.
- the variable phoneme detector is associated with a variable phoneme memory which stores, along with each variable phoneme, the predetermined location at which an influencing phoneme will influence the value of each parameter of the variable phoneme. If an influencing phoneme is detected at the appropriate location in relation to the variable phoneme the influencing phoneme detector will output data representative of a modification in the value of a selected parameter (duration, pitch or power) of the variable phoneme.
- the phoneme string is then delivered to a parameter value determining unit, which stores standard values of the parameters for all phonemes. Standard values may be modified in response to the modification data supplied by the influencing phoneme detector.
- the phoneme string, parameter values, and modification data are supplied to a well-known parametric synthesizer, which assembles them into synthetic speech.
- this invention since this invention is an electronic apparatus, it does not perform operations on "characters” or “phonemes” but rather on electrical codes representing characters and phonemes. This fact will be silently recognized throughout the specification.
- Fig. 1 is a block diagram of the improved speech synthesis apparatus.
- An encoded character string is input from a device (not shown) such as a keyboard having character kcys, a memory device which stores character strings (sech as used in a word processor), or a communication device receiving character strings through communication lines.
- a device such as a keyboard having character kcys, a memory device which stores character strings (sech as used in a word processor), or a communication device receiving character strings through communication lines.
- Character string translator 1 translates the character codes making up the character string into phoneme codes representing a phoneme string, using word memory 2.
- the phoneme string is supplied to variable phoneme detector 3, which detects variable phonemes.
- these variable phonemes include the double consonant (hereinafter indicated by "Q"), syllabic nasal “N”, and prolonged sound (hereinafter indicated by "L").
- variable phonemes are stored in variable phoneme memory 4 which is used by detector 3 in the detection of variable phonemes. As is described later, if a variable phoneme is detected, it is associated with attribution data from which parameter value modifications may be determined.
- the output of detector 3 includes information regarding the predetermined location at which an influencing phoneme must be found in order to influence the value of a parameter (the duration, in the preferred embodiment) of the detected variable phoneme.
- Influencing phoneme detector 5 determines whether there exists an influencing phoneme, such as a double consonant, syllabic nasal "N" or prolonged sound at the location Indicated by detector 3.
- Parameter value determining unit 7 stores the normal values of parameters such as the duration of the phoneme, its pitch or its power. If both the variable phoneme and a corresponding influencing phoneme are detected, unit 7 outputs a modified value of the given parameter to parametric synthesizer 8, which may be any well-known synthesizer, for example formant-type, Parcore-type, and Cepstrum-type synthesizer.
- Fig. 2 shows in more detail 'the character string translator 1 of Fig. 1.
- a character string (which normally constitutes one word) enters input register 101 under control of input control circuit 102.
- Read control circuit 103 supplies an initial address to word memory 2, and a word is read out into register 104.
- Word memory 2 stores a plurality of words (in segment 104A) together with the corresponding phoneme strings into which the words are translated (in segment 104B).
- the word from segment 104A is supplied to comparator 100 and compared with the content of input register 101. If the input character string In input register 101 is not the same as the character string from segment 104A, comparator 105 produces a signal on line 106.
- read control circuit 103 increments its internal counter (not shown) and provides the incremented address to word memory 2, so that the next word in memory 2 (together with the corresponding phoneme string) is read out Into register 104. These operations are repeated until comparator 105 detects identity between the input character string and the character string retrieved from memory 2,
- Fig. 3 shows in more detail the variable phoneme detector 3 of Fig. 1.
- Input control circuit 301 accesses phoneme string memory 108 and successively transfers phonemes into buffer 302.
- Most Japanese speech consists of four morae, as follows (a "mora" is a basic unit of time in speech; it may contain one or more phonemes):
- Read control circuit 303 supplies an initial address to variable phoneme memory 4, which stores variable phonemes in association with the predetermined location in the phoneme string at which an Influencing phoneme will influence the value of the duration of each variable phoneme.
- Variable phoneme data from memory 4 is written into register 304, which consists of a phoneme segment 304A and a relative location segment 304B.
- the relative location data in relative location segment 304B consists of at least one location in the phoneme string, relative to the variable phoneme, where an influencing phoneme may be located to influence the duration of the variable phoneme.
- Comparator 305 compares the phonemes in phoneme buffer 302 and the phoneme segment 304A of register 304. If they are not identical, comparator 305 produces a signal on Iine 306. Responding to the signal on line 306, read control circuit 303 increments its internal counter (not shown) and provides the incremented address to variable phoneme memory 4, so that the next variable phoneme in memory 4 is read out into register 304.
- comparator 305 detects identity between the phoneme in buffer 302 and the phoneme retrieved from memory 4 as stored in segment 304A, or until all variable phonemes in memory 4 have been retrieved.
- comparator 305 produces a signal on line 307, causing the phoneme from buffer 302 and the relative location data in segment 304B (through selector 308) to be written into memory 309.
- the selector 308 normally selects the relative location data from segment 304B; however, if identity is never detected by comparator 305, read control circuit 303 outputs an END signal on line 310, causing selector 308 to select "0" as the relative looation data, indicating that the phonemes are not variable phonemes.
- Fig. 4 shows in more detail the Influencing phoneme detector 5 of Fig. 1.
- Read control circuit 501 controls counter 502, whose content indicates an address memory 309.
- the present address in counter 502 is supplied to memory 309 through adder 503 which normally adds "O" to the present address.
- the data read out from memory 309 are provided to register 505 through selector 504 which normally selects register 506.
- the data consist of phonemes (in segment 505A) and relative locationa (in segment 505B).
- the relative location information in segment 505B is supplied to zero detector which produces a signal on line 507 when the relative location information in segmenet 505B is "O" or on line 508 when the relative location informetion is not "O".
- the signal on the line 508 is provided to read control circuit 501 to prevent the incrementing of counter 502, to the selector 504 to select register 506, and to read control circuit 510 to supply an Initial address to parameter value variation memory 6, That is, when the relative location information is not "O", the relative location is added to the present address in counter 502 by adder 503, and the sum is supplied as an address to memory 309 so that the corresponding data is read from memory 309 and stored in register 506 via selector 504.
- Parameter value variation memory 6 is accessed by read control circuit 510 and supplies to register 512 data consisting of a phoneme, in segment 512A (an influencing phoneme) and, in segment 512B, attribution data from which the change in duration of the variable phoneme may be determined.
- Comparator 513 compares the phoneme in segment 512A with that in segment 506A, producing a signal on line 514 if it detects identity and on line 515 otherwise. Responding to the signal on line 514, the phoneme'in segment 505A and the attribution data In segment 512B are written into a checked data memory 520. On the o her hand, the signal on line 515 is provided to read control 510 to increment an inner counter (not shown) and output the next address to memory 511.
- the comparator compares the phoneme in register 50f with successively retrieved phonemes in register 512. If none of the phonemes in memory 511 is the same as with the phoneme in register 506, read control circuit 510 outputs an END signal to selector 516. Selector 516 usually selects the attribution data segment of memory 512 as input data but instead selects "O" upon receipt of an END signal. Therefore, the phoneme In segment 505A is written into checked data memory 520 with attribution data equal to "O". When zero detector 506 detects a "0", it outputs a signal on line 507. In this case, the phoneme in segment 505A is not a variable phoneme, so it is written into the checked data memory 520 with attribution data equal to "O".
- Fig. 5 shows in more detail the parameter value determining unit 7 of Fig. 1.
- Data consisting of phonemes in combination with attribution data are successively read from checked data memory 520 into register 701.
- Parameter value memory 702 stores standard values of the parameters for every phoneme. (It is also possible, instead of storing all the values, to use a parameter calculator according to a phoneme code.)
- a phoneme in segment 701A of register 701 is supplied to parameter value memory 702 as an address; memory 702 then outputs the corresponding parameter value to adder 703.
- Modifying data memory 704 stores parameter value modifications and outputs them when addressed by attribution data from segment 701B.
- the standard parameter values from memory 702 and the modification data from memory 704 are added by adder 703 and the sums supplied along with the phoneme string to parametric synthesizer 8 of Fig. 1 for assembly into synthetic speech.
- a pitch modifying circuit 710 and a power modifying circuit 720 may be provided for modifying pitch or power data in a similar manner.
- Table 1 shows how the duration of a phoneme is modified.
- character string translator 1 translates it into the phoneme string KE/Q/SE/N, where "/" indicates divisions between morae.
- Variable phoneme detector 3 detects a variable phoneme "Q" in the second location; consequently, influencing phoneme detector 5 should search for an influencing phoneme.
- the standard duration t m of a double consonant in Japanese is 170 ms, and additional duration t a is, for example, 50 ms.
- clear and naturalized speech can be synthesized by considering the influence of a non-adjacent phoneme on the duration of a variable phoneme.
- Fig. 6 is a block diagram of another embodiment of this invention. Independent vowel detector 11, neighboring vowel detector 12, and prolonged sound transforming unit 13 are insertea between character string translator 1 and variable phoneme detector 3 of Fig. 1. The remaining elements of Fig. 6 are the same as in Fig. 1, so their descriptions are omitted here.
- Independent vowel detector 11 including a comparator (not shown), detects whether the phoneme string includes independent vowels O, U, or I. If the phoneme string includes such vowels, neighboring vowel detector 12, also Including a comparator, detects the identity of the phoneme immediately preceding the detected vowel. Then prolonged sound transforming unit 13, Including a code converter, transforms the detected independent vowel into the prolonged sound L if and only if the combination of the detected independent vowel and the immediately preceding phoneme fall into one of the following cate- goriesi
- Table 2 shows examples of these cases.
- Table 3 shows examples of the determination of phoneme duration.
- the character string "dangan” meaning “bullet”
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP58157719A JPS6050600A (ja) | 1983-08-31 | 1983-08-31 | 規則合成方式 |
JP157719/83 | 1983-08-31 |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0139419A1 true EP0139419A1 (de) | 1985-05-02 |
EP0139419B1 EP0139419B1 (de) | 1988-06-08 |
Family
ID=15655874
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP84305918A Expired EP0139419B1 (de) | 1983-08-31 | 1984-08-30 | Sprachsyntheseeinrichtung |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0139419B1 (de) |
JP (1) | JPS6050600A (de) |
DE (1) | DE3472021D1 (de) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ES2037623A2 (es) * | 1991-11-06 | 1993-06-16 | Korea Telecommunication | Metodo y dispositivo de sintesis del habla. |
US5546500A (en) * | 1993-05-10 | 1996-08-13 | Telia Ab | Arrangement for increasing the comprehension of speech when translating speech from a first language to a second language |
US5745650A (en) * | 1994-05-30 | 1998-04-28 | Canon Kabushiki Kaisha | Speech synthesis apparatus and method for synthesizing speech from a character series comprising a text and pitch information |
JP2010091829A (ja) * | 2008-10-09 | 2010-04-22 | Alpine Electronics Inc | 音声合成装置、音声合成方法および音声合成プログラム |
CN111177542A (zh) * | 2019-12-20 | 2020-05-19 | 贝壳技术有限公司 | 介绍信息的生成方法和装置、电子设备和存储介质 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0631999B2 (ja) * | 1985-03-29 | 1994-04-27 | 株式会社東芝 | 音声合成装置 |
JP2749804B2 (ja) * | 1986-08-15 | 1998-05-13 | 株式会社リコー | 韻律制御方式 |
JP2757867B2 (ja) * | 1987-12-02 | 1998-05-25 | テイボー株式会社 | 無機質製ペン先の製造方法 |
DE8914353U1 (de) * | 1989-12-06 | 1990-02-15 | Schwan-Stabilo Schwanhäußer GmbH & Co, 8500 Nürnberg | Gerät zum Auftragen von fließfähiger Kosmetiktusche, insbesondere Wimperntusche |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB1592473A (en) * | 1976-09-08 | 1981-07-08 | Edinen Zentar Phys | Method and apparatus for synthesis of speech |
-
1983
- 1983-08-31 JP JP58157719A patent/JPS6050600A/ja active Pending
-
1984
- 1984-08-30 DE DE8484305918T patent/DE3472021D1/de not_active Expired
- 1984-08-30 EP EP84305918A patent/EP0139419B1/de not_active Expired
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB1592473A (en) * | 1976-09-08 | 1981-07-08 | Edinen Zentar Phys | Method and apparatus for synthesis of speech |
Non-Patent Citations (3)
Title |
---|
EDN - ELECTRICAL DESIGN NEWS, vol. 25, no. 14, August 1980, pages 99-103, Denver, US; E. TEJA: "Versatile voice output demands sophisticated software" * |
ICASSP 80 - PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 9th-11th April 1980, vol. 2, pages 576-579, IEEE, New York, US; J. BERNSTEIN et al.: "Unlimited text-to-speech system: description and evaluation of a microprocessor based device" * |
ICASSP 82 - PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 3rd-5th May 1982, Paris, FR, vol. 3, pages 1589-1592, IEEE, New York, US; D.H. KLATT: "The klattalk text-to-speech conversion system" * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ES2037623A2 (es) * | 1991-11-06 | 1993-06-16 | Korea Telecommunication | Metodo y dispositivo de sintesis del habla. |
AT400646B (de) * | 1991-11-06 | 1996-02-26 | Korea Telecommunication | Sprachsegmentkodierungs- und tonlagensteuerungsverfahren für sprachsynthesesysteme und synthesevorrichtung |
US5546500A (en) * | 1993-05-10 | 1996-08-13 | Telia Ab | Arrangement for increasing the comprehension of speech when translating speech from a first language to a second language |
US5745650A (en) * | 1994-05-30 | 1998-04-28 | Canon Kabushiki Kaisha | Speech synthesis apparatus and method for synthesizing speech from a character series comprising a text and pitch information |
JP2010091829A (ja) * | 2008-10-09 | 2010-04-22 | Alpine Electronics Inc | 音声合成装置、音声合成方法および音声合成プログラム |
CN111177542A (zh) * | 2019-12-20 | 2020-05-19 | 贝壳技术有限公司 | 介绍信息的生成方法和装置、电子设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
DE3472021D1 (en) | 1988-07-14 |
JPS6050600A (ja) | 1985-03-20 |
EP0139419B1 (de) | 1988-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR900009170B1 (ko) | 규칙합성형 음성합성시스템 | |
EP1213705B1 (de) | Verfahren und Anordnung zur Sprachsysnthese | |
US5396577A (en) | Speech synthesis apparatus for rapid speed reading | |
US3704345A (en) | Conversion of printed text into synthetic speech | |
EP0723696B1 (de) | Sprachsynthese | |
US5220629A (en) | Speech synthesis apparatus and method | |
US6778962B1 (en) | Speech synthesis with prosodic model data and accent type | |
US4393460A (en) | Simultaneous electronic translation device | |
US5463713A (en) | Synthesis of speech from text | |
US7558732B2 (en) | Method and system for computer-aided speech synthesis | |
EP0282272B1 (de) | Spracherkennungssystem | |
US6035272A (en) | Method and apparatus for synthesizing speech | |
US5715368A (en) | Speech synthesis system and method utilizing phenome information and rhythm imformation | |
WO2004066271A1 (ja) | 音声合成装置,音声合成方法および音声合成システム | |
US5950162A (en) | Method, device and system for generating segment durations in a text-to-speech system | |
EP0139419B1 (de) | Sprachsyntheseeinrichtung | |
US6847932B1 (en) | Speech synthesis device handling phoneme units of extended CV | |
EP0144731B1 (de) | Sprachsynthesizer | |
JPH1115497A (ja) | 氏名読み音声合成装置 | |
JP3201329B2 (ja) | 音声合成装置 | |
JPH06119144A (ja) | 文書読み上げ装置 | |
EP1777697A2 (de) | Verfahren und Vorrichtung zur Sprachsynthese ohne Änderung der Prosodie | |
JP2801622B2 (ja) | テキスト音声合成方法 | |
JP2003005776A (ja) | 音声合成装置 | |
JP3414326B2 (ja) | 音声合成用辞書登録装置及び方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Designated state(s): DE FR GB |
|
17P | Request for examination filed |
Effective date: 19851018 |
|
17Q | First examination report despatched |
Effective date: 19861223 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REF | Corresponds to: |
Ref document number: 3472021 Country of ref document: DE Date of ref document: 19880714 |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 19960809 Year of fee payment: 13 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 19960821 Year of fee payment: 13 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 19960906 Year of fee payment: 13 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 19970830 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 19970830 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 19980430 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 19980501 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST |