EP0139419A1 - Sprachsyntheseeinrichtung - Google Patents

Sprachsyntheseeinrichtung Download PDF

Info

Publication number
EP0139419A1
EP0139419A1 EP84305918A EP84305918A EP0139419A1 EP 0139419 A1 EP0139419 A1 EP 0139419A1 EP 84305918 A EP84305918 A EP 84305918A EP 84305918 A EP84305918 A EP 84305918A EP 0139419 A1 EP0139419 A1 EP 0139419A1
Authority
EP
European Patent Office
Prior art keywords
phoneme
string
variable
influencing
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP84305918A
Other languages
English (en)
French (fr)
Other versions
EP0139419B1 (de
Inventor
Sadakazu Watanabe
Norimasa Nomura
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Publication of EP0139419A1 publication Critical patent/EP0139419A1/de
Application granted granted Critical
Publication of EP0139419B1 publication Critical patent/EP0139419B1/de
Expired legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Definitions

  • One of the known methods' for transforming character strings into synthetic speech is "synthesis by rule.”
  • a character string is first transformed into a sequence of phonemes.
  • the prosodic parameters the duration, pitch and power
  • speech segments having those parameters are selected from a library of the spectral envelopes of such speech segments.
  • the phoneme sequence and the parameters are provided to a well-known speech synthesizer which assembles the segments into synthetic speech, adjusting the parameters and connecting them into more-or-less natural speech.
  • it is possible to synthesize speech using standard values of each parameter for each phoneme it is also possible to vary the duration of those phonemes which consist of a consonant-vowel (CV) combination.
  • CV consonant-vowel
  • variable phonemes When one of these variable phonemes is encountered in a phoneme string, its duration may be modified (changed from the standard value) by considering the influence of the phoneme immediately before or after the variable phoneme in the phoneme string.
  • duration of CV phonemes is modified in accordance with the influence of influencing phonemes immediately adjacent the variable phoneme, the synthetic speech produced is rether unnatural and unclear.
  • One object of this invention is to produce synthetic speech of such clarity and high quality that it is very nearly natural speech.
  • Another object of the invention is to produce synthetic speech in which the prosodic parameters of variable phonemes are modified as a function of the existence of influencing phonemes at locations in the phoneme string other than the two locations immediately adjacent the variable phoneme (as a funtlon of a non-influencing phoneme in a non-adjacent mora).
  • the invention is based on the careful observation of natural speech by the present inventors. They found, for example, that the duration of a variable phoneme consisting of a double consonant is influenced more by the kind of phoneme which exists in the phoneme string in a location one phoneme removed from the double consonant than by the kind of phoneme which exists immediately adjacent the double consonant in the phoneme string. They observed that, in Japanese speech, a double consonant lasts several milliseconds longer when a "prolonged sound" or a syllabic nasal "N" is at a location in the phoneme string two morae subsequent to the double consonant.
  • the invention comprises five main elements.
  • a character string translator accepts encoded character strings from a device, such as a keyboard, which is capable of inputting character strings electronically. Using a word memory, the character string translator converts each word in the input character string into a phoneme string corresponding to the characters in the word.
  • a variable phoneme detector detects those phonemes in the string the values of whose prosodic parameters may be modified due to the existence of an influencing phoneme at a location in the phoneme string which is at least one mora removed from the location of the variable phoneme. If variable phonemes are detected in the phoneme string, a search is made by an influencing phoneme detector for influencing phonemes at the location in the phoneme string indicated by the variable phoneme detector.
  • the variable phoneme detector is associated with a variable phoneme memory which stores, along with each variable phoneme, the predetermined location at which an influencing phoneme will influence the value of each parameter of the variable phoneme. If an influencing phoneme is detected at the appropriate location in relation to the variable phoneme the influencing phoneme detector will output data representative of a modification in the value of a selected parameter (duration, pitch or power) of the variable phoneme.
  • the phoneme string is then delivered to a parameter value determining unit, which stores standard values of the parameters for all phonemes. Standard values may be modified in response to the modification data supplied by the influencing phoneme detector.
  • the phoneme string, parameter values, and modification data are supplied to a well-known parametric synthesizer, which assembles them into synthetic speech.
  • this invention since this invention is an electronic apparatus, it does not perform operations on "characters” or “phonemes” but rather on electrical codes representing characters and phonemes. This fact will be silently recognized throughout the specification.
  • Fig. 1 is a block diagram of the improved speech synthesis apparatus.
  • An encoded character string is input from a device (not shown) such as a keyboard having character kcys, a memory device which stores character strings (sech as used in a word processor), or a communication device receiving character strings through communication lines.
  • a device such as a keyboard having character kcys, a memory device which stores character strings (sech as used in a word processor), or a communication device receiving character strings through communication lines.
  • Character string translator 1 translates the character codes making up the character string into phoneme codes representing a phoneme string, using word memory 2.
  • the phoneme string is supplied to variable phoneme detector 3, which detects variable phonemes.
  • these variable phonemes include the double consonant (hereinafter indicated by "Q"), syllabic nasal “N”, and prolonged sound (hereinafter indicated by "L").
  • variable phonemes are stored in variable phoneme memory 4 which is used by detector 3 in the detection of variable phonemes. As is described later, if a variable phoneme is detected, it is associated with attribution data from which parameter value modifications may be determined.
  • the output of detector 3 includes information regarding the predetermined location at which an influencing phoneme must be found in order to influence the value of a parameter (the duration, in the preferred embodiment) of the detected variable phoneme.
  • Influencing phoneme detector 5 determines whether there exists an influencing phoneme, such as a double consonant, syllabic nasal "N" or prolonged sound at the location Indicated by detector 3.
  • Parameter value determining unit 7 stores the normal values of parameters such as the duration of the phoneme, its pitch or its power. If both the variable phoneme and a corresponding influencing phoneme are detected, unit 7 outputs a modified value of the given parameter to parametric synthesizer 8, which may be any well-known synthesizer, for example formant-type, Parcore-type, and Cepstrum-type synthesizer.
  • Fig. 2 shows in more detail 'the character string translator 1 of Fig. 1.
  • a character string (which normally constitutes one word) enters input register 101 under control of input control circuit 102.
  • Read control circuit 103 supplies an initial address to word memory 2, and a word is read out into register 104.
  • Word memory 2 stores a plurality of words (in segment 104A) together with the corresponding phoneme strings into which the words are translated (in segment 104B).
  • the word from segment 104A is supplied to comparator 100 and compared with the content of input register 101. If the input character string In input register 101 is not the same as the character string from segment 104A, comparator 105 produces a signal on line 106.
  • read control circuit 103 increments its internal counter (not shown) and provides the incremented address to word memory 2, so that the next word in memory 2 (together with the corresponding phoneme string) is read out Into register 104. These operations are repeated until comparator 105 detects identity between the input character string and the character string retrieved from memory 2,
  • Fig. 3 shows in more detail the variable phoneme detector 3 of Fig. 1.
  • Input control circuit 301 accesses phoneme string memory 108 and successively transfers phonemes into buffer 302.
  • Most Japanese speech consists of four morae, as follows (a "mora" is a basic unit of time in speech; it may contain one or more phonemes):
  • Read control circuit 303 supplies an initial address to variable phoneme memory 4, which stores variable phonemes in association with the predetermined location in the phoneme string at which an Influencing phoneme will influence the value of the duration of each variable phoneme.
  • Variable phoneme data from memory 4 is written into register 304, which consists of a phoneme segment 304A and a relative location segment 304B.
  • the relative location data in relative location segment 304B consists of at least one location in the phoneme string, relative to the variable phoneme, where an influencing phoneme may be located to influence the duration of the variable phoneme.
  • Comparator 305 compares the phonemes in phoneme buffer 302 and the phoneme segment 304A of register 304. If they are not identical, comparator 305 produces a signal on Iine 306. Responding to the signal on line 306, read control circuit 303 increments its internal counter (not shown) and provides the incremented address to variable phoneme memory 4, so that the next variable phoneme in memory 4 is read out into register 304.
  • comparator 305 detects identity between the phoneme in buffer 302 and the phoneme retrieved from memory 4 as stored in segment 304A, or until all variable phonemes in memory 4 have been retrieved.
  • comparator 305 produces a signal on line 307, causing the phoneme from buffer 302 and the relative location data in segment 304B (through selector 308) to be written into memory 309.
  • the selector 308 normally selects the relative location data from segment 304B; however, if identity is never detected by comparator 305, read control circuit 303 outputs an END signal on line 310, causing selector 308 to select "0" as the relative looation data, indicating that the phonemes are not variable phonemes.
  • Fig. 4 shows in more detail the Influencing phoneme detector 5 of Fig. 1.
  • Read control circuit 501 controls counter 502, whose content indicates an address memory 309.
  • the present address in counter 502 is supplied to memory 309 through adder 503 which normally adds "O" to the present address.
  • the data read out from memory 309 are provided to register 505 through selector 504 which normally selects register 506.
  • the data consist of phonemes (in segment 505A) and relative locationa (in segment 505B).
  • the relative location information in segment 505B is supplied to zero detector which produces a signal on line 507 when the relative location information in segmenet 505B is "O" or on line 508 when the relative location informetion is not "O".
  • the signal on the line 508 is provided to read control circuit 501 to prevent the incrementing of counter 502, to the selector 504 to select register 506, and to read control circuit 510 to supply an Initial address to parameter value variation memory 6, That is, when the relative location information is not "O", the relative location is added to the present address in counter 502 by adder 503, and the sum is supplied as an address to memory 309 so that the corresponding data is read from memory 309 and stored in register 506 via selector 504.
  • Parameter value variation memory 6 is accessed by read control circuit 510 and supplies to register 512 data consisting of a phoneme, in segment 512A (an influencing phoneme) and, in segment 512B, attribution data from which the change in duration of the variable phoneme may be determined.
  • Comparator 513 compares the phoneme in segment 512A with that in segment 506A, producing a signal on line 514 if it detects identity and on line 515 otherwise. Responding to the signal on line 514, the phoneme'in segment 505A and the attribution data In segment 512B are written into a checked data memory 520. On the o her hand, the signal on line 515 is provided to read control 510 to increment an inner counter (not shown) and output the next address to memory 511.
  • the comparator compares the phoneme in register 50f with successively retrieved phonemes in register 512. If none of the phonemes in memory 511 is the same as with the phoneme in register 506, read control circuit 510 outputs an END signal to selector 516. Selector 516 usually selects the attribution data segment of memory 512 as input data but instead selects "O" upon receipt of an END signal. Therefore, the phoneme In segment 505A is written into checked data memory 520 with attribution data equal to "O". When zero detector 506 detects a "0", it outputs a signal on line 507. In this case, the phoneme in segment 505A is not a variable phoneme, so it is written into the checked data memory 520 with attribution data equal to "O".
  • Fig. 5 shows in more detail the parameter value determining unit 7 of Fig. 1.
  • Data consisting of phonemes in combination with attribution data are successively read from checked data memory 520 into register 701.
  • Parameter value memory 702 stores standard values of the parameters for every phoneme. (It is also possible, instead of storing all the values, to use a parameter calculator according to a phoneme code.)
  • a phoneme in segment 701A of register 701 is supplied to parameter value memory 702 as an address; memory 702 then outputs the corresponding parameter value to adder 703.
  • Modifying data memory 704 stores parameter value modifications and outputs them when addressed by attribution data from segment 701B.
  • the standard parameter values from memory 702 and the modification data from memory 704 are added by adder 703 and the sums supplied along with the phoneme string to parametric synthesizer 8 of Fig. 1 for assembly into synthetic speech.
  • a pitch modifying circuit 710 and a power modifying circuit 720 may be provided for modifying pitch or power data in a similar manner.
  • Table 1 shows how the duration of a phoneme is modified.
  • character string translator 1 translates it into the phoneme string KE/Q/SE/N, where "/" indicates divisions between morae.
  • Variable phoneme detector 3 detects a variable phoneme "Q" in the second location; consequently, influencing phoneme detector 5 should search for an influencing phoneme.
  • the standard duration t m of a double consonant in Japanese is 170 ms, and additional duration t a is, for example, 50 ms.
  • clear and naturalized speech can be synthesized by considering the influence of a non-adjacent phoneme on the duration of a variable phoneme.
  • Fig. 6 is a block diagram of another embodiment of this invention. Independent vowel detector 11, neighboring vowel detector 12, and prolonged sound transforming unit 13 are insertea between character string translator 1 and variable phoneme detector 3 of Fig. 1. The remaining elements of Fig. 6 are the same as in Fig. 1, so their descriptions are omitted here.
  • Independent vowel detector 11 including a comparator (not shown), detects whether the phoneme string includes independent vowels O, U, or I. If the phoneme string includes such vowels, neighboring vowel detector 12, also Including a comparator, detects the identity of the phoneme immediately preceding the detected vowel. Then prolonged sound transforming unit 13, Including a code converter, transforms the detected independent vowel into the prolonged sound L if and only if the combination of the detected independent vowel and the immediately preceding phoneme fall into one of the following cate- goriesi
  • Table 2 shows examples of these cases.
  • Table 3 shows examples of the determination of phoneme duration.
  • the character string "dangan” meaning “bullet”

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
EP84305918A 1983-08-31 1984-08-30 Sprachsyntheseeinrichtung Expired EP0139419B1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP58157719A JPS6050600A (ja) 1983-08-31 1983-08-31 規則合成方式
JP157719/83 1983-08-31

Publications (2)

Publication Number Publication Date
EP0139419A1 true EP0139419A1 (de) 1985-05-02
EP0139419B1 EP0139419B1 (de) 1988-06-08

Family

ID=15655874

Family Applications (1)

Application Number Title Priority Date Filing Date
EP84305918A Expired EP0139419B1 (de) 1983-08-31 1984-08-30 Sprachsyntheseeinrichtung

Country Status (3)

Country Link
EP (1) EP0139419B1 (de)
JP (1) JPS6050600A (de)
DE (1) DE3472021D1 (de)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2037623A2 (es) * 1991-11-06 1993-06-16 Korea Telecommunication Metodo y dispositivo de sintesis del habla.
US5546500A (en) * 1993-05-10 1996-08-13 Telia Ab Arrangement for increasing the comprehension of speech when translating speech from a first language to a second language
US5745650A (en) * 1994-05-30 1998-04-28 Canon Kabushiki Kaisha Speech synthesis apparatus and method for synthesizing speech from a character series comprising a text and pitch information
JP2010091829A (ja) * 2008-10-09 2010-04-22 Alpine Electronics Inc 音声合成装置、音声合成方法および音声合成プログラム
CN111177542A (zh) * 2019-12-20 2020-05-19 贝壳技术有限公司 介绍信息的生成方法和装置、电子设备和存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0631999B2 (ja) * 1985-03-29 1994-04-27 株式会社東芝 音声合成装置
JP2749804B2 (ja) * 1986-08-15 1998-05-13 株式会社リコー 韻律制御方式
JP2757867B2 (ja) * 1987-12-02 1998-05-25 テイボー株式会社 無機質製ペン先の製造方法
DE8914353U1 (de) * 1989-12-06 1990-02-15 Schwan-Stabilo Schwanhäußer GmbH & Co, 8500 Nürnberg Gerät zum Auftragen von fließfähiger Kosmetiktusche, insbesondere Wimperntusche

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1592473A (en) * 1976-09-08 1981-07-08 Edinen Zentar Phys Method and apparatus for synthesis of speech

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1592473A (en) * 1976-09-08 1981-07-08 Edinen Zentar Phys Method and apparatus for synthesis of speech

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
EDN - ELECTRICAL DESIGN NEWS, vol. 25, no. 14, August 1980, pages 99-103, Denver, US; E. TEJA: "Versatile voice output demands sophisticated software" *
ICASSP 80 - PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 9th-11th April 1980, vol. 2, pages 576-579, IEEE, New York, US; J. BERNSTEIN et al.: "Unlimited text-to-speech system: description and evaluation of a microprocessor based device" *
ICASSP 82 - PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 3rd-5th May 1982, Paris, FR, vol. 3, pages 1589-1592, IEEE, New York, US; D.H. KLATT: "The klattalk text-to-speech conversion system" *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2037623A2 (es) * 1991-11-06 1993-06-16 Korea Telecommunication Metodo y dispositivo de sintesis del habla.
AT400646B (de) * 1991-11-06 1996-02-26 Korea Telecommunication Sprachsegmentkodierungs- und tonlagensteuerungsverfahren für sprachsynthesesysteme und synthesevorrichtung
US5546500A (en) * 1993-05-10 1996-08-13 Telia Ab Arrangement for increasing the comprehension of speech when translating speech from a first language to a second language
US5745650A (en) * 1994-05-30 1998-04-28 Canon Kabushiki Kaisha Speech synthesis apparatus and method for synthesizing speech from a character series comprising a text and pitch information
JP2010091829A (ja) * 2008-10-09 2010-04-22 Alpine Electronics Inc 音声合成装置、音声合成方法および音声合成プログラム
CN111177542A (zh) * 2019-12-20 2020-05-19 贝壳技术有限公司 介绍信息的生成方法和装置、电子设备和存储介质

Also Published As

Publication number Publication date
DE3472021D1 (en) 1988-07-14
JPS6050600A (ja) 1985-03-20
EP0139419B1 (de) 1988-06-08

Similar Documents

Publication Publication Date Title
KR900009170B1 (ko) 규칙합성형 음성합성시스템
EP1213705B1 (de) Verfahren und Anordnung zur Sprachsysnthese
US5396577A (en) Speech synthesis apparatus for rapid speed reading
US3704345A (en) Conversion of printed text into synthetic speech
EP0723696B1 (de) Sprachsynthese
US5220629A (en) Speech synthesis apparatus and method
US6778962B1 (en) Speech synthesis with prosodic model data and accent type
US4393460A (en) Simultaneous electronic translation device
US5463713A (en) Synthesis of speech from text
US7558732B2 (en) Method and system for computer-aided speech synthesis
EP0282272B1 (de) Spracherkennungssystem
US6035272A (en) Method and apparatus for synthesizing speech
US5715368A (en) Speech synthesis system and method utilizing phenome information and rhythm imformation
WO2004066271A1 (ja) 音声合成装置,音声合成方法および音声合成システム
US5950162A (en) Method, device and system for generating segment durations in a text-to-speech system
EP0139419B1 (de) Sprachsyntheseeinrichtung
US6847932B1 (en) Speech synthesis device handling phoneme units of extended CV
EP0144731B1 (de) Sprachsynthesizer
JPH1115497A (ja) 氏名読み音声合成装置
JP3201329B2 (ja) 音声合成装置
JPH06119144A (ja) 文書読み上げ装置
EP1777697A2 (de) Verfahren und Vorrichtung zur Sprachsynthese ohne Änderung der Prosodie
JP2801622B2 (ja) テキスト音声合成方法
JP2003005776A (ja) 音声合成装置
JP3414326B2 (ja) 音声合成用辞書登録装置及び方法

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Designated state(s): DE FR GB

17P Request for examination filed

Effective date: 19851018

17Q First examination report despatched

Effective date: 19861223

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REF Corresponds to:

Ref document number: 3472021

Country of ref document: DE

Date of ref document: 19880714

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 19960809

Year of fee payment: 13

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 19960821

Year of fee payment: 13

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 19960906

Year of fee payment: 13

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 19970830

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 19970830

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 19980430

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 19980501

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST