EP0205298A1 - Sprachsyntheseeinrichtung - Google Patents

Sprachsyntheseeinrichtung Download PDF

Info

Publication number
EP0205298A1
EP0205298A1 EP86304183A EP86304183A EP0205298A1 EP 0205298 A1 EP0205298 A1 EP 0205298A1 EP 86304183 A EP86304183 A EP 86304183A EP 86304183 A EP86304183 A EP 86304183A EP 0205298 A1 EP0205298 A1 EP 0205298A1
Authority
EP
European Patent Office
Prior art keywords
data
speech synthesis
frame
circuit
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP86304183A
Other languages
English (en)
French (fr)
Inventor
Kazuo c/o Patent Division Toshiba Corp. Takamori
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Publication of EP0205298A1 publication Critical patent/EP0205298A1/de
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems

Definitions

  • This invention relates to a PARCOR type speech synthesis device in which analysis data produced by speech analysis by the PARCOR method is stored in a memory device, and thereafter speech synthesis processing is carried out by reading out this analysis data from the memory device.
  • the original speech wave to be synthesized is separated into speech waveforms with 10 milliseconds or 20 milliseconds as a frame, and for each frame speech analysis is carried out, the amplitude data, frequency data and K parameter data which make up the PARCOR coefficients are generated and stored in a memory device as frame data; then for speech synthesis the above data values are read out from the memory device, and speech synthesis processing is carried out with the same frame length as was used in the analysis stage.
  • a large problem, however, with speech synthesis devices using various methods is that of reducing the data rate (bit rate) without losing the quality of the synthesized speech.
  • speech synthesis devices using the PARCOR method too various approaches to this problem have been tried, and of these the one generally adopted is the use of a 20 millisecond frame length. If the frame length is set to 20 milliseconds then the data quantity is reduced to a half compared with the case where the frame length is 10 milliseconds. When, however, the frame length is set to 20 milliseconds, consonant and plosive sounds and the like in the original speech cannot be extracted in the analysis, and therefore a defect of the synthesized speech is that sounds such as consonants and plosives cannot be realized.
  • sounds such as consonants and plosives can be extracted with a 10 millisecond frame length, but in this case, as described above, the data volume is increased, and there is the defect that the data compression is lost.
  • This invention is made in view of the above described state of affairs, and has as its object the provision of a speech synthesis device whereby sounds such as consonants and plosives which can only be realized with a short frame length can be synthesized, and in which a substantial amount of data compression is achieved.
  • sounds such as consonants and plosives included in the original speech which can only be realized with a 10 millisecond frame length are subjected to analysis with a frame length of 10 milliseconds, whereas normal sounds are subjected to analysis with a frame length of 20 milliseconds.
  • the frame data generated by the analysis- is appended for each frame a variable frame bit which indicates the frame length used for analysis, and this is stored in a memory device; in the speech generation circuit the speech synthesis is carried out using a frame length determined in accordance with the variable frame bit.
  • sounds such as consonants and plosives which.cannot be synthesized using the conventional frame length of 20 milliseconds can be synthesized using the 10 millisecond frame length. Furthermore, the proportion of sounds such as consonants and plosives in the generated speech is low, and in general the same quality of speech synthesis can be achieved using a 20 millisecond frame length, so that a substantial data compression can be carried out.
  • Fig. 1 is a block diagram showing the structure of a speech synthesis device of the present invention.
  • numeral 10 is a data memory, in which is stored the frame data which is the analysis data for each frame generated by the PARCOR speech analysis method and the variable frame bit (VFB) corresponding to the frame length used in the analysis for each frame.
  • This data memory 10 has an address specified by the output of an address counter 11, and previously stored data, that is a plurality of bits, is read out in parallel from the data area specified by this address counter. The data read out from this data memory 10 is applied to a parallel to serial conversion circuit 12.
  • This parallel to serial conversion circuit 12 converts the data read out from the data memory 10 in parallel to serial data and outputs it; in response to a control signal Al output from a control circuit described below the next frame data is output after a certain time interval.
  • This serial data is applied to a serial to parallel conversion circuit 13.
  • This serial to parallel conversion circuit 13 stores the serial data output by parallel to serial conversion circuit 12 and outputs the stored data in parallel at a fixed timing.
  • the parallel data output from this serial to parallel conversion circuit 13 is applied to a control circuit 14 and a PARCOR speech synthesis circuit 15.
  • PARCOR speech synthesis circuit 15 is provided with an input data temporary memory circuit 16 which stores the parallel data output from serial to parallel conversion circuit 13, and PARCOR speech synthesis circuit 15 uses this- data stored in memory circuit 16, and selecting either of at least two different frame lengths carries out sequentially PARCOR speech synthesis processing.
  • Control circuit 14 comprises a timing generating circuit 14a and a discriminating circuit 14b and has a data read out function to output an increment signal to increment address counter 11, a variable frame bit (VFB) discrimination function to discriminate the content of the variable frame bit (VFB) applied through serial to parallel conversion circuit 13, an output control function to output the control signal to control the interval of outputting the next frame data from parallel to serial conversion circuit 12, and a frame length selection control function to control the selection operation for the frame length during speech synthesis in PARCOR speech synthesis circuit 15 according to the timing control in input data temporary memory circuit 16 within PARCOR speech synthesis circuit 15 and the discrimination result of the datas discrimination function.
  • VFB variable frame bit
  • Fig. 2 is a portion of a block diagram showing discriminating circuit 14b in detail.
  • Discriminating circuit 14b is equipped with a VFB discriminating circuit 20 and the other discriminating circuit 22.
  • VFB discriminating circuit 20 is made by a latch circuit. After the reset, VFB data is input to a terminal D and is latched in the latch circuit of discriminating circuit 20. And a VFB data is output to timing generating circuit 14a according to latch clock. The VFB data stored in VFB discriminating circuit 20 is held until a control signal is output from timing generating circuit 14a. The discrimination of a unvoiced sounds, voiced sounds and so on is discriminated in the other discriminaing circuit 22.
  • Fig. 3 is a waveform diagram of the sound "pa" including a plosive
  • Fig. 4 illustrates an example of the way in which data is stored when the result of analysis according to the PARCOR speech analysis method of a sound having a waveform as shown in Fig. 3 is stored in data memory 10.
  • a frame length of 10 milliseconds or 20 milliseconds is used selectively.
  • VFB variable frame bit
  • VFB is the variable frame bit (VFB) indicating whether the frame length used in the analysis is 10 milliseconds or 20 milliseconds, and in the first frames t1 and t2 in which the 10 millisecond frame length is selected the value thereof is "I", while in frames t3 and thereafter in which the 20 millisecond frame length is selected the value thereof is "0".
  • VFB variable frame bit
  • the frame data comprises amplitude data (AMP data), frequency data (PITCH data) and a plurality of K parameter data values which are the PARCOR coefficients;
  • AMP data amplitude data
  • PITCH data frequency data
  • K parameter data values are each of n bits, so that the plurality of K parameter data values indicated by Kl to Kj give a total K parameter data amount of n x j bits.
  • 1 + m + n x j is 48 bits).
  • the word length of data memory 10 is 8 bits (1 byte)
  • the first byte will hold the variable frame bit (VFB)
  • the AMP data and a part of the PITCH data being a total of 8 bits
  • the second byte will hold the remainder of the PITCH data and a part of the K parameter data being a total of 8 bits
  • the third byte will hold the remainder of the K parameter data.
  • control circuit 14 applies an increment signal AO to address counter 11. Thereby, in data memory 10 an address indication corresponding to the sound to be produced is made, and data is read out from the first 8 bit memory area, and supplied to parallel to serial conversion circuit 12.
  • a control signal A1 is supplied by timing generating circuit 1 4a within control circuit 14 to parallel to serial conversion circuit 12, based on the supplied timing of this control signal A1 parallel to serial conversion circuit 12 outputs the 8 bits of data to serial to parallel conversion circuit 13.
  • discrimination circuit 14b within control circuit 14 determines the value of this bit.
  • the output timing of the control signal is controlled by signal Al so that the next frame data will be output from parallel to serial conversion circuit 12 10 milliseconds later, and a signal A2 to select 10 milliseconds as the frame length to be used in speech synthesis by PARCOR speech synthesis circuit 15 and also a signal A3 to control the output timing of input data temporary memory circuit 16 are output.
  • control circuit 14 determines whether the next frame data will be output from parallel to serial conversion circuit 12 20 milliseconds later.
  • a signal A2 to select 20 milliseconds as the frame length to be used in speech synthesis by PARCOR speech synthesis circuit 15 is output.
  • PARCOR speech synthesis circuit 15 carries out speech synthesis for the next 10 millisecond or 20 millisecond time interval using the selected 10 millisecond or 20 millisecond frame length.
  • waveform a is a synthesized speech waveform of the original speech shown in Fig. 3, analyzed entirely with a frame length of 20 milliseconds and then synthesized; waveform b is a synthesized speech waveform similarly entirely analyzed with a frame length of 10 milliseconds and then synthesized; and waveform c is a sythesized speech waveform of the original speech shown in Fig. 3, analyzed with a variable frame length by the circuit of the above embodiment and then synthesized. It will be seen that waveform c has characteristics which cannot be realized in waveform a, and post-onset characteristics of the sound in which no difference is seen between a and b are also rendered satisfactorily.
  • a speech synthesis device in which sounds such as consonants and plosives which can only be realized with a short frame length can be synthesized, while a substantial data compression ratio can be achieved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Electrophonic Musical Instruments (AREA)
EP86304183A 1985-06-05 1986-06-02 Sprachsyntheseeinrichtung Withdrawn EP0205298A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP60121704A JPS61278900A (ja) 1985-06-05 1985-06-05 音声合成装置
JP121704/85 1985-06-05

Publications (1)

Publication Number Publication Date
EP0205298A1 true EP0205298A1 (de) 1986-12-17

Family

ID=14817812

Family Applications (1)

Application Number Title Priority Date Filing Date
EP86304183A Withdrawn EP0205298A1 (de) 1985-06-05 1986-06-02 Sprachsyntheseeinrichtung

Country Status (3)

Country Link
EP (1) EP0205298A1 (de)
JP (1) JPS61278900A (de)
KR (1) KR870000673A (de)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5630010A (en) * 1992-04-20 1997-05-13 Mitsubishi Denki Kabushiki Kaisha Methods of efficiently recording an audio signal in semiconductor memory
CN1127053C (zh) * 1995-09-30 2003-11-05 三星电子株式会社 用于鉴别话音信号的非话音和清音的方法和装置
DE4345252B4 (de) * 1992-04-20 2004-05-27 Mitsubishi Denki K.K. Verfahren zur Wiedergabe von digitalisierten Audiodaten aus einem Halbleiterspeicher

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0033510A2 (de) * 1980-02-04 1981-08-12 Texas Instruments Incorporated Vorrichtung zur Sprachsynthese und Verfahren zur Anregung des Filters dieser Vorrichtung
EP0045813A1 (de) * 1980-02-22 1982-02-17 Nippon Telegraph and Telephone Public Corporation Schrachsynthese-vorrichtung
EP0059832A2 (de) * 1981-03-05 1982-09-15 Texas Instruments Incorporated Integrierter Schaltkreis für die Sprachsynthese, der eine variable Rahmenlänge zulässt

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5872194A (ja) * 1981-10-23 1983-04-30 松下電器産業株式会社 音声分析符号化方式

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0033510A2 (de) * 1980-02-04 1981-08-12 Texas Instruments Incorporated Vorrichtung zur Sprachsynthese und Verfahren zur Anregung des Filters dieser Vorrichtung
EP0045813A1 (de) * 1980-02-22 1982-02-17 Nippon Telegraph and Telephone Public Corporation Schrachsynthese-vorrichtung
EP0059832A2 (de) * 1981-03-05 1982-09-15 Texas Instruments Incorporated Integrierter Schaltkreis für die Sprachsynthese, der eine variable Rahmenlänge zulässt

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH & SIGNAL PROCESSING, Tulsa, Oklahoma, US, 10th-12th April 1978, pages 454-457, IEEE, New York, US; J.M. TURNER et al.: "A variable frame length linear predictive coder" *
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. ASSP-25, no. 4, August 1977, pages 322-330, New York, US; S. CHANDRA et al.: "Linear prediction with a variable analysis frame size" *
PROCEEDINGS OF THE NATIONAL ELECTRONICS CONFERENCE, 27th-29th October 1980, pages 540-543, Chicago, US; A.S. YATAGAI: "Speech parameters: the driving force for synthesizer circuits" *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5630010A (en) * 1992-04-20 1997-05-13 Mitsubishi Denki Kabushiki Kaisha Methods of efficiently recording an audio signal in semiconductor memory
US5752221A (en) * 1992-04-20 1998-05-12 Mitsubishi Denki Kabushiki Kaisha Method of efficiently recording an audio signal in semiconductor memory
US5774843A (en) * 1992-04-20 1998-06-30 Mitsubishi Denki Kabushiki Kaisha Methods of efficiently recording an audio signal in semiconductor memory
US5864801A (en) * 1992-04-20 1999-01-26 Mitsubishi Denki Kabushiki Kaisha Methods of efficiently recording and reproducing an audio signal in a memory using hierarchical encoding
DE4345252B4 (de) * 1992-04-20 2004-05-27 Mitsubishi Denki K.K. Verfahren zur Wiedergabe von digitalisierten Audiodaten aus einem Halbleiterspeicher
CN1127053C (zh) * 1995-09-30 2003-11-05 三星电子株式会社 用于鉴别话音信号的非话音和清音的方法和装置

Also Published As

Publication number Publication date
JPS61278900A (ja) 1986-12-09
KR870000673A (ko) 1987-02-19

Similar Documents

Publication Publication Date Title
US5524172A (en) Processing device for speech synthesis by addition of overlapping wave forms
US4278838A (en) Method of and device for synthesis of speech from printed text
US4709390A (en) Speech message code modifying arrangement
US6125346A (en) Speech synthesizing system and redundancy-reduced waveform database therefor
US4685135A (en) Text-to-speech synthesis system
US4398059A (en) Speech producing system
EP0059880A2 (de) System zur Synthese der Sprache aus einem Text
US5463715A (en) Method and apparatus for speech generation from phonetic codes
EP0047175A1 (de) Sprachsynthesizer
US20040054537A1 (en) Text voice synthesis device and program recording medium
EP0351848A2 (de) Einrichtung zur Sprachsynthese
EP0205298A1 (de) Sprachsyntheseeinrichtung
EP0042590A1 (de) Phonemen-Extraktor
EP0139419A1 (de) Sprachsyntheseeinrichtung
EP0107945B1 (de) Einrichtung zur Sprachsynthese
US5729657A (en) Time compression/expansion of phonemes based on the information carrying elements of the phonemes
EP0144731B1 (de) Sprachsynthesizer
JP3087761B2 (ja) 音声処理方法及び音声処理装置
JPS6053999A (ja) 音声合成器
US5171928A (en) Memory for electronic recording apparatus using standard melody note-length table
JPS5912188B2 (ja) 音声情報圧縮方法
JPS58113992A (ja) 音声信号圧縮方式
JPS6028697A (ja) 音声有音無音切換装置
JPS60113299A (ja) 音声合成装置
JPS59162597A (ja) 音声合成装置

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19860612

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB

17Q First examination report despatched

Effective date: 19880905

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19890117

RIN1 Information on inventor provided before grant (corrected)

Inventor name: TAKAMORI, KAZUOC