EP0205298A1 - Sprachsyntheseeinrichtung - Google Patents
Sprachsyntheseeinrichtung Download PDFInfo
- Publication number
- EP0205298A1 EP0205298A1 EP86304183A EP86304183A EP0205298A1 EP 0205298 A1 EP0205298 A1 EP 0205298A1 EP 86304183 A EP86304183 A EP 86304183A EP 86304183 A EP86304183 A EP 86304183A EP 0205298 A1 EP0205298 A1 EP 0205298A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- speech synthesis
- frame
- circuit
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 50
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 50
- 238000001308 synthesis method Methods 0.000 claims abstract 2
- 238000004458 analytical method Methods 0.000 claims description 25
- 238000006243 chemical reaction Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 7
- 238000013144 data compression Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000000034 method Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
Definitions
- This invention relates to a PARCOR type speech synthesis device in which analysis data produced by speech analysis by the PARCOR method is stored in a memory device, and thereafter speech synthesis processing is carried out by reading out this analysis data from the memory device.
- the original speech wave to be synthesized is separated into speech waveforms with 10 milliseconds or 20 milliseconds as a frame, and for each frame speech analysis is carried out, the amplitude data, frequency data and K parameter data which make up the PARCOR coefficients are generated and stored in a memory device as frame data; then for speech synthesis the above data values are read out from the memory device, and speech synthesis processing is carried out with the same frame length as was used in the analysis stage.
- a large problem, however, with speech synthesis devices using various methods is that of reducing the data rate (bit rate) without losing the quality of the synthesized speech.
- speech synthesis devices using the PARCOR method too various approaches to this problem have been tried, and of these the one generally adopted is the use of a 20 millisecond frame length. If the frame length is set to 20 milliseconds then the data quantity is reduced to a half compared with the case where the frame length is 10 milliseconds. When, however, the frame length is set to 20 milliseconds, consonant and plosive sounds and the like in the original speech cannot be extracted in the analysis, and therefore a defect of the synthesized speech is that sounds such as consonants and plosives cannot be realized.
- sounds such as consonants and plosives can be extracted with a 10 millisecond frame length, but in this case, as described above, the data volume is increased, and there is the defect that the data compression is lost.
- This invention is made in view of the above described state of affairs, and has as its object the provision of a speech synthesis device whereby sounds such as consonants and plosives which can only be realized with a short frame length can be synthesized, and in which a substantial amount of data compression is achieved.
- sounds such as consonants and plosives included in the original speech which can only be realized with a 10 millisecond frame length are subjected to analysis with a frame length of 10 milliseconds, whereas normal sounds are subjected to analysis with a frame length of 20 milliseconds.
- the frame data generated by the analysis- is appended for each frame a variable frame bit which indicates the frame length used for analysis, and this is stored in a memory device; in the speech generation circuit the speech synthesis is carried out using a frame length determined in accordance with the variable frame bit.
- sounds such as consonants and plosives which.cannot be synthesized using the conventional frame length of 20 milliseconds can be synthesized using the 10 millisecond frame length. Furthermore, the proportion of sounds such as consonants and plosives in the generated speech is low, and in general the same quality of speech synthesis can be achieved using a 20 millisecond frame length, so that a substantial data compression can be carried out.
- Fig. 1 is a block diagram showing the structure of a speech synthesis device of the present invention.
- numeral 10 is a data memory, in which is stored the frame data which is the analysis data for each frame generated by the PARCOR speech analysis method and the variable frame bit (VFB) corresponding to the frame length used in the analysis for each frame.
- This data memory 10 has an address specified by the output of an address counter 11, and previously stored data, that is a plurality of bits, is read out in parallel from the data area specified by this address counter. The data read out from this data memory 10 is applied to a parallel to serial conversion circuit 12.
- This parallel to serial conversion circuit 12 converts the data read out from the data memory 10 in parallel to serial data and outputs it; in response to a control signal Al output from a control circuit described below the next frame data is output after a certain time interval.
- This serial data is applied to a serial to parallel conversion circuit 13.
- This serial to parallel conversion circuit 13 stores the serial data output by parallel to serial conversion circuit 12 and outputs the stored data in parallel at a fixed timing.
- the parallel data output from this serial to parallel conversion circuit 13 is applied to a control circuit 14 and a PARCOR speech synthesis circuit 15.
- PARCOR speech synthesis circuit 15 is provided with an input data temporary memory circuit 16 which stores the parallel data output from serial to parallel conversion circuit 13, and PARCOR speech synthesis circuit 15 uses this- data stored in memory circuit 16, and selecting either of at least two different frame lengths carries out sequentially PARCOR speech synthesis processing.
- Control circuit 14 comprises a timing generating circuit 14a and a discriminating circuit 14b and has a data read out function to output an increment signal to increment address counter 11, a variable frame bit (VFB) discrimination function to discriminate the content of the variable frame bit (VFB) applied through serial to parallel conversion circuit 13, an output control function to output the control signal to control the interval of outputting the next frame data from parallel to serial conversion circuit 12, and a frame length selection control function to control the selection operation for the frame length during speech synthesis in PARCOR speech synthesis circuit 15 according to the timing control in input data temporary memory circuit 16 within PARCOR speech synthesis circuit 15 and the discrimination result of the datas discrimination function.
- VFB variable frame bit
- Fig. 2 is a portion of a block diagram showing discriminating circuit 14b in detail.
- Discriminating circuit 14b is equipped with a VFB discriminating circuit 20 and the other discriminating circuit 22.
- VFB discriminating circuit 20 is made by a latch circuit. After the reset, VFB data is input to a terminal D and is latched in the latch circuit of discriminating circuit 20. And a VFB data is output to timing generating circuit 14a according to latch clock. The VFB data stored in VFB discriminating circuit 20 is held until a control signal is output from timing generating circuit 14a. The discrimination of a unvoiced sounds, voiced sounds and so on is discriminated in the other discriminaing circuit 22.
- Fig. 3 is a waveform diagram of the sound "pa" including a plosive
- Fig. 4 illustrates an example of the way in which data is stored when the result of analysis according to the PARCOR speech analysis method of a sound having a waveform as shown in Fig. 3 is stored in data memory 10.
- a frame length of 10 milliseconds or 20 milliseconds is used selectively.
- VFB variable frame bit
- VFB is the variable frame bit (VFB) indicating whether the frame length used in the analysis is 10 milliseconds or 20 milliseconds, and in the first frames t1 and t2 in which the 10 millisecond frame length is selected the value thereof is "I", while in frames t3 and thereafter in which the 20 millisecond frame length is selected the value thereof is "0".
- VFB variable frame bit
- the frame data comprises amplitude data (AMP data), frequency data (PITCH data) and a plurality of K parameter data values which are the PARCOR coefficients;
- AMP data amplitude data
- PITCH data frequency data
- K parameter data values are each of n bits, so that the plurality of K parameter data values indicated by Kl to Kj give a total K parameter data amount of n x j bits.
- 1 + m + n x j is 48 bits).
- the word length of data memory 10 is 8 bits (1 byte)
- the first byte will hold the variable frame bit (VFB)
- the AMP data and a part of the PITCH data being a total of 8 bits
- the second byte will hold the remainder of the PITCH data and a part of the K parameter data being a total of 8 bits
- the third byte will hold the remainder of the K parameter data.
- control circuit 14 applies an increment signal AO to address counter 11. Thereby, in data memory 10 an address indication corresponding to the sound to be produced is made, and data is read out from the first 8 bit memory area, and supplied to parallel to serial conversion circuit 12.
- a control signal A1 is supplied by timing generating circuit 1 4a within control circuit 14 to parallel to serial conversion circuit 12, based on the supplied timing of this control signal A1 parallel to serial conversion circuit 12 outputs the 8 bits of data to serial to parallel conversion circuit 13.
- discrimination circuit 14b within control circuit 14 determines the value of this bit.
- the output timing of the control signal is controlled by signal Al so that the next frame data will be output from parallel to serial conversion circuit 12 10 milliseconds later, and a signal A2 to select 10 milliseconds as the frame length to be used in speech synthesis by PARCOR speech synthesis circuit 15 and also a signal A3 to control the output timing of input data temporary memory circuit 16 are output.
- control circuit 14 determines whether the next frame data will be output from parallel to serial conversion circuit 12 20 milliseconds later.
- a signal A2 to select 20 milliseconds as the frame length to be used in speech synthesis by PARCOR speech synthesis circuit 15 is output.
- PARCOR speech synthesis circuit 15 carries out speech synthesis for the next 10 millisecond or 20 millisecond time interval using the selected 10 millisecond or 20 millisecond frame length.
- waveform a is a synthesized speech waveform of the original speech shown in Fig. 3, analyzed entirely with a frame length of 20 milliseconds and then synthesized; waveform b is a synthesized speech waveform similarly entirely analyzed with a frame length of 10 milliseconds and then synthesized; and waveform c is a sythesized speech waveform of the original speech shown in Fig. 3, analyzed with a variable frame length by the circuit of the above embodiment and then synthesized. It will be seen that waveform c has characteristics which cannot be realized in waveform a, and post-onset characteristics of the sound in which no difference is seen between a and b are also rendered satisfactorily.
- a speech synthesis device in which sounds such as consonants and plosives which can only be realized with a short frame length can be synthesized, while a substantial data compression ratio can be achieved.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Electrophonic Musical Instruments (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP60121704A JPS61278900A (ja) | 1985-06-05 | 1985-06-05 | 音声合成装置 |
JP121704/85 | 1985-06-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
EP0205298A1 true EP0205298A1 (de) | 1986-12-17 |
Family
ID=14817812
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP86304183A Withdrawn EP0205298A1 (de) | 1985-06-05 | 1986-06-02 | Sprachsyntheseeinrichtung |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0205298A1 (de) |
JP (1) | JPS61278900A (de) |
KR (1) | KR870000673A (de) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5630010A (en) * | 1992-04-20 | 1997-05-13 | Mitsubishi Denki Kabushiki Kaisha | Methods of efficiently recording an audio signal in semiconductor memory |
CN1127053C (zh) * | 1995-09-30 | 2003-11-05 | 三星电子株式会社 | 用于鉴别话音信号的非话音和清音的方法和装置 |
DE4345252B4 (de) * | 1992-04-20 | 2004-05-27 | Mitsubishi Denki K.K. | Verfahren zur Wiedergabe von digitalisierten Audiodaten aus einem Halbleiterspeicher |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0033510A2 (de) * | 1980-02-04 | 1981-08-12 | Texas Instruments Incorporated | Vorrichtung zur Sprachsynthese und Verfahren zur Anregung des Filters dieser Vorrichtung |
EP0045813A1 (de) * | 1980-02-22 | 1982-02-17 | Nippon Telegraph and Telephone Public Corporation | Schrachsynthese-vorrichtung |
EP0059832A2 (de) * | 1981-03-05 | 1982-09-15 | Texas Instruments Incorporated | Integrierter Schaltkreis für die Sprachsynthese, der eine variable Rahmenlänge zulässt |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5872194A (ja) * | 1981-10-23 | 1983-04-30 | 松下電器産業株式会社 | 音声分析符号化方式 |
-
1985
- 1985-06-05 JP JP60121704A patent/JPS61278900A/ja active Pending
-
1986
- 1986-06-02 EP EP86304183A patent/EP0205298A1/de not_active Withdrawn
- 1986-06-04 KR KR1019860004425A patent/KR870000673A/ko not_active IP Right Cessation
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0033510A2 (de) * | 1980-02-04 | 1981-08-12 | Texas Instruments Incorporated | Vorrichtung zur Sprachsynthese und Verfahren zur Anregung des Filters dieser Vorrichtung |
EP0045813A1 (de) * | 1980-02-22 | 1982-02-17 | Nippon Telegraph and Telephone Public Corporation | Schrachsynthese-vorrichtung |
EP0059832A2 (de) * | 1981-03-05 | 1982-09-15 | Texas Instruments Incorporated | Integrierter Schaltkreis für die Sprachsynthese, der eine variable Rahmenlänge zulässt |
Non-Patent Citations (3)
Title |
---|
IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH & SIGNAL PROCESSING, Tulsa, Oklahoma, US, 10th-12th April 1978, pages 454-457, IEEE, New York, US; J.M. TURNER et al.: "A variable frame length linear predictive coder" * |
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. ASSP-25, no. 4, August 1977, pages 322-330, New York, US; S. CHANDRA et al.: "Linear prediction with a variable analysis frame size" * |
PROCEEDINGS OF THE NATIONAL ELECTRONICS CONFERENCE, 27th-29th October 1980, pages 540-543, Chicago, US; A.S. YATAGAI: "Speech parameters: the driving force for synthesizer circuits" * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5630010A (en) * | 1992-04-20 | 1997-05-13 | Mitsubishi Denki Kabushiki Kaisha | Methods of efficiently recording an audio signal in semiconductor memory |
US5752221A (en) * | 1992-04-20 | 1998-05-12 | Mitsubishi Denki Kabushiki Kaisha | Method of efficiently recording an audio signal in semiconductor memory |
US5774843A (en) * | 1992-04-20 | 1998-06-30 | Mitsubishi Denki Kabushiki Kaisha | Methods of efficiently recording an audio signal in semiconductor memory |
US5864801A (en) * | 1992-04-20 | 1999-01-26 | Mitsubishi Denki Kabushiki Kaisha | Methods of efficiently recording and reproducing an audio signal in a memory using hierarchical encoding |
DE4345252B4 (de) * | 1992-04-20 | 2004-05-27 | Mitsubishi Denki K.K. | Verfahren zur Wiedergabe von digitalisierten Audiodaten aus einem Halbleiterspeicher |
CN1127053C (zh) * | 1995-09-30 | 2003-11-05 | 三星电子株式会社 | 用于鉴别话音信号的非话音和清音的方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
JPS61278900A (ja) | 1986-12-09 |
KR870000673A (ko) | 1987-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5524172A (en) | Processing device for speech synthesis by addition of overlapping wave forms | |
US4278838A (en) | Method of and device for synthesis of speech from printed text | |
US4709390A (en) | Speech message code modifying arrangement | |
US6125346A (en) | Speech synthesizing system and redundancy-reduced waveform database therefor | |
US4685135A (en) | Text-to-speech synthesis system | |
US4398059A (en) | Speech producing system | |
EP0059880A2 (de) | System zur Synthese der Sprache aus einem Text | |
US5463715A (en) | Method and apparatus for speech generation from phonetic codes | |
EP0047175A1 (de) | Sprachsynthesizer | |
US20040054537A1 (en) | Text voice synthesis device and program recording medium | |
EP0351848A2 (de) | Einrichtung zur Sprachsynthese | |
EP0205298A1 (de) | Sprachsyntheseeinrichtung | |
EP0042590A1 (de) | Phonemen-Extraktor | |
EP0139419A1 (de) | Sprachsyntheseeinrichtung | |
EP0107945B1 (de) | Einrichtung zur Sprachsynthese | |
US5729657A (en) | Time compression/expansion of phonemes based on the information carrying elements of the phonemes | |
EP0144731B1 (de) | Sprachsynthesizer | |
JP3087761B2 (ja) | 音声処理方法及び音声処理装置 | |
JPS6053999A (ja) | 音声合成器 | |
US5171928A (en) | Memory for electronic recording apparatus using standard melody note-length table | |
JPS5912188B2 (ja) | 音声情報圧縮方法 | |
JPS58113992A (ja) | 音声信号圧縮方式 | |
JPS6028697A (ja) | 音声有音無音切換装置 | |
JPS60113299A (ja) | 音声合成装置 | |
JPS59162597A (ja) | 音声合成装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19860612 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): DE FR GB |
|
17Q | First examination report despatched |
Effective date: 19880905 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 19890117 |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: TAKAMORI, KAZUOC |