EP0144724A1 - Speech synthesizing apparatus - Google Patents

Speech synthesizing apparatus Download PDF

Info

Publication number
EP0144724A1
EP0144724A1 EP84113148A EP84113148A EP0144724A1 EP 0144724 A1 EP0144724 A1 EP 0144724A1 EP 84113148 A EP84113148 A EP 84113148A EP 84113148 A EP84113148 A EP 84113148A EP 0144724 A1 EP0144724 A1 EP 0144724A1
Authority
EP
European Patent Office
Prior art keywords
data
speech
sampling
pitch
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP84113148A
Other languages
German (de)
French (fr)
Inventor
Kazuo c/o Patent Division Sumita
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Publication of EP0144724A1 publication Critical patent/EP0144724A1/en
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers

Definitions

  • the present invention relates to a speech synthesizing apparatus for generating synthetic speech with excellent quality.
  • this kind of speech synthesizing apparatus usually comprises a vocal-tract-approximation-digital filter 1 for setting the feature of a speech to be synthesized in accordance with synthesis parameters; a random noise generator 2; a periodic sound generator 3 for generating a sound signal at a pitch interval that depends upon the fundamental frequency of the speech to be synthesized; a selector 4 for selectively supplying output signals from the generators 2 and 3 to the digital filter depending upon whether the speech to be synthesized is voiced speech or unvoiced speech; and a control circuit 5 for respectively supplying synthesis parameters and pitch data to the digital filter 1 and to periodic sound generator 3 in accordance with the speech to be synthesized, and for supplying sampling pulses-to these circuits.
  • the digital filter 1 receives a linear predictive coefficient, line spectral pair (LSP) parameter, cepstrum, etc. as synthesis parameters depending upon the speech synthesizing method, e.g., linear predictive coding, LSP method, cepstrum method, etc. and generates the speech data corresponding to the speech to be synthesized.
  • LSP line spectral pair
  • the voiced sound signal such as an impulse, triangular wave, glottal pulse wave, etc. which is generated from the periodic sound generator 3 at the pitch interval corresponding to the above-mentioned pitch data is supplied to the digital filter 1 through the selector 4.
  • a random noise from the random noise generator 2 is supplied to the digital filter 1 through the selector 4.
  • the digital filter 1 filtering-processes the voiced sound signal or random noise that is selectively supplied through the selector 4 in accordance with the synthesis parameters from the control circuit 5, thereby making the speech.
  • the synthesis parameter is updated, for example, at every frame of about 10 msec or at every time interval synchronized with the pitch period. Since the synthetic speech signal which is generated from the digital filter 1 is a discrete signal, it is converted to an analog signal by a D/A converter 6, and thereafter it is supplied to an electric-acoustic converter 7 through a low-pass filter 8 having a cut-off frequency which is less than half of the frequency of a sampling pulse which is supplied from the control circuit 5.
  • the voiced sound signal is substantially equal to the interval determined by the input pitch data and is generated from the periodic sound generator 3 synchronously with the sampling pulse. Then it is subjected to the filtering processing in the digital filter 1 synchronously with the sampling pulse.
  • the periodic sound generator 3 generates an impulse.
  • the pulses must be spaced by an integer multiple of the sampling period that is most approximate to the interval determined by the input pitch data. This impulse is supplied through the selector 4 to the digital filter 1, where it is subjected to filtering-process, generating the speech as shown in Fig. 2B.
  • the pitch period which is equal to N times the sampling period can be set with a relatively high degree of accuracy in accordance with the input pitch data.
  • the compass is high and its pitch period is short and where this pitch period frequently varies such as in the voice of a woman or child, it is difficult to approximate the pitch period corresponding to the input pitch data by using N times the sampling period.
  • the vibrato component is mixed with the synthetic speech, causing the sound quality to deteriorate.
  • such an incontinuous change of frequency in the synthetic speech is sensed as an unpleasant sound by audience.
  • the interval determined by the input pitch data is set to have a length which is N times the sampling period with a high degree of approximation by increasing the frequency of the sampling pulse which is used.
  • the interval determined by the input pitch data is set to have a length which is N times the sampling period with a high degree of approximation by increasing the frequency of the sampling pulse which is used.
  • a speech synthesizing apparatus comprising a control circuit for generating clock pulses, synthesis parameters and pitch data corresponding to a speech to be synthesized; a memory which has stored therein a predetermined number of sampling data corresponding to a predetermined number of sampling values within a predetermined time range of a continuous wave which is obtained by developing a voiced sound signal by use of an interpolation function; a readout control circuit for sequentially reading out the sampling data of the continuous wave within the predetermined time range that is synchronized with a pitch period represented by the input pitch data from the memory in response to the clock pulse; a digital filter for filtering-processing the sampling data read out synchronously with the clock pulse from that memory in accordance with the synthesis parameters generated from the control circuit; and a speech generator circuit for generating speech corresponding to the digital data from this digital filter.
  • the sampling values corresponding to the generation timings of the clock pulse of the continuous wave which is generated within a predetermined time range synchronously with the pitch period are sequentially generated. Therefore, irrespective of the length of the pitch period, a voiced sound signal corresponding to the continuous wave which is always synchronized with the pitch period is supplied to the digital filter so that a synthesized speech of excellent quality can be obtained.
  • Fig. 3 is a block diagram of a speech synthesizing apparatus according to one embodiment of the present invention.
  • This speech synthesizing apparatus is constituted similarly to the speech synthesizing apparatus shown in Fig. 1 except that it has a periodic sound generator circuit 10 in place of the periodic sound generator 3.
  • the periodic sound generator circuit 10 comprises a central processing circuit 100: a read only memory (ROM) 101; a random access memory (RAM) 102; an I/O port 103 which receives the pitch data from the control circuit 5; and an I/O port 104 which supplies the voiced sound data to the sound selector 4.
  • ROM read only memory
  • RAM random access memory
  • this interpolation function it is possible to use, for instance, a Lagrangean polynomial, a spline function or the like.
  • this continuous wave signal x a (t) is generated over the time interval from - ⁇ to + ⁇ ; however, the signal component x b (t) at a predetermined time range around time 0 of the continuous wave signal x a (t) can be substantially regarded to be equivalent to the continuous wave signal x a (t).
  • a time window ⁇ (t) such as a square window or a hamming window or the like which becomes 0 in the ranges of t ⁇ -2T and t ⁇ 2T and becomes 1 in the range of -2T ⁇ t ⁇ 2T
  • This signal component x b (t) is given by the following equation:
  • the sampling data SD(-2N)...SD(0)...SD(2N) correspond to the sampling data at the sampling points -2N...0...2N in Fig. 4D.
  • N is set to be 5
  • N can be set to another value.
  • the items of sampling data SD(-2N)...SD(0)...SD(2N) are respectively stored in memory areas M(-2N)...M(0)... M(2N) of the ROM 101. These memory areas M(-2N)... M(0)...2(2N) are respectively designated by address data A[0]...A[2N]...A[4N].
  • a pitch period PT of the voiced sound signal that should be synthesized is given by the following equation: where Pl and P2 are integers and 0 ⁇ P2 ⁇ N.
  • This equation (5) denotes that the deviation time data DT2 is obtained by the sum of the time that is given by the deviation time data DT1, i.e., the time between the starting time of the pitch period in the present cycle and the leading edge of the sampling pulse generated simultaneously with or immediately after the start of this pitch period, and the time difference between the fraction interval (interval corresponding to P2 x (T/N) in equation (4)) of the pitch interval expressed by the pitch data given in the next operation cycle and one sampling period T.
  • DT2/T has a value that is equal to or greater than 0 but less than 2.
  • this deviation time data DT2 which is subtracted by one sampling period T is written in the memory area MR2 as new deviation time data.
  • the pitch count data PCD was written in the memory area MR3 in this way, the contents of the memory area MR4 are cleared. Further, thereafter, in STEP 3, a check is made to see if count data CD of the memory area MR4 has a predetermined value Z or less.
  • This predetermined value Z denotes the amount of sampling data which is read out from the ROM 101 in each operation cycle and is set to 3 in this embodiment.
  • STEP 3 when it is detected that the count data CD becomes larger than the predetermined value Z, "0" is supplied to the digital filter, and the pitch count data PCD is decreased by one count without increasing the count data CD. Then, the apparatus waits for the input of the next sampling pulse. When the next sampling pulse is supplied in this state, STEP 1 is again executed. When the pitch count data PCD is determined to be larger than 0 in STEP 1, STEP 2 is executed.
  • the pitch count data PCD stored in the memory area MR3 is 0, the deviation time data DT2 which is given by the following equation is stored in the memory area MR2:
  • the sampling data SD(-2N) is read out from the memory area M(-2N) of the ROM 101 and is supplied to the digital filter 1 through the I/O port 104 and selector 4.
  • the content of the memory area MR4 is changed from 0 to 1
  • the content of the memory area MR3 is changed from 26 to 25.
  • the CPU 100 executes STEP 1 in response to the next sampling pulse. Since the pitch count data PCD is now 25, STEP 2 is executed. Since the count data CD is 1, the address data A[X2] is given by the following equation:
  • the count data CD of the memory area MR4 is increased by one count, while the count data PC D of the memory area MR3 is decreased by one count.
  • the address data A[0], A[5], A[10], and A[15] are - sequentially given to the ROM 101 in response to the first four sampling pulses, and then the sampling data SD(-10), SD(-5), SD(0), and SD(5) are sequentially read out from the ROM 101 and are given to the digital filter 1.
  • these items of sampling data SD(-10), SD(-5) and SD(5) are 0, while the sampling data SD(0) corresponds to an impulse of a predetermined amplitude X 0 .
  • the pitch count data CD becomes 0 after the 26th sampling pulse is generated at time 25T.
  • the CPU 100 transfers the contents of the memory area MR2 to the memory area MRl, and the contents DT2 of the memory area MR2 are updated in accordance with the following equation:
  • the pitch count data PCD which is given by the following equation is stored in the memory area MR3.
  • sampling data SD(-2N + 3) is read out from the memory area M(-2N + 3) of the ROM 101.
  • the address data A[8], A[l3] and A[18] are sequentially generated whenever the sampling pulses are.supplied at times 27T to 29T.
  • a similar operation is executed whenever new pitch data is supplied. For instance, in the next operation cycle, as shown at the right of Fig. 6, the four items of sampling data corresponding to the impulses at the sampling points -8, -3, 2, and 7 in Fig. 4D are read out from the ROM 101 at the sampling timings 51T to 54T.
  • the synthesized impulse of the four impulses corresponds to the impulse having the amplitude of X 0 , as indicated by the broken line in the diagram.
  • the impulse is generated at time 27.4T, that is, at a time after the interval of 25.4T from the sampling timing 2T at which the impulse having the amplitude of x 0 was generated in the first cycle.
  • the synthesized impulse of the four impulses corresponding to the four items of sampling data which are generated at the sampling timings 51T to 54T corresponds to the impulse having the amplitude of X o which is generated at time 52.6T, that is, at the time after the elapse of the interval of 25.2T from the time 27.4T.
  • one or a predetermined number of impulses which are equivalent to the voiced speech signal that is generated synchronously with the pitch interval determined by the input pitch data are generated synchronously with the sampling pulses. Therefore, even if the pitch period is set to be short, or even if the pitch period rapidly changes, synthetic speech of relatively excellent quality can be generated.
  • F ig. 7 shows a block diagram of a periodic sound generator circuit of a speech synthesizing apparatus according to another embodiment of the present invention.
  • This periodic sound generator circuit comprises registers 110 to 112 for respectively storing the pitch data PX and deviation time data DT1 and DT2; a divider 113 for dividing the pitch data PX by the sampling period T; a register 114 for storing the output data of the divider 113; an integer detector 115 for detecting the integer part of the output data of the register 114; a DT2 calculation circuit 116 for calculating a new deviation time data DT2 on the basis of the output data of the registers 110 and 111; a divider 117 for dividing the output data of the calculation circuit 116 by T; a fraction detector l18 for detecting the fraction part of the output data of the divider 117; and a comparator 119 for comparing the output data of the divider 117 with a constant 1.
  • the comparator 119 generates an output data "1" when it is detected that the output data of the divider 117 is equal to or larger than 1. On the other hand, the comparator 119 generates an output data "0" when such an output data is detected to be smaller than T.
  • a subtracter 120 subtracts the output data of the comparator 119 from the output data generated from the integer detector 115 and sets the result into a PCD down-counter L21. This PCD down-counter 121 executes the down-counting operation in response to the sampling pulse. When the contents of the down-counter 121 become 0, it sets the CD up-counter 122 to 0. The CD up-counter 122 executes the up-counting operation in response to the sampling pulse.
  • a comparator 123 checks the output data of the up-counter 122 to see if it is three or less. When it is detected that the output data is three or less, the comparator 123 generates an output pulse.
  • An address calculation circuit 124 calculates the address data on the basis of the output data of the register 111 and of the up-counter 122 in response to the output pulse from the comparator 123, supplies this calculated address data to a memory 125, and reads out the corresponding sampling data from the memory 125.
  • the memory 125 stores the sampling data SD(-2N)... SD(0)...SD(2N) as in the ROM 101 shown in Fig. 3.
  • the data (Pl + P2/N) which was obtained by dividing the pitch data by T in the divider 113 is stored in the register 114. Therefore, output data of Pl is generated from the integer detector 115.
  • the DT2 calculation circuit 116 calculates the new deviation time data DT2 in accordance with equation (5).
  • the output data of the calculation circuit 116 is divided by T in the divider 117. Thereafter, it is supplied to the fraction detector 118 and comparator 119. After the output data of the fraction detector 118 is increased by T times in the multiplier 126, it is stored in the DT2 register 112.
  • the output data of the divider 117 is less than that of the first cycle, the output data of 0 is generated from the comparator 119.
  • the output data of Pl of the integer detector 115 is directly set into the PCD down-counter 121 in the first operation cycle.
  • the down-counter 121 supplies inhibition signals to the registers 110 to 112, calculation circuit 116 and subtracter 120, thereby inhibiting the operations thereof.
  • the address data A[0] is generated in the address calculation circuit 124 in accordance with the following equation.
  • the corresponding sampling data is read out from the memory 125 in accordance with this address data A[ 0].
  • the address data A[5], A[l0] and A[15] are generated from the address calculation circuit 124, while the corresponding sampling data is read out from the memory 125.
  • the contents of the CD up-counter 122 become 4, so that a "0" output signal is generated from the comparator 123.
  • the address calculation circuit 124 does not execute the calculation.
  • the PCD down-counter 121 executes the down-counting operation until the contents thereof become 0 in response to the sampling pulse.
  • the CD up-counter 122 is set to 0, and at the same time the registers 110 to 112, calculation circuit 116 and subtracter 120 are set into the operational states. Therefore, the pitch data is stored in the PX register 110 in response to the next sampling pulse, and at the same time the deviation time data from the DT2 register 112 is stored in the DTl register 111.
  • the circuit shown in Fig. 7 generates the voiced sound signal corresponding to the input pitch data in response to the sampling circuit.
  • time window which becomes at a 1 level in the range from -T to T as the time window w(t).
  • other memory areas M(-N)...M(0)...M(N) for storing the sampling data SD(-N)...SD(0)...SD(N) may be provided in the ROM 101.

Abstract

@ A speech synthesizing apparatus including a periodic sound signal generator circuit (10) for generating a voiced sound signal synchronously with a clock pulse in accordance with pitch data corresponding to a speech to be synthesized; and a digital filter (1) for filter-processing the voiced sound signal generated from the periodic sound signal generator circuit (10) in accordance with synthesis parameters which are generated in correspondence to a speech to be synthesized. This periodic sound signal generator circuit (10) includes a memory (101) which has stored therein a predetermined number of sampling values within a predetermined time range of a continuous wave signal which is obtained by developing the voiced sound signal based on an interpolation function; and a processing unit (100, 102) for sequentially reading out the sampling values of the continuous wave signal within the predetermined time range that are synchronized with the pitch period represented by the input pitch data from the memory (101) in response to the clock pulse.

Description

  • The present invention relates to a speech synthesizing apparatus for generating synthetic speech with excellent quality.
  • Recently, various kinds of speech synthesizing apparatuses for generating desired synthetic speech have been developed. For example, as shown in Fig. 1, this kind of speech synthesizing apparatus usually comprises a vocal-tract-approximation-digital filter 1 for setting the feature of a speech to be synthesized in accordance with synthesis parameters; a random noise generator 2; a periodic sound generator 3 for generating a sound signal at a pitch interval that depends upon the fundamental frequency of the speech to be synthesized; a selector 4 for selectively supplying output signals from the generators 2 and 3 to the digital filter depending upon whether the speech to be synthesized is voiced speech or unvoiced speech; and a control circuit 5 for respectively supplying synthesis parameters and pitch data to the digital filter 1 and to periodic sound generator 3 in accordance with the speech to be synthesized, and for supplying sampling pulses-to these circuits. The digital filter 1 receives a linear predictive coefficient, line spectral pair (LSP) parameter, cepstrum, etc. as synthesis parameters depending upon the speech synthesizing method, e.g., linear predictive coding, LSP method, cepstrum method, etc. and generates the speech data corresponding to the speech to be synthesized. In the case where the speech to be synthesized is what is called periodic voiced speech, the voiced sound signal such as an impulse, triangular wave, glottal pulse wave, etc. which is generated from the periodic sound generator 3 at the pitch interval corresponding to the above-mentioned pitch data is supplied to the digital filter 1 through the selector 4. On the other hand, in the case where the speech to be synthesized is unvoiced, a random noise from the random noise generator 2 is supplied to the digital filter 1 through the selector 4.
  • The digital filter 1 filtering-processes the voiced sound signal or random noise that is selectively supplied through the selector 4 in accordance with the synthesis parameters from the control circuit 5, thereby making the speech. The synthesis parameter is updated, for example, at every frame of about 10 msec or at every time interval synchronized with the pitch period. Since the synthetic speech signal which is generated from the digital filter 1 is a discrete signal, it is converted to an analog signal by a D/A converter 6, and thereafter it is supplied to an electric-acoustic converter 7 through a low-pass filter 8 having a cut-off frequency which is less than half of the frequency of a sampling pulse which is supplied from the control circuit 5.
  • In synthesizing voiced speech, the voiced sound signal is substantially equal to the interval determined by the input pitch data and is generated from the periodic sound generator 3 synchronously with the sampling pulse. Then it is subjected to the filtering processing in the digital filter 1 synchronously with the sampling pulse. For instance, in the case where impulses are used as a voiced sound signal, as shown in Fig. 2A, the periodic sound generator 3 generates an impulse. The pulses must be spaced by an integer multiple of the sampling period that is most approximate to the interval determined by the input pitch data. This impulse is supplied through the selector 4 to the digital filter 1, where it is subjected to filtering-process, generating the speech as shown in Fig. 2B.
  • Generally, since the compass of the voice of a man is low and its pitch period is long, the pitch period which is equal to N times the sampling period can be set with a relatively high degree of accuracy in accordance with the input pitch data. However, in the case where the compass is high and its pitch period is short and where this pitch period frequently varies such as in the voice of a woman or child, it is difficult to approximate the pitch period corresponding to the input pitch data by using N times the sampling period. Furthermore, it is impossible to smoothly execute the processing to set the pitch period to have a length which is N times the sampling period for the variation in pitch period. In such a case, the vibrato component is mixed with the synthetic speech, causing the sound quality to deteriorate. In addition, such an incontinuous change of frequency in the synthetic speech is sensed as an unpleasant sound by audience.
  • To solve this kind of problem, a method is considered whereby the interval determined by the input pitch data is set to have a length which is N times the sampling period with a high degree of approximation by increasing the frequency of the sampling pulse which is used. However, in this case, in order to impart the proper synthesis parameter to the digital filter from the control circuit 5 in accordance with each sampling pulse, it is required to store a great amount of synthesis parameters into a memory (not shown) in the control circuit 5, causing the readout control of these synthesis parameters to be complicated.
  • It is an object of the present invention to provide a speech synthesizing apparatus for generating synthetic speech with excellent quality irrespective of the length and variation of the pitch period.
  • This object is accomplished by a speech synthesizing apparatus comprising a control circuit for generating clock pulses, synthesis parameters and pitch data corresponding to a speech to be synthesized; a memory which has stored therein a predetermined number of sampling data corresponding to a predetermined number of sampling values within a predetermined time range of a continuous wave which is obtained by developing a voiced sound signal by use of an interpolation function; a readout control circuit for sequentially reading out the sampling data of the continuous wave within the predetermined time range that is synchronized with a pitch period represented by the input pitch data from the memory in response to the clock pulse; a digital filter for filtering-processing the sampling data read out synchronously with the clock pulse from that memory in accordance with the synthesis parameters generated from the control circuit; and a speech generator circuit for generating speech corresponding to the digital data from this digital filter.
  • In this invention, the sampling values corresponding to the generation timings of the clock pulse of the continuous wave which is generated within a predetermined time range synchronously with the pitch period are sequentially generated. Therefore, irrespective of the length of the pitch period, a voiced sound signal corresponding to the continuous wave which is always synchronized with the pitch period is supplied to the digital filter so that a synthesized speech of excellent quality can be obtained.
  • This invention can be more fully understood from- the following detailed description when taken in conjunction with the accompanying drawings, in which:
    • Fig. 1 is a block diagram showing a conventional speech synthesizing apparatus;
    • Figs. 2A and 2B are signal waveform diagrams for explaining the operation of the speech synthesizing apparatus of Fig. 1;
    • Fig. 3 is a block diagram of a speech synthesizing apparatus of the present invention;
    • Figs. 4A to 4D are signal waveform diagrams for explaining the operation of the speech synthesizing apparatus shown in Fig. 3;
    • Fig. 5 is a flow chart explaining the operation of the voiced sound generator circuit shown in Fig. 3;
    • Fig. 6 is a signal waveform diagram explaining the operation of the voiced sound generator circuit shown in Fig. 3; and
    • Fig. 7 is a block diagram of the voiced sound generator circuit of a speech synthesizing apparatus according to another embodiment of the invention.
  • Fig. 3 is a block diagram of a speech synthesizing apparatus according to one embodiment of the present invention. This speech synthesizing apparatus is constituted similarly to the speech synthesizing apparatus shown in Fig. 1 except that it has a periodic sound generator circuit 10 in place of the periodic sound generator 3. The periodic sound generator circuit 10 comprises a central processing circuit 100: a read only memory (ROM) 101; a random access memory (RAM) 102; an I/O port 103 which receives the pitch data from the control circuit 5; and an I/O port 104 which supplies the voiced sound data to the sound selector 4.
  • The fundamental operation of the periodic sound generator circuit 10 will be explained hereinbelow. It is now assumed that the pitch data representative of the pitch period which is equal to m (integer) times the sampling period T is supplied from the control circuit 5 and that the periodic sound generator circuit 10 generates the voiced sound data indicative of the impulse x0(n) at the n-th sampling timing as shown in Fig. 4A in response to this pitch data. A continuous wave signal xa(t) which is obtained by developing this impulse x0(n) by the interpolation function based on the sampling theorem is given by the following equation:
    Figure imgb0001
  • As this interpolation function, it is possible to use, for instance, a Lagrangean polynomial, a spline function or the like.
  • As shown in Fig. 4B, this continuous wave signal xa(t) is generated over the time interval from -∞ to +∞; however, the signal component xb(t) at a predetermined time range around time 0 of the continuous wave signal xa(t) can be substantially regarded to be equivalent to the continuous wave signal xa(t). For instance, when using a time window ω(t) such as a square window or a hamming window or the like which becomes 0 in the ranges of t ≤ -2T and t ≥ 2T and becomes 1 in the range of -2T < t < 2T, the signal component xb(t) as shown in Fig. 4C is obtained. This signal component xb(t) is given by the following equation:
    Figure imgb0002
  • By sampling this signal component xb(t) with respect to 4N points as shown in Fig. 4D in the time range of -2T < t < 0 and in the time range of 0 < t < 2T, respectively, 4N sampling data SD(-2N) to SD(0) or SD(0) to SD(2N) are given in each time range. These sampling data SD(i) (i is an integer in a range of -2N < i < 2N) are given by the following equation:
    Figure imgb0003
  • Namely, the sampling data SD(-2N)...SD(0)...SD(2N) correspond to the sampling data at the sampling points -2N...0...2N in Fig. 4D. In Fig. 4D, although N is set to be 5, N can be set to another value.
  • The items of sampling data SD(-2N)...SD(0)...SD(2N) are respectively stored in memory areas M(-2N)...M(0)... M(2N) of the ROM 101. These memory areas M(-2N)... M(0)...2(2N) are respectively designated by address data A[0]...A[2N]...A[4N].
  • The operation of the periodic sound generator circuit 10 shown in Fig. 3 will now be explained with reference to the flow chart shown in Fig. 5. It is now assumed that a pitch period PT of the voiced sound signal that should be synthesized is given by the following equation:
    Figure imgb0004
    where Pl and P2 are integers and 0 ≦ P2 < N.
  • In the initial state, the contents of the memory areas MRl to MR4 are all cleared.
  • When the pitch data indicative of an interval PX is supplied to the I/O port 103 under this state, a check is made synchronously with the sampling pulse to see if a pitch count data PCD stored in the memory area MR3 is 0 or less in STEP 1. When it is detected that the pitch count data PCD is 0 or less, deviation time data DT2 stored in the memory area MR2 is then stored in the memory area MRl as deviation time data DT1. At the same time, new deviation time data DT2 which is given by the following equation is stored in the memory area MR2:
    Figure imgb0005
    where Re{X} is a function representing a value which is equal to an integer portion of X. This equation (5) denotes that the deviation time data DT2 is obtained by the sum of the time that is given by the deviation time data DT1, i.e., the time between the starting time of the pitch period in the present cycle and the leading edge of the sampling pulse generated simultaneously with or immediately after the start of this pitch period, and the time difference between the fraction interval (interval corresponding to P2 x (T/N) in equation (4)) of the pitch interval expressed by the pitch data given in the next operation cycle and one sampling period T. DT2/T has a value that is equal to or greater than 0 but less than 2.
  • Thereafter in STEP 2, a check is made to see if this deviation time data DT2 is T or more. When it is detected that DT2/T is less than 1, namely, in the case where it is detected that the deviation time data DT2 represents the time difference between the end of pitch period of the present cycle and the sampling pulse generated simultaneously with or immediately after this end of pitch period, the-pitch count data PCD which is given by the following equation is written in the memory area MR3:
    Figure imgb0006
  • On the other hand, in STEP 2, when it is detected that DT2/T is 1 or more, i.e., that the deviation time data DT2 indicates the time difference between the end of the pitch period in the present cycle and the sampling pulse which is thereafter generated at the second time, the pitch count data PCD which is given by the following equation is written in the memory area MR3:
    Figure imgb0007
  • In this case, the value of this deviation time data DT2 which is subtracted by one sampling period T is written in the memory area MR2 as new deviation time data. After the pitch count data PCD was written in the memory area MR3 in this way, the contents of the memory area MR4 are cleared. Further, thereafter, in STEP 3, a check is made to see if count data CD of the memory area MR4 has a predetermined value Z or less. This predetermined value Z denotes the amount of sampling data which is read out from the ROM 101 in each operation cycle and is set to 3 in this embodiment. In STEP 3, when it is detected that the count data CD has a predetermined value Z or less, one of the address data A[0]...A[2N]...A[4N] corresponding to the sum of this count data CD and 1/T of the deviation time data DT1 stored in the memory area MR1 is supplied to the ROM 101. Corresponding one of the sampling data SD(-2N)...SD(0)...SD(2N) is read out from this ROM 101 and is supplied to the digital filter 1 through the I/O port 104 and selector 4. Thereafter, the count data CD in the memory area MR4 is increased by one count. Further, thereafter, the pitch count data PCD in the memory area MR3 is decreased by one count, and the apparatus stands by until the next sampling pulse is supplied. On the other hand, in STEP 3, when it is detected that the count data CD becomes larger than the predetermined value Z, "0" is supplied to the digital filter, and the pitch count data PCD is decreased by one count without increasing the count data CD. Then, the apparatus waits for the input of the next sampling pulse. When the next sampling pulse is supplied in this state, STEP 1 is again executed. When the pitch count data PCD is determined to be larger than 0 in STEP 1, STEP 2 is executed.
  • It is now assumed that the first and second pitch data representative of the pitch periods PXl and PX2 of 25.4T (Pl = 25, P2 = 2) and 25.2T (Pl = 25, P2 = 1) are supplied to the CPU 100 through the I/O port 103 in accordance with this sequence.
  • The CPU 100 executes STEP 1'in response to the first sampling pulse (time t = 0) which was input immediately after the first pitch data was received. In this stage, since the pitch count data PCD stored in the memory area MR3 is 0, the deviation time data DT2 which is given by the following equation is stored in the memory area MR2:
    Figure imgb0008
  • Since the deviation time data DT2 of 0.6T stored in the memory area MR2 is smaller than T, the pitch count data PCD which is given by the following equation is stored in the memory area MR3:
    Figure imgb0009
  • Thereafter, the contents of the memory area MR4 are cleared, and it is detected in STEP 3 that the count data CD is smaller than the predetermined value Z(=3). Due to this, the address data A[Xl] which is given by the following equation is given to the ROM 101:
    Figure imgb0010
  • Due to this, the sampling data SD(-2N) is read out from the memory area M(-2N) of the ROM 101 and is supplied to the digital filter 1 through the I/O port 104 and selector 4. Next, the content of the memory area MR4 is changed from 0 to 1, and the content of the memory area MR3 is changed from 26 to 25.
  • The CPU 100 .executes STEP 1 in response to the next sampling pulse. Since the pitch count data PCD is now 25, STEP 2 is executed. Since the count data CD is 1, the address data A[X2] is given by the following equation:
    Figure imgb0011
  • Thereafter, the count data CD of the memory area MR4 is increased by one count, while the count data PCD of the memory area MR3 is decreased by one count.
  • After that, a similar operation is repeatedly executed, so that address data A[10] and A[15] are supplied to the ROM 101 and the corresponding sampling data are read out from the ROM 101. After the address data A[15] was read out, the content of the count data CD becomes 4 and the content of the pitch count data PCD becomes 22. In the following cycle, it is detected that the count data CD is larger than 3 in STEP 3. Therefore, 0 is supplied to the digital filer, and the pitch count data PCD is decreased by one count whenever each sampling pulse is supplied. This operation is executed until this pitch count data PCD becomes 0.
  • As described above, after the first pitch data was input, the address data A[0], A[5], A[10], and A[15] are - sequentially given to the ROM 101 in response to the first four sampling pulses, and then the sampling data SD(-10), SD(-5), SD(0), and SD(5) are sequentially read out from the ROM 101 and are given to the digital filter 1. As shown in Fig. 6, these items of sampling data SD(-10), SD(-5) and SD(5) are 0, while the sampling data SD(0) corresponds to an impulse of a predetermined amplitude X 0.
  • The pitch count data CD becomes 0 after the 26th sampling pulse is generated at time 25T. When it is detected in STEP 1 that the pitch count data PCD is 0 or less in response to the sampling pulse generated at time 26T, the CPU 100 transfers the contents of the memory area MR2 to the memory area MRl, and the contents DT2 of the memory area MR2 are updated in accordance with the following equation:
    Figure imgb0012
  • Since the deviation time data DT2 is larger than T, the deviation time data of 0.4T is written in the memory area MR2. The pitch count data PCD which is given by the following equation is stored in the memory area MR3.
    Figure imgb0013
  • Thereafter, the address data A[X26] is obtained in accordance with the following equation in a similar manner as mentioned above:
    Figure imgb0014
  • Thus, the sampling data SD(-2N + 3) is read out from the memory area M(-2N + 3) of the ROM 101.
  • In a similar manner, the address data A[8], A[l3] and A[18] are sequentially generated whenever the sampling pulses are.supplied at times 27T to 29T. In response to the address data A[3], A[8], A[13], and A[18], the four items of sampling data corresponding to the impulses at the sampling points -7, -2, 3, and 8 (where, N = 5) in Fig. 4D are read out from the ROM 101. Namely, the sampling data corresponding to the four impulses that are generated at times 26T to 29T shown in Fig. 6 is supplied to the digital filter 1.
  • A similar operation is executed whenever new pitch data is supplied. For instance, in the next operation cycle, as shown at the right of Fig. 6, the four items of sampling data corresponding to the impulses at the sampling points -8, -3, 2, and 7 in Fig. 4D are read out from the ROM 101 at the sampling timings 51T to 54T.
  • The synthesized impulse of the four impulses, corresponding to the four items of sampling data that are generated at the sampling timings 26T to 29T in Fig. 6, corresponds to the impulse having the amplitude of X0, as indicated by the broken line in the diagram. The impulse is generated at time 27.4T, that is, at a time after the interval of 25.4T from the sampling timing 2T at which the impulse having the amplitude of x0 was generated in the first cycle. On the other hand, the synthesized impulse of the four impulses corresponding to the four items of sampling data which are generated at the sampling timings 51T to 54T corresponds to the impulse having the amplitude of Xo which is generated at time 52.6T, that is, at the time after the elapse of the interval of 25.2T from the time 27.4T.
  • As described above, in this embodiment, one or a predetermined number of impulses which are equivalent to the voiced speech signal that is generated synchronously with the pitch interval determined by the input pitch data are generated synchronously with the sampling pulses. Therefore, even if the pitch period is set to be short, or even if the pitch period rapidly changes, synthetic speech of relatively excellent quality can be generated.
  • Fig. 7 shows a block diagram of a periodic sound generator circuit of a speech synthesizing apparatus according to another embodiment of the present invention. This periodic sound generator circuit comprises registers 110 to 112 for respectively storing the pitch data PX and deviation time data DT1 and DT2; a divider 113 for dividing the pitch data PX by the sampling period T; a register 114 for storing the output data of the divider 113; an integer detector 115 for detecting the integer part of the output data of the register 114; a DT2 calculation circuit 116 for calculating a new deviation time data DT2 on the basis of the output data of the registers 110 and 111; a divider 117 for dividing the output data of the calculation circuit 116 by T; a fraction detector l18 for detecting the fraction part of the output data of the divider 117; and a comparator 119 for comparing the output data of the divider 117 with a constant 1. The comparator 119 generates an output data "1" when it is detected that the output data of the divider 117 is equal to or larger than 1. On the other hand, the comparator 119 generates an output data "0" when such an output data is detected to be smaller than T. A subtracter 120 subtracts the output data of the comparator 119 from the output data generated from the integer detector 115 and sets the result into a PCD down-counter L21. This PCD down-counter 121 executes the down-counting operation in response to the sampling pulse. When the contents of the down-counter 121 become 0, it sets the CD up-counter 122 to 0. The CD up-counter 122 executes the up-counting operation in response to the sampling pulse. A comparator 123 checks the output data of the up-counter 122 to see if it is three or less. When it is detected that the output data is three or less, the comparator 123 generates an output pulse. An address calculation circuit 124 calculates the address data on the basis of the output data of the register 111 and of the up-counter 122 in response to the output pulse from the comparator 123, supplies this calculated address data to a memory 125, and reads out the corresponding sampling data from the memory 125. The memory 125 stores the sampling data SD(-2N)... SD(0)...SD(2N) as in the ROM 101 shown in Fig. 3.
  • When the pitch data representative of the pitch period PX (= Pl x T + P2 x T/N) is input during the initial state, the data (Pl + P2/N) which was obtained by dividing the pitch data by T in the divider 113 is stored in the register 114. Therefore, output data of Pl is generated from the integer detector 115. On the other hand, the DT2 calculation circuit 116 calculates the new deviation time data DT2 in accordance with equation (5). The output data of the calculation circuit 116 is divided by T in the divider 117. Thereafter, it is supplied to the fraction detector 118 and comparator 119. After the output data of the fraction detector 118 is increased by T times in the multiplier 126, it is stored in the DT2 register 112. In addition, since the output data of the divider 117 is less than that of the first cycle, the output data of 0 is generated from the comparator 119. Thus, the output data of Pl of the integer detector 115 is directly set into the PCD down-counter 121 in the first operation cycle. Although not shown, in the case where the contents of the PCD down-counter 121 are one or more, the down-counter 121 supplies inhibition signals to the registers 110 to 112, calculation circuit 116 and subtracter 120, thereby inhibiting the operations thereof.
  • Since the contents of the DT1 register 111 and counter 122 are both 0, the address data A[0] is generated in the address calculation circuit 124 in accordance with the following equation.
    Figure imgb0015
  • The corresponding sampling data is read out from the memory 125 in accordance with this address data A[0]. In a similar manner, whenever the sampling pulse is input, the address data A[5], A[l0] and A[15] are generated from the address calculation circuit 124, while the corresponding sampling data is read out from the memory 125. In response to the next sampling pulse, the contents of the CD up-counter 122 become 4, so that a "0" output signal is generated from the comparator 123. Thus, the address calculation circuit 124 does not execute the calculation.
  • On the other hand, the PCD down-counter 121 executes the down-counting operation until the contents thereof become 0 in response to the sampling pulse. When the contents of the down-counter 121 become 0, the CD up-counter 122 is set to 0, and at the same time the registers 110 to 112, calculation circuit 116 and subtracter 120 are set into the operational states. Therefore, the pitch data is stored in the PX register 110 in response to the next sampling pulse, and at the same time the deviation time data from the DT2 register 112 is stored in the DTl register 111.
  • In a similar manner, the circuit shown in Fig. 7 generates the voiced sound signal corresponding to the input pitch data in response to the sampling circuit.
  • Although the present invention has been described with respect to the embodiments, the invention is not limited to these embodiments. For example, in the flow chart shown in Fig. 5, although the determination regarding the generation of the address data is made by checking the count data CD to see if it is three or less in STEP 3, it is also possible to check whether or not the following condition is satisfied in STEP 3:
  • Figure imgb0016
  • Further, it is possible to use the time window which becomes at a 1 level in the range from -T to T as the time window w(t). Also other memory areas M(-N)...M(0)...M(N) for storing the sampling data SD(-N)...SD(0)...SD(N) may be provided in the ROM 101.

Claims (4)

1. A speech synthesizing apparatus comprising:
a control circuit (5) for generating synthesis parameters, pitch data and clock pulses corresponding to speech to be synthesized; a periodic-sound-signal- generating means (100 to 102; 110 to 125) for generating digital data representative of a voiced sound signal synchronously with said clock pulse; a digital filter circuit (1) for filter-processing the digital data supplied from said periodic sound signal generating means (100 to 102; 110 to 125) synchronously with said clock pulse in accordance with said synthesis parameters from said control circuit (5); and speech generating means (6 to 8) for generating the speech corresponding to the digital data from said digital filter circuit (1),

characterized in that said periodic sound signal generating means comprises memory means (101; 125) for storing a predetermined number of sampling data corresponding to a predetermined number of sampling values within a predetermined time range of a continuous wave signal which is obtained by developing the voiced sound signal based on an interpolation function; and readout control means (100, 102; 110 to 124) for sequentially reading out the sampling data of the continuous wave signal within said predetermined time range that are synchronized with the pitch period represented by said input pitch data from said memory means (101; 125) in response to said clock pulse and for supplying said sampling data to said digital filter circuit (1).
2. A speech synthesizing apparatus according to claim 1, characterized in that said readout control means comprises: a first memory (MRl) for storing deviation time data representative of the time between the start point of the pitch interval and a clock pulse which is generated simultaneously with or immediately after said start point of said pitch interval; and a data processing circuit (100, MR2) for sequentially reading out the sampling data at the sampling points corresponding to the sum of said deviation time data and the integer times the period of said clock pulse from said memory means (101) synchronously with said clock pulse.
3. A speech synthesizing apparatus according to claim 2, characterized in that said data processing circuit comprises: a second memory (MR2); and a data processing unit (100) for sequentially counting up the contents of said second memory (MR2) in response to the clock pulse which is generated simultaneously with or immediately after the start point of said pitch interval and for sequentially reading out the sampling data from said memory means (101) in response to the contents of said first and second memories (MRl and MR2).
4. A speech synthesizing apparatus according to claim 1, 2 or 3, characterized in that said readout control means (100, 102; 110 to 124) produces a selection signal according to a speech to be synthesized, and characterized by further comprising a random noise generator (2) and a sound selector (4) for permitting one of output signals from said memory means (101) and random noise generator (2) to be selectively transferred to said digital filter circuit (1) in accordance with the selection signal.
EP84113148A 1983-11-04 1984-10-31 Speech synthesizing apparatus Ceased EP0144724A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP58207052A JPS6098498A (en) 1983-11-04 1983-11-04 Voice synthesizer
JP207052/83 1983-11-04

Publications (1)

Publication Number Publication Date
EP0144724A1 true EP0144724A1 (en) 1985-06-19

Family

ID=16533407

Family Applications (1)

Application Number Title Priority Date Filing Date
EP84113148A Ceased EP0144724A1 (en) 1983-11-04 1984-10-31 Speech synthesizing apparatus

Country Status (2)

Country Link
EP (1) EP0144724A1 (en)
JP (1) JPS6098498A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0384587A1 (en) * 1989-01-31 1990-08-29 Canon Kabushiki Kaisha Voice synthesizing apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5840596A (en) * 1981-09-02 1983-03-09 株式会社東芝 Sound source circuit

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, Vol. ASSP-29, No. 6, December 1981, pages 1113-1116. NEW YORK, (US) W.T. HARTWELL et al.:"A pulse driving function generator for LDC synthesis of Voiced segments of speech". * Paragraph III: " Generation of noninteger pitch periods "; figures 1-4 * *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5321794A (en) * 1989-01-01 1994-06-14 Canon Kabushiki Kaisha Voice synthesizing apparatus and method and apparatus and method used as part of a voice synthesizing apparatus and method
EP0384587A1 (en) * 1989-01-31 1990-08-29 Canon Kabushiki Kaisha Voice synthesizing apparatus

Also Published As

Publication number Publication date
JPS6098498A (en) 1985-06-01

Similar Documents

Publication Publication Date Title
US4461199A (en) Electronic musical instruments
US4896359A (en) Speech synthesis system by rule using phonemes as systhesis units
EP0187211A1 (en) Tone signal generating apparatus
US4612838A (en) Electronic musical instrument
US4386547A (en) Electronic musical instrument
US5216189A (en) Electronic musical instrument having slur effect
EP0144724A1 (en) Speech synthesizing apparatus
EP0130332A1 (en) Digital electronic musical instrument of pitch synchronous sampling type
EP0081595A1 (en) Voice synthesizer
US4989250A (en) Speech synthesizing apparatus and method
EP0137304B1 (en) Burst signal generator
USRE34913E (en) Electronic musical instrument
JPS6115438B2 (en)
JPS5858678B2 (en) electronic musical instruments
JP3435702B2 (en) Music generator
JPS6023358B2 (en) electronic musical instruments
JPH08152488A (en) Time measuring device
JPS61209496A (en) Electronic musical instrument
JP3112743B2 (en) Sound source device
JPH05346788A (en) Musical sound signal generating device
JPS615297A (en) Formation of musical sound
JP3223555B2 (en) Waveform reading device
JPS60117922A (en) Digital/analog converting circuit
JP3311898B2 (en) Music synthesis circuit
JPH052999B2 (en)

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19841031

AK Designated contracting states

Designated state(s): DE FR GB NL

17Q First examination report despatched

Effective date: 19860930

D17Q First examination report despatched (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 19871217

RIN1 Information on inventor provided before grant (corrected)

Inventor name: SUMITA, KAZUOC/O PATENT DIVISION