US3328525A - Speech synthesizer - Google Patents

Speech synthesizer Download PDF

Info

Publication number
US3328525A
US3328525A US334354A US33435463A US3328525A US 3328525 A US3328525 A US 3328525A US 334354 A US334354 A US 334354A US 33435463 A US33435463 A US 33435463A US 3328525 A US3328525 A US 3328525A
Authority
US
United States
Prior art keywords
spectrum
speech
frequencies
frequency
formant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US334354A
Inventor
Jr John L Kelly
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
Bell Telephone Laboratories Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bell Telephone Laboratories Inc filed Critical Bell Telephone Laboratories Inc
Priority to US334354A priority Critical patent/US3328525A/en
Application granted granted Critical
Publication of US3328525A publication Critical patent/US3328525A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00

Definitions

  • Conventional speech communication systems for example, commercial telephone system, typi-cally convey human speech by transmitting an electrical facsimile of the acoustic waveform produced by a human talker. Because of the redundancy of human speech, however, facsimile trans-mission is a relatively inefficient way to transmit speech information, and it is well known thatV the information contained in a typical speech sound may be transmitted over a channel of substantially narrower bandwidth than ⁇ that required for facsimile transmission of the speech waveform.
  • the distinctive feature of resonance vocoder systems is the transmission of speech information in terms of narrow bandwidth control signals representative of the frequency locations of selected peaks or maxima in the speech 'amplitude spectrum which correspond to the principal yformants or resonances of the human vocal tract.
  • a typical resonance vocoder system includes at a transmitter station an analyzer for deriving from an incoming speech 'wave a group of narrow bandwidth control signals including formant contral signals representative of the frequencies of selected formant peaks in the speech spectrum. After transmission to a receiver station, the control signals are applied to a synthesizer that is provided with controllable resonant circuits for shaping an artificial spectrum to have peaks at frequencies specified by the formant control signals, thereby reconstructing a replica of the spectrum of the original speech wave.
  • the three principal lower order formants are represented by control signals that are transmitted from an analyzer to a synthesizer, as in a conventional resonance vocoder, but in addition to shaping an artificial ice spectrum to have three peaks at frequencies specified by the three transmitted formant control signals, the synthesizer is provided with a separate fixed frequen-cy resonant circuit that shapes the artificial spectrum to have a fourth peak at a fixed frequency corresponding to the average location of a fourth, high frequency formant.
  • the human vocal tract is characterized by an infinite number of resonances or formants, ⁇ and therefore the Flanagan- House arrangement does not specify completely the higher order speech formants.
  • the present invention improves the quality of speech reconstructed in a resonance vocoder without decreasing lbandwidth efficiency by providing at a resonance vocoder receiver station a novel arrangement for shaping lan artificial spectrum to have an infinite number of peaks at selected fixed frequencies corresponding to the locations of higher order speech formants,
  • a novel resonant circuit having an infinite number of resonances at selected fixed frequencies, the higher frequency resonances of this resonant circuit corresponding to the frequencies of higher order speech formants.
  • One yor more ofthe lower frequency resonances of this circuit may lie within the frequency range of the formants represented by transmitted control signals, hence these lower-frequency resonances are canceled or removed by separate antiresonant circuits having .antiresonances at fixed frequencies corresponding to the unwanted lower frequency resonances of the resonant circuit.
  • an artificial spectrum is first shaped in conventional fashion by adjustable resonant circuits to have lower order formant peaks at frequencies specified by transmitted formant control signals, following which the artificial spectrum is further shaped by the apparatus of the present invention to have higher order formant peaks at fixed frequencies specified by the uncanceled resonances of the resonant circuit provided in this invention.
  • FIG. 1 is a block diagram showing a complete resonance vocoder system embodying the principles of this invention
  • FIG. 2 is a circuit diagram showing in detail a specific embodiment of an antiresonant circuit employed in this invention.
  • FIGS. 3A and 3B are diagrams of assistance in explaining the features of this invention.
  • elements 11 and 12 respectively represent the -analyzer and synthesizer portions of a typical formant vocoder system.
  • the frequency locations of these maxima in the speech spectrum correspond to natural frequencies of the formants or normal modes of vibration of the human vocal tract, and as the 'shape of the vocal tract is deformed during the articulation of different speech sounds, the natural or formant frequencies and the corresponding locations of spectral maxima also change. It is generally accepted that the most important formants are the three having the lowest frequencies, and in most formant vocoders, the formant control signals derived by an analyzer represent the first three formants; that is, three is a suitable value for N in the apparatus of FIG. 1.
  • the control signals derived by analyzer 11 ⁇ are delivered by way of a suitable transmission medium, indicated by broken lines, to a synthesizer 12 located at a receiver station.
  • synthesizer 12 there is provided a lbuzz source 121 that generates a relatively fiat artificial amplitude spectrum comprising a plurality of relatively uniform amplitude harmonics of the fundamental frequency FO, and an amplitude modulator 122 that adjusts the uniform amplitude of the harmonics of the artificial spectrum from source 121 to represent the glottal excitation amplitude AV.
  • Synthesizer 12 is further provided with a cascade of uncoupled resonant circuits 123-1 through 12S-N, each.
  • the adjustable resonances of circuits 123-1 through 123-N shape the spectrum from modulator-122 to have N maxima at frequencies F1 through FN corresponding to the locations of the N formant peaks in the original speech spectrum which are represented by the N transmitted formant control signals.
  • synthesized spectrum developed at the output terminal of synthesizer 12 is limited in its resemblance to the original speech spectrum in that the number of peaks in synthesized spectrum is determined by the number of transmitted formant control signals.
  • the synthesized spectrum developed by synthesizer 12 has only three maxima.
  • a closer resemblance to original speech spectrum is obtained in the present invention by passing the synthesized spectrum from synthesizer 12 through higher order formant synthesizer 13 in order to shape further the synthesized spectrum to have additional maxima at frequencies corre-Y sponding to the locations of maxima in the original speech spectrum which are not represented by transmitted control signals.
  • the synthesized spectrum from synthesizer 12 is passed through M series-connected antiresonant circuits 131-1 through 131-M toresonant circuit 133, which comprises a delay line 133a of length 1- seconds in negative feedback relation through subtractor 132 with an amplifier 133b ⁇ having a gain e-T less than unity.
  • resonant circuit 133 theoretically has an infinite number of resonances at fixed frequencies dependent upon the length 1- of delay line 13317.
  • the fixed resonances of feedback circuit 133 shape the incoming synthesized spectrum from synthesizer 12 to have maxima which correspond to desired higher order formant locations not specified by the transmitted formant control signals. Since feedback circuit 133 has fixed resonances at low frequencies as well as at high frequencies,l
  • Equation 1 indi-v cates thait the glottis-to-mouth transfer functions for vowel sounds has only poles, denoted sk, and no zeros, and the the poles coincide with the normalmodes of vibration. The locations of these poles are illustrated graphically in the pole diagram of FIG. 3A, where the Xs indicate the locations in the s-plane of the complex numbers s1, s2, s3, representing the first three poles or rformants, and the dashed lines indicate higher frequency poles.
  • the vocal tract Since the vocal tract is a distributed acoustic system, it has in theory an infinite number of natural frequencies which change in value with time las the vocal tract is deformed during the articulation of different speech sounds, and correspondingly, the poles of the transfer characteristic in Equation 1 also change with time. However, only at relatively low frequencies, usually including no more than the first three natural frequencies, do the natural frequencies change substantially in value with deformations of the vocal tract, Whereas at high frequencies, the natural frequencies asymptotically approach a uniform spacing in frequency.
  • Synthesizer 13 of the present invention is provided with an infinite number of natural resonant frequencies having a uniform spacing in frequency corresponding t0 that of the higher order natural frequencies of the Vocal tract by constructing resonant circuit 133 and :antiresonant circuits 131-1 through 131-N in the following manner.
  • resonant circuit-133 comprises delay line 133a of length r in feedback relation through subtractor 132 with amplifier 13317 having gain fr", where a suitable value for e-ff may be on the order of 2/ 3.
  • the transfer characteristic of delay line 133a is ers?, hence the combined f transfer characteristic of delay line 13'3a and amplifier 133b is the product.
  • Equation 2 corresponds to the familiar in the transfer characteristic relating the incoming signal F1 applied to the minuend terminal of subtractor 132 to the outgoing signal F2 developed at the output terminal of circuit 133,
  • each pole has the same constant real part, -r, which represents the so-called formant damping, and which is manifested in the speech spectrum by the bandwidth of the formant peaks.
  • FIG. 3B illustrates graphically the locations of the poles of F11/F1, in which it is noted that a particular uniform spacing in frequency may be obtained by suitably choosing the length T of delay line 133a and the corresponding factor -r in the gain of amplifier 133b.
  • a suitable value for the length of delay line 133a 7- may be on the order of one millisecond, which corresponds to the round t-rip delay of the human vocal tract, thereby placing the poles of feedback circuit 133 at frequencies of approximately 500, 1500, 2500 cycles per second.
  • a suitable value for the formant bandwidth factor a in the gain of amplifier 133th may be on the order of 400 nepers per second, corresponding to a formant bandwidth of about 130 ⁇ cycles per second.
  • one or more of the fixed poles or resonances of resonant circuit 133 may occur within the frequency range of formants represented by the control signals transmitted from ianalyzer 11 to synthesizer 12.
  • the rst three xed poles of circuit 133 occur at 500, 1500, and 2500 cycles per second, which respectively ⁇ lie within the frequency yranges of the first three speech formants typically represented by transmitted control signals, In this situation it is therefore necessary to remove or cancel the three lowerorder poles of resonant circuit 133 which lie within the frequency range of the formants Irepresented by transmitted control signals in order to prevent interference with the maxima previously synthesized in the artificial spectrum from synthesizer 12.
  • FIG. 3B illusr-ates the situation in which the rst three poles of circuit 133 are to be canceled, as indicated by the three Xs enclosed in circles.
  • a particular antiresonant circuit say 131-1
  • s'1. ( a1-fm1)
  • A is a constant.
  • a suitable realization of a circuit having a transfer characteristic of the type specified by Equation 6 is shown in detail in FIG. 2, it being understood that other antiresonant circuits having a suitable transfer characteristic may be employed, if desired. Further, it is understood that the transfer characteristic required for the cancellation of other unwanted poles by antiresonant circuits, 131-2, 131-M, may be obtained by substituting other poles s2, s2*; r11, SM* for the quantities s1, s1* in Equation 6.
  • the synthesized speech spectrum from synthesizer 12 is applied through a sufficiently high resistance R3 to pass a constant current through the series connected inductance, resistance and capacitance elements respectively denoted L1, R1, and C1, and to apply a constant voltage to cathode folower V1.
  • the output voltage of cathode follower V1 is differentiated by capacitance element C2 and resistance element R2, and the differentiated output signal is passed to the next antiresonant circuit.
  • C1 may be determined from the predetermined values of a and w1 according to the following relations:
  • a source of a plurality of control signals including a pitch control signal, an amplitude control signal, and a group of formant control signals representative of the frequencies of selected low frequency formant peaks in the spectrum of an original speech wave
  • first synthesizing means responsive to said plurality of control signals. for developing an artificial speech spectrum having a first group of peaks lat frequencies represented by said formant controlsignals, and second synthesizing means for Ishaping said artificial speech spectrum to have a second group of peaks at selected fixed frequencies representative of high lfrequency speech formants, said second synthesizing Y means including a resonant circuit having a transfer characteristic with no zeroes and an infinite number of poles at equally spaced predetermined frequencies, wherein the higher frequency poles of said transfer characteristic correspond in frequency to high frequency speech formants, andy a plurality of series-connected antiresonant circuits preceding said resonant circuit, each of said antiresonant circuits having a transfer characteristic with no poles and a zero at a predetermined frequency corresponding to an unwanted pole in said transfer characteristic vvof said resonant circuit, and
  • apparatus for introducing additional peaks into said artificial speech spectrum ⁇ at selected frequencies corresponding to selected high frequency formants which comprises a resonant circuit having a transfer characteristic with no zeros and an infinite number of poles at equally spaced lfixed frequencies, wherein the higher frequency poles of said transfer characteristic correspond in frequency to selected high frequencyA speech formants, and Y a plurality of series-connected antiresonant circuits in preceding circuit relation with said resonant circuit for cancelling a corresponding plurality of unwanted poles in the transfer characteristic of said resonant circuit.
  • Apparatus for synthesizing an artificial spectrum having a plurality of peaks at selected high frequency locations so that said peaks in said artificial spectrum closely resemble the formant peaks at high frequency locations in the spectrum of an original speech wave which comprises means for developing an incoming artificial spectrum having peaks at selected low frequency locations corresponding to formant peaks at selected low frequency locations in said spectrum of said original speech wave,
  • a cathode follower provided withan input point and an output point
  • a resonant circuit including subtracting means provided with a minuend terminal
  • a delay element having a delay time of r seconds and provided 4with an inputterminal and an output terminal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Description

June 27, 1967 J. L. KELLY, JR
SPEECH SYNTHESIZER 2 Sheets-Sheet l Filed Dec. 50, 1963 'TORNEV /A/f/E/WOR @VJ L KELL Y JR 2 Sheets-Sheet 2 Filed Dec. 30, 1963 United States Patent 3,328,525 SPEECH SYNTHESIZER John L. Kelly, Jr., Berkeley Heights, NJ., assignor to Bell Telephone Laboratories, Incorporated, New York, N.Y., a corporation of New York Filed Dec. 30, 1963, Ser. No. 334,354 4 Claims. (Cl. 179-1) This invention relates to the synthesis of speech, and in particular to the synthesis of natural sounding speech in bandwidth compression systems.
Conventional speech communication systems, for example, commercial telephone system, typi-cally convey human speech by transmitting an electrical facsimile of the acoustic waveform produced by a human talker. Because of the redundancy of human speech, however, facsimile trans-mission is a relatively inefficient way to transmit speech information, and it is well known thatV the information contained in a typical speech sound may be transmitted over a channel of substantially narrower bandwidth than `that required for facsimile transmission of the speech waveform.
A number of arrangements for compressing or reducing the amount of bandwidth employed in the transmission of speech information have been proposed, one of the best known of these arrangements being the so-called resonance vocoder. A specific version of the resonance vocoder is described in J. C. Steinberg Patent 2,635,146, issued Apr. 14, 1953.
The distinctive feature of resonance vocoder systems is the transmission of speech information in terms of narrow bandwidth control signals representative of the frequency locations of selected peaks or maxima in the speech 'amplitude spectrum which correspond to the principal yformants or resonances of the human vocal tract. A typical resonance vocoder system includes at a transmitter station an analyzer for deriving from an incoming speech 'wave a group of narrow bandwidth control signals including formant contral signals representative of the frequencies of selected formant peaks in the speech spectrum. After transmission to a receiver station, the control signals are applied to a synthesizer that is provided with controllable resonant circuits for shaping an artificial spectrum to have peaks at frequencies specified by the formant control signals, thereby reconstructing a replica of the spectrum of the original speech wave.
From the standpoint of efficiency, it is of course desirable to transmit the smallest number of control signals consistent with a desired level of intelligibility and naturalness in the reconstructed speech wave. It is well known that the first three formants in order of frequency contribute most to the intelligibility of speech; accordingly, it is common practice to transmit three formant control signals representative of the locations of these three principal formant locations. From the standpoint of speechquality, however, it has been Ifound that higher frequency formants contribute significantly to the'naturalness of reconstructed speech, but of course the transmission of additional formant control signals requires a greater transmission channel bandwidth, thereby decreasing the bandwith efiiciency of a resonance vocoder system.
One arrangement that improves thepquality of reconstructed speech without decreasing bandwidth efficiency is described in an article by I. L. Flanagan and A. S. House, Development and Testing of a Formant-Coding Speech Compression System, volume 28, Journal of the Acoustical Society of America, page 1099, (1957). In the Flanagan- House system, the three principal lower order formants are represented by control signals that are transmitted from an analyzer to a synthesizer, as in a conventional resonance vocoder, but in addition to shaping an artificial ice spectrum to have three peaks at frequencies specified by the three transmitted formant control signals, the synthesizer is provided with a separate fixed frequen-cy resonant circuit that shapes the artificial spectrum to have a fourth peak at a fixed frequency corresponding to the average location of a fourth, high frequency formant. However, the human vocal tract is characterized by an infinite number of resonances or formants, `and therefore the Flanagan- House arrangement does not specify completely the higher order speech formants.
The present invention improves the quality of speech reconstructed in a resonance vocoder without decreasing lbandwidth efficiency by providing at a resonance vocoder receiver station a novel arrangement for shaping lan artificial spectrum to have an infinite number of peaks at selected fixed frequencies corresponding to the locations of higher order speech formants, In the apparatus of this invention there is provided a novel resonant circuit having an infinite number of resonances at selected fixed frequencies, the higher frequency resonances of this resonant circuit corresponding to the frequencies of higher order speech formants. One yor more ofthe lower frequency resonances of this circuit may lie within the frequency range of the formants represented by transmitted control signals, hence these lower-frequency resonances are canceled or removed by separate antiresonant circuits having .antiresonances at fixed frequencies corresponding to the unwanted lower frequency resonances of the resonant circuit. In the present invention an artificial spectrum is first shaped in conventional fashion by adjustable resonant circuits to have lower order formant peaks at frequencies specified by transmitted formant control signals, following which the artificial spectrum is further shaped by the apparatus of the present invention to have higher order formant peaks at fixed frequencies specified by the uncanceled resonances of the resonant circuit provided in this invention. l
The invention will be fully understood from the following detailed description of illustrative embodiments thereof, taken in connection with the appended drawings, in which:
FIG. 1 is a block diagram showing a complete resonance vocoder system embodying the principles of this invention;
FIG. 2 is a circuit diagram showing in detail a specific embodiment of an antiresonant circuit employed in this invention; and
FIGS. 3A and 3B are diagrams of assistance in explaining the features of this invention.
Referring first to FIG. l, elements 11 and 12 respectively represent the -analyzer and synthesizer portions of a typical formant vocoder system. Formant vocoder analyzer 11, which is ordinarily located at a transmitter station, includes a pitch detector 111, a voiced amplitude detector 112,- land a formant frequency detector 113, which respectively derive from a speech wave from source 1f), for example, a conventional microphone, a group of narrow bandwidth control signals respectively representative of the fundamental glottal excitation frequency, F0, the amplitude of the glottal excitation, AV, and the frequencies F1, FN, N=2,3, of selected maxima in the spectrum of the incoming speech wave. The frequency locations of these maxima in the speech spectrum correspond to natural frequencies of the formants or normal modes of vibration of the human vocal tract, and as the 'shape of the vocal tract is deformed during the articulation of different speech sounds, the natural or formant frequencies and the corresponding locations of spectral maxima also change. It is generally accepted that the most important formants are the three having the lowest frequencies, and in most formant vocoders, the formant control signals derived by an analyzer represent the first three formants; that is, three is a suitable value for N in the apparatus of FIG. 1.
The control signals derived by analyzer 11` are delivered by way of a suitable transmission medium, indicated by broken lines, to a synthesizer 12 located at a receiver station. In synthesizer 12 there is provided a lbuzz source 121 that generates a relatively fiat artificial amplitude spectrum comprising a plurality of relatively uniform amplitude harmonics of the fundamental frequency FO, and an amplitude modulator 122 that adjusts the uniform amplitude of the harmonics of the artificial spectrum from source 121 to represent the glottal excitation amplitude AV. Synthesizer 12 is further provided with a cascade of uncoupled resonant circuits 123-1 through 12S-N, each. of which has an adjustable resonance that is individually tuned by a corresponding one of the N formant control signals transmitted from analyzer 11. By successively passing the artificial spectrum of uniform amplitude harmonies from modulator 122 through circuits 123-1 through 123-N, the` spectrum is shaped by the resonances of circuits 12341 through 123-N to resemble the spectrum of the original speech wave from source Thus the adjustable resonances of circuits 123-1 through 123-N shape the spectrum from modulator-122 to have N maxima at frequencies F1 through FN corresponding to the locations of the N formant peaks in the original speech spectrum which are represented by the N transmitted formant control signals.
It is evident that the synthesized spectrum developed at the output terminal of synthesizer 12 is limited in its resemblance to the original speech spectrum in that the number of peaks in synthesized spectrum is determined by the number of transmitted formant control signals. Thus in the usual situation Where three formant control signals representative of the three principalspeech formants are transmitted, the synthesized spectrum developed by synthesizer 12 has only three maxima. A closer resemblance to original speech spectrum is obtained in the present invention by passing the synthesized spectrum from synthesizer 12 through higher order formant synthesizer 13 in order to shape further the synthesized spectrum to have additional maxima at frequencies corre-Y sponding to the locations of maxima in the original speech spectrum which are not represented by transmitted control signals.
Within synthesizer 13, the synthesized spectrum from synthesizer 12 is passed through M series-connected antiresonant circuits 131-1 through 131-M toresonant circuit 133, which comprises a delay line 133a of length 1- seconds in negative feedback relation through subtractor 132 with an amplifier 133b `having a gain e-T less than unity. As described in detail below, resonant circuit 133 theoretically has an infinite number of resonances at fixed frequencies dependent upon the length 1- of delay line 13317. The fixed resonances of feedback circuit 133 shape the incoming synthesized spectrum from synthesizer 12 to have maxima which correspond to desired higher order formant locations not specified by the transmitted formant control signals. Since feedback circuit 133 has fixed resonances at low frequencies as well as at high frequencies,l
and since the synthesized spectrum from synthesizer 12 has already been shaped to have N maxima corresponding to N low frequency formants, one or more of the low frequency resonances of feedback circuit 133 must be canceled. Cancellation `of M selected low frequency resonances of feedback circuit 133 is accomplished by the M antiresonant circuits 131-1 through 131-M, M :1,2, detailed illustration of a suitable antiresonant circuit being shown in FIG. 2 and explained below. After the synthesized spectrum from synthesizer 12 has been further shaped by synthesizer 13, the spectrum may be converted into intelligible speech sounds by a suitable transducer 14, for example, a conventional loudspeaker.
Before describing the theoretical,considerations underlying the construction of resonant circuit 133, it will be helpful at this point to describe the properties Aof the human vocal tract during vowel production in terms of Laplace transform notation. The ratio of the Laplace transform, U2(S), ofthe volume velocity of air through the lips, U2U), to the Laplace transform, U1(s) of the volume velocity of air through the glottis,.U2(t), this ratio being commonly known as the transfer characteristic of the vocal tract, can be approximated by a rational transfer function having the following form:
able;,sk=("k+jwk) is a complex number representing a formant or normal mode of vibration of the.vocal tract;
and sk* is the complex conjugate of sk. Equation 1 indi-v cates thait the glottis-to-mouth transfer functions for vowel sounds has only poles, denoted sk, and no zeros, and the the poles coincide with the normalmodes of vibration. The locations of these poles are illustrated graphically in the pole diagram of FIG. 3A, where the Xs indicate the locations in the s-plane of the complex numbers s1, s2, s3, representing the first three poles or rformants, and the dashed lines indicate higher frequency poles.
Since the vocal tract is a distributed acoustic system, it has in theory an infinite number of natural frequencies which change in value with time las the vocal tract is deformed during the articulation of different speech sounds, and correspondingly, the poles of the transfer characteristic in Equation 1 also change with time. However, only at relatively low frequencies, usually including no more than the first three natural frequencies, do the natural frequencies change substantially in value with deformations of the vocal tract, Whereas at high frequencies, the natural frequencies asymptotically approach a uniform spacing in frequency.
Synthesizer 13 of the present invention is provided with an infinite number of natural resonant frequencies having a uniform spacing in frequency corresponding t0 that of the higher order natural frequencies of the Vocal tract by constructing resonant circuit 133 and :antiresonant circuits 131-1 through 131-N in the following manner. As shown in FIG, 1, resonant circuit-133 comprises delay line 133a of length r in feedback relation through subtractor 132 with amplifier 13317 having gain fr", where a suitable value for e-ff may be on the order of 2/ 3. In Laplace transform notation, the transfer characteristic of delay line 133a is ers?, hence the combined f transfer characteristic of delay line 13'3a and amplifier 133b is the product.
Since the elements 133a and 133b are in negative feedback relation with each other through subtractor 132, the product in Equation 2 corresponds to the familiar in the transfer characteristic relating the incoming signal F1 applied to the minuend terminal of subtractor 132 to the outgoing signal F2 developed at the output terminal of circuit 133,
and since e$=1 is periodic, the positive frequency poles of Fz/Fl occur at odd integral multiples of fr,
r(Sn+ff)=-J"(n1f) .77, S=o'i 7;1r, (12:1, 3, (5a) From Equation 5a it is evident that F2/F1 has an infinite number of poles, the radian frequency of the nth pole being or, since w11=21rf11 denotes frequency in cycles per second,
The spacing in frequency between poles is uniform, being 21r/r in radius and 1/1- in cycles per second. Further, each pole has the same constant real part, -r, which represents the so-called formant damping, and which is manifested in the speech spectrum by the bandwidth of the formant peaks.
FIG. 3B illustrates graphically the locations of the poles of F11/F1, in which it is noted that a particular uniform spacing in frequency may be obtained by suitably choosing the length T of delay line 133a and the corresponding factor -r in the gain of amplifier 133b. A suitable value for the length of delay line 133a 7- may be on the order of one millisecond, which corresponds to the round t-rip delay of the human vocal tract, thereby placing the poles of feedback circuit 133 at frequencies of approximately 500, 1500, 2500 cycles per second. Similarly, a suitable value for the formant bandwidth factor a in the gain of amplifier 133th may be on the order of 400 nepers per second, corresponding to a formant bandwidth of about 130` cycles per second.
Depending upon the value selected for the length of delay line 133a, one or more of the fixed poles or resonances of resonant circuit 133 may occur within the frequency range of formants represented by the control signals transmitted from ianalyzer 11 to synthesizer 12. For example, by selecting the length of delay line 133a to -be one millisecond, the rst three xed poles of circuit 133 occur at 500, 1500, and 2500 cycles per second, which respectively `lie within the frequency yranges of the first three speech formants typically represented by transmitted control signals, In this situation it is therefore necessary to remove or cancel the three lowerorder poles of resonant circuit 133 which lie within the frequency range of the formants Irepresented by transmitted control signals in order to prevent interference with the maxima previously synthesized in the artificial spectrum from synthesizer 12. FIG. 3B illusr-ates the situation in which the rst three poles of circuit 133 are to be canceled, as indicated by the three Xs enclosed in circles.
Antiresonant circuits 131-1 through 131-M, which precede feedback circuit 133, are designed to cancel M=l,2, unwanted lower order poles of feedback circuit 133 in the following manner. In order for a particular antiresonant circuit, say 131-1, to cancel a corresponding 4one of the poles of resonant circuit 133, say the pole denoted s'1.=( a1-fm1), it is necessary for the transfer characteristic of circuit 131-1 to have a single zero at s1='(- rija and no other poles or zeros. That is, if Z1 denotes the transfer characteristic of circuit 131-1, then Z1 must be proportional to (tv-s1) (s-s1*),
where A is a constant. A suitable realization of a circuit having a transfer characteristic of the type specified by Equation 6 is shown in detail in FIG. 2, it being understood that other antiresonant circuits having a suitable transfer characteristic may be employed, if desired. Further, it is understood that the transfer characteristic required for the cancellation of other unwanted poles by antiresonant circuits, 131-2, 131-M, may be obtained by substituting other poles s2, s2*; r11, SM* for the quantities s1, s1* in Equation 6.
Turning now to FIG. 2, -the synthesized speech spectrum from synthesizer 12 is applied through a sufficiently high resistance R3 to pass a constant current through the series connected inductance, resistance and capacitance elements respectively denoted L1, R1, and C1, and to apply a constant voltage to cathode folower V1. The output voltage of cathode follower V1 is differentiated by capacitance element C2 and resistance element R2, and the differentiated output signal is passed to the next antiresonant circuit.
The impedance of elements L1, R1, C1, in Laplace transform notation, may be written www? and differentiating elements R1 and C2 change this impedance by a multiplicative factor s, so that the transfer characteristic of circuit 131-1 may be written From Equation 8 it is evident that the values of the inductance, resistance, and capacitance elements L1, R1,
and C1 may be determined from the predetermined values of a and w1 according to the following relations:
Although this invention has been described in terms of a resonance vocoder system of the type shown in FIG. 1, it is to be understood that applications of the principles of this invention are not limited to this particular system, but include other resonance vocoder systems as well as various kinds of speech processing equipment in which speech form-ants are synthesized. In addition, it is to be understood that the above-described embodiments are merely illustrative of the numerous arrangements that may be devised for the principles of this invention by those skilled in the art without departing from the spirit and scope of the invention.
What is claimed is:
, 1. In a resonance vocoder synthesizer,
a source of a plurality of control signals including a pitch control signal, an amplitude control signal, and a group of formant control signals representative of the frequencies of selected low frequency formant peaks in the spectrum of an original speech wave,
first synthesizing means responsive to said plurality of control signals. for developing an artificial speech spectrum having a first group of peaks lat frequencies represented by said formant controlsignals, and second synthesizing means for Ishaping said artificial speech spectrum to have a second group of peaks at selected fixed frequencies representative of high lfrequency speech formants, said second synthesizing Y means including a resonant circuit having a transfer characteristic with no zeroes and an infinite number of poles at equally spaced predetermined frequencies, wherein the higher frequency poles of said transfer characteristic correspond in frequency to high frequency speech formants, andy a plurality of series-connected antiresonant circuits preceding said resonant circuit, each of said antiresonant circuits having a transfer characteristic with no poles and a zero at a predetermined frequency corresponding to an unwanted pole in said transfer characteristic vvof said resonant circuit, and
means for applying said artificial speech spectrum to said second synthesizing means. 2. In combination with a resonance vocoder synthesizer that generates an artificial speech spectrum having peaks at selected low frequencies corresponding to selected low frequency formant peaks in the spectrum of an original speech wave, apparatus for introducing additional peaks into said artificial speech spectrum `at selected frequencies corresponding to selected high frequency formants which comprises a resonant circuit having a transfer characteristic with no zeros and an infinite number of poles at equally spaced lfixed frequencies, wherein the higher frequency poles of said transfer characteristic correspond in frequency to selected high frequencyA speech formants, and Y a plurality of series-connected antiresonant circuits in preceding circuit relation with said resonant circuit for cancelling a corresponding plurality of unwanted poles in the transfer characteristic of said resonant circuit.
3. Apparatus for synthesizing a plurality of peaks of predetermined width at selected fixed frequencies in an incoming amplitude spectrum which comprises a plurality of series-connected antiresonant circuits each provided with a transfer characteristic having a single zero at S=-a+ QW, (n=1, 3,
means for applying said incoming amplitude spectrum to said input point, of said plurality of antiresonant circuits, and
means for conencting the output point of said pluralityl of antiresonant circuits to said resonant circuit.
4. Apparatus for synthesizing an artificial spectrum having a plurality of peaks at selected high frequency locations so that said peaks in said artificial spectrum closely resemble the formant peaks at high frequency locations in the spectrum of an original speech wave, which comprises means for developing an incoming artificial spectrum having peaks at selected low frequency locations corresponding to formant peaks at selected low frequency locations in said spectrum of said original speech wave,
, a plurality M (M=l, 2, of series-connected antiresonant circuits for preventing the Occurrence of peaks at a corresponding plurality of unwanted high 8 frequency locations, wherein the nth of said antiresonant circuits, n=l,2, M, comprises an input terminal,
a cathode follower provided withan input point and an output point,
a first resistance element connected between said input terminal and said input point of said cathode followan inductance element L, a second resistance element Rm, and a first capacitance element Cn connected in series between said input point of said cathode follower and ground,
an output terminal,
a second capacitance element connected between said output point of said cathode follower and said output terminal, and
a third resistance element connected between said output terminal and ground,
wherein the values of said inductance element L, said second resistance element Rh, and said first capaci tance element Cn are determinedby the bandwidth a and the frequency wn of said unwanted peak at said high frequency location according to the following relationships:
means for aplying said incoming artificial spectrum to said input terminal of the first of said antiresonant circuits, n=l,
a resonant circuit including subtracting means provided with a minuend terminal,
a subtrahend terminal and an output terminal,
a delay element having a delay time of r seconds and provided 4with an inputterminal and an output terminal,
an amplifier having a gain e-"r so that said resonant circuit has an infinite number of resonances at frequencies said amplifier being provided with an input terminal and au output terminal,
means for connecting said output terminal of the last of said antiresonant circuits, n=M, to said minuend terminal of said subtracting means,
means for connecting said output terminal of said subtracting` means to said input terminal of said delay element,
means for connecting vsaid output terminal of said delay element to said input terminal of said amplifier, and
means forV connecting said output terminal of said amplifier to said subtrahend terminal of said subtracting means,
thereby developing from said incoming artificial spectrum an outgoing artificial spectrum atsaid o-utput terminal of said delay element, wherein said outgoing artificial spectr-um has peaks atselected low frequency locations corresponding to said peaks of said incoming artificial spectrum'and peaks at selected high frequency locations corresponding to resonances of said resonant circuit .at frequencies wi, for i greater than M.
No references cited.
KATHLEEN H. CLAFFY, Primary Examiner.
R. MURRAY, Assistant Examiner.

Claims (1)

1. IN A RESONANCE VOCODER SYNTHESIZER, A SOURCE OF A PLURALITY OF CONTROL SIGNALS INCLUDING A PITCH CONTROL SIGNAL, AN AMPLITUDE CONTROL SIGNAL, AND A GROUP OF FORMANT CONTROL SIGNALS REPRESENTATIVE OF THE FREQUENCIES OF SELECTED LOW FREQUENCY FORMANT PEAKS IN THE SPECTRUM OF AN ORIGINAL SPEECH WAVE, FIRST SYNTHESIZING MEANS RESPONSIVE TO SAID PLURALITY OF CONTROL SIGNALS FOR DEVELOPING AN ARTIFICIAL SPEECH SPECTRUM HAVING A FIRST GROUP OF PEAKS AT FREQUENCIES REPRESENTED BY SAID FORMANT CONTROL SIGNALS, AND SECOND SYNTHESIZING MEANS FOR SHAPING SAID ARTIFICIAL SPECH SPECTRUM TO HAVE A SECOND GROUP OF PEAKS AT SELECTED FIXED FREQUENCIES REPRESENTATIVE OF HIGH FREQUENCY SPEECH FORMANTS, SAID SECOND SYNTHESIZING MEANS INCLUDING A RESONANT CIRCUIT HAVING A TRANSFER CHARACTERISTIC WITH NO ZEROES AND AN INFINITE NUMBER OF POLES AT EQUALLY SPACED PREDETERMINED FREQUENCIES, WHEREIN THE HIGHER FREQUENCY POLES OF SAID TRANSFER CHARACTERISTIC CORRESPOND IN FREQUENCY TO HIGH FREQUENCY SPEECH FORMANTS, AND A PLURALITY OF SERIES-CONNECTED ANTIRESONANT CIRCUITS PRECEDING SAID RESONANT CIRCUIT, EACH OF SAID ANTIRESONANT CIRCUITS HAVING A TRANSFER CHARACTERISTIC WITH NO POLES AND A ZERO AT A PREDETERMINED FREQUENCY CORRESPONDING TO AN UNWANTED POLE IN SAID TRANSFER CHARACTERISTIC OF SAID RESONANT CIRCUIT, AND MEANS FOR APPLYING SAID ARTIFICIAL SPEECH SPECTRUM TO SAID SECOND SYNTHESIZING MEANS.
US334354A 1963-12-30 1963-12-30 Speech synthesizer Expired - Lifetime US3328525A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US334354A US3328525A (en) 1963-12-30 1963-12-30 Speech synthesizer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US334354A US3328525A (en) 1963-12-30 1963-12-30 Speech synthesizer

Publications (1)

Publication Number Publication Date
US3328525A true US3328525A (en) 1967-06-27

Family

ID=23306845

Family Applications (1)

Application Number Title Priority Date Filing Date
US334354A Expired - Lifetime US3328525A (en) 1963-12-30 1963-12-30 Speech synthesizer

Country Status (1)

Country Link
US (1) US3328525A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3448216A (en) * 1966-08-03 1969-06-03 Bell Telephone Labor Inc Vocoder system
US3624302A (en) * 1969-10-29 1971-11-30 Bell Telephone Labor Inc Speech analysis and synthesis by the use of the linear prediction of a speech wave
US3649765A (en) * 1969-10-29 1972-03-14 Bell Telephone Labor Inc Speech analyzer-synthesizer system employing improved formant extractor
US3836717A (en) * 1971-03-01 1974-09-17 Scitronix Corp Speech synthesizer responsive to a digital command input

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3448216A (en) * 1966-08-03 1969-06-03 Bell Telephone Labor Inc Vocoder system
US3624302A (en) * 1969-10-29 1971-11-30 Bell Telephone Labor Inc Speech analysis and synthesis by the use of the linear prediction of a speech wave
US3649765A (en) * 1969-10-29 1972-03-14 Bell Telephone Labor Inc Speech analyzer-synthesizer system employing improved formant extractor
US3836717A (en) * 1971-03-01 1974-09-17 Scitronix Corp Speech synthesizer responsive to a digital command input

Similar Documents

Publication Publication Date Title
US3649765A (en) Speech analyzer-synthesizer system employing improved formant extractor
EP0154381B1 (en) Digital speech coder with baseband residual coding
US2817711A (en) Band compression system
US4586193A (en) Formant-based speech synthesizer
US3328525A (en) Speech synthesizer
US3431362A (en) Voice-excited,bandwidth reduction system employing pitch frequency pulses generated by unencoded baseband signal
US2458227A (en) Device for artificially generating speech sounds by electrical means
US2766325A (en) Narrow band communication system
Flanagan Band width and channel capacity necessary to transmit the formant information of speech
US3394228A (en) Apparatus for spectral scaling of speech
US4459674A (en) Voice input/output apparatus
US3268660A (en) Synthesis of artificial speech
US3405237A (en) Apparatus for determining the periodicity and aperiodicity of a complex wave
US3139487A (en) Bandwidth reduction system
US2395159A (en) Electrical compressor method and system
US3381093A (en) Speech coding using axis-crossing and amplitude signals
Golden Improving Naturalness and Intelligibility of Helium‐Oxygen Speech, Using Vocoder Techniques
US3499991A (en) Voice-excited vocoder
US3124654A (en) Transmitter
CA2037326A1 (en) Communication apparatus for speech signal
US3330910A (en) Formant analysis and speech reconstruction
Campanella A survey of speech bandwidth compression techniques
US3280266A (en) Synthesis of artificial speech
Holmes A survey of methods for digitally encoding speech signals
US3328526A (en) Speech privacy system