US2928901A - Transmission and reconstruction of artificial speech - Google Patents

Transmission and reconstruction of artificial speech Download PDF

Info

Publication number
US2928901A
US2928901A US578097A US57809756A US2928901A US 2928901 A US2928901 A US 2928901A US 578097 A US578097 A US 578097A US 57809756 A US57809756 A US 57809756A US 2928901 A US2928901 A US 2928901A
Authority
US
United States
Prior art keywords
wave
speech
damped
signals
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US578097A
Inventor
Bruce P Bogert
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
Bell Telephone Laboratories Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bell Telephone Laboratories Inc filed Critical Bell Telephone Laboratories Inc
Priority to US578097A priority Critical patent/US2928901A/en
Application granted granted Critical
Publication of US2928901A publication Critical patent/US2928901A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/66Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signals; for improving efficiency of transmission
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00

Definitions

  • This invention relates in general to the modification of signals to facilitate their transmission, and particularly to the compression of the frequency band occupied by such signals. Its principal specific object is to compress the band of frequencies occupied by a telephone message wave. A more general object is to apply new principles to the band compression or other modification of a message wave.
  • One well known approach to the band compression problem is to divide the entire frequency band occupied by a complex message wave, e.g., a speech wave, into a number of constituent subbands, to determine the energy in each such subband, and to derive, for each subband, a control signal whose magnitude represents the subband energy.
  • the analysis is performed with a bank of filters to all of which the speech wave is applied in parallel, while a rectifier connected to the output terminal of each filter delivers a signal representative of the energy passing through the filter.
  • the resulting low frequency control signals after transmission to a receiver station, control the synthesis of artificial speech.
  • the analysis carried out by the apparatus of the Dudley patent is essentially an analysis according to Fouriers Theorem: It postulates that the speech wave may be represented, to whatever degree of precision may be required, by a harmonic series of components, each of which, by itself, is a pure sinusoid, and thus orthogonal to each of the others. To the extent that this postulate is untrue of the speech or other complex wave undergoing analysis, to that extent the control signals derived by the apparatus fail accurately to represent the original speech wave, and the final synthetic speech fails to duplicate it.
  • the quality of a voiced sound stems from the energization of the damped resonant cavitiesthroat, mouth, nasal cavity, etc. of the vocal tract-by the successive pulfs of air which are driven from the lungs through the vocal cords and which recur at the frequency to which the cords are tuned: namely, the fundamental or pitch frequency of the voice wave.
  • a principal feature of the voice wave namely that, while it oscillates at frequencies determined by the tuning of the cavities, the oscillations are rapidly damped out, so that its greatest excursion within each fundamental period occurs early in the period, later excursions being generally of less magnitude.
  • the present invention attacks the problem partly in the time domain and partly in the frequency domain. Like the Dudley patent, it determines, by analysis, the contributions of each of a plurality of oscillatory component building blocks to the speech wave as a whole; but unlike Dudley, it does not take undamped sinusoids as these building blocks. Rather, it adopts the more realistic position that each buiding block is, and must be, a damped oscillation: a wave whose frequency is the natural frequency of one of the component cavities of the vocal tract, and whose decrement is likewise determined by the physiological structure of the tract.
  • the Bogert-Kock application Like the Bogert-Kock application, it takes a sequence of samples of the voice wave, in synchronism with the fundamental period; but unlike that application, it does not transmit these samples, directly or in code, to the receiver station. Rather, it employs them to determine the proportions in which each of the basic individual damped oscillations is present in the speech wave as a whole. This determination is in truth a computation, and it is carried out by a cross net or matrix of transfer elements, whose output conductors carry control signals which are severally and continuously representative of the extent to which the several basic damped oscillations are momentarily present in the speech.
  • the several transfer elements of the computing cross net should be proportional to the correspondingly numbered elements of a matrix N, which is the inverse of another matrix M, whose elements, in turn, are proportional to the several basic damped oscillation components as evaluated at the several sampling instants.
  • These control signals are transmitted, along with a pitch control signal, to a receiver station.
  • a train of energy pulses is held, by the pitch signal, in synchronism with the fundamental pitch of the speech wave.
  • These pulses are modulated in amplitude by the transmitted control signals and, as modulated, are applied as shocks to excite natural oscillations in each of a plurality of damped resonant circuits whose natural frequencies and decrements are the same as those of the postulated component building blocks.
  • the responses of these several tuned circuits to the shock pulses are gathered together and supplied, as artifically synthesized speech, to a common reproducer.
  • the signal transformation may be carried out at the receiver station or it may be carried out in two steps one of which takes place at the transmitter station and the other at the receiver station.
  • Each of these partial transformations may be elfected by a cross net of input and output conductors having a transfer element at each cross point.
  • the elements of the first cross net may be proportional to the elements of a matrix A and those of the second cross net to the elements of a matrix B, wherein the matrices A and B are such that their product is equal to the matrix N.
  • Fig. 1 is a block schematic diagram showing narrow band transmission apparatus in accordance with the invention
  • Fig. 2 is a block schematic diagram showing receiver apparatus for the synthesis of speech from the signals delivered by the apparatus of Fig. 1;
  • Fig. 3 is a waveform diagram of assistance in the exposition of the invention.
  • Fig. 4 is a block schematic diagram illustrating a modification of the invention.
  • a speech wave which may be derived through a vogad 2 from a source such as a microphone 1 is passed by way of a conductor into three parallel branches.
  • the upper branch 4, which comprises a band-pass filter 5, a rectifier 6, a low-pass filter 7 and the winding of a relay 8 connected in tandem serves as a voiced sound recognizer.
  • the bandwidth of the filter 5 be selected to embrace the principal components of a voiced sound, e.g., if the filter be proportioned to pass frequencies in the range 100 cycles per second-1.000 cycles per second, and if the low-pass filter 7 be adjusted to pass only syllabic frequencies, the relay 8 is operated by voiced sounds and remains unoperated when the sounds picked up by the microphone 1 are unvoiced. Closure of the contacts of the relay 8 by voiced sounds operates to establish a path for a pitch control si nal, derived in the second path 11. through a low-pass filter 9, to a transmission channel 10.
  • the second path 11 comprises a period marker signal generator 12 which may advantageously be of the type which forms the subject matter of E. Peterson Patent 2,593,694 and which is further described by O. O. Gruenz and L. O. Schott in an article published in the Journal of Acoustical Society of America for September 1949 (volume 21), page 487.
  • the principal feature of this generator 12 is a detector 13 followed in tandem by a shaping network 14 which accentuates the amplitudes of low frequency components at the expense of hi her harmonic component amplitudes.
  • each of these steps is carried out two or more times in succession and all of the may be preceded by an auxiliary shaping step as by a network 15. With this arran ement.
  • the output of the generator 12 comprises a single sharp spike of current which occurs at the instant of the major peak of each wave period, or between the principal zero and the major peak.
  • the principal zero is the last zero value or axis crossing of the speech wave preceding each of its major peaks.
  • This marker pulse thus indicates the instant of incep tion of each full period of the speech wave and the frequency at which such pulses are repeated is the fundamental pitch frequency of the speech wave. It is transmitted, as stated above, by way of the contacts of the relay 8 and the low-pass filter 9 to a transmission channel for use at a receiver station as described below.
  • the same marker pulse is also transmitted by way of a conductor to the input terminal 21 of a wave transmission device 22 such as an electromagnetic or acoustic delay line which is terminated at its far end as by a resistor 24 to prevent reflections.
  • a wave transmission device 22 such as an electromagnetic or acoustic delay line which is terminated at its far end as by a resistor 24 to prevent reflections.
  • This line 22 is provided with a number n of lateral taps 23. It operates to produce a replica of the input marker pulse on these several taps in succession, each at an instant determined by the location of the tap.
  • sampling gates 25 operate in well known fashion to derive brief samples of the instantaneous amplitudes of the speech wave, each at an instant determined by the location on the delay line 22 of the tap 23 which controls it.
  • the output terminals of the several sampling gates are connected to the input terminals of a group of phase splitter amplifiers 28 each of which is provided with a positive output conductor 29 and a negative output conductor 30.
  • Amplifiers of this character are well known which deliver on one conductor a positive signal and on the other conductor a negative signal each of which is proportional to the signal applied to its input terminal.
  • the positive and negative output conductors 29, 30 of the first phase splitter amplifier 28 are bridged by a group of m resistors r r .r each of which is provided with an adjustable tap. Similar resistors r r r r r bridge the positive and negative conductors of each of the amplifiers 28a, 28b, etc., of the group. There may be a number m such as ten of such resistors for each amplifier. Thus if the delay line 22 is provided with ten taps 23 and each of these taps has associated with it a gate, a phase splitter amplifier and a pair, positive and negative, of vertical conductors the result will be a square array of one hundred such tapped resistors.
  • the taps of the first resistors r r r of all of the vertical conductor pairs are connected together and to a first horizontal conductor 33.
  • the taps of the second resistors r r r of all of the vertical conductor pairs are connected to a second horizontal conductor 34 and so on, the taps of the last resistor r r r of all of the vertical conductor pairs being connected to a last horizontal conductor 42.
  • the first horizontal conductor 33 carries a signal which is a linear combination of the signals on all of the vertical conductor pairs 29, 30, due to the application of samples of the voice wave to the amplifiers 28 in succession. Each term of this linear combination is determined by the setting of the tap on one of the resistors.
  • the second horizontal conductor 34 carries a signal which is a linear combination of the outputs of all of the amplifiers 28 due to the application to 7 these amplifiers of samples of the voice wave in succession. So, too, for each of the other horizontal conductors, the last of which, 42, is shown.
  • each of the m horizontal conductors 33 42 are now smoothed by the interposition of a low-pass filter 44, 45 in tandem with each one, and the resulting low frequency control signals, In in numbers, are transmitted over channels 5160 of any desired sort to a receiver station.
  • each of these control signals is applied by Way of a conductor 51, etc., to the control terminal of one of a like number of modulators 61, etc.
  • the conduction terminals of these several modulators are supplied, by way of a conductor 71, with a sequence of sharp pulses whose recurrence rate is equal to the fundamental pitch of the voice wave.
  • These pulses may be derived in well known fashion from a pulse source 72 or 73 through a ditferentiator 74, a rectifier 75, and the contacts of a relay 76.
  • the transmitted pitch control signal maintains steady control of the frequency of oscillation of the source 72 in order that it shall continuously be substantially the same as the fundamental frequency of voiced sounds as determined by the period marker signal generator 12.
  • the pitch signal fails, the coil of the relay 76 is deenergized and the noise source 73 is connected through the back contacts of the relay 76, the ditferentiator 74 and the rectifier 75 to the conductor 71.
  • the outputs of these several modulators 61 70 thus comprise sharp pulses, recurring at the fundamental pitch rate and with various amplitudes as determined by the application of the control signals on the conductors 51 to the modulators.
  • the pulse output of the first modulator 61 with its amplitude thus determined is applied by way of a padding resistor 81 to a damped resonant circuit 91 which may comprise an inductance coil, a condenser and a resistor all connected in parallel.
  • the output pulses of the second modulator 62 with their amplitudes determined by the second control signal on the conductor 52 are passed by way of a padding resistor 82 to a second damped resonant circuit 92 which may have the same configuration as the first damped resonant circuit 91, but a different natural frequency and a decrement controlled by its resistor; and so on for all ten of the damped resonant circuits 91 100 of the group, of which only the last is shown.
  • the voltages which appear across the several damped resonant circuits 91 100 due to the application of pulses to them in various amplitudes, are applied by way of isolating resistors 101 110 to a common conductor 120 which gathers them together to form a composite wave. This wave is applied to a reproducer 121 which converts it into a synthetic speech sound.
  • Figs. 1 and 2 are treated as simplified to the extent that the delay line 22 is tapped at three points, that the cross net of Fig. 1 contains three pairs of vertical conductors, three horizontal conductors and therefore nine resistors; and that the synthesizer apparatus of Fig. 2 contains three modulators 61, etc. and three damped resonant circuits 91, etc.
  • Fig. 3 Three waves having these frequencies and decrements are plotted in Fig. 3, all to the same arbitrary amplitude scale. It will be noted that these waves all increase in the positive direction starting from a common zero point which may be regarded as the principal zero of one period of a voice wave. On Fig. 3 are marked three sampling instants t t and t equally spaced from each other and commencing after the lapse of the preassigned interval measured from the principal zero of the voice wave.
  • At t t an amplitude of 1.94, composed of .64 from the first component, .8 from the second component and .5 from the third component;
  • At t t an amplitude of .3, composed of .8 from the first component, 0 from second component and .5 from the third component;
  • At t t an amplitude of .27, composed of .5 from the first component, .5 from the second component and .27 from the third component.
  • an arbitrary voice wave f(t) as a linear combination of three component damped oscillations, e.g., a(t), b(t), c(t) in whatever proportions, like or unlike, may be required.
  • the proportional contribution of the first component be designated x, that of the second component y, and that of the third component z.
  • the entire resultant voice wave, ;f(t) may be written This is a functional relation, containing time as a variable. It may be reduced to an arithmetic relation by selecting a particular instant for the time t.
  • Equation 3 constitutes a set of three simultaneous equations in three unknowns x, y and 2.
  • A1 A2 A3 N B1 2 3 and that this matrix N is the inverse of the matrix M of Equation 6. Accordingly, in order to determine the fractional contributions x, y and z of the various component damped oscillations a(t), b(t), c(t) to the original signal wave f(t), it is only necessary to provide a cross net of conductors, to each cross point of which a transfer element is connected whose magnitude is proportional to the magnitude of a corresponding element of the matrix N; to feed the conductors of one set, e.g., the vertical conductors of this cross net, with signal samples f(t f(t (t3), etc., whereupon signals proportional to the contributions x, y and z may be withdrawn from the horizontal conductors.
  • Equation 7a produces on the output conductors 33 42 signals which are representative of the fractional contributions 2:, y and z of the component waves a(t), b(t), c(t) to an arbitrary voice wave f(t).
  • the several damped resonant circuits 91 100 are proportioned to duplicate, in their responses to shock excitation, the waves of Equations 1.
  • the first resonant circuit is tuned to 1,000 cycles per second, the second to 2,000 cycles per second and the third to 3,000 cycles per second, while the sharpness of resonance of each of these circuits is adjusted by proportioning its resistor to have the same Q as that of the correspond ing wave a(t), b(t), 0(1), namely, in the example of Equations 1 and Fig. 3, a Q of 3.5.
  • each of these resonant circuits is excited by a sharp pulse, occurring once for each fundamental period of the voice wave and of an amplitude proportional to the fractional contribution x, y or z of the component damped oscillation to the entire voice wave.
  • each of these resonant circuits delivers a damped oscillation at the frequency to which it is tuned, with the decay rate for which it is adjusted and with an am plitude dependent on the control signal with which it is supplied and therefore on its required fractional con-
  • the tuned circuits are all shocked at the same instant these several damped oscillations are built up from zero and in the same sense at the same time; i.e., they are, except for variations in amplitude, precisely as shown in Fig. 3.
  • additive combination of these damped oscillations in the reproducer 121 closely simulates the original voice wave.
  • Fig. 2 indicates 10 tuned circuits, for generating component damped oscillations of 10 different frequencies, and 10 modulators, each supplied with one of a group of 10 control signals.
  • these 10 control signals are indicated in Fig. 1 as being drawn from the 10 horizontal conductors of a cross net having 10 vertical conductor pairs and therefore a matrix of 10x10 or 100 transfer elements.
  • the natural frequencies of the component damped oscillations may be selected at 500, 700, 900, 1,200, 1,500, 1,800, 2,100, 2,400, 2,700 and 3,000 cycles per second, and each may decay at a rate 0.45 f, corresponding to a quality factor or Q of about 7.
  • the tuned circuits of Fig. 2 may be proportioned accordingly.
  • the incoming composite wave may be sampled at intervals following the inception of each full period, marked by the principal zero of the wave, e.g., at sampling instants t t t having the following values: .263, .395, .526, .658, .789, .921, 1.053, 1.184, 1.316, 1.447 milliseconds.
  • A.B N (8) where A, B and N are matrices and the dot indicates a matrix product.
  • another matrix B may be determined such that the product of the matrices A, B is equal to the matrix N.
  • Fig. 4 is a block schematic diagram showing a system in which the transformation is thus carried out in two steps.
  • a signal originating, for example, in a microphone 1 is first converted by an analyzer 150 which may be identical with that of Fig. 1 into a subgroup of signals. These are applied to the several input points of a cross net 151 whose transfer elements may be proportioned in accordance with Equation 10.
  • the signal resulting on each output conductor of this cross net is a linear combination of all the input signals.
  • These may be transmitted over a medium 152 to a receiver station where they are applied to the input points of another cross net 153 of which the transfer elements are proportioned in accordance with Equation 12.
  • the signals which thus appear on the output conductors of the second cross net 153 are thus linear combinations of the signals applied to its input points. Furthermore, they are identical in character with the signals derived at the output points of the cross net of Fig. 1. Hence, they have been subjected successively to two partial transformations which together constitute the transformation indicated by the matrix N. They are thus appropriate for application to synthesizing apparatus 154 which may be identical with that of Fig. 2.
  • the apparatus of Fig. 4 is believed to be of value for secret transmission of information.
  • Signal transmission apparatus which comprises, in combination with a source of a quasiperiodic wave constituted of a first plurality of distinct damped oscillatory components of substantially different frequencies, means for deriving samples of the amplitude of said wave at each of a second plurality of consecutive samplin instants in each wave period, a crossnet of input and output conductors, said input conductors being equal in number to said sampling instants, said output conductors being equal in number to said distinct components.
  • an unvarying transfer element located at each crosspoint of said net and interconnecting that input conductor with that output conductor which intersect at said crosspoint, means for applying said samples to said input conductors in one-to-one relation, and means for withdrawing derived signals from said output conductors, the magnitudes of said several transfer elements being proportional to the several elements of a matrix N which is the inverse of another matrix M of which the elements are proportional to the respective contributions of said several damped oscillatory components to said wave as said sampling instants.
  • Apparatus for deriving speech-defining control signals and for synthesizing artificial speech sounds therefrom which comprises, in combination with a source 0 a speech sound of which the wave dilfers only slightly from each of a succession of periods to the next period, means for deriving a plurality of samples of the wave of each period, each of said samples being derived at an instant following the inception of said period by a preassigned interval, less than said period, means for combining said samples to form a like plurality of weighting control signals, means for simultaneously deriving from said speech sound source a pitch control signal, means for transmitting said pitch control signal and said weighting control signals to a receiver station, and, at said receiver station, a plurality of resonant circuits, individually tuned to a like plurality of natural frequencies which together span the frequency range of a speech sound, each of said resonant circuits having aresistive element of a magnitude at least equal to one-tenth of its inductive reactance at resonance, means for generating a sequence of pulse
  • Apparatus for deriving speech-defining control signals and for synthesizing artificial speech sounds therefrom which comprises, in combination with a source of a speech sound of which the wave differs only slightly from each of a succession of periods to the next period, means for deriving a sequence of samples of the wave of each period, means for combining said samples to form a like plurality of weighting control signals, means for simultaneously deriving from said speech sound source a pitch control signal, means for transmitting said pitch control signal and said weighting control signals to a receiver station, and, at said receiver station, a plurality of damped resonant circuits, individually tuned to a like plurality of resonant frequencies which together span the frequency range of a speech sound, means for generating a sequence of pulses under control of said pitch control signal, means for applying each of said generated pulses to all of said resonant circuits together, thereby to shockexcite them in phase coincidence, whereupon each such circuit undergoes a damped train of oscillations, means for combining said da
  • Apparatus for synthesizing artificial speech sounds from a set of speech-defining control signals, each of which is representative of the proportional contribution, to an original speech sound, of one of a set of preassigned wave components which comprises a plurality of damped resonant circuits, individually tuned to a like plurality of resonant frequencies which together span the frequency range of a speech sound, means for generating a sequence of pulses in synchronism with the pitch frequency of said speech sounds, means for applying each of said generated pulses to all of said resonant circuits together, thereby to shock-excite them in phase coincidence, whereupon each such circuit undergoes a damped train of oscillations, means for combining said damped oscillation trains under control of said speech-defining control signals to form a weighted sum signal, and means for reproducing said weighted sum signal as an artificial speech sound.
  • Apparatus for deriving speech-defining control signals and for synthesizing artificial speech sounds therefrom which comprises, in combination with a source of a speech sound of which the wave differs only slightly from each of a succession of periods to the next period, means for deriving a sequence of samples of the wave of each period, means for combining said samples to form a like plurality of weighting control signals, means for simultaneously deriving from said speech sound source a pitch control signal, means for transmitting said pitch control signal and said weighting control signals to a receiver station, and, at said receiver station, means for generating a train of pulses under control of said pitch control signal, a plurality of wave shaping networks, each proportioned to convert each pulse of said train into one of a plurality of wave components, means for applying each of said generated pulses to all of said networks together, thereby to produce all of said components, means for combining said components under control of said weighting control signals to form a weighted sum signal, and means for reproducing said weighted sum signal as an artificial speech sound.
  • Apparatus for deriving speech-defining control signals and for synthesizing artificial speech sounds therefrom which comprises, in combination with a source of a speech sound of which the wave differs only slightly from each of a succession of periods to the next period, means for deriving a sequence of samples of the wave of each period, a cross net of input conductors and output conductors, a transfer element located at each crosspoint of said net and interconnecting that input conductor with that output conductor which intersect at said crosspoint, means for applying said samples to said input conductors in one-to-one relation, means including said cross net for combining said samples to form a like plu rality of weighting control signals, means for simultaneously deriving from said speech sound source a pitch control signal, means for transmitting said pitch control signal and said weighting control signals to a receiver station, and, at said receiver station, a plurality of damped resonant circuits, individually tuned to a like plurality of natural frequencies which together span the frequency range of a speech sound, means for generating
  • Apparatus for deriving speech-defining control signals and for synthesizing artificial speech sounds therefrom which comprises, in combination with a source of a speech sound of which the wave differs only slightly from each of a succession of periods to the next period, means for deriving a sequence of samples of the wave of each period, a cross net of input conductors and output conductors, a transfer element located at each crosspoint of said net and interconnecting that input conductor with that output conductor which intersect at said crosspoint, means for applying said samples to said input conductors in one-to-one relation, means including said cross net for combining said samples to form a like plurality of weighting control signals, means for simultaneously deriving from said speech sound source a pitch control signal, means for transmitting said pitch control signal and said weighting control signals to a receiver station, and, at said receiver relation, means for generating a train of pulses under control of said pitch control signal, a plurality of wave shaping networks, each proportioned to convert each pulse of said train into one of a plurality
  • a source of an original signal means including an analyzer for deriving from said original signal a subset of component signals, a cross net of input and output conductors, an unvarying transfer element located at each crosspoint of said net and interconnecting that input conductor with that output conductor which intersect at said crosspoint, means for applying the signals of said subset to said input conductors respectively, means for withdrawing modified signals from said output conductors, each of said modified signals thus being a linear combination of a plurality of said subset signals in various proportions as determined by the magnitudes of said transfer elements, means for transmitting said modified signals to a receiver station, and at said receiver station, means including a synthesizer for reconstituting a replica of said orig inal signal from said modified signals.
  • said reconstituting means includes a second cross net of input and output conductors, an unvarying transfer element located at each crosspoint of said second net and interconnecting that input conductor with that output conductor which intersect at said crosspoint.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Description

5 Sheets-Sheet 2 March 15, 1960 B. P. BOGERT TRANSMISSION AND RECONSTRUCTION 0F ARTIFICIAL SPEECH Filed April 1:5, 1956 wvmvro B. BOGERT BY uw c. nvf ATTORNEY March 15, 1960 B. P. BOGERT TRANSMISSION AND RECONSTRUCTION 0F ARTIFICIAL SPEECH Filed April 15, 1956 3 Sheets-Sheet 3 RMNGMIR 2 W Q 53 so QEQ28 Ga 395 A TTORNEV United States Patent TRANSMISSION AND RECONSTRUCTION OF ARTIFICIAL SPEECH Bruce P. Bogert, Morristown, N.J., assignor to Bell Telephone Laboratories, Incorporated, New York, N.Y., a corporation of New York Application April 13, 1956, Serial No. 578,097
12 Claims. (Cl. 17915.55)
This invention relates in general to the modification of signals to facilitate their transmission, and particularly to the compression of the frequency band occupied by such signals. Its principal specific object is to compress the band of frequencies occupied by a telephone message wave. A more general object is to apply new principles to the band compression or other modification of a message wave.
One well known approach to the band compression problem, exemplified by Dudley Patent 2,151,091, March 21, 1939, is to divide the entire frequency band occupied by a complex message wave, e.g., a speech wave, into a number of constituent subbands, to determine the energy in each such subband, and to derive, for each subband, a control signal whose magnitude represents the subband energy. The analysis is performed with a bank of filters to all of which the speech wave is applied in parallel, while a rectifier connected to the output terminal of each filter delivers a signal representative of the energy passing through the filter. The resulting low frequency control signals, after transmission to a receiver station, control the synthesis of artificial speech.
The analysis carried out by the apparatus of the Dudley patent is essentially an analysis according to Fouriers Theorem: It postulates that the speech wave may be represented, to whatever degree of precision may be required, by a harmonic series of components, each of which, by itself, is a pure sinusoid, and thus orthogonal to each of the others. To the extent that this postulate is untrue of the speech or other complex wave undergoing analysis, to that extent the control signals derived by the apparatus fail accurately to represent the original speech wave, and the final synthetic speech fails to duplicate it.
It is known that the quality of a voiced sound stems from the energization of the damped resonant cavitiesthroat, mouth, nasal cavity, etc. of the vocal tract-by the successive pulfs of air which are driven from the lungs through the vocal cords and which recur at the frequency to which the cords are tuned: namely, the fundamental or pitch frequency of the voice wave. This is borne out by a principal feature of the voice wave, namely that, while it oscillates at frequencies determined by the tuning of the cavities, the oscillations are rapidly damped out, so that its greatest excursion within each fundamental period occurs early in the period, later excursions being generally of less magnitude.
A copending application of B. P. Bogert and W. E. Kock, Serial No. 542,702, filed October 25, 1955, and now matured into Patent 2,890,285, granted June 9, 1959, undertakes to turn these features of the voice wave to account. In accordance with the teachings of that application, the speech wave is sampled synchronously with its fundamental period, and control signals are derived which are proportional, respectively, to the samples. These control signals are transmitted to a receiver station where they determine the magnitudes of the pulses of a locally generated train. The resulting train of amplitude-modulated pulses, after smoothing by a filter, constitutes the artifically reconstructed voice; Thus both the analysis and the synthesis of the Bogert-Kock application are carried out exclusively in the time domain, as distinguished from the frequency domain in which Dudleys analysis and synthesis take place.
The present invention attacks the problem partly in the time domain and partly in the frequency domain. Like the Dudley patent, it determines, by analysis, the contributions of each of a plurality of oscillatory component building blocks to the speech wave as a whole; but unlike Dudley, it does not take undamped sinusoids as these building blocks. Rather, it adopts the more realistic position that each buiding block is, and must be, a damped oscillation: a wave whose frequency is the natural frequency of one of the component cavities of the vocal tract, and whose decrement is likewise determined by the physiological structure of the tract. Like the Bogert-Kock application, it takes a sequence of samples of the voice wave, in synchronism with the fundamental period; but unlike that application, it does not transmit these samples, directly or in code, to the receiver station. Rather, it employs them to determine the proportions in which each of the basic individual damped oscillations is present in the speech wave as a whole. This determination is in truth a computation, and it is carried out by a cross net or matrix of transfer elements, whose output conductors carry control signals which are severally and continuously representative of the extent to which the several basic damped oscillations are momentarily present in the speech.
It turns out that for this purpose, the several transfer elements of the computing cross net should be proportional to the correspondingly numbered elements of a matrix N, which is the inverse of another matrix M, whose elements, in turn, are proportional to the several basic damped oscillation components as evaluated at the several sampling instants.
These control signals are transmitted, along with a pitch control signal, to a receiver station. There, a train of energy pulses is held, by the pitch signal, in synchronism with the fundamental pitch of the speech wave. These pulses are modulated in amplitude by the transmitted control signals and, as modulated, are applied as shocks to excite natural oscillations in each of a plurality of damped resonant circuits whose natural frequencies and decrements are the same as those of the postulated component building blocks. The responses of these several tuned circuits to the shock pulses are gathered together and supplied, as artifically synthesized speech, to a common reproducer.
In accordance with a modification of the invention the signal transformation may be carried out at the receiver station or it may be carried out in two steps one of which takes place at the transmitter station and the other at the receiver station. Each of these partial transformations may be elfected by a cross net of input and output conductors having a transfer element at each cross point. In this event the elements of the first cross net may be proportional to the elements of a matrix A and those of the second cross net to the elements of a matrix B, wherein the matrices A and B are such that their product is equal to the matrix N.
The invention will be fully apprehended from the following detailed description of a preferred illustrative embodiment thereof taken in connection with the appended drawings in which:
Fig. 1 is a block schematic diagram showing narrow band transmission apparatus in accordance with the invention;
Fig. 2 is a block schematic diagram showing receiver apparatus for the synthesis of speech from the signals delivered by the apparatus of Fig. 1;
Fig. 3 is a waveform diagram of assistance in the exposition of the invention; and
Fig. 4 is a block schematic diagram illustrating a modification of the invention.
Referring now to Fig. 1, a speech wave which may be derived through a vogad 2 from a source such as a microphone 1 is passed by way of a conductor into three parallel branches. The upper branch 4, which comprises a band-pass filter 5, a rectifier 6, a low-pass filter 7 and the winding of a relay 8 connected in tandem serves as a voiced sound recognizer. As is well known, if the bandwidth of the filter 5 be selected to embrace the principal components of a voiced sound, e.g., if the filter be proportioned to pass frequencies in the range 100 cycles per second-1.000 cycles per second, and if the low-pass filter 7 be adjusted to pass only syllabic frequencies, the relay 8 is operated by voiced sounds and remains unoperated when the sounds picked up by the microphone 1 are unvoiced. Closure of the contacts of the relay 8 by voiced sounds operates to establish a path for a pitch control si nal, derived in the second path 11. through a low-pass filter 9, to a transmission channel 10.
The second path 11 comprises a period marker signal generator 12 which may advantageously be of the type which forms the subject matter of E. Peterson Patent 2,593,694 and which is further described by O. O. Gruenz and L. O. Schott in an article published in the Journal of Acoustical Society of America for September 1949 (volume 21), page 487. The principal feature of this generator 12 is a detector 13 followed in tandem by a shaping network 14 which accentuates the amplitudes of low frequency components at the expense of hi her harmonic component amplitudes. Preferably, each of these steps is carried out two or more times in succession and all of the may be preceded by an auxiliary shaping step as by a network 15. With this arran ement. as is more fully explained in the patent and publication referred to above, the output of the generator 12 comprises a single sharp spike of current which occurs at the instant of the major peak of each wave period, or between the principal zero and the major peak. For present purposes, the principal zero is the last zero value or axis crossing of the speech wave preceding each of its major peaks.
This marker pulse thus indicates the instant of incep tion of each full period of the speech wave and the frequency at which such pulses are repeated is the fundamental pitch frequency of the speech wave. It is transmitted, as stated above, by way of the contacts of the relay 8 and the low-pass filter 9 to a transmission channel for use at a receiver station as described below.
The same marker pulse is also transmitted by way of a conductor to the input terminal 21 of a wave transmission device 22 such as an electromagnetic or acoustic delay line which is terminated at its far end as by a resistor 24 to prevent reflections. This line 22 is provided with a number n of lateral taps 23. It operates to produce a replica of the input marker pulse on these several taps in succession, each at an instant determined by the location of the tap.
The pulses thus appearing on these several taps 23 are applied to the control terminals, indicated by arrowheads, of a number of sampling gates 25 to whose conduction terminals the voice signal to be transmitted is applied by way of the third parallel path 26.. These sampling gates 25 operate in well known fashion to derive brief samples of the instantaneous amplitudes of the speech wave, each at an instant determined by the location on the delay line 22 of the tap 23 which controls it.
The output terminals of the several sampling gates are connected to the input terminals of a group of phase splitter amplifiers 28 each of which is provided with a positive output conductor 29 and a negative output conductor 30. Amplifiers of this character are well known which deliver on one conductor a positive signal and on the other conductor a negative signal each of which is proportional to the signal applied to its input terminal.
The positive and negative output conductors 29, 30 of the first phase splitter amplifier 28 are bridged by a group of m resistors r r .r each of which is provided with an adjustable tap. Similar resistors r r r r r bridge the positive and negative conductors of each of the amplifiers 28a, 28b, etc., of the group. There may be a number m such as ten of such resistors for each amplifier. Thus if the delay line 22 is provided with ten taps 23 and each of these taps has associated with it a gate, a phase splitter amplifier and a pair, positive and negative, of vertical conductors the result will be a square array of one hundred such tapped resistors.
The taps of the first resistors r r r of all of the vertical conductor pairs are connected together and to a first horizontal conductor 33. Similarly, the taps of the second resistors r r r of all of the vertical conductor pairs are connected to a second horizontal conductor 34 and so on, the taps of the last resistor r r r of all of the vertical conductor pairs being connected to a last horizontal conductor 42. As a result the first horizontal conductor 33 carries a signal which is a linear combination of the signals on all of the vertical conductor pairs 29, 30, due to the application of samples of the voice wave to the amplifiers 28 in succession. Each term of this linear combination is determined by the setting of the tap on one of the resistors. Because of the balanced character of the outputs of the amplifier 28 a setting at the center of any of the resistors gives a zero value for the corresponding term. A setting to one side of the center gives a positive value and a setting to the other side of the center gives a negative value.
In the same way the second horizontal conductor 34 carries a signal which is a linear combination of the outputs of all of the amplifiers 28 due to the application to 7 these amplifiers of samples of the voice wave in succession. So, too, for each of the other horizontal conductors, the last of which, 42, is shown.
The signals thus appearing on each of the m horizontal conductors 33 42 are now smoothed by the interposition of a low- pass filter 44, 45 in tandem with each one, and the resulting low frequency control signals, In in numbers, are transmitted over channels 5160 of any desired sort to a receiver station.
At the receiver station, shown in Fig. 2, after such detection, decoding or other operation as may be required to restore them to their original form each of these control signals is applied by Way of a conductor 51, etc., to the control terminal of one of a like number of modulators 61, etc. The conduction terminals of these several modulators are supplied, by way of a conductor 71, with a sequence of sharp pulses whose recurrence rate is equal to the fundamental pitch of the voice wave. These pulses may be derived in well known fashion from a pulse source 72 or 73 through a ditferentiator 74, a rectifier 75, and the contacts of a relay 76. Again, as is well known, the transmitted pitch control signal, arriving by way of the channel 10, maintains steady control of the frequency of oscillation of the source 72 in order that it shall continuously be substantially the same as the fundamental frequency of voiced sounds as determined by the period marker signal generator 12. When, as in the case of unvoiced sounds, the pitch signal fails, the coil of the relay 76 is deenergized and the noise source 73 is connected through the back contacts of the relay 76, the ditferentiator 74 and the rectifier 75 to the conductor 71.
The outputs of these several modulators 61 70 thus comprise sharp pulses, recurring at the fundamental pitch rate and with various amplitudes as determined by the application of the control signals on the conductors 51 to the modulators. The pulse output of the first modulator 61 with its amplitude thus determined is applied by way of a padding resistor 81 to a damped resonant circuit 91 which may comprise an inductance coil, a condenser and a resistor all connected in parallel. Similarly, the output pulses of the second modulator 62 with their amplitudes determined by the second control signal on the conductor 52 are passed by way of a padding resistor 82 to a second damped resonant circuit 92 which may have the same configuration as the first damped resonant circuit 91, but a different natural frequency and a decrement controlled by its resistor; and so on for all ten of the damped resonant circuits 91 100 of the group, of which only the last is shown.
The voltages which appear across the several damped resonant circuits 91 100 due to the application of pulses to them in various amplitudes, are applied by way of isolating resistors 101 110 to a common conductor 120 which gathers them together to form a composite wave. This wave is applied to a reproducer 121 which converts it into a synthetic speech sound.
The principles on which the invention is based and the manner in which the damped resonant circuits 91 100 of Fig. 2 and the multiplier resistors r r of Fig. 1 are proportioned will be explained in connection with a simplified example. In this example, Figs. 1 and 2 are treated as simplified to the extent that the delay line 22 is tapped at three points, that the cross net of Fig. 1 contains three pairs of vertical conductors, three horizontal conductors and therefore nine resistors; and that the synthesizer apparatus of Fig. 2 contains three modulators 61, etc. and three damped resonant circuits 91, etc.
With this understanding, consider three damped oscillatory waves, each with a preassigned oscillation frequency and a preassigned decrement. Consider, for example, that their oscillation frequencies f are 1,000, 2,000 and 3,000 cycles per second and that each of them decays at a rate 0.9 1, corresponding to a quality factor or Q of about 3.5. The equations of three such waves may be written as follows:
where c=0.9.
Three waves having these frequencies and decrements are plotted in Fig. 3, all to the same arbitrary amplitude scale. It will be noted that these waves all increase in the positive direction starting from a common zero point which may be regarded as the principal zero of one period of a voice wave. On Fig. 3 are marked three sampling instants t t and t equally spaced from each other and commencing after the lapse of the preassigned interval measured from the principal zero of the voice wave.
If the three waves of Equation 1 and Fig. 3 were to be additively combined in like proportions it is evident that the resultant wave would have the following amplitudes.
At t: 0, an amplitude 0;
At t=t an amplitude of 1.94, composed of .64 from the first component, .8 from the second component and .5 from the third component;
At t=t an amplitude of .3, composed of .8 from the first component, 0 from second component and .5 from the third component;
At t=t an amplitude of .27, composed of .5 from the first component, .5 from the second component and .27 from the third component.
Three such sample amplitudes would go far toward determining the subsequent behavior of the voice wave provided it were known to be the resultant of the three component waves a(t), bu), c(r) in like proportions.
It is desired, however, to represent an arbitrary voice wave f(t) as a linear combination of three component damped oscillations, e.g., a(t), b(t), c(t) in whatever proportions, like or unlike, may be required. Suppose that the proportional contribution of the first component be designated x, that of the second component y, and that of the third component z. Then the entire resultant voice wave, ;f(t), may be written This is a functional relation, containing time as a variable. It may be reduced to an arithmetic relation by selecting a particular instant for the time t. If this be done for each of the sampling instants t t t the result is Now because the waves 1, a, b and c are known in advance and the sampling instants t t and t are fixed, Equation 3 constitutes a set of three simultaneous equations in three unknowns x, y and 2. Many schemes for solving such equations are well known, and it is equally well known that the solutions may be stated in the following form the matrix A 7 etc.
a(t1) 0 1) (h) M 2) H 2) G2) (10 us) a) It is also well established in text books dealing with matrix algebra that the coefficients A A etc. of Equations 4 may be written in matrix form:
A1 A2 A3 N= B1 2 3 and that this matrix N is the inverse of the matrix M of Equation 6. Accordingly, in order to determine the fractional contributions x, y and z of the various component damped oscillations a(t), b(t), c(t) to the original signal wave f(t), it is only necessary to provide a cross net of conductors, to each cross point of which a transfer element is connected whose magnitude is proportional to the magnitude of a corresponding element of the matrix N; to feed the conductors of one set, e.g., the vertical conductors of this cross net, with signal samples f(t f(t (t3), etc., whereupon signals proportional to the contributions x, y and z may be withdrawn from the horizontal conductors.
For the waves of Equation 1 and Fig. 3 and sampling instants spaced from the principal zero and from each other by .125 millisecond as shown in Fig. 3 the elements of the matrix M of Equation 6 turn out to be asfollows:
.tribution to the voice wave.
Inversion of the matrix M gives for, the matrix N of Equation 7 elements as follows:
Therefore adjustment of the taps of the resistors r r of Fig. 1 to positions proportional to the cross net, e.g., the settings of the taps on the various resistors of the cross net of Fig. 1, are given in the following table, wherein the numbers in each column represent the tap settings for the resistors connected in parallel to the corresponding pair of vertical conductors and the numbers in each row represent the settings of the taps connected to the corresponding horizontal conductor:
l l l aeaeaasaee numbers of Equation 7a produces on the output conductors 33 42 signals which are representative of the fractional contributions 2:, y and z of the component waves a(t), b(t), c(t) to an arbitrary voice wave f(t).
In the receiver apparatus of Fig. 2 the several damped resonant circuits 91 100 are proportioned to duplicate, in their responses to shock excitation, the waves of Equations 1. Thus, in this specific example the first resonant circuit is tuned to 1,000 cycles per second, the second to 2,000 cycles per second and the third to 3,000 cycles per second, while the sharpness of resonance of each of these circuits is adjusted by proportioning its resistor to have the same Q as that of the correspond ing wave a(t), b(t), 0(1), namely, in the example of Equations 1 and Fig. 3, a Q of 3.5. In operation each of these resonant circuits is excited by a sharp pulse, occurring once for each fundamental period of the voice wave and of an amplitude proportional to the fractional contribution x, y or z of the component damped oscillation to the entire voice wave. Upon receiving this shock each of these resonant circuits delivers a damped oscillation at the frequency to which it is tuned, with the decay rate for which it is adjusted and with an am plitude dependent on the control signal with which it is supplied and therefore on its required fractional con- Because the tuned circuits are all shocked at the same instant these several damped oscillations are built up from zero and in the same sense at the same time; i.e., they are, except for variations in amplitude, precisely as shown in Fig. 3. Hence, additive combination of these damped oscillations in the reproducer 121 closely simulates the original voice wave.
In practice, three component damped oscillations are unduly few. A more realistic number is ten. For this reason Fig. 2 indicates 10 tuned circuits, for generating component damped oscillations of 10 different frequencies, and 10 modulators, each supplied with one of a group of 10 control signals. By the same token these 10 control signals are indicated in Fig. 1 as being drawn from the 10 horizontal conductors of a cross net having 10 vertical conductor pairs and therefore a matrix of 10x10 or 100 transfer elements.
As an example of such a 10x10 system the natural frequencies of the component damped oscillations may be selected at 500, 700, 900, 1,200, 1,500, 1,800, 2,100, 2,400, 2,700 and 3,000 cycles per second, and each may decay at a rate 0.45 f, corresponding to a quality factor or Q of about 7. The tuned circuits of Fig. 2 may be proportioned accordingly. The incoming composite wave may be sampled at intervals following the inception of each full period, marked by the principal zero of the wave, e.g., at sampling instants t t t having the following values: .263, .395, .526, .658, .789, .921, 1.053, 1.184, 1.316, 1.447 milliseconds. With this understanding the transfer elements of the computing The signal conversion mathematically represented by the transformation of the matrix N is carried out in Fig. 1 by a single cross net of conductors, the apparatus being located at the transmitter station. Evidently, the conversion could be carried out instead at the receiver station, the cross net being located there. Furthermore, there is no reason in principle why the entire conversion should be carried out at one station or the other. If preferred the conversion operation may be carried out in two successive steps, one of these steps being taken, for example, at the transmitter station and the other at the receiver station. Each such partial conversion may be effected by a cross net of conductors and transfer elements, for example a cross net having the same configuration as that of Fig. 1 but with different magnitudes for the ele- 111611118.
It is well known that the product of two matrices of a specified order is a third matrix of the same order.
Thus:
A.B=N (8) where A, B and N are matrices and the dot indicates a matrix product. Hence, for any nonsingular matrix A, another matrix B may be determined such that the product of the matrices A, B is equal to the matrix N. The explicit solution of Equation 8 for the matrix B may be written B=A- .N (9) where A- is a matrix which is the inverse of the matrix A. As an illustration, consider a matrix A given by Its inverse, A- is given by When, in accordance with Equation 9 the matrix product is formed of Equation 11 by Equation 70 the result is -.680 3.553 3.669 B=A.N= .686 -4.429 3.084 (12) It may readily be determined by test that the matrix product of Equation 10 by Equation 12 is equal to Equation 7a. Hence, if a signal is first transformed by one cross net in accordance with Equation 10 and thereafter further transformed by another cross net in accordance with Equation 12, the result is identical with the transformation carried out in accordance with Equation 7a by the cross net of Fig. 1.
Fig. 4 is a block schematic diagram showing a system in which the transformation is thus carried out in two steps. Here a signal originating, for example, in a microphone 1 is first converted by an analyzer 150 which may be identical with that of Fig. 1 into a subgroup of signals. These are applied to the several input points of a cross net 151 whose transfer elements may be proportioned in accordance with Equation 10. The signal resulting on each output conductor of this cross net is a linear combination of all the input signals. These may be transmitted over a medium 152 to a receiver station where they are applied to the input points of another cross net 153 of which the transfer elements are proportioned in accordance with Equation 12. The signals which thus appear on the output conductors of the second cross net 153 are thus linear combinations of the signals applied to its input points. Furthermore, they are identical in character with the signals derived at the output points of the cross net of Fig. 1. Hence, they have been subjected successively to two partial transformations which together constitute the transformation indicated by the matrix N. They are thus appropriate for application to synthesizing apparatus 154 which may be identical with that of Fig. 2. The apparatus of Fig. 4 is believed to be of value for secret transmission of information.
While the invention has been described as embodied in a system for frequency band compression it is not restricted to this use. By a different choice of component building blocks, sampling instants, etc., reflected in a different choice of the multiplying factors introduced by the several potentiometers or other admittance elements of the cross net of Fig. 1 a practically unlimited number of diiferent output signals can be derived on the several horizontal conductors for any input signal applied to the system. Furthermore, it is contemplated that a cross net converter such as that of Fig. 1, or a pair of cross net converters as shown in Fig. 4, may be of use in connection with an analyzer and a synthesizer differing widely from those of Figs. 1 and 2. Indeed, occasions may arise in which it is required to transform a given set of input signals into a different set of output signals carrying the same information, quite apart from the derivation of such input signals from a single source such as a microphone, and quite apart from the synthesis of the output signal into a composite signal for reproduction. Hence the cross net of Fig. 1 or the pair of cross nets of Fig. 4 is useful quite aside from any signal analyzer which precedes the cross net transformation and quite aside from any synthesizer which follows it.
What is claimed is:
1. Signal transmission apparatus which comprises, in combination with a source of a quasiperiodic wave constituted of a first plurality of distinct damped oscillatory components of substantially different frequencies, means for deriving samples of the amplitude of said wave at each of a second plurality of consecutive samplin instants in each wave period, a crossnet of input and output conductors, said input conductors being equal in number to said sampling instants, said output conductors being equal in number to said distinct components. an unvarying transfer element located at each crosspoint of said net and interconnecting that input conductor with that output conductor which intersect at said crosspoint, means for applying said samples to said input conductors in one-to-one relation, and means for withdrawing derived signals from said output conductors, the magnitudes of said several transfer elements being proportional to the several elements of a matrix N which is the inverse of another matrix M of which the elements are proportional to the respective contributions of said several damped oscillatory components to said wave as said sampling instants.
2. Apparatus for deriving speech-defining control signals and for synthesizing artificial speech sounds therefrom which comprises, in combination with a source 0 a speech sound of which the wave dilfers only slightly from each of a succession of periods to the next period, means for deriving a plurality of samples of the wave of each period, each of said samples being derived at an instant following the inception of said period by a preassigned interval, less than said period, means for combining said samples to form a like plurality of weighting control signals, means for simultaneously deriving from said speech sound source a pitch control signal, means for transmitting said pitch control signal and said weighting control signals to a receiver station, and, at said receiver station, a plurality of resonant circuits, individually tuned to a like plurality of natural frequencies which together span the frequency range of a speech sound, each of said resonant circuits having aresistive element of a magnitude at least equal to one-tenth of its inductive reactance at resonance, means for generating a sequence of pulses under control of said pitch control signal, means for applying each of said generated pulses to all of said resonant circuits together, thereby to shock-excite them in phase coincidence, whereupon each such circuit undergoes a damped train of oscillations, means for combining said damped oscillation trains under control of said weighting control signals to form a weighted sum signal, and means for reproducing said weighted sum signal as an artificial speech sound.
3. Apparatus for deriving speech-defining control signals and for synthesizing artificial speech sounds therefrom which comprises, in combination with a source of a speech sound of which the wave differs only slightly from each of a succession of periods to the next period, means for deriving a sequence of samples of the wave of each period, means for combining said samples to form a like plurality of weighting control signals, means for simultaneously deriving from said speech sound source a pitch control signal, means for transmitting said pitch control signal and said weighting control signals to a receiver station, and, at said receiver station, a plurality of damped resonant circuits, individually tuned to a like plurality of resonant frequencies which together span the frequency range of a speech sound, means for generating a sequence of pulses under control of said pitch control signal, means for applying each of said generated pulses to all of said resonant circuits together, thereby to shockexcite them in phase coincidence, whereupon each such circuit undergoes a damped train of oscillations, means for combining said damped oscillation trains under control of said weighting control signals to form a weighted sum signal, and means for reproducing said weighted sum signal as an artificial speech sound.
4. Apparatus for synthesizing artificial speech sounds from a set of speech-defining control signals, each of which is representative of the proportional contribution, to an original speech sound, of one of a set of preassigned wave components, which comprises a plurality of damped resonant circuits, individually tuned to a like plurality of resonant frequencies which together span the frequency range of a speech sound, means for generating a sequence of pulses in synchronism with the pitch frequency of said speech sounds, means for applying each of said generated pulses to all of said resonant circuits together, thereby to shock-excite them in phase coincidence, whereupon each such circuit undergoes a damped train of oscillations, means for combining said damped oscillation trains under control of said speech-defining control signals to form a weighted sum signal, and means for reproducing said weighted sum signal as an artificial speech sound.
5. Apparatus for deriving speech-defining control signals and for synthesizing artificial speech sounds therefrom which comprises, in combination with a source of a speech sound of which the wave differs only slightly from each of a succession of periods to the next period, means for deriving a sequence of samples of the wave of each period, means for combining said samples to form a like plurality of weighting control signals, means for simultaneously deriving from said speech sound source a pitch control signal, means for transmitting said pitch control signal and said weighting control signals to a receiver station, and, at said receiver station, means for generating a train of pulses under control of said pitch control signal, a plurality of wave shaping networks, each proportioned to convert each pulse of said train into one of a plurality of wave components, means for applying each of said generated pulses to all of said networks together, thereby to produce all of said components, means for combining said components under control of said weighting control signals to form a weighted sum signal, and means for reproducing said weighted sum signal as an artificial speech sound.
' 6. Apparatus for deriving speech-defining control signals and for synthesizing artificial speech sounds therefrom which comprises, in combination with a source of a speech sound of which the wave differs only slightly from each of a succession of periods to the next period, means for deriving a sequence of samples of the wave of each period, a cross net of input conductors and output conductors, a transfer element located at each crosspoint of said net and interconnecting that input conductor with that output conductor which intersect at said crosspoint, means for applying said samples to said input conductors in one-to-one relation, means including said cross net for combining said samples to form a like plu rality of weighting control signals, means for simultaneously deriving from said speech sound source a pitch control signal, means for transmitting said pitch control signal and said weighting control signals to a receiver station, and, at said receiver station, a plurality of damped resonant circuits, individually tuned to a like plurality of natural frequencies which together span the frequency range of a speech sound, means for generating a sequence of pulses under control of said pitch control signal, means for applying each of said generated pulses to all of said resonant circuits together, thereby to shock-excite them in phase coincidence, whereupon each such circuit undergoes a damped train of oscillations, means for combining said damped oscillation trains under control of said weighting control signals to form a weighted sum signal, and means for reproducing said weighted sum signal as an artificial speech sound.
7. Apparatus for deriving speech-defining control signals and for synthesizing artificial speech sounds therefrom which comprises, in combination with a source of a speech sound of which the wave differs only slightly from each of a succession of periods to the next period, means for deriving a sequence of samples of the wave of each period, a cross net of input conductors and output conductors, a transfer element located at each crosspoint of said net and interconnecting that input conductor with that output conductor which intersect at said crosspoint, means for applying said samples to said input conductors in one-to-one relation, means including said cross net for combining said samples to form a like plurality of weighting control signals, means for simultaneously deriving from said speech sound source a pitch control signal, means for transmitting said pitch control signal and said weighting control signals to a receiver station, and, at said receiver relation, means for generating a train of pulses under control of said pitch control signal, a plurality of wave shaping networks, each proportioned to convert each pulse of said train into one of a plurality of wave components, means for applying each of said generated pulses to all of said networks. together, thereby to produce all of said components, means for combining said components under control of said weighting control signals to form a weighted sum signal, and means for reproducing said weighted sum signal as an artificial speech sound.
8. In a signal transmission system, the combination which comprises a source of an original signal, means including an analyzer for deriving from said original signal a subset of component signals, a cross net of input and output conductors, an unvarying transfer element located at each crosspoint of said net and interconnecting that input conductor with that output conductor which intersect at said crosspoint, means for applying the signals of said subset to said input conductors respectively, means for withdrawing modified signals from said output conductors, each of said modified signals thus being a linear combination of a plurality of said subset signals in various proportions as determined by the magnitudes of said transfer elements, means for transmitting said modified signals to a receiver station, and at said receiver station, means including a synthesizer for reconstituting a replica of said orig inal signal from said modified signals.
9. Apparatus as defined in claim 8 wherein said reconstituting means includes a second cross net of input and output conductors, an unvarying transfer element located at each crosspoint of said second net and interconnecting that input conductor with that output conductor which intersect at said crosspoint.
10. Apparatus as defined in claim 9 wherein the transfer elements of the first cross net are proportional to the elements of a matrix A, wherein the magnitudes of the transfer elements of the second cross net are proportional to those of the elements of a matrix B, and wherein the matrices A and B are such that their matrix product is equal to a matrix N, representative of a desired transformation from the signals of said first-named subset to the signals which are operative to control said synthesizer.
11. Apparatus for synthesizing an artificial message wave from a set of wave-defining signals, each of which is representative of the proportional contribution, to an original message wave, of one of a set of preassigned component waves, each said component wave being by itself a damped oscillatory wave of preassigned decrement and preassigned frequency, said preassigned frequencies together spanning the frequency range of said original message wave, and control signals, which comprises a plurality of damped wave generators, each proportioned to deliver, in response to a control signal, a wave having the same frequency and the same decrement as one of said damped component waves, means for generating control signals in synchronism with the fundamental periods of said original message wave, means for applying said control signals to all of said damped wave generators together, thereby to actuate them in phase coincidence, whereupon each such generator delivers a damped oscillatory wave of its preassigned frequency and decrement, means for combining said generated waves under control of said wave-defining signals to form a weighted sum wave, and means for reproducing said weighted sum wave as an artificial message wave.
12. Apparatus for synthesizing an artificial message wave from a set of wave-defining signals, each of which is representative of the proportional contribution, to an original message wave, of one of a set of preassigned component waves, each said component wave being by itself a damped oscillatory wave of preassigned decrement and preassigned frequency, said preassigned frequencies together spanning the frequency range of said original message Wave, and control signals, which comprises a plurality of damped wave generators, each proportioned to deliver, in repsonse to a control signal, a wave having the same frequency and the same decrement as one of said damped component waves, means for generating control signals that are temporarily coordinated with said original message wave, means for applying said control signals to all of said damped wave generators, thereby to actuate them similarly, whereupon each such generator delivers a damped oscillatory wave of its preassigned frequency and decrement, means 91 combining said generated waves under control of said wave-defining signals to form a weighted sum wave, and means for reproducing said weighted sum wave as an artificial message wave.
References Cited in the file of this patent 5 UNITED STATES PATENTS Dudley Mar. 21, 1939 Dudley May 27, 1941 Craib June 11, 1946 Dickieson Sept. 10, 1946 10 Mauchly et a1. Dec. 4, 1951 Davis et a1. July 21, 1953 OTHER REFERENCES Analog Methods in Computation and Simultation, chap. 8, by Walter Saroka, published by McGraw-Hill, 1954.
Proceedings of the I.E.E. (London), vol. 99, part III, pp. 316-319.
British Journal of Applied Physics, vol. 1, pp. 93-103.
Proceedings of the I.R.E., January 1954, vol. 42, pp. 192195.
US578097A 1956-04-13 1956-04-13 Transmission and reconstruction of artificial speech Expired - Lifetime US2928901A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US578097A US2928901A (en) 1956-04-13 1956-04-13 Transmission and reconstruction of artificial speech

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US578097A US2928901A (en) 1956-04-13 1956-04-13 Transmission and reconstruction of artificial speech

Publications (1)

Publication Number Publication Date
US2928901A true US2928901A (en) 1960-03-15

Family

ID=24311433

Family Applications (1)

Application Number Title Priority Date Filing Date
US578097A Expired - Lifetime US2928901A (en) 1956-04-13 1956-04-13 Transmission and reconstruction of artificial speech

Country Status (1)

Country Link
US (1) US2928901A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3069507A (en) * 1960-08-09 1962-12-18 Bell Telephone Labor Inc Autocorrelation vocoder
US3071652A (en) * 1959-05-08 1963-01-01 Bell Telephone Labor Inc Time domain vocoder
US3102928A (en) * 1960-12-23 1963-09-03 Bell Telephone Labor Inc Vocoder excitation generator
US3127476A (en) * 1964-03-31 david
US3278846A (en) * 1962-05-03 1966-10-11 Edgerton Germeshausen & Grier Apparatus for sampling electric waves
US3400216A (en) * 1964-01-31 1968-09-03 Nat Res Dev Speech recognition apparatus
US3573374A (en) * 1968-01-25 1971-04-06 Philco Ford Corp Formant vocoder utilizing resonator damping

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2151091A (en) * 1935-10-30 1939-03-21 Bell Telephone Labor Inc Signal transmission
US2243527A (en) * 1940-03-16 1941-05-27 Bell Telephone Labor Inc Production of artificial speech
US2402059A (en) * 1942-04-29 1946-06-11 Hazeltine Research Inc Secrecy communication system
US2407259A (en) * 1941-07-09 1946-09-10 Bell Telephone Labor Inc Transmission control in signaling systems
US2577141A (en) * 1948-06-10 1951-12-04 Eckert Mauchly Comp Corp Data translating apparatus
US2646465A (en) * 1953-07-21 Voice-operated system
US2681385A (en) * 1950-06-29 1954-06-15 Bell Telephone Labor Inc Reduction of signal redundancy
US2701274A (en) * 1950-06-29 1955-02-01 Bell Telephone Labor Inc Signal predicting apparatus

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2646465A (en) * 1953-07-21 Voice-operated system
US2151091A (en) * 1935-10-30 1939-03-21 Bell Telephone Labor Inc Signal transmission
US2243527A (en) * 1940-03-16 1941-05-27 Bell Telephone Labor Inc Production of artificial speech
US2407259A (en) * 1941-07-09 1946-09-10 Bell Telephone Labor Inc Transmission control in signaling systems
US2402059A (en) * 1942-04-29 1946-06-11 Hazeltine Research Inc Secrecy communication system
US2577141A (en) * 1948-06-10 1951-12-04 Eckert Mauchly Comp Corp Data translating apparatus
US2681385A (en) * 1950-06-29 1954-06-15 Bell Telephone Labor Inc Reduction of signal redundancy
US2701274A (en) * 1950-06-29 1955-02-01 Bell Telephone Labor Inc Signal predicting apparatus

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3127476A (en) * 1964-03-31 david
US3071652A (en) * 1959-05-08 1963-01-01 Bell Telephone Labor Inc Time domain vocoder
US3069507A (en) * 1960-08-09 1962-12-18 Bell Telephone Labor Inc Autocorrelation vocoder
US3102928A (en) * 1960-12-23 1963-09-03 Bell Telephone Labor Inc Vocoder excitation generator
US3278846A (en) * 1962-05-03 1966-10-11 Edgerton Germeshausen & Grier Apparatus for sampling electric waves
US3400216A (en) * 1964-01-31 1968-09-03 Nat Res Dev Speech recognition apparatus
US3573374A (en) * 1968-01-25 1971-04-06 Philco Ford Corp Formant vocoder utilizing resonator damping

Similar Documents

Publication Publication Date Title
US2705742A (en) High speed continuous spectrum analysis
US3624302A (en) Speech analysis and synthesis by the use of the linear prediction of a speech wave
Dudley Remaking speech
US3649765A (en) Speech analyzer-synthesizer system employing improved formant extractor
US2098956A (en) Signaling system
US5689529A (en) Communications method and apparatus for digital information
Gold et al. Analysis of digital and analog formant synthesizers
US3344349A (en) Apparatus for analyzing the spectra of complex waves
US3069507A (en) Autocorrelation vocoder
US2928901A (en) Transmission and reconstruction of artificial speech
US3071652A (en) Time domain vocoder
US3102928A (en) Vocoder excitation generator
US5475629A (en) Waveform decoding apparatus
US3431362A (en) Voice-excited,bandwidth reduction system employing pitch frequency pulses generated by unencoded baseband signal
US2243526A (en) Production of artificial speech
US3127476A (en) david
US3139487A (en) Bandwidth reduction system
US3109070A (en) Pitch synchronous autocorrelation vocoder
US3715509A (en) Method and means for providing resolution level selection in a spectrum analyzer
US2890285A (en) Narrow band transmission of speech
US3381093A (en) Speech coding using axis-crossing and amplitude signals
David et al. Note on Pitch‐Synchronous Processing of Speech
US3190963A (en) Transmission and synthesis of speech
US2817707A (en) Synthesis of complex waves
US3471644A (en) Voice vocoding and transmitting system