US3190963A

US3190963A - Transmission and synthesis of speech

Info

Publication number: US3190963A
Application number: US235703A
Authority: US
Inventors: Jr Edward E David; James L Flanagan
Original assignee: Bell Telephone Laboratories Inc
Current assignee: AT&T Corp
Priority date: 1962-11-06
Filing date: 1962-11-06
Publication date: 1965-06-22
Anticipated expiration: 1982-06-22

Description

June 22, 1965 E. DAVID, JR., ETAL 3,190,953

TRANSMISSION AND SYNTHESIS 0F SPEECH 7 Sheets-Sheet 1 Filed Nov. 6, 1962 June l22, 1965 7 Sheets-Sheet 2 Filed Nov. 6, 1962 June 22, 1965 E. E. DAVID, JR.. ETAL. 3,190,963

TRANSMISSION AND SYNTHESIS oF sPEEcN 7 SheetsSheet 3 Filed Nov. 6, 1962 (QP) 30m/www QN ...El

IIIIIII June 22, 1965 E. E. DAVID, JR., ETAL 3,19059633 TRANSMISSION AND SYNTHESIS OF SPEECH 7 Sheets-Sheet 4 Filed Nov. 6, 1962 mv gl June 22, 1965 E. E. DAVID, JR., ETAL 3,190,963

TRANSMISSION AND SYNTHESIS oF SPEECH Filed Nov. e, 1962 7 sheets-sheet 5 United States Patent 3,l9tl,963 TRANSMISSIN AND SYNTHESIS Gli' SPEECH Edward E. David, r., Berkeley Heights, and James L. Fianagan, Warren Township, Somerset County, NJ., assigner-s to Beil Telephone Laboratories, incorporated, New York, NPY., a corporation of New York Fiied Nov. 6, 1962, Ser. No. 235,703 9 Claims. (CI. 179-4555) This invention reiates to the transmission and synthesis of speech, and in particular to the transmission and synthesis of speech in bandwidth compression systems.

In order to make more economic use of the frequency bandwidth of speech transmission channels, a number of bandwidth compression arrangements have been devised for transmitting the information content of a speech wave over a channel whose bandwidth is substantially narrower than that required for transmission of the speech wave itself. Bandwidth compression systems typically include at a transmitting terminal an analyzer for deriving from an incoming speech wave a group of narrow bandwidth control signals representative of selected information bearing characteristics of the speech wave, and at a receiving terminal, a synthesizer for reconstructing from the control signals a replica of the original speech wave.

One well-known bandwidth compression system is the so-called resonance Vocoder, a speciiic form of which is described in E. S. Weibel Patent 2,817,707, issued December 24, 1957. In a resonance vocoder, the information bearing characteristics which are represented by the control signals and which are reconstructed at the receiving terminal are the locations in the speech amplitude spectrum of selected resonant or formant frequencies. These resonances or formants correspond to the principal vocal tract resonances, that is, they correspond to frequency regions of relatively effective transmission through a talkers vocal tract.

It has been determined, however, that although reso nances describe a large and important class of speech sounds, the so-called voiced nonnasal sounds, there are at least two other large and important classes of speech sounds which are not adequately characterized by resonant frequencies alone, namely, voiced nasal sounds and unvoiced sounds. In particular, it has been recognized that an adequate description of the two latter classes of sounds requires the location of -at least one principal antiresonant frequency in addition to the locations of selected resonant frequencies, where antiresonant frequencies correspond to regions of relatively ineffective transmission through a talkers vocal tract.

Recognition of the important role played by antiresonant frequencies in the accurate specication of voiced nasal sounds and unvoiced sounds has been turned to account in the bandwidth compression system of the present invention. At the transmitting terminal of this invention, an incoming speech wave is analyzed to determine the locations of selected resonant and antiresonant frequencies. These frequencies are represented by reduced bandwidth control signals that are transmitted to a receiving terminal where they are utilized to reconstruct a natural sounding replica of the original speech wave.

The location of an antiresonance is evidenced by a major valley or dip in the envelope of the amplitude spectrum of a particular sound, and in the present invention an antiresonance is determined by locating the corresponding ice valley in the spectral envelope. On the other hand, it is well known that locations of resonances correspond to major peaks in the spectral envelope of a sound, and resonances are determined in this invention by locating these peaks in the spectral envelope.

Peaks and valleys in the spectral envelope of an incoming speech wave are located at the analyzer of the present invention by comparing one of three locally generated, artificial spectra with the speech spectrum. The three artiiicial spectra correspond to the three classes of speech sounds, voiced nonnasal sounds, voiced nasal sounds, and unvoiced sounds, and selection of the appropriate artificial spectrum depends upon which of the three classes of speech sounds appears in the speech Wave at a given instant. Each artificial spectrum is constructed from a set of resonance or resonance and antiresonance signals, and each set of resonance or resonance and antiresonance signals is varied cyclically over a wide range of values corresponding to a wide range of resonance or resonance and antiresonance locations. In this fashion, the comparison of the speech spectrum with an artiiicial spectrum during each cycle of variation produces a best matching version of the artificial spectrum whose resonances or resonances and antiresonances correspond closely to the true speech resonances or resonances and anti-v resonances. The best matching resonances or resonances and antiresonances for each of the three classes of speech sounds are represented by one of three groups of narrow band control signals which are successively transmitted to a receiving terminal, and at the receiving 4terminal there signals a corresponding succession of accurate replicas of the spectra of the succession of different sounds in the original speech wave. By constructing each replica spectrurn at the receiving terminal to correspond to one of three above-mentioned major speech sound classes, a high degree of realism for a large number of sounds is attained in the speech reproduced at the receiving terminal.

The invention will be fully understood from the following descriptions of illustrative embodiments thereof, taken in connection'with the appended drawings, in which:

FIG. 1 is a block diagram showing the complete bandwidth compression system of this invention;

FIGS. 2A and 2C are block diagrams illustrating in detail certain components of the system shown in FIG. 1;

FIGS. 2B and 2D are graphs illustrating particular fea. tures of the circuits shown in FIGS. 2A and 2C;

FIGS. 3A and 4A are schematic diagrams of circuits for synthesizing resonances and antiresonances, respectively;

FIGS. 3B and 4B are graphs of assistance in explaining the operationof the circuits shown in FIGS. 3A and 4A, respectively;

FIG. 5 is a block diagram showing apparatus for synthesizing speech sounds in accordance with the principles of this invention;

FIGS. 6A, 6B, and 6C illustrate spectral envelopes of typical voiced nasal, unvoiced, and voiced nonnasal sounds, respectively; and

FIG. 7 is a block schematic diagram showing in detail certain components of the system illustrated in FIG. 1.

Theoretical considerations As explained in detail by J. L. Flanagan in A Reso- System for Speech Transmission, 1959 Institute of Radio Engineers Wescon Convention Record, part 7, pages -16, speech sounds may be divided into three classes, voiced nonnasal, voiced nasal, and unvoiced (nonnasal), each of which is characterized by a so-called vocal transmission function. In the case of voiced nonnasal sounds, the variable features of the corresponding Vocal transmission function may be specified solely in terms of the first few poles or resonances of the human vocal tract, whereas in the case of voiced nasal sounds and unvoiced sounds, the variable features of the corresponding vocal transmission functions must be specified in terms of zeros or antiresonances as well as poles or resonances of the human vocal tract. As explained in G. Fant, Acoustic Theory of Speech Production (1960), the production of human speech may be characterized in several different Ways. Thus the production of human speech may be described in terms of the human vocal mechanism which includes the glottis, the vocal tract, the nasal cavities, and other articulatory features. The vocal tract, for example, resembles a tube of non-uniform cross section which is characterized by a number of resonances or frequency regions of relatively effective transmission through the vocal tract. Another description of speech production is in terms of the properties of the speech sound wave, which has an amplitude spectrum characterized by a number of prominent peaks or formants that correspond to resonances of the vocal tract, as pointed out on page 2O of the Fant reference. Finally, by analogy with the terminology of acoustic and electrical engineering, the production of speech may be described as the response of the vocal tract filter systems to one or more sound sources, in which case the vocal tractV is characterized by a transmission function that may be expressed as a rational function of the complex frequency variable, s, as shown on page 42 of the Faut text. This rational function expression has both zeros and poles, with the zeros corresponding to antiresonances of the vocal tract and the poles corresponding to resonances of the vocal tract, as pointed out on pages 48 through 62 of the Fant reference.

Resonances and antiresonances of the various classes -of sounds are manifested by maxima and minima, respectively, in the envelopes of the amplitude spectra of these sounds, as pointed out on page 24 of the Fant reference cited above, and as shown and described by I. L. Flanagan in Note on the Design of Terminal-Analog Speech Synthesizers, vol. 29, Journal of the Acoustical Society of America, pages 306, 310 (1957). FIGS. 6A, 6B, and 6C illustrate the envelopes of the amplitude spectra of typical voiced nasal, unvoiced, and voiced nonnasal sounds, respectively, in which the symbol x on the horizontal axis indicates the approximate frequency location of a spectral maximum or resonance, and the symbol O on the horizontal axis indicates the approximate frequency location of a spectral minimum o1' antiresonance. i

It is observed in FIG. 6C that there are three principal maxima or resonances in the spectrum of a typicalY voiced nonnasal sound, while in FIG. 6A there are fourV principal V in the production of voiced nasal sounds, that is, in the production of voiced nasal sounds, the nasal tract is theV main branch for the flow of air from the lungs, and the vocal tract acts as a side branch. For voiced nonnasal sounds of the type illustratedV by FIG. 6C, the vocal tract acts as the main branch. Y

In the unvoiced sound spectrum shown in FIG. 6BV it is noted that there are four resonances and four antiresonances, but that there are three pairs of resonances and antiresonances in which the resonance and antiresonance lie so close together that they essentially nullify each other, as evidenced by the small maxima and minima corresponding to these resonance-antiresonance pairs in the spectrum. Hence unvoiced sounds may be specified in terms of two resonances and a single antiresonance.

In the following description of the apparatus of this invention, the three classes of sounds described above will be characterized in terms of their principal resonances and antiresonanccs as shown in FIGS. 6A, 6B, and 6C. However, it is to be understood that these sounds may be specified by additional resonances and antiresonances, if desired, with appropriate modification of the necessary components of this invention.

Complete system Referring first to FIG. 1, this drawing illustrates a complete bandwidth compression system in which both resonances and antiresonances are utilized as the information bearing characteristics ofspeech to conserve transmission channel bandwidth. At the transmitter stat-ion of this system, an incoming speech wave from source 10, for example, a conventional microphone, is applied simultaneously to resonance and antiresonance locator 11, pitch detector and voiced amplitude detector 109, and unvoiced'amplitude detector 110. As described in detail below, locator 11 produces at any given time one of three groups of vnarrow band control signals, one representative of selected resonant frequencies of voiced nonnasal sounds, one representative yof selected resonant and antiresonant frequencies of voiced nasal sounds, and the other representative of selected resonant and antiresonant frequenciesof unvoiced sounds. Thus the succession of these three diferentclasses of sounds in the incoming speechwave causes locator 11 to produce a corresponding succession of groups of control signals `each representative of the corresponding sound. Detector 109, which is illustrate-d in detail in FIG. 2C, derives from the speech wave a pitch control signal indicative of the frequency of the fundamental component of voiced portions of the speech wave, and a voiced amplitude control signal representative -of the amplitude of the fundamental component of voiced portions of the speech wave. Detector 110,' which may comprise a conventional rectifier followed by a low pass filter, develops an unvoiced amplitude control signal indicative of the energy of unvoiced portions of the speech Wave.

The control signals produced at the transmitter station specify the characteristics of the three classes of speech sounds referred to above with a high degree of accuracy, and thertotal frequency bandwidth of these contr-ol sig- ,nals is substantially smaller than that of the speech wave from source 16. I-Ience these control signals may be transmitted to a distant receiver station lover a reduced bandwidth transmission channel, a suitable channel being indicated in FIG. 1 by broken lines.

At the receiver station, the voiced nasal and nonnasal control signals, the pitch control sigral, and the voiced amplitude control signal are applied to voiced spectrum synthesizer 113 which reconstructs replicas `of the spectra of voiced nasal and nonnasal sounds. VThe unvoiced resonance and antiresonance control signals and the unvoiced amplitude control signal are delivered to unvoiced spectrum synthesizer 114, which reconstructs replicas of the spectra Iof unvoiced sounds. The structure of synthesizers 113 and 114 is illustrated inFIGS. 3A, 4A, and 5, and the reconstructed spectra developed by these synthesizers are combined in adder 115, which may be any one of a number of well-known adding circuits.v The combined spectra are converted into audible speech Vsounds by reproducerv 116, for example, a conventional loudspeaker. A

l Resonance and antiresonance locator 11 is an improvement upon'the formant locator apparatus shown in Patent arcanes 3,127,477 issued March 3l, 1964, to E. E. David, Jr., M. V. Mathews, and l. E. Miller, Serial No. 205,663, filed June 27, 1962, in that locator lll determines both resonances and antiresonances in the spectrum of an incoming speech Wave, whereas the formant locator apparatus shown in the David et al. patent determines only resonances or formants. The process of determining resonances and antiresonances embodied by locator ll comprises a rapid succession of repetitive cycles initiated by a repetitive cycle control signal generated by control circuit ltl and delivered through switching circuit ldd to one of the three signal generators lltla, lilSb, or 165C. Control circuit 163 may be similar in design to control circuit 23 of the above-mentioned E. E. David, lr., et al. patent but switching circuit lila is shown in detail in FIG. 2A of the present application.

Switching circuit lila is supplied with the incoming speech wave from source itl, and after determining which one of the three classes of sound is present in the speech wave, circuit M54 passes the cycle control signal to the appropriate signal generator. In response to the cycle control signal passed by switching circuit 164, the appropriate signal generator repetitively produces in each cycle a set of signals from which an artificial spectrum may be constructed. Voiced nonnasal control signal generator lti'a produces a set of signals representing selected resonant frequencies of voiced nonnasal sounds, voiced nasal control signal generator 195i) produces a set of signals representing selected resonant and antiresonant frequencies oi voiced nasal sounds, and unvoiced control signal generator ltl'c produces a set of control signals repre senting selected resonant and antiresonant frequencies of unvoiced sounds. Each of the signals produced by generators ltlSa through lltlc varies continuously over a range of values corresponding to the usual frequency range of a particular resonance or antiresonance. Further, the combination of values represented by each set of control signals is made to vary in a manner similar to that shown in the aforementioned E. E. David, lr., et al. patent so that during each cycle of operation of locator ll, each set of control signals collectively represents all possible combinations of resonance or resonance and antiresonance locations.

The sets of control signals from generators 195o and 10517 are applied to voiced spectrum synthesizer litio, together with the pitch and voiced amplitude control signals from detector 1il9. Synthesizer lilo is similar in design to synthesizer llS at the receiver station, illustrated in detail and labeled synthesizer 50 in FG. 5, and constructs from the incoming control signals an artificial signal having an amplitude spectrum whose resonances or resonances and antiresonances vary continuously in location in synchrony with the continuous variations in value of one or the other set of applied control signals. Similarly, the set of control signals from generator 165C is applied t-o unvoiced spectrum synthesizer 107, together with the unvoiced amplitude control signal from detector llt). Synthesizer lil? may be identical in structure with synthesizerilli at the receiver station, illustrated in detail and labeled synthesizer Sli in FIG. 5, and constructs from the incoming control signals an artificial signal having an amplitude spectrum whose resonances and antiresonances vary continuously in location in synchrony with the continuous variations inV value of the set of applied control signals.

In order to make the spectra orr the artificial signals from synthesizers lilo and 1.07 resemble the spectra of the various sounds contained in the speech Wave in every important respect save resonance and antiresonance locations, it is necessary that the frequency components of the'artificial spectra be scaled up in frequency, as eX- plained in the previously mentioned E. E. David, Jr., et al. patent. This is accomplished for the artificial signals produced by synthesizer 106 by passing the pitch control signal from detector 169 through a conventional voltage amplifier i12 having an appropriate gain constant. For the artificial signals produced by synthesizer 107 this is achieved by providing synthesizer 167 with a noise source that generates frequency components with sufliciently high frequencies; that is, as shown in FIG. 5, noise generator Sill of synthesizer 5l may be a conventional Wide band gas diode source.

The artificial signals produced by

synthesizers

106 and 107 are combined in adder 16S, which may be of any well-known sort, to form at the output terminal of adder "1% a succession of artificial signals having spectra corresponding to the succession of different classes of sounds appearing in the incoming speech wave. From adder 10S the succession of artiiicial signals is passed to analyzer L92, which separates each successive artificial signal into the individual frequency components that constitute its amplitude spectrum. Analyzer ltlll similarly separate the incoming speech wave from source 19 into the individual frequency components that constitute the speech spectrum, and the two sets of frequency components from analyzers Iiil and lil?, are sent to comparator 101. Analyzers 109 and lil?. and comparator lill may each be similar in construction with analyzers 2l ,and 27 and comparator 22, respectively of the above-mentioned E. E. David, .l r., et al. patent.

Comparator lill operates upon the two sets of frequency components from analyzers ldd and 192 to derive an error signal whose magnitude is representative of the difference between each artificial spectrum and the speech spectrum during cach cycle. The diierence between the two spectra, as represented by the error signal, arises from the difference between the resonance or resonance and antiresonance locations of the speech spectrum and the respective resonance or resonance and antiresonance locations oi the artificial spectrum, because in every other important respect, that is, in pitch and in amplitude as derived by detectors 169 and 119, the speech spectrum and the artificial spectrum are identical.

Since the resonances or resonances and antiresonances of each artificial spectrum are made to occur at substantially all possible resonance or resonance and antiresonance locations during each cycle, theoretically there will be at least one point in each cycle when the resonance or resonance and antiresonance locations of the artificial spectrum will be identical with the resonance or resonance and antiresonance locations of the speech spectrum. Accordingly, the magitude of the error signal developed by comparator lill at such a point in each cycle would be zero. However, because of noise and other imperfections, the artificial spectrum will never be exactly identical with the speech spectrum, but there is at least one point in each cycle at which the difference between the two spectra is at a minimum, this minimum difference being indicated by a corresponding minimum in the magnitude of the error signal produced by comparator 101.

To determine the resonance or resonance and antiresonance locations of the artificial spectrum at the point in each cycle at which the difference between the artificial spectrum and the speech spectrum is a minimum, the error signal from comparator lill is passed to control circuit '103, where the error signal is continuously examined to determine its minimum magnitude in each cycle. At the instant in each cycle that circuit lltl detects a minimum magnitude, it delivers a sample control signal to samplers lila through ills, which may be identical with sampler 23 or" the above-mentioned E. E. David, Jr., et al. patent, and in response, the sampler that is receiving control signals from its corresponding signal generator .'iSa, lilSb, or ltlc samples and stores the values of the set of resonance or resonance and antiresonance signals which is being applied from the signal generator. When the cycle ends, these sampled and store values are converted into control signals that indicate with a high degree of accuracy the locations of the resonances or resonances and antia resonances of the spectrum of the class of sound present at a given instant in the incoming speech wave.

As described in the 'E. E. David, l r., et al. patent referred to above, the difference between an artificial spectrum and the speech spectrum, as represented by the error signal, may be measured in several ways. It is to be understood that any one of these measures of error may be utilized in this invention.

Pitch detector and voiced amplitude detector Turning now to FIG. 2C, this drawing illustrates the structure of detector 109 employed at the transmitter station of this invention, as shoum in FIG. 1. The incoming speech wave from source is passed through bandpassrfilter 27 to obtain the speech frequency components lying in the frequency range between 70 and 250 cycles per second. Within this frequency range, the bandpass characteristic of filter 27 is designed in conventional fashion to decrease with frequency at the rate of 12 decibels per octave, as shown in FiG. 2D; that is, the amplitude of each component in this frequency range is reduced by a factor of where f denotes frequency. Since the speech frequency components lying within this pass band have a natural tendency to decrease in amplitude with increasing frequency at a rate of 12 decibels per octave, the amplitude of the first or fundamental frequency component of the speech wave is further enhanced by passage through filter 27.

Because the chief constituent of the output signal of filter 27 is the fundamental frequency component, a pitch control signal representative of the fundamental speech frqeuency may be obtained from this output signal. For example, a conventional frequency meter 30 may be employed to determine the reciproeals of the intervals between successive positive-going axis crossings of the output signal of filter 27. The pitch control signal developed by frequency meter 30 is then utilized in the manner shown in FIG. 1.

A signal representative of the magnitude of the fundamental speech frequency component voiced sounds may also be derived from the output signal of filter 27. This is accomplished by rectifying the output signal of filter 27 and averaging the rectified signal over an interval comparable to the period of the fundamental speech component, for example, over an interval on the order of 10 to milliseconds. A conventional rectifier 28 and low pass filter 29 may be utilized to perform the rectifying .and averaging operations. However, the magnitude of the rectified and averaged signal developed at the output terminal of filter 29 is not indicative of the amplitude of the fundamental frequency component because the char acteristic of filter 27 reduced the magnitude of the fundamental component by a factor To cause the magnitude of the rectified and averaged output signal of filterk 29 to represent the magnitude of the fundamental speech component, it is therefore necessary to increase the magnitude of the rectified and averaged` signal by a factor f2, for example, by squaring the magnitude of the pitch control signal and multiplying the rectified and averaged signal by theesquared pitch control signal. These operations may be realized in the fashion indicated in FIG.`2C by squaring circuit 31-and multiplier 32, both vof which maybe of well-known con#` struction. The output signal of multiplier 32 is then suitable for use as a voiced amplitude control signal, as illustrated in FIG. ,1. Y

Switching circuit Referring now to FIG. 2A, this drawing shows the structure of switching circuit 104 employed in locator 11 at the transmitter terminal of the system of F-IG 1. The cycle control signal from control circuit 103 is directed 4by relays R1 and R2 to one of the three signal generators 105g, 105i), or 105C according to the following logic. Relay R1, which is Yprovided with a control terminal, an input terminal, and two output'terminals, may be any one of a number of well-known devices for directing an incoming signal from its input terminal to one or the other of its output 4terminals in response to whether or not a control signal of sufficient magnitude is applied to its control terminal. In FIG. 2A, the voiced amplitude control signal is applied to the control terminal of relay R1, the cycle control signal is applied to lthe input terminal of relay R1', and the output terminals of relay R1 are connected to unvoiced signal generator 105C and to the input terminal of relay R2, Relay R1 is energized by the voiced amplitude Vcontrol signal from detector `109 so that the cycle control signal is passed to the input terminal of relay R2 whenever a voiced sound, either nasal or nonnasal, is present in the incoming speech wave. When an unvoiced sound is present in the incoming speech wave, the magnitude of the voiced amplitude control signal falls below a predetermined level necessary to energizerelay R1, and in its deenergized state, relay R1 passes the cycle control signal to unvoiced control signal generator 105e.

To understand the operation of circuit 104 in 'distinguishing between voiced nasal and voiced nonnasal sounds it is convenient at this point to refer to the graph in FIG. 2B. The upper dashed line in the graph of FIG. 2B illustrates the relatively small decrease of amplitude as a function of frequency for Ithe spectrum of a typical' nonnasal voiced sound, while the lower dashed line illus-l trates the relatively large decrease of amplitude -as a function of frequency for the spectrum of a typical nasal voiced sound. It is therefore observed that in the frequency range from 0 to 3,000 cycles per second the difference in average amplitude between the components in the range from Oto 1,500 cycles per second and the components in the range from 1,500 to 3,000I cycles per second is greater for nasal voiced sounds than for nonnasal voiced sounds. This characteristic difference in average amplitude is turned to account in the switching circuit apparatus shown in FIG. 2A to direct the cycle control signal to one of the two signal generators 1050 or 105b.

The incoming speech wave from source 10 is applied simultaneously to two parallel paths, each path containing a bandpass filter, a rectifier, and a low pass filter connected in series. The bandpass filter 22a in the upper path is proportioned to pass speech componen-ts lying in the frequency range from 0 to 1,500 cycles per second, while bandpass filter 22h in the lower path is proportioned to pass speech components within the frequencyV range from 1,500 to 3,000 cycles per second. The output terminal of the upper path is connected to the minuend terminal of a conventional subtractor device 25, and the output terminal of the lower subpath is connected to the subtrahen'd terminal of subtractor 25. The magnitude of the difference signal developed at the output terminal of subtractor 25 is indicative ofwhether a nasal or a nonnasal voiced sound is present'in the incoming speech wave. The ldifference signal from subtractor 25 is passed to a threshold detector 26, which may be of any Wellknown design, and if the magnitude of the difference signal exceeds the predetermined threshold of detector 26, thereby indicating the presence of a nasal voiced sound, an output signal from detector 26 is applied to the `control terminal of relay R2. When relay R2 is energized by arcontrol signal from detector 26,'the incoming cycle control signal is passed to signall generator 10511. If the 'magnitude of -the difference signal does not exceed the predetermined threshold of detector 26, thereby indicating the presence. of a nonnasal voiced sound, relay R2 remains deenerglzed and the cycle control signal is delivered to signal generator llr'lSa.

Signal generators 105e, 10511, 105C The structure of signal generators 105e through 105e is based upon the structure of formant generator 24 shown in the above-mentioned E. E. David, Jr., et al. patent, modilied, however, to take into account antiresonances where necessary. Thus in FIG. 7'signal generator ltlSa may be identical in construction with formant generator 24 illustrated in FIG. 2B of the E. E. David, Jr., et al. patent, that is, signal generator 105s comprises a monostable multivibrator 70a and free running multivibrators 'lla and 72a, each of the two latter multivibrators generating a succession of positive-going pulses at rates on the order of 200 and 2,000 cycles per second, respectively. Monostable multivibrator 70a is triggered to its unstable state by the cycle control signal delivered from control circuit 103 through switching circuit 104, and the positive-going pulse generated by multivibrator illu at the beginning of each cycle is applied through capacitor Cl to sampler lill.

Each multivibrator 70a, 71a, 72u is connected to a ramp network 2755i, 76a, 77u, respectively, which converts each positive-going output pulse into a ramp-shaped signal that increases continuously over a range of values corresponding to the range of frequencies within which a particular speech resonance is defined to occur. The ramp networks of signal generator 165:1 may also be identical in structure with the ramp networks described in the E. E. David, lr., et al. patent, and during a single cycle the output signals of ramp networks '75a through 77a represent substantially all possible locations of the three principal resonances of voiced nonnasal sounds.

Signal generators lllz'a'b and ll'c operate upon the same principles as signal generator 1055i, but the set of signals generated by each of the two former circuits represents substantially all possible locations of the principal resonances and the principal antiresonance of voiced nasal n sounds and unvoiced sounds, respectively. Thus signal generator lliSb is provided with one monostable multivibrator 70h, four free running multivibrators 715 through 741'), and five corresponding ramp networks 'lSb through 79h, to produce a set of live voiced nasal control signals. Suitable frequencies of oscillation for multivibrators 71h through 74h are on the order of 200, 2,000, 20,000, and 200,000 cycles per second, respectively. Four of the five control signals represent the ranges of frequencies within which four of the principal resonances of voiced nasal sounds are defined to occur, and the fth control signal represents the range of frequencies within which one of the principal antiresonances of voiced nasal sounds is delined to occur, for example, as shown in FIG. 6A.

Similarly, signal generator lllSc is provided with one monostable multivibrator 76e, two free running multivibrators Pile and 720, and three corresponding ramp networlrs 75C through 77e, to produce a set of three unvoiced control signals. The frequencies of oscillation for multivibrators 1c and '72e may be the same as those of multivibrators 71a and 72o in generator ltlSa. rlwo of the three control signals represent the ranges of frequencies. within which two of the principal resonances of unvoiced sounds are defined to occur, and the third control signal represents the range of frequencies within which one of the principal antiresonances of unvoiced sounds is defined to occur, for example, as shown in FIG. 6B.

Spectrum synthesizers Synthesizers suitable for reconstructing the spectra of voiced nasal, voiced nonnasal, and unvoiced sounds at both the transmitter and receiver terminals of this invention are illustrated in detail in PiG. 5. Voiced spectrum synthesizer Si) is designed to reconstruct replicas of the spectra of both voiced nasal and voiced nonnasal sounds,

l@ Y while unvoiced spectrum synthesizer Sl is designed to reconstruct replicas of the spectra of unvoiced sounds.

Within synthesizer 5i), there is provided a pulse generator Sill that delivers a succession of brief pulses of uniform amplitude whose repetition frequency is controlled by a pitch control signal of the type derived by detector 109 of FIG. 2C. lf desired, a relaxation oscillator. of conventional design may be used as a pulse generator 501. From pulse generator 5M the succession of pulses is passed to modulator S02, where the amplitudes of the incoming pulses are adjusted in response to a voiced amplitude control signal from detector 109 applied to the control terminal of modulator 502. The amplitude adjusted pulses appearing at the output terminal of modulator 502 are applied to a cascade of four electronically controllable resonance circuits 593 through 506 and one electronically controllable antiresonance circuit 507. Schematic diagrams of resonance and antiresonance circuits which may be utilized are shown in FIGS. 3A and 4A, respectively.

For the production of voiced nonnasal sounds, only the ilrst three resonance circuits 503, 59d, SGS are utilized, that is, voiced nonnasal control signals of the type developed by signal generator llio'a are used to `control resonance circuits 503, S64, 505 to reconstruct a replica of the spectrum of a voiced nonnasal sound. For the production of voiced nasal sounds, however, resonance circuit Sd and antiresonance circuit 507 are employed in addition to circuit Sti@ through Sd, with a set of voiced nasal control signals of the type developed by signal generator ltlSb being furnished to control circuits S03 through 507. Thus synthesizer Sil reconstructs replicas of both voiced nasal and voiced nonnasal sounds, the particular spectrum depending upon the set of control signals which is applied v to circuits 5%53 through S07.

The spectra of unvoiced sounds are reconstructed by synthesizer 5l, which comprises a noise generator 511 for generating a noise voltage followed by a modulator 5l2. The bandwidth of the noise voltage depends upon the application of the unvoiced synthesizer, since as previously mentioned in connection with locator ll, it is necessary for unvoiced synthesizer in locator lll to have a wide band noise voltage, whereas at the receiver station, the bandwidth of the noise voltage may be on the order of the bandwidth of the incoming speech wave from source lll. The amplitude of the noise voltage is adjusted in modulator S12 in response to an unvoiced amplitude control signal supplied, for example, by an unvoiced amplitude detector of the type illustrated by element il@ in FIG. l. From modulator 512 the amplitude adjusted noise voltage is applied to a cascade of two electronically controllable resonant circuits 51.3 and Sill and one electronically controllable antiresonant circuit S15, examples of suitable resonant and antiresonant circuits being shown in FIGS. 3A and 4A, respectively. Reconstruction of a replica of an unvoiced sound spectrum is controlled by a set of unvoiced control signals of the type developed by signal generator ltlc shown in FG. 7, with the two resonance control .signals being applied to the respective control terminals of circuits 513 and Sid, and the one antiresonance control signal being applied to the control terminal of circuit 515.

Synthesizers 50 and 5l are connected to the input terminals of adder 52 so that the two synthesizers may operate simultaneously, if desired, for example, to reconstruct so-called voiced fricative sounds. The output terminal of adder 52 is connected to reproducer 53, which may be a conventional loudspeaker, in order to convert the reconstructed spectra into audible speech sounds.

Turning now to FIG. 3A, this is -a schematic drawing of la resonant circuit comprising a resistive element R, an inductive element L, and a capacitive element C, all connected in series. The frequency of resonance of this circuit may be varied -by changing the capacitance of 3 capacitive element C, where if desired, C may be a wellcreases velement R may be setto a preassigned value corresponding to -any desired resonance bandwith appropriate to one or more classes of human speech sounds. iFor example, element R of

resonanceY circuits

503, 504, 505, and 506 of synthesizer `Si) in lFIG. may be adjusted to produce half-power bandwidths on the order of 50, 50, 75 and 200 cycles per second, respectively, while elem-ent R of resonant circuits 513 and 514- of synthesizer 51 may be adjusted to produce half-power bandwid-ths of .about 500 cycles per second. Y

FIG. 4A is Ia schematic diagram of an antiresonant circuit suitable for use in voiced alud unvoiced synthesizers 50 yandSll shown .in FIG. 5. The voltage e2 across fthe series connected inductive element 42, resistive element 43 and capacitive element 44 is given by where j=\/l, w denotes frequency, L is the inductance of element `4L?, R is .the resistance of element 43, and C is the capacit-ance of element 44.

lln order to produce an antiresonance at .a desired frequency in the incoming signal to the circuit, denoted e1, it is necessary lfor'the .ampii-tude response curve for the circuit, las specified by the ratio e2/e1, 1to have .the shape illustratedY in PIG. 4B, that is, at zero frequency, w=0, the gain -must be unity, :at the desired antiresonant frequency wo', the Igain must approach zero, and at :frequencies higher than wo the gain must increase with increasing frequency. These conditions are satisfied if the current, i, is made equal to It is therefore evident from

Equations

4 and 5 that the proper .amplitude response is realized by passing the 1ncoming signal el, through a `diiierentiator 40, which may.

be of any well-'known construction, followed by an ampliiier 41 having a high -output impedance land delivering an output current that is proportional to its input voltage ein according to the relationship nece, (6') where, by the `action of diferentiator 4G,

einzfwel' The change in capaciever, in order toA preserve the amplitude response speciiied by Equation 5 for vari-ations in -the capacitance C of element 44, the factor C in Equation 6 requiresthat the output current delivered by amplifier 41 also be varied. This is accomplished by simultaneously applying the antiresonance control signal to amplier 41 as weil as to element 44.

The half-power bandwidth .of lthe antiresonant frequency of the circuit of FIG. 4A may also be varied, if desired, by changing the resistance of element R in response to an appropriate control signal. However, in the absence of a bandwidth control signal, the resistance of element R maybe adjusted to any desired value corresponding to the half-power bandwidth of one or more antiresonances of human speech sounds. For example, element R of antiresonant circuit 597 of synthesizer 50 in FIG. 5 may be adjusted to produce a half-power band- Vwidth .of approximately `200 cycles per second, while elernent R of :antiresonantv circuit 515 in synthesizer 51 may Ybe adjusted to produce a half-power bandwidth of approximately 1,060 cycles per second.

*Y Although this invention has been described in terms of speech communications systems of the type shown in FIG. 1, it is to be understood that applications of the principles of this invention are not limited to such systems,

but include the fields of automatic speech recognition, speech processing, and automatic message recording and reproduction. ln addition, i-t is to be understood that the above-described embodiments are merely illustrative of the numerous arrangements which may be dev-ised vfor the principles of this invention by those skilled in the art without departing from the spirit and scope of the invention.

What is claimed is:

il. Y A speech transmission system that comprises a transmitter station including .a source of an incoming speech wave characterized by a succession of .different speech sounds including voiced nonnasal sounds, voiced nasal sounds, and unvoiced sounds,

means supplied with said speech wave for deriving from said succession of diierent sounds a corresponding succession of groups .of reduced bandwidth control signals representative of selected resonances and antiresonances in the spectra of said different sounds of said speech wave, and

means supplied with said speech wave for obtaining a set of control signals representative of the frequencies and the amplitudes of the `fundamental components of said voiced nonnasal and voiced nasal sounds of said speeech wave and the energy of said unvoiced .sounds of said speech wave,

Vmeans for transmitting said corresponding succession of groups of control signals and said set of control signals to a receiver station, and

at said receiver station,

means for synthesizing from said corresponding succession of groups of control signals and said set of control signals a succession of spectra which is a replica of the spectra of said succession of different sounds of said speech wave.

2. A speech transmission system that comprises a .transmitter station including a source of lan incoming speech wave characterized by a succession of diiferent speech sounds including voiced nonnasal sounds, voiced nasa'l sounds, and unvoiced sounds, Y

a resonance and ant-iresonance locator supplied with said speech wave for deriving from said succession of diieren-t sounds a corresponding succession of sets of control signals respectively representative of selected resonances and antiresonances in the spectrum of each of said didcrent sounds of said speech wave, wherein a iirst one of said .set-s of contr-o1 signals is representative of a plurality off selected resonances 'aisance is p of voiced nonnasal sound spectra, a second one of said sets of control signals is representative of a plurality of selected resonances and a single selected antiresonance of voiced nasal sound spectra, and a ld signal generato-r producing a set of resonance and antiresonance signals lwhich correspond to a plurality of selected resonances and a single selected antiresonance of unvoiced sounds,

third one of said sets of control signals is reprea switching circuit provided with three input terminals sentative of a plurality of selected resonances and and three output terminals in one-to-one cora single selected antiresonance of unvoiced sound -respondence with said plurality of signal generators spectra, for delivering said cycle control signal to a selected a rst detector for deriving from said speech wave one of said signal generators according to the class a pitch control signal indicative of the frequency of speech sound present in said speech wave, of the fundamental component of said voiced nonmeans for applying said speech wave to a selected one nasal sounds and said voiced nasal sounds of said of the input terminals of said switching circuit, speech wave and a voiced amplitude control signal means for applying said first control signal to another indicative of the amplitude of the fundamental comof the input terminals of said switching circuit, ponent of said voiced nonnasal sounds and said means for applying said cycle control signal to the third voiced nasal sounds of said speech wave, `of the input terminals of said svi/.itching circuit,

a second detector for deriving from said speech wave means for connecting each output terminal of said an unvoiced amplitude control signal indicative of switching circuit to one of said signal generators, the energy of said unvoiced sounds of said speech means supplied with said iirst, second, and third conlwave, trol signals for synthesizing from the selected set of means for transmitting to a receiver station said sucsignals produced by said selected one of said signal Icession of sets of control signals, said pitch control generators an artificial signal, signal, said voiced amplitude control signal, and said rst analyzingmeans for deriving fromy said artificial unvoiced amplitude control signal, signal a lirst group of signals representative of the and at said receiver station, amplitudes of the individual frequency components first synthesizing means supplied with said iirst and of the spectrum of said artificial signal,

second sets of control signals, said pitch cont-rol second analyzing means for deriving from said speech signaL'and said voiced amplitude control signal for wave a second group of signals representative of the reconstructing the spectra of said voiced nonnasal amplitudes of the individual frequency components and said voiced nasal sounds of said speech wave, of the spectrum of said speech wave,

second synthesizing means supplied with said third set comparing means in circuit relation with said first and of cont-rol signals and said unvoiced amplitude conecond analyzing means for obtaining from said trol signal for reconstructing the spectra of said first and second groups of amplitude signals an unvoiced sounds of said speech wave, and error signal indicative of a preassigned measure of means for combining said reconstructed spectra to the diierence between Vsaid artificial spectrum and form a succession of spectra that is a replica of the spectra of said succession of dierent sounds of said incoming speech wave.

3. Apparatus for locating .resonances and antiresosaid speech spectrum, and

means responsive to said error .signal for sampling the values of said selected set of signals which correspond to the minimum magnitude of said error .signal during each cycle of said cycle control signal.

4. Apparatus as deined in claim 3 wherein said switching circuit comprises narices in the spectra of a succession of different classes of speech sounds which comprises a source of an incoming speech wave characterized by a succession of diierent speech sounds including voiced nonnasal sounds, voiced nasal sounds, and

a iirst relay means provided with an input terminal, a -control terminal, and fir-st and second output terunvoiced sounds, minals,

a source of a iirst control signal representative of the a second relay means provided with an input terminal, amplitudes of the fundamental frequency coma control terminal, and first and second output ponents of voiced nonnasal and voiced nasal sound terminals, portions of said speech Wave, v means for applying said first control 4signal to the input .a source of a second control signal representa-tive of terminal of said `first relay n'iGHS,

the frequency of the fundamental frequency commeans for connecting the irst output terminal of said ponents of voiced nonnasal and nasal sound portions first relay means to the input terminal of said third of said speech wave, `signal generator,

a source of a .third control signal representative of the means for connecting the second output terminal of energy of unvoiced sound port-ions of said speech said first relay means to the input terminal of said wave, second relay means,

a source of a repetitive cycle control signal having a means for connecting the iirst output terminal of said predetermined repetition rate, second relay means to the input terminal of said three signa-l generators in one-to-one correspondence second signal generator,

with said voiced nonnasal, voiced nasal, and unvoiced means for connecting the second output terminal of said speech sounds, respectively, wherein each of said second relay means to the input terminal of said signal generators, in response to said cycle control first signal generator, v signal, produces a set of resonance and antiresonance means for distinguish-ing between voiced nasal sounds signals whose values collectively vary to represent and voiced nonnasal sounds which includes substantially all possible combinations of locations a first signal path provide-d with an input terminal, an Eof selected resonances and antiresonances of the output terminal, and comprising a first bandpass filter correspon-ding one of said classes of speech sounds lwith a pass band extending from about zero to dur-ing each cy-cle of said cycle control signal, said 1,500 cycles per second, `rst signal generator producing a set of resonance a iirst rectier, anda lirst low pass filter connected in signals which correspond to a plurality of selected series, resonances of voiced nonnasal sounds, said second a second signal path provi-ded with an input terminal, signal generator producing a set of resonance and an output terminal, and comprising a second bandlantiresonance signals which correspond toa plurality pass ilter with a pass band extending from about of selected resonances and a single selected anti- 1,50() to 3,000 cycles per second, a second rectifier, .resonance of voiced nasal sounds, and said third and a second low pass iilter connected in series,

.aucunes means for simultaneousl applying said speech wave to the input terminals of said first and second signal paths,

subtracting means provided with a minuend terminal, a

obtaining a first control signal having a magnitude trol signal, said voiced amplitudev control` signal, and said rst sequence of sets of'control signals for reconstructing the amplitude spectra of said succession of voiced nonnasal and voiced nasal speech sounds,

subtrahend terminal, and an output terminal, a source of an unvoiced amplitude control signal repremeans for connecting the output terminal of said first sentative of the energy of unvoiced speech sounds,

signal path to the minuend terminal of said suba source of a second set of control signals representatracting means, ytive of selected resonance and antiresonance locameans for connecting the output terminal of said second tions in the amplitude spectra Yof each of said unsignal path to the .subtrahend terminal of said subvoiced sounds in said succession of dierent sounds tracting means, in said speech Wave,

a threshold detector provided with an input terminal, second synthesizer means responsive to said unvoiced an output terminal, and a threshold corresponding amplitude control signal and said second set of resoto the smallest expected difference in energy between nance and antiresonance control signals for recon- -the group of frequency components lying in the structing the amplitude spectra of unvoiced speech frequency range under about 1,500 cycles per second, sounds in said succession of different sounds in said and the group of frequency components lying within speech wave, the frequency range that extends from about 1,500 adding means for combining the spectra reconstructed to 3,000 cycles per second, and by said first and second synthesizer means to form means f-or connecting the output terminal 4of said sub- 2O a succession of spectra that is a replica of the spectra -tractng means to the input terminal of said threshold of said succession of voiced nonnasal sounds, voiced detector, and nasal sounds, and unvoiced sounds in said speech means for connecting the output. terminal of said Wave, and

threshold detector to the control terminal of said reproducing means supplied With the succession of second relay means. Y spectra from said adding means for converting said 5. Apparatus for deriving from a speech wave control succession of spectra into a corresponding successignals indicative of the frequency and the amplitude olf vsion of audible speech sounds. the fundamental speech frequency component which com- 7. Apparatus as defined in claim 6 wherein said first prises synthesizer means for reconstructing the amplitude spectra asource of an incoming speech wave, of said succession of voiced nonnasal and voiced nasal filter means supplied with said Ispeech wave for passing speech sounds comprises Ithe fundamental component of said speech wave, means responsive to said pitch control signal for genwherein said filter means is provided with a erating a train of uniform amplitude pulses with a repetition rate equal to the frequency of said funda- 1 35 mental component, f2 means responsive to said voiced amplitude control signal and supplied with said train of pulses for characteristic in the interval between about 70 and adjusting the amplitudes of said pulses to corre- 250 Cycles PGI SSCODd, Where f deIlOtS frequency, spond to the amplitude of said fundamental compoa first subpat'n comprising a rectifier and a low pass 40 nent, and

filter connected in series for rectifying and aver-aging a cascade of four resonant circuits and one antiresonant the fundamental component passed by said filter circuit supplied with said train of amplitude adjusted means, pulses and responsive to said first sequence'of sets Va second subpath connected in parallel with said first of control signals for reconstructing a succession subpath comprising frequency measuring means for of voiced nonnasal amplitude spectra and voiced nasal amplitude spectra corresponding to said succession of voiced nonnasal sounds and voiced nasal sounds in said incoming speech wave.

Y 8. Apparatus as vdefined in claim 6 wherein said second synthesizer means for reconstructing the amplitude spectra ofsaid succession of unvoiced speech sounds comprises means supplied with said first control signal for squaring the magnitude of said first control signal, 5o multiplier means in circuit relationship with said first .subpath and said squaring means for multiplying said rectified and averaged fundamental component by the squared magnitude of sai-d first control signal to produce a second control signal indicative of the magnitude Vo-f the fundamental component passed by said filter means.

6. A speech synthesizer for reconstructing a replica circuit supplied with said amplitude adjusted noise signal and responsivetto said second set of control signals for reconstructing a succession of unvoiced of an incoming speech wave characterized by a succes- Y sion of different sounds including voiced nonnasal sounds, voiced nasal sounds, and unvoiced sounds comprising a source of a pitch control signal representative of the frequency`of the fundamental component of each of said voicedrnonnasal andvv voiced nasal speech sounds,

a source of a Vvoiced amplitude control signal representative of the amplitude of the fundamental component of said voiced nonnasal sounds and said voiced nasal speech sounds,

a source of a first sequence of sets of control signals representative of selected resonance and anti- `resonance locations in the amplitude spectra of Said corresponding succession of voiced nonnasal and voiced nasal speech sounds,

first synthesizer means responsive to said pitch cony'amplitude spectra corresponding to said succession of unvoiced speech sounds in said incoming speech wave.

9. Apparatus for constructing an amplitude spectrum with an antiresonance that can be varied in location which comprises a source'of an incoming signal,

a source of an antiresonance control signall having a magnitude which is representative of a relatively wide range of antiresonance locations,

a differentiator provided with an input terminal and an output terminal for developing from said incoming signal an output signal representative of the derivative of said incoming signal,

means for applying said incoming signal to the input whereby the output signal developed across said series terminal of said diierentiator, connected inductive, resistive, and variable capacitive a variable transconductance .ampliier provided with elements contains an antiresonance Whose location an input terminal, an output terminal and a conis determined by the magnitude of said antiresonance trol terminal, control signal.

an inductive element, a resistive element, and a variable capacitive element connected in series, References Cited by the Examiner means for connecting the output terminal of said am UNITED STATES PATENTS plier to said inductive element, and 2 817 707 12/57 Weibel 179 1 means for applying said antiresonance control signal 10 to the control terminal of said amplifier and to vary D AVID G REDINBAUGH primary Examiner. the capacitance of said variable capacitive element,

Claims

1. A SPEECH TRANSMISSION SYSTEM THAT COMPRISES A TRANSMITTER STATION INCLUDING A SOURCE OF AN INCOMING SPEECH WAVE CHARACTERIZED BY A SUCCESSION OF DIFFERENT SPEECH SOUNDS INCLUDING VOICED NONNASAL SOUNDS VOICED NASAL SOUNDS, AND UNVOICED SOUNDS, MEANS SUPPLIED WITH SAID SPEECH WAVE FOR DERIVING FROM SAID SUCCESSION OF DIFFERENT SOUNDS A CORRESPONDING SUCCESSION OF GROUPS OF REDUCED BANDWIDTH CONTROL SIGNALS REPRESENTATIVE OF SELECTED RESONANCES AND ANTIRESONANCES IN THE SPECTRA OF SAID DIFFERENT SOUNDS OF SAID SPEECH WAVE, AND MEANS SUPPLIED WITH SAID SPEECH WAVE FOR OBTAINING A SET OF CONTROL SIGNALS REPRESENTATIVE OF THE FREQUENCIES AND THE AMPLITUDES OF THE FUNDAMENTAL COMPONENTS OF SAID VOICED NONNASAL AND VOICED NASAL SOUNDS OF SAID SPEECH WAVE AND THE ENERGY OF SAID UNVOICED SOUNDS OF SAID SPEECH WAVE, MEANS FOR TRANSMITTING SAID CORRESPONDING SUCCESSION OF GROUPS OF CONTROL SIGNALS AND SAID SET OF CONTROL SIGNALS TO A RECEIVER STATION, AND AT SAID RECEIVER STATION, MEANS FOR SYNTHESIZING FROM SAID CORRESPONDING SUCCESSION OF GROUPS OF CONTROL SIGNALS AND SAID SET OF CONTROL SIGNALS A SUCCESSION OF SPECTRA WHICH IS A REPLICA OF THE SPECTRA OF SAID SUCCESSION OF DIFFERENT SOUNDS OF SAID SPEECH WAVE.