US3069507A - Autocorrelation vocoder - Google Patents
Autocorrelation vocoder Download PDFInfo
- Publication number
- US3069507A US3069507A US48422A US4842260A US3069507A US 3069507 A US3069507 A US 3069507A US 48422 A US48422 A US 48422A US 4842260 A US4842260 A US 4842260A US 3069507 A US3069507 A US 3069507A
- Authority
- US
- United States
- Prior art keywords
- speech
- autocorrelation
- samples
- signals
- control signals
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000001228 spectrum Methods 0.000 description 84
- 238000005311 autocorrelation function Methods 0.000 description 49
- 230000005284 excitation Effects 0.000 description 18
- 230000005540 biological transmission Effects 0.000 description 16
- 230000009466 transformation Effects 0.000 description 16
- 230000000875 corresponding effect Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 8
- 238000005070 sampling Methods 0.000 description 5
- 230000002596 correlated effect Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 230000003111 delayed effect Effects 0.000 description 3
- 239000004020 conductor Substances 0.000 description 2
- 238000005314 correlation function Methods 0.000 description 2
- 230000001771 impaired effect Effects 0.000 description 2
- 238000000034 method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000000695 excitation spectrum Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/06—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
Definitions
- the channel vocoder described in H. W. Dudley Patent 2,151,091, issued March 21, 1939.
- the speech amplitude spectrum is divided into frequency bands by a bank of band-pass filters, and the energy contained within each band is represented by a narrow-band control signal.
- the control signals adjust the energy of selected frequency bands of an excitation spectrum generated at the synthesizer.
- the energy-adjusted frequency lbands are then combined to produce synthetic speech.
- Synthetic speech produced by the Dudley vocoder is distorted by the inherent limitations of the band-pass -filters that it employs.
- the speech amplitude spectrum should be represented by a group of points; as practiced by the Dudley vocoder, however, the finite widths of the band-pass filters produce points that are in fact averages of many points Within the frequency bands passed by the filters.
- an incoming speech wave is correlated with itself at an analyzer terminal to obtain a number of samples of each period of the speech autocorrelation function.
- the bandwidth required to transmit these samples is much smaller than that required to transmit the original speech wave, since the speech autocorrelation function changes very little from one period to the next.
- the samples are transmitted over a narrow-band channel to a synthesizer, where they are used in the reconstruction of artificial speech by serving as control signals to adjust the amplitude of an excitation signal generated at the synthesizer.
- the intelligibility of artificial speech reconstructed from autocorrelation control signals is impaired by the squaring of the speech -amplitude ⁇ spectrum inherent in the autocorrelation function representation of speech. Squaring the speech amplitude spectrum changes its shape, thereby altering the characteristics of most sounds.
- the amplitude spectrum of the autocorrelation samples is obtained by passing the samples through a first network of resistance elements, which performs a Fourier transformation upon the autocorrelation samples.
- the spectrum signals produced by this first network are then rooted by a group of square-root-taking circuits, whose rooted spectrum output signals represent an amplitude spectrum of the same shape as the amplitude spectrum of the original speech wave.
- Artificial -speech reproduced from the rooted spectrum signals is therefore free of the 3,069,507 Patented Dec. 18, 1962 distortion due to amplitude spectrum squaring.
- they are converted into autocorrelation samples by a second Fourier transformation network also composed of resistance elements.
- the second Fourier transformation network in the square-root-taking apparatus referred to above is replaced by a phase transformation network that converts the rooted spectrum signals into correlation signals having a predetermined phase spectrum.
- the operation of the phase transformation network is based upon a Fourier transformation in which the desired phase angles have been inserted.
- a fixed number of samples of the speech autocorrelation function is obtained at the analyzer terminal, the number of samplesbeing kept small, consistent with good quality speech, in order to conserve transmission channel bandwidth.
- the samples transmitted to the synthesizer terminal represent a portion of each autocorrelation period, and by deriving from each sample two equal amplitude, symmetrically located points on each reconstructed period, an artificial speech wave whose symmetry approximates the symmetry of the original speech autocorrelation function is reconstructed.
- the weighting function thereby reducing the discontinuities and the associated distortion in the artificial speech.
- the amount by which the magnitude of each sample is reduced depends upon the particular weighting function that is chosen.
- An optimum weighting function minimizes distortion with a. negligible loss in vocal characteristics. Since the previously mentioned amplitude unsquaring process is to be performed upon the samples after weighting, the weighting function must possess a real and nonnegative amplitude spectrum in order to assure physically realizable results from the square-root-taking process.
- FIGS. 1A, 1B, and 1C are a group of waveform diagrams of assistance in explaining this invention.
- FIG. 1D is a schematic block diagram showing a cornplete speech transmission system based upon the principles of this invention.
- FIG. 2 is a block schematic diagram showing apparatus for coding speech in terms of a fixed number of autocorrelation control signals
- FIGS. 3A, 3B, 3C, 3D, 3E, 3F, and 3G are a group of waveforms of assistance in explaining the operation of the apparatus of FIG. 2;
- FIG. 4 is a block schematic diagram showing apparatus for reconstructing artificial speech from autocorrelation control signals
- FIG. 5A is a schematic block diagram showing apparatus for unsquaring the amplitude spectrum of the autocorrelation control signals
- FIG. 5B is a schematic block diagram showing apparatus for converting rooted amplitude spectrum signals into their autocorrelation function counterparts
- FIG. 6 is a schematic block diagram showing apparatus for generating autocorrelation control signals having any desired phase spectrum from rooted amplitude spectrum signals.
- FIG. 7 is a schematic block diagram showing apparatus for reconstructing artificial speech having any desired phase spectrum from phase-transformed autocorrelation control signals.
- a speech wave g(t) with period T may be expanded in a Fourier series where the coeicients G(fn) constitute the amplitude spectrum of g(t) and the phase angles tbn constitute the phase spectrum.
- the autocorrelation function of lg(t) is deiined as the average a(- OTgcrge-od 2) where p(f) has the same period as g(t) and 1- represents the amount of time by which g( t) is delayed before being multiplied together with the undelayed speech wave.
- the autocorrelation function may also be expanded in a Fourier series
- the amplitude spectrum of p(1) is the square of the amplitude spectrum of g(t). Further, it is noted in a comparison of Equations 1 and 3 that all of the phase angles of the original speech wave have become zero in the speech autocorrelation function.
- FIG. 1A there is shown the amplitude spectrum of a typical voiced sound.
- FIG. 1B shows the amplitude spectrum of the autocorrelation function corresponding to the speech wave of FIG. 1A. It is observed in a comparison of the curves that squaring the amplitude spectrum, as given in Equation 3, doubles the differences between peaks of the amplitude spectrum, thereby suppressing the relatively small peaks and changing the characteristics of the sound.
- FIG. 1C shows several periods of a typical speech autocorrelation function, where it is noted that the autocorrelation function is symmetrical about the center of each period.
- the amplitude spectrum of the autocorrelation function may also be expanded in a Fourier series where, from the symmetry of the autocorelation function about the center of each period, the amplitude spectrum may be expressed in terms of autocorrelation function samples over either of the half periods T T -ST-NS 01' OSTNS Hence Equation 5 may be rewritten Complete Speech Transmission System' Referring rst to FIG. 1D, an incoming speech wave from source '80 is applied to speech autocorrelation function analyzer 81, which derives samples of the speech autocorrelation function from the speech wave. The details of source 80, analyzer 81, and the other elements of FIG. 1D are described below. The samples are passed through weighting network 82 in order to reduce the magnitudes of discontinuities in the sampled autocorrelation function.
- the amplitude spectrum of the weighted samples from network 82 is unsquared by Fourier transformation network ⁇ 83: and square-root-taking circuits 34.
- the rooted spectrum signals from circuits S4 are converted into autocorrelation function samples with zero phase angles by Fourier transformation network 85, and into correlation function samples having predetermined nonzero phase angles by phase transformation network l86.
- Switch 87 may be manually set to pass the output signals of either network S5 or network 86 to ⁇ a speech synthesizer 88, depending upon which set of signals is desired for a particular application of this invention.
- a speech wave is reconstructed from the output signals of either network 85 or network 86 by speech synthesizer 88, and artificial speech is reproduced from the reconstructed speech wave by reproducer 89.
- Analyzer ⁇ 81 is located at a transmitter terminal, while speech synthesizer 88 ⁇ and phase transformation network 86 are located at a receiver terminal. Elements '82, 83, S4, and 85, however, may be located at either the transmitter terminal or the receiver terminal.
- Filter 210 is proportioned to pass only those frequencies in the band from 0 to W cycles per second, where W may be chosen to be 4,000.
- the band-limited speech wave output of filter 210 is applied simultaneously to a tapped delay line 231, for example, a tapped acoustic delay line, and to a bank of multipliers, for example, modulators M1, M2 Mp, each of which is provided with two input terminals and one output terminal.
- Delay line 231 which is terminated in a matched impedance 211 to prevent reflection, is provided With taps P1, P2 Pp, at which appear signals proportional to the speech wave at various delay times, g(-Tl), g(-T2) g(-Tp), Where T1, T2 Tp are the various delay times corresponding to taps P1, P2 Pp, respectively.
- Modulators M1, M2 Mp in addition to receiving the undelayed speech wave at one of their input terminals, have their second input terminals connected to delay line taps P1, P2 Pp, respectively, to receive the variously delayed speech wave ⁇ as a second input signal.
- the modulators develop at their output terminals signals proportional to the products g( t) g(t-T1), g(t)-g(t-T2) g(t)g(t-rp), which are passed to a bank of averaging devices, for example, low-pass filters F1, F2 Fp, each having7 a cutoff of 25 cycles per second.
- Filters F1, F2 Fp develop at their output points signals proportional to averages of the product signals received from modulators M1, M2 Mp. From Equation 2, these signals are proportional to samples of the autocorrelation function at specific delay times, am), am) MTP).
- a signal proportional to go(()) is obtained by applying the undelayed speech Wave to both input terminals of modulator M0, and by connecting the output terminal of M to low-pass filter F0.
- Weighting Network The number of taps with which delay line 23d is provided, and the number of associated modulators and filters, determine the number of autocorrelation samples appearing at the output points of filters F1, F2 Fp. From the previously noted symmetry of the speech autocorrelation function, transmission channel bandwidth is conserved by sampling half periods of the autocorrelation function, the symmetry of each period being restored at the synthesizer from the transmitted samples; for example, a 3 millisecond period need only be sampled over the delay interval 0 to 11/2 milliseconds.
- Variations in the period of the autocorrelation function prevent accurate sampling of each half period with a fixed number of taps, modulators, and filters.
- the samples obtained Iby the apparatus of FIG. 2 are passed through a Weighting network N that reduces the magnitude of each sample, thereby reducing the discontinuities and the associated distortion in the artificial speech.
- the weighting network N is located at the analyzer terminal, Ibut, if desired, it may be located at the synthesizer terminal.
- Weighting network N consists of a group of resistors R0, R1, R2 Rp, one for each autocorrelation sample, connected to the output terminals of filters F0, F1, F2 Fp, respectively.
- the resistance values of the elements of circuit N are determined by the particular weighting function selected on the basis of the following criteria: (a) discontinuities in periods reconstructed 'from the weighted samples must be very small, consistent with the preservation of important speech characteristics; and (b) the amplitude spectrum of the Weighted samples must be real and nonnegative.
- Many suitable weighting functions satisfying these criteria are available, -for example, the class of decreasing autocorrelation functions, the decreasing property satisfying (a), and the autocorrelation property satisfying (b).
- One such function is shown graphically in FIG. 3E.
- FIG. 3E illustrates an example of a suitable Weighting function w(1), which decreases in value with increasing delay time.
- the signals developed at the output terminals of the resistors are weighted samples
- Artificial symmetrical periods reconstructed from a fixed number of samples after weighting are shown in FIGS. 3F and 3G, corresponding to the original periods shown in FIGS. 3A and 3B, respectively.
- the smoothness with which these periods begin and end is to be compared with the abrupt discontinuities of artificial periods reconstructed from the same samples without weighting, as illustrated in FIGS. 3C and 3D.
- the Weighted autocorrelation samples appearing at the output terminals of the weighting network N in FIG. 2 constitute a set of control signals from which artificial speech may be synthesized.
- the speech autocorrelation function changes very little from period to period, hence the variation of the control signals is very small.
- the control signals individually occupy relatively narrowfrequency bands, on the order of 25 cycles per second, and the entire group of control signals may be transmitted over a much narrower frequency band than is required for transmission of the original speech Wave.
- the control signals may be transmitted from the analyzer terminal to the synthesizer terminal by means of any well-known transmission medium to meet the requirements of the particular application of this invention.
- the weighted autocorrelation control signals must be supplemented with a signal whose characteristics indicate whether the instantaneous speech sound is voiced or unvoiced, an-d if voiced, its lfundamental pitch frequency.
- an excitation signal is derived from the supplementary signal, the excitation signal characteristics being closely correlated with the characteristics of the supplementary signal.
- Artificial speech reconstructed from the excitation signal under the control lof ⁇ the autocorrelation samples thus preserves faithfully the characteristics ⁇ of the original speech.
- the supplementary signal is derived by passing the speech wave output of source 2t) through band-pass filter 2M.
- the supplementary signal may also be a conventional voiced-unvoiced pitch signal, if desired.
- Synthesizer Artificial speech is reproduced from the transmitted autocorrelation control signals and supplementary signal at a synthesizer, a preferred embodiment of which is shown in FIG 4.
- the synthesizer reconstructs an artificial speech wave with symmetrical periods by using the supplementary signal to generate an excitation signal and by using the autocorrelation samples as control signals to form from the excitation signal symmetrically located samples of an artificial period.
- the symmetry of the artificial periods yreconstructed in this Ifashion approximates the symmetry of the original autocorrelation periods.
- each of the incoming control signals is applied to the control terminal of a modulator whose output terminal is connected to delay line 441.
- Delay line 441 is terminated in a matched impedance 421 to prevent reflection and is provided with a number of taps disposed in symmetrically located pairs disposed about center tap S40.
- the output terminal of modulator L40 is connected to center tap S40, and the output terminal of each of the other modulators L41, L42 L41, is connected to two taps S41, S41, S40, S42 84p, s4p, respectively, disposed at equal intervals about center tap S40.
- control signals (0), p(v1), 10(7-2) @(rp) are applied to the control terminals of modulators L40, L41, L42 L40, respectively, and adjust the amplitude of an excitation signal supplied in parallel to the input terminals of the modulators from excitation signal generator 491.
- Generator 491 derives the excitation signal from the incoming supplementary signal, which is first passed through an equalizing delay 41 to synchronize the supplementary signal with the control signals.
- Excitation signal generator 491 which is fully described in a patent application of M. R. Schroeder, Serial No.
- excitation signal generator 491 may be the usual buzz-hiss source.
- the incoming control signals adjust the amplitude of the excitation signal from generator 491, and the amplitude-adjusted excitation signals derived by the modulators are passed to delay line 441 via the various tap connections.
- the output signals of the modulators reappear at the output terminal of delay line 441 as samples of symmetrical periods of an artificial wave. These samples are smoothed to form an artificial wave by filter 442, proportioned to eliminate all frequencies greater than 4,000 cycles per second.
- the electrical wave formed by filter 442 is converted into audible and intelligible speech by conventional reproducer 443 connected to the output terminal of filter 442.
- FIG. 5A Apparatus for unsquaring the amplitude spectrum of the weighted autocorrelation control signals is shown in FIG. 5A. Ihe apparatus of FIG. 5A may be located in its entirety at either the analyzer or the synthesizer, or, if desired, the component parts may be conveniently divided between the two stations.
- Network 50 of FIG. 5A converts the incoming control signals into signals representing specific values of the corresponding amplitude spectrum, in accordance with the Fourier transformation of Equation 5.
- the amplitude spectrum signals developed at the output terminals of network 50 are then passed through rooting circuits H51, H52 Hp, which perform a square-root-taking operation upon the spectrum signals.
- the rooted spectrum signals derived by the rooting circuits represent an amplitude spectrum of approximately the same shape as the amplitude spectrum of the original speech wave, as given by Equation 7.
- the autocorrelation signals corresponding to the rooted spectrum signals are obtained by passing the latter through network 51 of FIG. 5B, which performs an inverse Fourier transformation in accordance with Equation 3.
- 'Ihe autocorrelation signals appearing at the output terminals of network 51 constitute a group of control signals from which artificial speech may be reconstructed by applying them, together with a supplementary signal, to a synthesizer such that shown in FIG. 4.
- Each row of resistors corresponds to a particular series in Equation 8, and the resistance values of the individual resistors in each row are proportional to the absolute values of the cosine factors appearing in the corresponding series.
- the exact configuration of network 50 ⁇ depends upon two factors: the frequency resolution or number of amplitude spectrum values desired in the reconstructed speech wave; and the number of autocorrelation samples obtained at the analyzer.
- the frequency resolution determines the number of rows of resistors, q
- the number of autocorrelation samples determines the number of resistors in each row, p.
- the incoming weighted autocorrelation sv(r1)'w(r1),(p(r2)w(r2) otfplwop) are applied to the input points I1, I2 Ip, respectively, of network 50, and the sample p(f)w(0) is applied in parallel to the second input terminals of adders B51, B52 Bq.
- the signal developed at the output terminal of each adder is a linear combination of the weighted autocorrelation samples, which, in accordance with Equation 8, is proportional to a particular amplitude spectrum value.
- the output signals I W(f1), I W(f2) CDWUQ) of network 50 are applied to a bank of rooters H51, H52 H5q, -of any suitable variety. Each rooter develops at its output point a signal whose magnitude is proportional to the square root of the magnitude of the signal applied to its input point, that is,
- these signals represent an amplitude spectrum of approximately the same shape as the amplitude spectrum of the original speech wave; hence, artificial speech reconstructed -fro-m either the rooted spectrum signals or their autocorrelation counterparts is substantially free of the distortion caused by amplitude spectrum squaring.
- Each row of resistors corresponds to a particular series in Equation 9, and the resistance values of the individual resistors in each row are proportional to the absolute values of the cosine factors appearing in the corresponding series.
- the rooted spectrum signals from the footers of FIG. 5A are applied to the input terminals of network 51, thereby developing at each common output point a linear combination of signals proportional to a particular series of Equation 9.
- a signal proportional to p(0) is obtained by applying each of the rooted spectruml signals to an input terminal of adder 53; in accordance with Equation 9, the linear combination of rooted spectrum signals formed at the output terminal 00 of adder 53 is proportional to (0).
- the configuration of network 51 is determined by two quantities: 'the number of amplitude spectrum values produced by network 5i); and the number of autocorrelation samples to be supplied to the synthesizer.
- the number of rows of resistors, p is determined by the number of autocorrelation samples to be supplied to the synthesizer, and the number of resistors in each row is determined by the number of rooted spectrum signals from the rooters.
- the establishment of these quantities fixes both the resistance values and the positions of the switches in network 51. For a particular application in which these quantities have been established, the switches may be replaced by appropriate permarient connections, and network 51 need contain only those polarity inverters actually required by negative cosine values in Equation 9.
- the autocorrelation function signals @(0), p'('r1), p(fr2) p'(rp) produced by network S1 constitute a set of control signals from which artificial speech may be reconstructed by a synthesizer of the type shown in FIG. 4 of this application.
- Artificial speech reconstructed from these autocorrelation control signals has an amplitude spectrum of approximately the same shape as that of the original speech wave, thus faithfully reproducing the original speech sounds.
- Equation 3 Phase Transformation Network It is observed from Equation 3 that the output signals of network 51 represent an autocorrelation function with a zero phase spectrum. If desired, a function with a phase configuration other than zero may be obtained by the following transformation of the rooted spectrum Values given by Equation 7:
- phase transformationnetwork 61 of FIG. 6 there is shown apparatus for deriving from the rooted spectrum signals produced, for example, by rooters H51, H52 Hq, of FIG. 5A, a group of control signals having any desired phase spectrum in accordance with Equation 10.
- each row of resistors corresponds to a particular series in Equation 10, and the individual resistance values of the resistors in each row are proportional to the absolute values of the individual cosine factors in the corresponding series. The position in which the switch connecting a given resistor to its common conductor is placed depends upon whether the corresponding cosine factor is positive, negative or zero.
- the switch For a positive cosine value, the switch connects the particular resistor directly to its common output point; for a negative cosine value, the switch connects the resistor to its common output point through a polarity inverter, an" of any conventional design; and for a zero cosine value, the switch is placed in the open position.
- the rooted spectrum signals from rooters H51, H52 H5q, of FIG. 5A are applied to input points I1, I2" Iq", respectively, of network 61.
- the signal formed at each common output point of network 61 is a linear combination of the rooted spectrum signals passed through the resistors in each row, and each linear combination is proportional to a particular series of Equation l0.
- the output signals p(1 p) p"(0), 50"(7-1) du” (rp) of network 61 constitute a set of control signals from which speech may be reconstructed by a synthesizer of the type shown in FIG. 7 of this invention.
- speech is reproduced from the output signals of network 61 of FIG. 6 by applying the control signals p"('r p) p('r 1), zp"(0), p(f1) (1p), to the control terminals of a bank of modulators Lp L1, L0, l1 Ip, respectively.
- the incoming control signals adjust the amplitude of an excitation signal supplied to the modulators from excitation generator 791.
- the excitation signal is derived by generator 791 from the incoming supplementary signal,
- Generator 791 is identical in construction and operation to generator 491 of FIG. 4, and supplies the modulators with an excitation signal that is closely correlated with the voiced-unvoiced and fundamental frequency characteristics of the original speech wave.
- the amplitude-adjusted output signals of the modulators are passed to tapped delay line 741, which is terminated in a matched impedance 721 and is similar in construction to delay line 441 of FIG. 4, via taps Sp S1, S0, s1 sp.
- the reconstructed signals appearing at the output terminal of delay line 741 represent samples of the periods of a nonsymmetrical correlation function having a nonzero phase spectrum. 'I'hese samples are converted into a continuous Wave by filter 742, which passes only those frequencies between 0 and 4,000 cycles per second. Audible speech is obtained from the output wave of filter 742 by conventional reproducer 743.
- l. l In a system for the narrow-band transmission of speech, the combination that comprises a source of a speech wave, means for correlating said speech wave with itself to obtain a constant number of control signals representative of portions of the autocorrelation function of said speech wave, means for reducing discontinuities in the speech autocorrelation function represented by said control signals by selectively reducing the magnitude of each of said Control signals, means for transmitting said reduced magnitude control signals to a receiver station, and, at said receiver station, means for reconstructing an artificial speech wave from said transmitted control signals.
- Apparatus as defined in claim 1 wherein said means for selectively reducing the magnitude of each 0f said control signals comprises a plurality of resistance elements in one-to-one correspondence with said control signals whose resistance values are proportional to a decreasing function with a nonnegative amplitude spectrum.
- a source of a speech wave means for correlating said speech wave with itself to obtain a fixed number of signals representative of portions of the autocorrelation function of said speech wave, means for selectively reducing the magnitude of each of said autocorrelation signals, means for unsquaring the amplitude spectrum of said reduced magnitude autocorrelation signals to produce a set of unsquared control signals, means for transmitting said unsquared control signals to a receiver station, and, at said receiver station, means for reconstructing an artificial speech wave from said unsquared control signals.
- Narrow-band speech transmission apparatus that comprises a source of a speech wave, means for correlating said speech wave with itself to obtain a fixed number of control signals representative of portions of the speech autocorrelation function, means for reducing discontinuities in the speech autocorrelation function represented by the control signals by reducing the magnitude of each of said control signals by a predetermined amount, means for transmitting said reduced magnitude control signals to a receiver station, and, at said receiver station, means for unsquaring the amplitude spectrum of said control signals, and means for reconstructing an artificial speech wave from said unsquared control signals.
- Apparatus as defined in claim 4 wherein said means for unsquaring the amplitude spectrum of said control signals comprises a first array of resistors arranged in
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
Description
CRBS-S REFERENC SRCH RGW Dec. 18, 1962 E. E. DAVID, JR
AUTOCORRELATION VOCODER vQ @Px /N VE N TOR BY E. E'. DA V/D, JR. Cladm.
ATTO/@NE Y Dec. 18, 1962 E l-z. DAVID, JR
AuTocoRRELATIoN vocoDER '7 Sheets-Sheet 2 Filed Aug. 9, 1960 Wild /Nl/E/Vro@ E. E. DAV/QJR.
BV l/ ATTO EV Dec. 18, 1962 E. E. DAVID, JR 3,069,507
AuTocoRRELATIoN vocoDER '7 Sheets-Sheet 3 Filed Aug. 9, 1960 FIG. 3B
NVENTOR E. E. 0A wo, JR. BV
ATTOPN Dec. 18, 1962 Filed Aug. 9. 1960 E. E. DAVID, JR
AUTOCORRELATION VOCODER '7 Sheets-Sheet 4 FIG. 4 e $42 '--O-P' `S`4l O M0 L40 s oELAY Moo. L L/,vE
P-I "44/ mf) W05" L /NcoM//vsT AurocoRRELA /oN 42 4 co/vrRoLs/GNALS IME) OQ .ll l i l l Ww l |M00I i i T Er/c SUPPLEMENTARY i sY/v H .b-v A SPEECH SIGNAL /4/ f 49] L4 4 L 44a our l EauAL- Exc/TA r/oN -lz/NG sla/VAL 442,., L-R E DELAY GE/v. 0-4000 cps F IG. 7
L "(7' 0) 0 $0 DELAY MTI) 1/ l l z I i l u "(T) 'r REcoA/sr/el/crEo i MOD. v SPEECH SUPsP/LG'-NTARY 7/ 79/ ria-1J 4 our A 2 L 74a s EQuAL- Exc/TA T/o/v 1 E E DIEZ-Lge Sigel-L 742 f" o-4000 cps /N VEN TOR E. E. 0A Y/Qum ATTOR/VE V Dec. 18, 1962 E. E. DAVID, JR
AUTOCORRELATION VOCODER R. \M y J E s m 0W N t T D R okmz N W M m h 1/ E Q u n V A A m 0 fw e mh Enom n Ggf la* l l l. u? wl. MW `w i@ -@bm\-*||| @@U W v. V' kvm i* kw w8. i E w8 b .w m8. B u @wie I I I l 1| Stoom n mqqv .5) n N s@ @m Vll. G Nw wou NQ wwmwwob s wwou w a u AN SES@ mQQw l S 5) f SU NG W :D at am s .p fmou @e @S m8 NwS Q |I. l l l W T v l l I l l l I l NN N 1 9. Gmm s @les l Qs .w l .1 F
'7 Sheets-Sheet 6 Dec. 18, 1962 E. E. DAVID, JR
AUTOCORRELATION v0001ER Filed Aug. 9, 1960 usz/ssHm/s ol Dec. 18, 1962 E. E. DAVID, JR
AuTocoRRELATIoN vocoDER 7 Sheets-Sheet 7 Filed Aug. 9, 1960 QR *we NQ N n w. w8
we. E w8 l "n S am u TNS QQ we nx ..Q Q n fwn/ron BV E. E. DA W0, JR..
w, i. l v
ATTORNEY United States Patent C 3,069,567 AUTCRRELATIN VCODER Edward E. David, Jr., Berkeley Heights, NJ., assignor to Bell Teiephone Laboratories, Incorporated, New York, NX., a corporation of New York Filed Aug. 9, 196i), Ser. No. 48,422 Claims. (Cl. 179-1555) This invention relates to the transmission of speech over narrow-band channels, and particularly to the narrow-band transmission of speech in terms of autocorrelation functions.
Among speech coding systems for the conservation of transmission channel bandwidth, one of the best known is the channel vocoder described in H. W. Dudley Patent 2,151,091, issued March 21, 1939. At the transmitter terminal of the Dudley vocoder, the speech amplitude spectrum is divided into frequency bands by a bank of band-pass filters, and the energy contained within each band is represented by a narrow-band control signal. After transmission over a reduced bandwidth channel to a receiver station, the control signals adjust the energy of selected frequency bands of an excitation spectrum generated at the synthesizer. The energy-adjusted frequency lbands are then combined to produce synthetic speech.
Synthetic speech produced by the Dudley vocoder is distorted by the inherent limitations of the band-pass -filters that it employs. For ideal reproduction of speech, the speech amplitude spectrum should be represented by a group of points; as practiced by the Dudley vocoder, however, the finite widths of the band-pass filters produce points that are in fact averages of many points Within the frequency bands passed by the filters.
It is a specific object of this invention to reduce distortion and to eliminate the need for bandpass filters by transmitting speech in terms of nearly ideal points on its autocorrelation function.
In this invention, an incoming speech wave is correlated with itself at an analyzer terminal to obtain a number of samples of each period of the speech autocorrelation function. The bandwidth required to transmit these samples is much smaller than that required to transmit the original speech wave, since the speech autocorrelation function changes very little from one period to the next. The samples are transmitted over a narrow-band channel to a synthesizer, where they are used in the reconstruction of artificial speech by serving as control signals to adjust the amplitude of an excitation signal generated at the synthesizer.
The intelligibility of artificial speech reconstructed from autocorrelation control signals is impaired by the squaring of the speech -amplitude `spectrum inherent in the autocorrelation function representation of speech. Squaring the speech amplitude spectrum changes its shape, thereby altering the characteristics of most sounds.
It is a specific object of the present invention to improve the intelligibility of speech reconstructed from autocorrelation control signals by performing a square-roottaking operation upon the amplitude spectrum of the speech autocorrelation function samples.
The amplitude spectrum of the autocorrelation samples is obtained by passing the samples through a first network of resistance elements, which performs a Fourier transformation upon the autocorrelation samples. The spectrum signals produced by this first network are then rooted by a group of square-root-taking circuits, whose rooted spectrum output signals represent an amplitude spectrum of the same shape as the amplitude spectrum of the original speech wave. Artificial -speech reproduced from the rooted spectrum signals is therefore free of the 3,069,507 Patented Dec. 18, 1962 distortion due to amplitude spectrum squaring. In order to reproduce artificial speech from the rooted spectrum signals, they are converted into autocorrelation samples by a second Fourier transformation network also composed of resistance elements.
One o-f the properties of autocorrelation Ifunctions generally is an amplitude spectrum whose phase angles are all zero; hence speech reconstructed from autocorrelation signals will also have `an amplitude spectrum with zero phase angles. Phase angles other than zero, hol ever, may be desired in the amplitude spectrum of high quality artificial speech.
Accordingly, it is a specific object of this invention to yreconstruct artificial speech having -any desired phase spectrum.
ln order to produce artificial speech with a given phase spectrum, the second Fourier transformation network in the square-root-taking apparatus referred to above is replaced by a phase transformation network that converts the rooted spectrum signals into correlation signals having a predetermined phase spectrum. The operation of the phase transformation network is based upon a Fourier transformation in which the desired phase angles have been inserted.
The symmetry of the speech autocorrelation function about the center of each period permits the function to be reproduced from samples of half of each period, thereby achieving `a further reduction in transmission channel bandwidth. Variations in the period of the autocorrelation function, however, require variations in the number of half-period samples in order to reproduce the speech autocorrelation function exactly. To vary the number of half-period samples in synchrony with variations in the period of the autocorrelation function requires complex apparatus; for example, see the copending patent application of E. E. David, Ir., and J. gR. Pierce filed this date, Serial No. 48,423.
Accordingly, it is a specific object of this invention to produce good quality artificial speech by reconstructing a symmetrical speech wave from a fixed number of autocorrelation samples, regardless of variations in the period of the speech autocorrelation function. p
In this invention, a fixed number of samples of the speech autocorrelation function is obtained at the analyzer terminal, the number of samplesbeing kept small, consistent with good quality speech, in order to conserve transmission channel bandwidth. The samples transmitted to the synthesizer terminal represent a portion of each autocorrelation period, and by deriving from each sample two equal amplitude, symmetrically located points on each reconstructed period, an artificial speech wave whose symmetry approximates the symmetry of the original speech autocorrelation function is reconstructed.
Because of variations in the period of the autocorrelation function, it is not possible to represent accurately each half period of the autocorrelation function with a fixed number of samples. A period that is either too long or too short with respect to the interval spanned by the fixed number of samples is truncated at some point other than the end of the period. The fixed number of samples thus represents truncated portions of each period of the autocorrelation function, and the points of truncation appear as abrupt discontinuities in artificial periods reconstructed from the samples, causing distortion in the artificial speech.
It is a further object of the present invention to minimize distortion in speech reconstructed from samples of truncated autocorrelation periods.
By passing the autocorrelation samples through a weighting network, the magnitudes of the individual samples are reduced by various predetermined amounts,
thereby reducing the discontinuities and the associated distortion in the artificial speech. The amount by which the magnitude of each sample is reduced depends upon the particular weighting function that is chosen. An optimum weighting function minimizes distortion with a. negligible loss in vocal characteristics. Since the previously mentioned amplitude unsquaring process is to be performed upon the samples after weighting, the weighting function must possess a real and nonnegative amplitude spectrum in order to assure physically realizable results from the square-root-taking process.
'Ihe invention will be fully understood from the following detailed description of preferred embodiments thereof taken in connection with the appended drawings, in which:
FIGS. 1A, 1B, and 1C are a group of waveform diagrams of assistance in explaining this invention;
FIG. 1D is a schematic block diagram showing a cornplete speech transmission system based upon the principles of this invention;
FIG. 2 is a block schematic diagram showing apparatus for coding speech in terms of a fixed number of autocorrelation control signals;
FIGS. 3A, 3B, 3C, 3D, 3E, 3F, and 3G are a group of waveforms of assistance in explaining the operation of the apparatus of FIG. 2;
FIG. 4 is a block schematic diagram showing apparatus for reconstructing artificial speech from autocorrelation control signals;
FIG. 5A is a schematic block diagram showing apparatus for unsquaring the amplitude spectrum of the autocorrelation control signals;
FIG. 5B is a schematic block diagram showing apparatus for converting rooted amplitude spectrum signals into their autocorrelation function counterparts;
FIG. 6 is a schematic block diagram showing apparatus for generating autocorrelation control signals having any desired phase spectrum from rooted amplitude spectrum signals; and
FIG. 7 is a schematic block diagram showing apparatus for reconstructing artificial speech having any desired phase spectrum from phase-transformed autocorrelation control signals.
Mathematical Foundations A speech wave g(t) with period T may be expanded in a Fourier series where the coeicients G(fn) constitute the amplitude spectrum of g(t) and the phase angles tbn constitute the phase spectrum.
The autocorrelation function of lg(t) is deiined as the average a(- OTgcrge-od 2) where p(f) has the same period as g(t) and 1- represents the amount of time by which g( t) is delayed before being multiplied together with the undelayed speech wave.
From Wieners theorem, the autocorrelation function may also be expanded in a Fourier series,
that is, the amplitude spectrum of p(1) is the square of the amplitude spectrum of g(t). Further, it is noted in a comparison of Equations 1 and 3 that all of the phase angles of the original speech wave have become zero in the speech autocorrelation function.
Referring now to FIG. 1A, there is shown the amplitude spectrum of a typical voiced sound. FIG. 1B, on the other hand, shows the amplitude spectrum of the autocorrelation function corresponding to the speech wave of FIG. 1A. It is observed in a comparison of the curves that squaring the amplitude spectrum, as given in Equation 3, doubles the differences between peaks of the amplitude spectrum, thereby suppressing the relatively small peaks and changing the characteristics of the sound.
FIG. 1C shows several periods of a typical speech autocorrelation function, where it is noted that the autocorrelation function is symmetrical about the center of each period.
The amplitude spectrum of the autocorrelation function may also be expanded in a Fourier series where, from the symmetry of the autocorelation function about the center of each period, the amplitude spectrum may be expressed in terms of autocorrelation function samples over either of the half periods T T -ST-NS 01' OSTNS Hence Equation 5 may be rewritten Complete Speech Transmission System' Referring rst to FIG. 1D, an incoming speech wave from source '80 is applied to speech autocorrelation function analyzer 81, which derives samples of the speech autocorrelation function from the speech wave. The details of source 80, analyzer 81, and the other elements of FIG. 1D are described below. The samples are passed through weighting network 82 in order to reduce the magnitudes of discontinuities in the sampled autocorrelation function.
The amplitude spectrum of the weighted samples from network 82 is unsquared by Fourier transformation network `83: and square-root-taking circuits 34. The rooted spectrum signals from circuits S4 are converted into autocorrelation function samples with zero phase angles by Fourier transformation network 85, and into correlation function samples having predetermined nonzero phase angles by phase transformation network l86. Switch 87 may be manually set to pass the output signals of either network S5 or network 86 to `a speech synthesizer 88, depending upon which set of signals is desired for a particular application of this invention. A speech wave is reconstructed from the output signals of either network 85 or network 86 by speech synthesizer 88, and artificial speech is reproduced from the reconstructed speech wave by reproducer 89.
Analyzer `81 is located at a transmitter terminal, while speech synthesizer 88 `and phase transformation network 86 are located at a receiver terminal. Elements '82, 83, S4, and 85, however, may be located at either the transmitter terminal or the receiver terminal.
Analyzer Referring now to the analyzer apparatus of FIG. 2, an incoming speech wave g(t) from source 20, for example, a transducer of any suitable variety, is passed through low-pass lter 210. Filter 210 is proportioned to pass only those frequencies in the band from 0 to W cycles per second, where W may be chosen to be 4,000.
The band-limited speech wave output of filter 210 is applied simultaneously to a tapped delay line 231, for example, a tapped acoustic delay line, and to a bank of multipliers, for example, modulators M1, M2 Mp, each of which is provided with two input terminals and one output terminal. Delay line 231, which is terminated in a matched impedance 211 to prevent reflection, is provided With taps P1, P2 Pp, at which appear signals proportional to the speech wave at various delay times, g(-Tl), g(-T2) g(-Tp), Where T1, T2 Tp are the various delay times corresponding to taps P1, P2 Pp, respectively.
Modulators M1, M2 Mp, in addition to receiving the undelayed speech wave at one of their input terminals, have their second input terminals connected to delay line taps P1, P2 Pp, respectively, to receive the variously delayed speech wave `as a second input signal. The modulators develop at their output terminals signals proportional to the products g( t) g(t-T1), g(t)-g(t-T2) g(t)g(t-rp), which are passed to a bank of averaging devices, for example, low-pass filters F1, F2 Fp, each having7 a cutoff of 25 cycles per second. Filters F1, F2 Fp develop at their output points signals proportional to averages of the product signals received from modulators M1, M2 Mp. From Equation 2, these signals are proportional to samples of the autocorrelation function at specific delay times, am), am) MTP).
A signal proportional to go(()) is obtained by applying the undelayed speech Wave to both input terminals of modulator M0, and by connecting the output terminal of M to low-pass filter F0.
Weighting Network The number of taps with which delay line 23d is provided, and the number of associated modulators and filters, determine the number of autocorrelation samples appearing at the output points of filters F1, F2 Fp. From the previously noted symmetry of the speech autocorrelation function, transmission channel bandwidth is conserved by sampling half periods of the autocorrelation function, the symmetry of each period being restored at the synthesizer from the transmitted samples; for example, a 3 millisecond period need only be sampled over the delay interval 0 to 11/2 milliseconds.
Variations in the period of the autocorrelation function, however, prevent accurate sampling of each half period with a fixed number of taps, modulators, and filters. A period that is either too short, FIG. 3A, or too long, FIG. 3B, with respect to the sampling interval (rp-T1), is truncated at the last sampling point Tp, and symmetrical periods synthesized from samples of portions of truncated periods contain abrupt discontinuities at the points of truncation, as shown in FIGS. 3C and 3D, respectively. These abrupt discontinuties in the synthesized periods produce distortion in the artificial speech.
In order to reduce distortion due to discontinuities, the samples obtained Iby the apparatus of FIG. 2 are passed through a Weighting network N that reduces the magnitude of each sample, thereby reducing the discontinuities and the associated distortion in the artificial speech.
As shown in FIG. 2, the weighting network N is located at the analyzer terminal, Ibut, if desired, it may be located at the synthesizer terminal. Weighting network N consists of a group of resistors R0, R1, R2 Rp, one for each autocorrelation sample, connected to the output terminals of filters F0, F1, F2 Fp, respectively. The resistance values of the elements of circuit N are determined by the particular weighting function selected on the basis of the following criteria: (a) discontinuities in periods reconstructed 'from the weighted samples must be very small, consistent with the preservation of important speech characteristics; and (b) the amplitude spectrum of the Weighted samples must be real and nonnegative. Many suitable weighting functions satisfying these criteria are available, -for example, the class of decreasing autocorrelation functions, the decreasing property satisfying (a), and the autocorrelation property satisfying (b). One such function is shown graphically in FIG. 3E.
The requirement that the weighting function have a real and nonnegative amplitude spectrum deserves further explanation. As will be explained below in connection with the description of the apparatus of FIG. 5, la square-roottaking operation is performed upon the amplitude spectrum of the weighted autocorrelation samples, thus requiring the spectral values to be real and nonnegative in order to obtain physically realizable results. Passing the autocorrelation samples through the weighting network is equivalent to multiplication in the time domain and convolution in the frequency domain. For example, if w(r) is the weighting function and W( fp) is its amplitude spectrum, then the weighted autocorrelation function is p(T) w(f), and the amplitude spectrum of the weighted autocorrelation function is o Where denotes convolution. From Equation 4, I (fp) is real and nonnegative, hence for I W(fp) to be real and nonnegative, it is sufficient that Vl/(fp) be real and nonnegative.
FIG. 3E illustrates an example of a suitable Weighting function w(1), which decreases in value with increasing delay time. By choosing the values of resistors R0, R1, R2 Rp to correspond to the values of the weighting function at the same delay times as those of the incoming autocerrelation samples, the signals developed at the output terminals of the resistors are weighted samples Artificial symmetrical periods reconstructed from a fixed number of samples after weighting are shown in FIGS. 3F and 3G, corresponding to the original periods shown in FIGS. 3A and 3B, respectively. The smoothness with which these periods begin and end is to be compared with the abrupt discontinuities of artificial periods reconstructed from the same samples without weighting, as illustrated in FIGS. 3C and 3D.
The Weighted autocorrelation samples appearing at the output terminals of the weighting network N in FIG. 2 constitute a set of control signals from which artificial speech may be synthesized. As shown in FIG. 1C, the speech autocorrelation function changes very little from period to period, hence the variation of the control signals is very small. As a result, the control signals individually occupy relatively narrowfrequency bands, on the order of 25 cycles per second, and the entire group of control signals may be transmitted over a much narrower frequency band than is required for transmission of the original speech Wave. The control signals may be transmitted from the analyzer terminal to the synthesizer terminal by means of any well-known transmission medium to meet the requirements of the particular application of this invention.
Supplementary Signal The weighted autocorrelation control signals must be supplemented with a signal whose characteristics indicate whether the instantaneous speech sound is voiced or unvoiced, an-d if voiced, its lfundamental pitch frequency. At the synthesizer an excitation signal is derived from the supplementary signal, the excitation signal characteristics being closely correlated with the characteristics of the supplementary signal. Artificial speech reconstructed from the excitation signal under the control lof `the autocorrelation samples thus preserves faithfully the characteristics `of the original speech.
As shown in FIG. 2, the supplementary signal is derived by passing the speech wave output of source 2t) through band-pass filter 2M. Filter 214 is proportioned to -pass the frequency band from l00= to 350 cycles per second, spanning the range of fundamental pitch yfrequencies of typical human talkers. This subband of frequencies is then transmitted as a supplementary signal to the synthesizer. The supplementary signal may also be a conventional voiced-unvoiced pitch signal, if desired.
Synthesizer Artificial speech is reproduced from the transmitted autocorrelation control signals and supplementary signal at a synthesizer, a preferred embodiment of which is shown in FIG 4. The synthesizer reconstructs an artificial speech wave with symmetrical periods by using the supplementary signal to generate an excitation signal and by using the autocorrelation samples as control signals to form from the excitation signal symmetrically located samples of an artificial period. The symmetry of the artificial periods yreconstructed in this Ifashion approximates the symmetry of the original autocorrelation periods.
Referring now to the apparatus of FIG. 4, each of the incoming control signals is applied to the control terminal of a modulator whose output terminal is connected to delay line 441. Delay line 441 is terminated in a matched impedance 421 to prevent reflection and is provided with a number of taps disposed in symmetrically located pairs disposed about center tap S40. The output terminal of modulator L40 is connected to center tap S40, and the output terminal of each of the other modulators L41, L42 L41, is connected to two taps S41, S41, S40, S42 84p, s4p, respectively, disposed at equal intervals about center tap S40.
The control signals (0), p(v1), 10(7-2) @(rp) are applied to the control terminals of modulators L40, L41, L42 L40, respectively, and adjust the amplitude of an excitation signal supplied in parallel to the input terminals of the modulators from excitation signal generator 491. Generator 491 derives the excitation signal from the incoming supplementary signal, which is first passed through an equalizing delay 41 to synchronize the supplementary signal with the control signals. Excitation signal generator 491, which is fully described in a patent application of M. R. Schroeder, Serial No. 812,028, filed May 8, 1959, operates to provide the modulators with an excitation signal that is closely correlated with the voicedunvoiced and fundamental frequency characteristics of the original speech Wave as conveyed by the supplementary signal. If a conventional voiced-unvoiced pitch signal is employed as a supplementary signal, excitation signal generator 491 may be the usual buzz-hiss source.
At the modulators, the incoming control signals adjust the amplitude of the excitation signal from generator 491, and the amplitude-adjusted excitation signals derived by the modulators are passed to delay line 441 via the various tap connections. The output signals of the modulators reappear at the output terminal of delay line 441 as samples of symmetrical periods of an artificial wave. These samples are smoothed to form an artificial wave by filter 442, proportioned to eliminate all frequencies greater than 4,000 cycles per second. The electrical wave formed by filter 442 is converted into audible and intelligible speech by conventional reproducer 443 connected to the output terminal of filter 442.
Amplitude Spectrum Unsqutzrzng Network The intelligibility of artificial speech synthesized directly from the weighted autocorrelation control signals produced by the analyzer of FIG. 2 is impaired by the amplitude spectrum squaring `inherent in the autocorrelation function representation, as shown in Equation 6a and in FIGS. 1A and 1B. To improve quality and intelligibility, the shape of the amplitude spectrum of the artificial speech is made to resemble closely the shape of the amplitude spectrum of the original speech wave by subjecting the amplitude spectrum of the weighted autocorrelation control signals to a square-root-taking operation prior to synthesizing the artificial speech. From Equation 6a, taking the square root of the amplitude spectrum of the weighted autocorrelation samples produces from which it is seen that both and G( fn) have approximately the same shape.
Apparatus for unsquaring the amplitude spectrum of the weighted autocorrelation control signals is shown in FIG. 5A. Ihe apparatus of FIG. 5A may be located in its entirety at either the analyzer or the synthesizer, or, if desired, the component parts may be conveniently divided between the two stations.
Recalling the -Fourier transformation of Equation 5, the amplitude spectrum of weighted autocorrelation control signals is represented by the following group of series:
From the above equation, it is seen that in order to obtain signals proportional to specific values of the amplitude spectrum of the weighted control signals, it is necessary to multiply the various control signals by selected values of the cosine function and to add the resulting products.
The exact configuration of network 50` depends upon two factors: the frequency resolution or number of amplitude spectrum values desired in the reconstructed speech wave; and the number of autocorrelation samples obtained at the analyzer. As given by Equation 8, the frequency resolution determines the number of rows of resistors, q, and the number of autocorrelation samples determines the number of resistors in each row, p. Once these two factors have been established, the resistance values and the positions of the switches are fixed by Equation 8. For a particular application of this invention, the switches may be replaced by appropriate permanent connections, and only those polarity inverters required by the particular negative cosine values of Equation 8 need be employed.
The incoming weighted autocorrelation sv(r1)'w(r1),(p(r2)w(r2) otfplwop) are applied to the input points I1, I2 Ip, respectively, of network 50, and the sample p(f)w(0) is applied in parallel to the second input terminals of adders B51, B52 Bq. The signal developed at the output terminal of each adder is a linear combination of the weighted autocorrelation samples, which, in accordance with Equation 8, is proportional to a particular amplitude spectrum value.
The output signals I W(f1), I W(f2) CDWUQ) of network 50 are applied to a bank of rooters H51, H52 H5q, -of any suitable variety. Each rooter develops at its output point a signal whose magnitude is proportional to the square root of the magnitude of the signal applied to its input point, that is,
From Equation 7, these signals represent an amplitude spectrum of approximately the same shape as the amplitude spectrum of the original speech wave; hence, artificial speech reconstructed -fro-m either the rooted spectrum signals or their autocorrelation counterparts is substantially free of the distortion caused by amplitude spectrum squaring.
In order to reconstruct speech in a synthesizer of the type shown in FIG. 4 of this application, the rooted spectrum signals must be converted into their autocorrelation function counterparts. Apparatus for performing this conversion is based upon Equation 3, and a preferred embodiment is shown in FIG. 5B.
Referring now to the inverse Fourier transformation of Equation 3, the autocorrelation function corresponding to the rooted amplitude spectrum values produced by the rooters of FIG. 5A is given by the following group of series:
samples a(1p)= 1 \f(f1) cos 2arflrp+ v(f2) cos 21rf2rD-I- +qvl'(fu) cos 27|'fq7n (9) The operation of network 51 shown in FIG. 5B is based upon the above equation, and is similar in structure to network 50 of FIG. 5A. Network 51 consists of an array of p.q resistors rij, =l,2 p; j=1,2 q, arranged in p rows and q columns the input terminals of the resistors in each column being connected to a common input point Ij and the output terminals of the resistors in each row being connected to a common output point O', through manually adjusted switches cij. Each row of resistors corresponds to a particular series in Equation 9, and the resistance values of the individual resistors in each row are proportional to the absolute values of the cosine factors appearing in the corresponding series.
The position in which the switch connecting the output terminal of each resistor to its common output point is placed depends upon whether the corresponding cosine factor is positive, negative or zero. For a positive cosine value, the switch connects the corresponding resistor directly to its common output point; for a negative value, the switch connects the corresponding resistor to its common output point through polarity inverter aij, =l,2 p, j=1,2 q; and for a zero value, the switch is placed in the open position.
The rooted spectrum signals from the footers of FIG. 5A are applied to the input terminals of network 51, thereby developing at each common output point a linear combination of signals proportional to a particular series of Equation 9.
A signal proportional to p(0) is obtained by applying each of the rooted spectruml signals to an input terminal of adder 53; in accordance with Equation 9, the linear combination of rooted spectrum signals formed at the output terminal 00 of adder 53 is proportional to (0).
The configuration of network 51 is determined by two quantities: 'the number of amplitude spectrum values produced by network 5i); and the number of autocorrelation samples to be supplied to the synthesizer. As given by Equation 9, the number of rows of resistors, p, is determined by the number of autocorrelation samples to be supplied to the synthesizer, and the number of resistors in each row is determined by the number of rooted spectrum signals from the rooters. The establishment of these quantities fixes both the resistance values and the positions of the switches in network 51. For a particular application in which these quantities have been established, the switches may be replaced by appropriate permarient connections, and network 51 need contain only those polarity inverters actually required by negative cosine values in Equation 9.
The autocorrelation function signals @(0), p'('r1), p(fr2) p'(rp) produced by network S1 constitute a set of control signals from which artificial speech may be reconstructed by a synthesizer of the type shown in FIG. 4 of this application. Artificial speech reconstructed from these autocorrelation control signals has an amplitude spectrum of approximately the same shape as that of the original speech wave, thus faithfully reproducing the original speech sounds.
Phase Transformation Network It is observed from Equation 3 that the output signals of network 51 represent an autocorrelation function with a zero phase spectrum. If desired, a function with a phase configuration other than zero may be obtained by the following transformation of the rooted spectrum Values given by Equation 7:
where rbi, i=l,2 q are the desired phase angles.
It is noted in Equation that samples of both halves of each period of the phase-transformed autocorrelation function must be computed, since for arbitrary phase angles other than 0 or 180, the phase-transformed function is not symmetrical about the center of each period. Computing samples of the phase-transformed function is therefore performed at the synthesizer after transmission, in order to maintain the saving in transmission channel bandwidth effected by sampling portions of half periods of the original speech autocorrelation function at the analyzer terminal.
Referring now to phase transformationnetwork 61 of FIG. 6, there is shown apparatus for deriving from the rooted spectrum signals produced, for example, by rooters H51, H52 Hq, of FIG. 5A, a group of control signals having any desired phase spectrum in accordance with Equation 10. Network 61 consists of an array of resistors rij, i=p 2,-1,\0,1,2 p, j=l,2 q, arranged in 2p-|1 rows and q columns,
the input terminals of the resistors in the jth column being connected to a common input point Ij", and the output terminals of the resistors in the ith row being connected to a common output point Oi" through switches 01j. Each row of resistors corresponds to a particular series in Equation 10, and the individual resistance values of the resistors in each row are proportional to the absolute values of the individual cosine factors in the corresponding series. The position in which the switch connecting a given resistor to its common conductor is placed depends upon whether the corresponding cosine factor is positive, negative or zero. For a positive cosine value, the switch connects the particular resistor directly to its common output point; for a negative cosine value, the switch connects the resistor to its common output point through a polarity inverter, an" of any conventional design; and for a zero cosine value, the switch is placed in the open position.
The rooted spectrum signals from rooters H51, H52 H5q, of FIG. 5A are applied to input points I1, I2" Iq", respectively, of network 61. The signal formed at each common output point of network 61 is a linear combination of the rooted spectrum signals passed through the resistors in each row, and each linear combination is proportional to a particular series of Equation l0.
The output signals p(1 p) p"(0), 50"(7-1) du" (rp) of network 61 constitute a set of control signals from which speech may be reconstructed by a synthesizer of the type shown in FIG. 7 of this invention.
Artificial speech reconstructed `from these `signals has an amplitude spectrum of approximately the same shape as that of the original speech wave and a phase spectrum of any desired configuration.
Referring now to FIG. 7, speech is reproduced from the output signals of network 61 of FIG. 6 by applying the control signals p"('r p) p('r 1), zp"(0), p(f1) (1p), to the control terminals of a bank of modulators Lp L1, L0, l1 Ip, respectively. The incoming control signals adjust the amplitude of an excitation signal supplied to the modulators from excitation generator 791. The excitation signal is derived by generator 791 from the incoming supplementary signal,
which is first delayed by equalizing delay 71 to synchronize the supplementary signal with the control signals. Generator 791 is identical in construction and operation to generator 491 of FIG. 4, and supplies the modulators with an excitation signal that is closely correlated with the voiced-unvoiced and fundamental frequency characteristics of the original speech wave.
The amplitude-adjusted output signals of the modulators are passed to tapped delay line 741, which is terminated in a matched impedance 721 and is similar in construction to delay line 441 of FIG. 4, via taps Sp S1, S0, s1 sp. The reconstructed signals appearing at the output terminal of delay line 741 represent samples of the periods of a nonsymmetrical correlation function having a nonzero phase spectrum. 'I'hese samples are converted into a continuous Wave by filter 742, which passes only those frequencies between 0 and 4,000 cycles per second. Audible speech is obtained from the output wave of filter 742 by conventional reproducer 743.
It is to be understood that the above-described arrangements are merely illustrative of applications of the principles of the invention. Numerous other arrangements may be devised by those skilled in the art without departing from the spirit and scope of the invention.
What is claimed is:
l. lIn a system for the narrow-band transmission of speech, the combination that comprises a source of a speech wave, means for correlating said speech wave with itself to obtain a constant number of control signals representative of portions of the autocorrelation function of said speech wave, means for reducing discontinuities in the speech autocorrelation function represented by said control signals by selectively reducing the magnitude of each of said Control signals, means for transmitting said reduced magnitude control signals to a receiver station, and, at said receiver station, means for reconstructing an artificial speech wave from said transmitted control signals.
2. Apparatus as defined in claim 1 wherein said means for selectively reducing the magnitude of each 0f said control signals comprises a plurality of resistance elements in one-to-one correspondence with said control signals whose resistance values are proportional to a decreasing function with a nonnegative amplitude spectrum.
3. In a system for the narrow-band transmission of speech, the combination that comprises a source of a speech wave, means for correlating said speech wave with itself to obtain a fixed number of signals representative of portions of the autocorrelation function of said speech wave, means for selectively reducing the magnitude of each of said autocorrelation signals, means for unsquaring the amplitude spectrum of said reduced magnitude autocorrelation signals to produce a set of unsquared control signals, means for transmitting said unsquared control signals to a receiver station, and, at said receiver station, means for reconstructing an artificial speech wave from said unsquared control signals.
4. Narrow-band speech transmission apparatus that comprises a source of a speech wave, means for correlating said speech wave with itself to obtain a fixed number of control signals representative of portions of the speech autocorrelation function, means for reducing discontinuities in the speech autocorrelation function represented by the control signals by reducing the magnitude of each of said control signals by a predetermined amount, means for transmitting said reduced magnitude control signals to a receiver station, and, at said receiver station, means for unsquaring the amplitude spectrum of said control signals, and means for reconstructing an artificial speech wave from said unsquared control signals.
5. Apparatus as defined in claim 4 wherein said means for unsquaring the amplitude spectrum of said control signals comprises a first array of resistors arranged in
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US48422A US3069507A (en) | 1960-08-09 | 1960-08-09 | Autocorrelation vocoder |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US48422A US3069507A (en) | 1960-08-09 | 1960-08-09 | Autocorrelation vocoder |
Publications (1)
Publication Number | Publication Date |
---|---|
US3069507A true US3069507A (en) | 1962-12-18 |
Family
ID=21954482
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US48422A Expired - Lifetime US3069507A (en) | 1960-08-09 | 1960-08-09 | Autocorrelation vocoder |
Country Status (1)
Country | Link |
---|---|
US (1) | US3069507A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3270188A (en) * | 1959-12-28 | 1966-08-30 | Gen Electric | Correlation system |
US3344349A (en) * | 1963-10-07 | 1967-09-26 | Bell Telephone Labor Inc | Apparatus for analyzing the spectra of complex waves |
US3359409A (en) * | 1963-04-09 | 1967-12-19 | Hugh L Dryden | Correlation function apparatus |
US3366783A (en) * | 1964-04-10 | 1968-01-30 | Pan American Petroleum Corp | Computing device for seismic signals |
US3400216A (en) * | 1964-01-31 | 1968-09-03 | Nat Res Dev | Speech recognition apparatus |
US3610831A (en) * | 1969-05-26 | 1971-10-05 | Listening Inc | Speech recognition apparatus |
US3662115A (en) * | 1970-02-07 | 1972-05-09 | Nippon Telegraph & Telephone | Audio response apparatus using partial autocorrelation techniques |
US3742146A (en) * | 1969-10-21 | 1973-06-26 | Nat Res Dev | Vowel recognition apparatus |
US3947638A (en) * | 1975-02-18 | 1976-03-30 | The United States Of America As Represented By The Secretary Of The Army | Pitch analyzer using log-tapped delay line |
US4038503A (en) * | 1975-12-29 | 1977-07-26 | Dialog Systems, Inc. | Speech recognition apparatus |
US6035271A (en) * | 1995-03-15 | 2000-03-07 | International Business Machines Corporation | Statistical methods and apparatus for pitch extraction in speech recognition, synthesis and regeneration |
WO2005006972A2 (en) | 2003-07-14 | 2005-01-27 | Welch Allyn, Inc. | Motion management in a fast blood pressure measurement device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2908761A (en) * | 1954-10-20 | 1959-10-13 | Bell Telephone Labor Inc | Voice pitch determination |
US2928901A (en) * | 1956-04-13 | 1960-03-15 | Bell Telephone Labor Inc | Transmission and reconstruction of artificial speech |
-
1960
- 1960-08-09 US US48422A patent/US3069507A/en not_active Expired - Lifetime
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2908761A (en) * | 1954-10-20 | 1959-10-13 | Bell Telephone Labor Inc | Voice pitch determination |
US2928901A (en) * | 1956-04-13 | 1960-03-15 | Bell Telephone Labor Inc | Transmission and reconstruction of artificial speech |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3270188A (en) * | 1959-12-28 | 1966-08-30 | Gen Electric | Correlation system |
US3359409A (en) * | 1963-04-09 | 1967-12-19 | Hugh L Dryden | Correlation function apparatus |
US3344349A (en) * | 1963-10-07 | 1967-09-26 | Bell Telephone Labor Inc | Apparatus for analyzing the spectra of complex waves |
US3400216A (en) * | 1964-01-31 | 1968-09-03 | Nat Res Dev | Speech recognition apparatus |
US3366783A (en) * | 1964-04-10 | 1968-01-30 | Pan American Petroleum Corp | Computing device for seismic signals |
US3610831A (en) * | 1969-05-26 | 1971-10-05 | Listening Inc | Speech recognition apparatus |
US3742146A (en) * | 1969-10-21 | 1973-06-26 | Nat Res Dev | Vowel recognition apparatus |
US3662115A (en) * | 1970-02-07 | 1972-05-09 | Nippon Telegraph & Telephone | Audio response apparatus using partial autocorrelation techniques |
US3947638A (en) * | 1975-02-18 | 1976-03-30 | The United States Of America As Represented By The Secretary Of The Army | Pitch analyzer using log-tapped delay line |
US4038503A (en) * | 1975-12-29 | 1977-07-26 | Dialog Systems, Inc. | Speech recognition apparatus |
US6035271A (en) * | 1995-03-15 | 2000-03-07 | International Business Machines Corporation | Statistical methods and apparatus for pitch extraction in speech recognition, synthesis and regeneration |
WO2005006972A2 (en) | 2003-07-14 | 2005-01-27 | Welch Allyn, Inc. | Motion management in a fast blood pressure measurement device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US3624302A (en) | Speech analysis and synthesis by the use of the linear prediction of a speech wave | |
US2705742A (en) | High speed continuous spectrum analysis | |
US3069507A (en) | Autocorrelation vocoder | |
US3982070A (en) | Phase vocoder speech synthesis system | |
US3995116A (en) | Emphasis controlled speech synthesizer | |
US2908761A (en) | Voice pitch determination | |
US3360610A (en) | Bandwidth compression utilizing magnitude and phase coded signals representative of the input signal | |
KR950013372B1 (en) | Voice coding device and its method | |
US4038495A (en) | Speech analyzer/synthesizer using recursive filters | |
US3566035A (en) | Real time cepstrum analyzer | |
CA1242279A (en) | Speech signal processor | |
US3344349A (en) | Apparatus for analyzing the spectra of complex waves | |
US5073938A (en) | Process for varying speech speed and device for implementing said process | |
US3071652A (en) | Time domain vocoder | |
US3102928A (en) | Vocoder excitation generator | |
US5475629A (en) | Waveform decoding apparatus | |
US3127476A (en) | david | |
US3109070A (en) | Pitch synchronous autocorrelation vocoder | |
US3431362A (en) | Voice-excited,bandwidth reduction system employing pitch frequency pulses generated by unencoded baseband signal | |
US4034160A (en) | System for the transmission of speech signals | |
US2928901A (en) | Transmission and reconstruction of artificial speech | |
US2672512A (en) | System for analyzing and synthesizing speech | |
US3394228A (en) | Apparatus for spectral scaling of speech | |
US3746791A (en) | Speech synthesizer utilizing white noise | |
US3139487A (en) | Bandwidth reduction system |