US3746791A - Speech synthesizer utilizing white noise - Google Patents

Speech synthesizer utilizing white noise Download PDF

Info

Publication number
US3746791A
US3746791A US00155988A US3746791DA US3746791A US 3746791 A US3746791 A US 3746791A US 00155988 A US00155988 A US 00155988A US 3746791D A US3746791D A US 3746791DA US 3746791 A US3746791 A US 3746791A
Authority
US
United States
Prior art keywords
orthogonal
speech
sample
functions
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US00155988A
Inventor
A Wolf
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Application granted granted Critical
Publication of US3746791A publication Critical patent/US3746791A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters

Definitions

  • Fieid l 5 55 R employs a gaussian white noise source, the output of which is fed through an orthogonal filter to produce a 179/ 1555 1 SA set of random orthogonal functions.
  • These orthogonal [56] References Cited functions when multiplied by a speech signal and temporally averaged, produce a set of slowly time-varying UNITED STATES PATENTS coefficients which can be used to reproduce a voice sig- 3,025,35O 3/1962 Lindner l79/l5 BC nal, This device is somewhat analogous to the use of 3,330,910 7/1967 Flanagan..... 3,204,034 8/1965 Ballard 179/1 SA generalized Fourier coefficients to define a pe l79/l5 BC function,
  • FIG. 4 ALFRED A. WOLF TEMPORAL AVERAGER ATTORNEY PATEMFEDJULWW 3.746.791
  • Vocoder Voice Coder
  • the Vocoder dates back to the l920s.
  • the standard Vocoder is a spectrum/channel vocoder. It consists of an analyzer which produces a signal proportional to the short term amplitude spectrum of the fundamental frequency of the speech input, and the synthesizer consists of devices that reconstruct speech by means of electrical signals appearing at the analyzer output. In both the analyzer as well as the synthesizer, signals are generated that are proportional to both the voice and unvoiced sounds and the pitch of the sounds.
  • the present invention capitalizes on the fact that speech is a stochastic process. Speech is stochastic because long samples of speech convey information which is probabilistic in time.
  • the present system employs a gaussian white noise source; the output of which is passed through an orthogonal filter to produce a set of random orthogonal functions which when multiplied by a speech signal and averaged produces coefficients which can be used to synthesize speech information at a later time. This method is somewhat analogous to the use of generalized Fourier coefficients to define and reproduce a periodic function.
  • FIG. 1 is a block diagram of the system for obtaining the speech extraction coefficients.
  • FIG. 2 is a block diagram of a linear orthogonal filter as a cascaded chain of linear filter sections.
  • FIG. 3 is a schematic drawing of an RC orthogonal filter.
  • FIG. 4 is a schematic drawing of the' temporal averager.
  • FIG. 5 is a block diagram of the speech reconstruction system.
  • This invention relates to a system and the corresponding subsystem devices for synthesizing and codifying speech from broadband white noise.
  • This invention makes it possible to transform a sample of speech of length, T into a set of speech coefficients designated by a a a,,, each of which depends on time according to the sample length, T, of the speech sample, which is explained further below.
  • These coefficients'a a a can be thought of as a set of generalized Fourier coefficients defined on a set of orthogonal noise sample functions obtained from a white noise source.
  • x(t) If the record of the speech sample, of length T is denoted by x(t), then by makinguse of the fact that x(t) is a sample function from a stochastic process, it can be decomposed into an infinite series of orthogonal sample functions taken from a white noise source. Each orthogonal sample function is weighted by a coefficient the value of which depends on the speech sample under consideration.
  • the set of coefficients a,,: n l, 2, 09 thus contains the necessary inforrna ion from which the original speech sample can be reconstructed by weighting the orthogonal sample functions l w, (t): n l, 2, (see FIG.
  • the invention makes possible the maximum compression of speech by using the coefficients a,,: n l, 2, to convey information. Using these coefficients, it is now possible for man to communicate with computers and to have them in turn communicate with man by means of speech.
  • This invention opens up the possibility of new unforeseen innovations in machines and systems in which there is a speech communication interface with man.
  • One such example in addition to the possibility of talking to a computer is the possibility of talking to a typewriter.
  • FIG. I is a block diagram of the system for obtaining the speech extraction coefficients:
  • the speech signal 2 denoted by the sample functions x(t) of sample length T is multiplied (instant by instant) by each orthogonal noise signal denoted respectively by the sample functions v,(t), v (t), v (t), v,,(t), which are derived as the set of outputs of an orthogonal filter 12, described below, when the white noise signal It), denoted by the sample function g(t) of spectral density of N watts per cycle, is applied to the orthogonal filters input.
  • the sample functions that result from the multiplication of the set of orthogonal noise sample functions v,(t), v,,(t) with the sample function, x(t), in the multipliers 4, 6, and 8, are the speech extractionsamples of length T seconds as denoted by v,,(t) x (t):n 1, 2,
  • This set of product sample functions v,(t) x (t), v (t) x (t), v (t) x (t), v,,(t) x (t) is averaged temporally in the temporal averagers l4, l6, and 18 to give rise to the corresponding set of coefficients a,, a a,,, where mathematically these averages are formally given by the equations:
  • IP f 3 1130) X (M) (3) and so forth to a %,f: 11,, 1 (0dr (4)
  • the integral over (0,T) of each of the product sample functions, v,,x divided by T is the temporal average of those sample functions.
  • the resplting speech extraction coefficients a,, (1,, a, are dependent only on the sample length, T, and are the coefficients that represent the essential information needed to reconstruct the speech signal, x( t).
  • a rule of thumb for the value of T, the averaging time is that about ten times the sample length, T of the speech sample.
  • the orthogonal filter is a linear filter with one pair of input terminals and many pairs of output terminals.
  • the filter which is described below in detail, may be roughly likened to an ideal prism on which white light is incident.
  • the incident light may be thought of as the input to the prism.
  • the output of an ideal prism is essentially the complementary colors in response to the incident white light upon it.
  • the complementary colors are of course pairwise orthogonal.
  • the orthogonal functions are random within a band of frequencies whereas in the prism case the complementary functions are in principle roughly single frequency sinusoids with random amplitudes and random phases. The latter is due to the randomness of the white light.
  • Linear orthogonal filters can be constructed as a chain of linear filter sections in which the poles of any section of the filter, in the complex plane, is essentially cancelled by the zeros of the next immediate section.
  • FIG. 2 shows this general scheme for constructing an orthogonal filter.
  • the numerator B(s) is selected to satisfy the equation and the denominator C,(s) is selected to satisfy the equation C,(s) 0. e s ...rlt .r" (8) where the constants b b b and the constants c c c are chosen such that H (s) is physically realizable.
  • the transfer function is designed such that the numerator C,(s) is exactly like the denominator of the first section, H,(s) with s replaced by (-s). This makes the zeros of H (s) H (s)H (s) occupy the same positions as the poles of H,(s) except that they are reflected about the jw-axis.
  • the RC orthogonal filter is shown in FIG. 3.
  • the first section of the filter includes an input resistor 34, feedback resistor 36 designated as R in equation 13, feedback capacitor 38, designated as C in equation 13, and operational amplifier 40.
  • the subsequent section of the filter includes inputs resistors 42, 44, designated as r and r in equation respectively, resistors 48, 50, 54, and 56; capacitors 46, 52; and differential amplifier 58.
  • Resistors 48 and 50 are designated as R, in equation 17, resistors 54 and 56 are designated as R, and R respectively in equation 16 and capacitors 46 and 52 are designated as C, in equation 17.
  • the transfer function of the first section is given by and for n 2 2 y-( ⁇ I it/ n -V n-1( (l2) where for design purposes the first pole is placed at s for the first section and for the subsequent tandem sections, the poles s s are placed according to the formulae and s HR, (2,, (I?) In the ordinary orthogonal filter, the poles s s s can be placed arbitrarily along the real axis. But for the speech synthesizer this procedure will fail to work.
  • the poles s s s are in such a way as to make the reconstructed signal converge to the desired speech signal.
  • the first pole is selected approximately equal to the bandwidth of the speech signal. Since speech signals can cover a bandwidth of the order of between 200 Hz and 2,500 Hz depending on the speaker, s, can be selected to be of this order of frequencies.
  • Step 1 the degree of B(s) in Equation (7) is determined and hence the degree of C,(s) is determined from physical realizability considerations, i.e., if k is the degree of b(s) and m is the degree of C (s) then m k+l.
  • the circuit is synthesized according to standard procedure noting that the placement of the poles must follow the requirements of convergence if for example a speech signal is to be reconstructed.
  • a. s selected equal to or greater than the bandwidth of the signal under consideration.
  • SIGNAL MULTIPLIERS The signal multipliers shown in FIG. 1 for multiplying the orthogonal noise components of the white noise source with the speech signal are standard devices. In the frequency range of interest, i.e., bandwidths up to 20 KHz, the quarter Gauss square method was used. Other types can easily be used instead.
  • the quarter Gauss square multiplier is one that makes use of the identity (A-i-BY- (A-B) 4 AB (26) TEMPORAL AVERAGER
  • the temporal averager is a simple R-C low pass filter with a very long time constant compared to the highest frequency component in the speech signal. Iff denotes this frequency, men
  • FIG. 4 gives an example of the configuration of the temporal averager.
  • the temporal averager consists of input resistor 60, feedback capacitor 62 and amplifier 64.
  • the Gaussian white noise source is a standard noise generator of the diode type.
  • Equation (27) is the partial sum of a generalized random orthogonal series which converges as n w to the speech sample x(t) in some probabilistic sense.
  • FIG. 5 is a block diagram of the speech reconstruc In FIG. 5, the white noise signal, k(t) of spectral density of l/N watts per cycle is'applied to an orthogonal filter which has a transfer function, H,,(s) identical to the orthogonal filter of the coding generator shown in FIG. 1.
  • the response of this orthogonal filter is the set of orthogonal noise sample functions w,(t), w (t), w (t), w, w,,(t). Each of these is multiplied respectively by the coefficients a,, a a, derived from the coding generator (FIG. 1).
  • the novelty of the invention consists in the utilization of orthogonal filters, white noise and temporal averaging devices connected in the unique arrangement shown in FIG. 1 that gives rise to the speech extraction coefficients, a a a
  • the speech extraction parameters are very narrow band signals. These signals are nearly constant for T T,,. This also means that the system can be used to greatly compress speech.
  • a system for synthesizing speech from white noise comprising:
  • said means for multiplying each said speech extraction coefficient with a respective orthogonal noise sample function includes:
  • said means for producing said set of speech extraction coefficients includes means for temporally averaging each product sample function to produce said set of speech coefficients and wherein said coefiicients at the outputs of said averaging means are described mathematically by the following equations,
  • T is the averaging time
  • x(t) is the speech signal sample function
  • V,(t) is the speech signal sample function
  • V,( t)... V,,(t) is the set of orthogonal noise sample functions.
  • said speech signal sample has a duration T and said averaging time T is at least equal to said speech sample time T 5.
  • said means for summing comprises an adder connected to the output of said means for multiplying said each coefficient with a respective sample function, to sum the outputs thereof in accordance with the following mathematical expression:
  • a W (t) represents the set of products at the output of said 5 means for multiplying.
  • an orthogonal filter coupled to a source of white noise for producing said orthogonal noise sample functions.
  • said source of white noise used to generate said speech coefficients has a spectral power density N watts per cycle and where said source of white noise applied to said orthogonal filter for recon structing said speech has a spectral power density l/N watts per cycle.
  • a method for synthesizing speech comprising the steps of:
  • said steps of producing said orthogonal sample functions include the step of connecting a source of white noise with a spectral power density of N watts per cycle to an orthogonal filter to produce said orthogonal sample functions for producing said speech extraction coefficient and;

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention relates to a system and corresponding subsystem devices for synthesizing and codifying speech from broadband white noise. The system employs a gaussian white noise source, the output of which is fed through an orthogonal filter to produce a set of random orthogonal functions. These orthogonal functions when multiplied by a speech signal and temporally averaged, produce a set of slowly time-varying coefficients which can be used to reproduce a voice signal. This device is somewhat analogous to the use of generalized Fourier coefficients to define a periodic function.

Description

United States Patent [1 1 [111 3,746,791
Wolf July 17, 1973 [5 1 SPEECH SYNTHESIZER UTILIZING WHITE 3,036,157 5/1962 Franco 179 15 BC NOlSE Primary Examiner-Kathleen H. Claffy Aftorney- R. S; Sciascia, P. E. Hodges et al. [22] Filed: June 23, 1971 211 Appl. No.: 155,988 [571 ABSTRACT The present invention relates to a system and corresponding subsystem devices for synthesizing and codiy g speech from broadband white noise. The system [58] Fieid l 5 55 R employs a gaussian white noise source, the output of which is fed through an orthogonal filter to produce a 179/ 1555 1 SA set of random orthogonal functions. These orthogonal [56] References Cited functions when multiplied by a speech signal and temporally averaged, produce a set of slowly time-varying UNITED STATES PATENTS coefficients which can be used to reproduce a voice sig- 3,025,35O 3/1962 Lindner l79/l5 BC nal, This device is somewhat analogous to the use of 3,330,910 7/1967 Flanagan..... 3,204,034 8/1965 Ballard 179/1 SA generalized Fourier coefficients to define a pe l79/l5 BC function,
3,488,445 1/1970 Chang; 179/15 BC 3,548,107 12/1970 Webb 179/15 BC 11 Claims, 5 Drawing Figures 2 2o "'lh) GAU "2(1) SSIAN km ORTHOGONAL WHITE NOISE SOURCE FILTER "'ktt) (SPECTRAL DENSITY I/N 14,5 7
24 Y MULTIPLIER 26 k MULTIPLIER SPEECH EXTRACTION 2e COEFFICIENTS 30 L MULTIPLLER RECONSTRUCTED SPEECH SIGNAL SPEECH RECONSTRUCTION SYSTEM PAIENIED 3.746.791
sum 2 or 4 INPUT H|(5) K TB%)L OUTPUTS i b m K2 CIFS) 2 IT 0 (5) H3) f C2( 5) K C3(s) l h (t) Hn (S) l cn l( s) Kn-l l h m FIG. 2
LINEAR ORTHOGONAL FILTERS AS A CASCADE CHAIN OF LINEAR FILTER SECTIONS INVENTOR ALFRED A. WOLF BY W ATTORNEY Pmmcu 3.746.791
SHEET 3 0f 4 gm -v,m
7 FIRST SECTIONS H,(s)
' 42 56 n-l 48 0- vv I V h) SUBSEQUENT SECTIONS RC ORTHOGONAL FILTERS 62 fiu 64 6O INPUT o-v{---4 OUTPUT INVENTOR FIG. 4 ALFRED A. WOLF TEMPORAL AVERAGER ATTORNEY PATEMFEDJULWW 3.746.791
SHEET 8 BF 4 22 20 f m) 2m GAUSSIAN kh) ORTHOGONAL WHITE NOISE SOURCE FILTER "km (SPECTRAL DENSITY l/N H,.(s) 376;-
24 Y MULTIPLIER 26 k 2% MULTIPLIER SPEECH EXTRACTION 2 28 COEFFICIENTS MULTIPLE;
RECONSTRUCTED SPEECH SIGNAL SPEECH RECONSTRUCTION SYSTE M mvsmox ALFRED A. WOLF BY ATTORNEY SPEECH SYNTI-IESIZER UTILIZING WHITE NOISE The invention described herein may be manufactured and used by or for the Government of the United States of America for governmental purposes without the payment of any royalties thereon or therefor.
FIELD OF INVENTION With the advent of systems in which there is a man/- machine interface such as computer systems or control systems, it is desirable for a man to be able to communicate with the machine as easily as possible. The ideal situation would be to have the man speak to the machine and to have the machine reply in human language. Several attempts have been made to produce such a system; however, all such schemes to date are generally very complicated and have a limited vocabulary.
DESCRIPTION OF THE PRIOR ART One of the most popular man/machine interface schemes in which the man merely talks to a machine employs the fact that in each language there are a few basic sounds that make up the words in the language. These basic sounds are called phonemes. In more precise language, phonemes are the sound features which are common to all speakers of a given speech form and which are exactly reproduced in repetition. In any language there are a definite and small number of phonemes. In the English language there are 46 phonemes. These phonemes are known sound-wise, and any system employing phonemes, records the basic phoneme sounds on magnetic tape or some other recording means. A computer program is then written to connect the proper phonemes to produce words that convey speech information in the form of recurrent word patterns. In such a system one can type information into, say a computer, and have it speak back to the operator.
To date such systems have the disadvantage that, as the vocabulary of the system increases the computer programs become more complexed. Thus the speech vocabulary of such systems is usually very limited. Another disadvantage of the system is that the words spoken by the computer is generally of poor fidelity and difficult to understand. Also, such a system is not flexible since the information that can be conveyed by the computer depends on the extent of the vocabulary of the computer that is the complexity of the computer program.
Another type of speech synthesizer system is known as the Vocoder (Voice Coder). The Vocoder dates back to the l920s. The standard Vocoder is a spectrum/channel vocoder. It consists of an analyzer which produces a signal proportional to the short term amplitude spectrum of the fundamental frequency of the speech input, and the synthesizer consists of devices that reconstruct speech by means of electrical signals appearing at the analyzer output. In both the analyzer as well as the synthesizer, signals are generated that are proportional to both the voice and unvoiced sounds and the pitch of the sounds.
There are, of course, other speech analyzer/synthesizer systems which will not be dealt with here.
Suffice it to say that in each of these systems the method of speech used by the human is imitated in one way or another or the linguistic properties of speech are capitalized upon.
It is an object of the present invention to produce a speech coding and synthesizing system that does not depend on a prestored vocabulary nor upon the linguistic properties of a natural language but instead makes use of the signal properties of speech, namely its probabilistic content.
It is another object of the present system to produce a speech synthesizing device which does not employ a complicated computer program.
It is another object of the present invention to produce a speech synthesizer which employs a technique not used for previous speech synthesizing systems.
These and other objects of the present invention are set forth in the following disclosure.
SUMMARY OF THE INVENTION The present invention capitalizes on the fact that speech is a stochastic process. Speech is stochastic because long samples of speech convey information which is probabilistic in time. The present system employs a gaussian white noise source; the output of which is passed through an orthogonal filter to produce a set of random orthogonal functions which when multiplied by a speech signal and averaged produces coefficients which can be used to synthesize speech information at a later time. This method is somewhat analogous to the use of generalized Fourier coefficients to define and reproduce a periodic function.
DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of the system for obtaining the speech extraction coefficients.
FIG. 2 is a block diagram of a linear orthogonal filter as a cascaded chain of linear filter sections.
FIG. 3 is a schematic drawing of an RC orthogonal filter.
FIG. 4 is a schematic drawing of the' temporal averager.
FIG. 5 is a block diagram of the speech reconstruction system.
DESCRIPTION OF THE PREFERRED EMBODIMENTS This invention relates to a system and the corresponding subsystem devices for synthesizing and codifying speech from broadband white noise. This invention makes it possible to transform a sample of speech of length, T into a set of speech coefficients designated by a a a,,, each of which depends on time according to the sample length, T, of the speech sample, which is explained further below. These coefficients'a a a, can be thought of as a set of generalized Fourier coefficients defined on a set of orthogonal noise sample functions obtained from a white noise source. If the record of the speech sample, of length T is denoted by x(t), then by makinguse of the fact that x(t) is a sample function from a stochastic process, it can be decomposed into an infinite series of orthogonal sample functions taken from a white noise source. Each orthogonal sample function is weighted by a coefficient the value of which depends on the speech sample under consideration. The set of coefficients a,,: n l, 2, 09 thus contains the necessary inforrna ion from which the original speech sample can be reconstructed by weighting the orthogonal sample functions l w, (t): n l, 2, (see FIG. 5) with the appropriate corresponding co lcient,{a,,: n 1, 2, in which T is a strip of time running to infinity and t is a given instant of time. The invention makes possible the maximum compression of speech by using the coefficients a,,: n l, 2, to convey information. Using these coefficients, it is now possible for man to communicate with computers and to have them in turn communicate with man by means of speech. This invention opens up the possibility of new unforeseen innovations in machines and systems in which there is a speech communication interface with man. One such example, in addition to the possibility of talking to a computer is the possibility of talking to a typewriter.
DESCRIPTION OF THE COEFFICIENT OR CODING GENERATOR FIG. I is a block diagram of the system for obtaining the speech extraction coefficients: The speech signal 2, denoted by the sample functions x(t) of sample length T is multiplied (instant by instant) by each orthogonal noise signal denoted respectively by the sample functions v,(t), v (t), v (t), v,,(t), which are derived as the set of outputs of an orthogonal filter 12, described below, when the white noise signal It), denoted by the sample function g(t) of spectral density of N watts per cycle, is applied to the orthogonal filters input. The sample functions that result from the multiplication of the set of orthogonal noise sample functions v,(t), v,,(t) with the sample function, x(t), in the multipliers 4, 6, and 8, are the speech extractionsamples of length T seconds as denoted by v,,(t) x (t): n 1, 2, This set of product sample functions v,(t) x (t), v (t) x (t), v (t) x (t), v,,(t) x (t) is averaged temporally in the temporal averagers l4, l6, and 18 to give rise to the corresponding set of coefficients a,, a a,,, where mathematically these averages are formally given by the equations:
IP f 3 1130) X (M) (3) and so forth to a =%,f: 11,, 1 (0dr (4) where Tis the averaging time and is at least equal to the sample length, T,, of the speech sample. The integral over (0,T) of each of the product sample functions, v,,x divided by T is the temporal average of those sample functions. The resplting speech extraction coefficients a,, (1,, a, are dependent only on the sample length, T, and are the coefficients that represent the essential information needed to reconstruct the speech signal, x( t). A rule of thumb for the value of T, the averaging time is that about ten times the sample length, T of the speech sample.
GENERATION OF THE ORTHOGONAL NOISE FUNCTIONS In the generation of the speech code, 0,, a a a it is necessary to generate a corresponding set of random sample functions v,(t), v (t), v -,(t), v,,(t), that are pairwise orthogonal. By pairwise orthogonality we roughly mean that the information contained in each of the sample functions v,(t), v (t), v (t), v,,(t) is unique. To put this another way, no overlap in information exists between any two sample functions v (t), v (t), v,,(t). For descriptive purposes, the generation of the orthogonal random sample functions is achieved by means of an orthogonal filter when white noise of power spectral density N is applied to the input of the orthogonal filter.
The orthogonal filter is a linear filter with one pair of input terminals and many pairs of output terminals. The filter, which is described below in detail, may be roughly likened to an ideal prism on which white light is incident. The incident light may be thought of as the input to the prism. The output of an ideal prism is essentially the complementary colors in response to the incident white light upon it. The complementary colors are of course pairwise orthogonal. In our case however, the orthogonal functions are random within a band of frequencies whereas in the prism case the complementary functions are in principle roughly single frequency sinusoids with random amplitudes and random phases. The latter is due to the randomness of the white light.
CONSTRUCTION OF THE ORTHOGONAL FILTERS Linear orthogonal filters can be constructed as a chain of linear filter sections in which the poles of any section of the filter, in the complex plane, is essentially cancelled by the zeros of the next immediate section. FIG. 2 shows this general scheme for constructing an orthogonal filter. The complex function I-I,(s) is the transfer function of the first section of the filter as a function of s, the complex frequency variable in which s a jw where w=21rf,
f frequency in cycles per second a decrement in amplitude in cycles per second j \,ll, unit of the imaginary number system.
The numerator B(s) is selected to satisfy the equation and the denominator C,(s) is selected to satisfy the equation C,(s) 0. e s ...rlt .r" (8) where the constants b b b and the constants c c c are chosen such that H (s) is physically realizable. In a like way the transfer function is designed such that the numerator C,(s) is exactly like the denominator of the first section, H,(s) with s replaced by (-s). This makes the zeros of H (s) H (s)H (s) occupy the same positions as the poles of H,(s) except that they are reflected about the jw-axis. Hence, the zeros of one transfer function in effect cancel the poles of the next section giving rise to the orthogonality of the pair. In general, the n-th transfer function of the orthogonal filter from the input of the filter is given by the formula H H H,,(s) =K,,B (s i H H k=1 (10) for n 2. Where the symbol,
is the product of all factors from I to n. For this speech device, a filter is developed using this process of construction of orthogonal filters having simple poles and zeros along the axis of the reals. This gives rise to simple resistance-capacitance (RC) network with differential amplifiers. The RC orthogonal filter is shown in FIG. 3. The first section of the filter includes an input resistor 34, feedback resistor 36 designated as R in equation 13, feedback capacitor 38, designated as C in equation 13, and operational amplifier 40. The subsequent section of the filter includes inputs resistors 42, 44, designated as r and r in equation respectively, resistors 48, 50, 54, and 56; capacitors 46, 52; and differential amplifier 58. Resistors 48 and 50 are designated as R, in equation 17, resistors 54 and 56 are designated as R,, and R respectively in equation 16 and capacitors 46 and 52 are designated as C, in equation 17. For these filters the transfer function of the first section is given by and for n 2 2 y-( \I it/ n -V n-1( (l2) where for design purposes the first pole is placed at s for the first section and for the subsequent tandem sections, the poles s s are placed according to the formulae and s HR, (2,, (I?) In the ordinary orthogonal filter, the poles s s s can be placed arbitrarily along the real axis. But for the speech synthesizer this procedure will fail to work. It is important to place the poles s s s in such a way as to make the reconstructed signal converge to the desired speech signal. In the process of constructing an orthogonal filter for the speech synthesizer the first pole is selected approximately equal to the bandwidth of the speech signal. Since speech signals can cover a bandwidth of the order of between 200 Hz and 2,500 Hz depending on the speaker, s, can be selected to be of this order of frequencies. Then to make the reconstructed signal converge to the original speech signal in the mean squared sense, the remaining poles are chosen according to the formula s,,=s /n n 2 l (18) If the remaining poles are placed according to the formula s s ln n 2 1 (19) then the reconstructed signal will converge absolutely almost everywhere. In general s,,= s /n"; n 2 I (20) where p 2 2. The higher p the faster the convergence and the less the number of terms in the series that will be needed for convergence. There is a practical difficulty however, which causes a resolution problem in placing the poles at distinct positions along the real axis.
SUMMARY OF THE GENERAL PROCESS FOR PRODUCING ORTHOGONAL FILTERS I Decide on the nature of the poles of the orthogonal filter to be, i.e., whether they are simple or complex, from the application for which the filters are to be used.
2. From Step 1 above, the degree of B(s) in Equation (7) is determined and hence the degree of C,(s) is determined from physical realizability considerations, i.e., if k is the degree of b(s) and m is the degree of C (s) then m k+l.
Check to see if [log H (w)[dw is satisfied where: H,,(w) =|H,,(jw) is the amplitude characteristic of the transfer function H,,(s).
.4. From each transfer function, i.e., H,(s) and H (s), etc. given by and where H (s) K /K, C,(s)/C (s) 24 etc.
the circuit is synthesized according to standard procedure noting that the placement of the poles must follow the requirements of convergence if for example a speech signal is to be reconstructed.
It is therefore clear that in terms of the circuit all the H (s) for n 2 2 are the same with the exception of the circuit parameters which depend only on n, the number of the section. Hence, the additional requirements on the poles depend on the convergence of the reconstruction process or on some other physical requirement. The resulting chain is now an orthogonal filter.
6. The design of the transfer functions H (s), H (s), H (s), depend on the placement of the poles s s s These in turn define the constants b b b and C C C where n is the number of the section. For an orthogonal filter used in reconstruction of signals, the poles s,,, n l, 2, must be placed according to the rules:
a. s selected equal to or greater than the bandwidth of the signal under consideration.
b. For convergence in the mean-square case c. For convergence almost everywhere d. For other types of convergence S ll p a 4 sion using amplifiers and differential amplifiers to take account of the negative signs that may result from the partial fraction constants. The process described above is the one used to obtain the hardware configuration of the orthogonal filter shown in FIG. 3.
SIGNAL MULTIPLIERS The signal multipliers shown in FIG. 1 for multiplying the orthogonal noise components of the white noise source with the speech signal are standard devices. In the frequency range of interest, i.e., bandwidths up to 20 KHz, the quarter Gauss square method was used. Other types can easily be used instead.
The quarter Gauss square multiplier is one that makes use of the identity (A-i-BY- (A-B) 4 AB (26) TEMPORAL AVERAGER The temporal averager is a simple R-C low pass filter with a very long time constant compared to the highest frequency component in the speech signal. Iff denotes this frequency, men
FIG. 4 gives an example of the configuration of the temporal averager. The temporal averager consists of input resistor 60, feedback capacitor 62 and amplifier 64.
GAUSSIAN WHITE NOISE SOURCE The Gaussian white noise source is a standard noise generator of the diode type.
SYSTEM FOR THE RECONSTRUCTION OF THE SPEECH SIGNAL FROM SPEECH EXTRACTION COEFFICIENTS AND WHITE NOISE Once the speech extraction coefficients, 0,, a a a are obtained from the Speech Extraction Code Generator, shown in FIG. 1, it is possible to reconstruct from these coefficients the original speech sample, x(t). T he reconstruction of x(t) is carried out by multiplying each coefficient a : k 1,2, n by a corresponding set of orthogon I oise functions which we shall denote by the symbolsihw fl): k =1, 2, n and 0 s t and then summing e resulting set of products. Thus, we form the set of products a,w (t), a,w (t), a w (t), a,,w,,(t) and then sum to obtain the reconstructed speech signal x(t) as the summation It is evident by analogy that the coefficients a,, a 0,, correspond to the coefficients of a Fourier series and the orthogonal noise functions w (t), w (t), w,(t), w,,(t) to correspond to the orthogonal set of trigonometric functions. Equation (27) is the partial sum of a generalized random orthogonal series which converges as n w to the speech sample x(t) in some probabilistic sense. If the series converges rapidly then by truncating the series at some finite number n the partial sum will still approximate the speech sample/2 with a predictable error. The mean squared error, G can be calculated from the equation en =z a For a speech sample 2 will be fixed. Then 6 is smallest when ll E k=l is largest.
THE PHYSICAL RECONSTRUCTION SYSTEM FIG. 5 is a block diagram of the speech reconstruc In FIG. 5, the white noise signal, k(t) of spectral density of l/N watts per cycle is'applied to an orthogonal filter which has a transfer function, H,,(s) identical to the orthogonal filter of the coding generator shown in FIG. 1. The response of this orthogonal filter is the set of orthogonal noise sample functions w,(t), w (t), w (t), w,,(t). Each of these is multiplied respectively by the coefficients a,, a a, derived from the coding generator (FIG. 1). The products thus formed in each of the multipliers is summed in the adder to give TI-IE ADDER- The summing of the products a,w,(t), a,w,(t), a,,w,,(t) are accomplished in a conventional adder.
RELATIONS BETWEEN SPECTRAL DENSITIES Since the white noise sourceof the generation and reconstruction systems have reciprocal spectral density functions, the sample functions are related according SUMMARY In the method presented here the speech analysis and synthesis technique capitalizes on the fact that speech is a random signal that can be decomposed into a generalized orthogonal series something like the Fourier series. This means that a speech signal can be represented by a set of coefficients which depend only on the nature of the speech information and on the length of the speech sample. This invention is an electronic system for accomplishing the generation of these speech coefficients or speech extraction parameters and for utilizing them to reconstruct the speech into a spoken signal.
The novelty of the invention consists in the utilization of orthogonal filters, white noise and temporal averaging devices connected in the unique arrangement shown in FIG. 1 that gives rise to the speech extraction coefficients, a a a It should be noted that the speech extraction parameters are very narrow band signals. These signals are nearly constant for T T,,. This also means that the system can be used to greatly compress speech.
Obviously many modifications and variations of the present invention are possible in the light of the above teachings. It is therefore to be understood that within the scope of-the appended claims the invention may be practiced otherwise than as specifically described.
What is claimed is:
l. A system for synthesizing speech from white noise, comprising:
means for producing a set of speech extraction coefficients responsive to an input speech signal samp means for producing a set of orthogonal noise sample 5 functions;
means for multiplying each said speech extraction coefficient with a respective orthogonal noise sample function; means connected to the output of said multiplier for summing the outputs of said multiplier to reproduce said input speech signal sample. 2. The system of claim 1, wherein: said means for generating said set of speech extraction coefficients includes:
means for producing another set of orthogonal noise sample functions;
means for multiplying each said orthogonal noise sample function of said another set with said input speech signal sample to produce a set of product sample functions from which said coefficients are derived.
3. The system of claim 1 wherein said means for producing said set of speech extraction coefficients includes means for temporally averaging each product sample function to produce said set of speech coefficients and wherein said coefiicients at the outputs of said averaging means are described mathematically by the following equations,
3o 1 T T ai f viwxmdti, a.=% f vzmxmdti Where T is the averaging time, x(t) is the speech signal sample function and V,(t), V,( t)... V,,(t) is the set of orthogonal noise sample functions.
4. The system of claim 3, wherein: said speech signal sample has a duration T and said averaging time T is at least equal to said speech sample time T 5. The system of claim 3, wherein: said means for summing comprises an adder connected to the output of said means for multiplying said each coefficient with a respective sample function, to sum the outputs thereof in accordance with the following mathematical expression:
s(t) :2 Aims) for reconstructing said speech signal, where A W (t) represents the set of products at the output of said 5 means for multiplying.
6. The speech synthesizer of claim 5, wherein said means to generate said set and said means to generate said another set of orthogonal noise sample functions each include:
0 an orthogonal filter coupled to a source of white noise for producing said orthogonal noise sample functions.
7. The system of claim 6, wherein:
the response to the respective white noise sources of said orthogonal filters for producing said set of orthogonal noise functions and said orthogonal filters for producing said another set of orthogonal noise functions'is substantially the same.
8. The system of claim'7, wherein:
said source of white noise used to generate said speech coefficients has a spectral power density N watts per cycle and where said source of white noise applied to said orthogonal filter for recon structing said speech has a spectral power density l/N watts per cycle.
9. A method for synthesizing speech comprising the steps of:
generating a speech signal sample;
generating a first set of orthogonal noise sample functions; multiplying each of said orthogonal sample functions with said speech signal sample; averaging the output of each said product of the multiplication of the orthogonal sample functions with the speech signal sample to generate a set of speech extraction coefiicients;
generating a second set of orthogonal noise sample functions;
multiplying each said speech coefficient with a re spective function. of said second set of orthogonal noise sample functions to reconstruct said speech signal sample.
10. The method of claim 9, wherein:
said steps of producing said orthogonal sample functions include the step of connecting a source of white noise with a spectral power density of N watts per cycle to an orthogonal filter to produce said orthogonal sample functions for producing said speech extraction coefficient and;
connecting a white noise source having a spectral power density l/N watts per cycle to an orthogonal filter for producing said orthogonal noise function, for reconstructing said speech signals.
11. The method of claim 10 wherein:
the steps of generating a first set of orthogonal noise functions is accomplished through an orthogonal filter having a transfer function substantially identical to said filter for producing said second set orthogonal filter sample functions;
multiplying each of said second set of orthogonal noise functions with said speech extraction coefficients; and
summing the output of each multiplier to construct said speech signal.

Claims (11)

1. A system for synthesizing speech from white noise, comprising: means for producing a set of speech extraction coefficients responsive to an input speech signal sample; means for producing a set of orthogonal noise sample functions; means for multiplying each said speech extraction coefficient with a respective orthogonal noise sample function; means connected to the output of said multiplier for summing the outputs of said multiplier to reproduce said input speech signal sample.
2. The system of claim 1, wherein: said means for generating said set of speech extraction coefficients includes: means for producing another set of orthogonal noise sample functions; means for multiplying each said orthogonal noise sample function of said another set with said input speech signal sample to produce a set of product sample functions from which said coefficients are derived.
3. The system of claim 1 wherein said means for producing said set of speech extraction coefficients includes means for temporally averaging each product sample function to produce said set of speech coefficients and wherein said coefficients at the outputs of said averaging means are described mathematically by the following equations,
4. The system of claim 3, wherein: said speech signal sample has a duration To and said averaging time T is at least equal to said speech sample time To.
5. The system of claim 3, wherein: said means for summing comprises an adder connected to the output of said means for multiplying said each coefficient with a respective sample function, to sum the outputs thereof in accordance with the following mathematical expression:
6. The speech synthesizer of claim 5, wherein said means to generate said set and said means to generate said another set of orthogonal noise sample functions each include: an orthogonal filter coupled to a source of white noise for producing said orthogonal noise sample functions.
7. The system of claim 6, wherein: the response to the respective white noise sources of said orthogonal filters for producing said set of orthogonal noise functions and said orthogonal filters for producing said another set of orthogonal noise functions is substantially the same.
8. The system of claim 7, wherein: said source of white noise used to generate said speech coefficients has a spectral power density N2 watts per cycle and where said source of white noise applied to said orthogonal filter for reconstructing said speech has a spectral power density 1/N2 watts per cycle.
9. A method for synthesizing speech comprising the steps of: generating a speech signAl sample; generating a first set of orthogonal noise sample functions; multiplying each of said orthogonal sample functions with said speech signal sample; averaging the output of each said product of the multiplication of the orthogonal sample functions with the speech signal sample to generate a set of speech extraction coefficients; generating a second set of orthogonal noise sample functions; multiplying each said speech coefficient with a respective function of said second set of orthogonal noise sample functions to reconstruct said speech signal sample.
10. The method of claim 9, wherein: said steps of producing said orthogonal sample functions include the step of connecting a source of white noise with a spectral power density of N2 watts per cycle to an orthogonal filter to produce said orthogonal sample functions for producing said speech extraction coefficient and; connecting a white noise source having a spectral power density 1/N2 watts per cycle to an orthogonal filter for producing said orthogonal noise function, for reconstructing said speech signals.
11. The method of claim 10 wherein: the steps of generating a first set of orthogonal noise functions is accomplished through an orthogonal filter having a transfer function substantially identical to said filter for producing said second set orthogonal filter sample functions; multiplying each of said second set of orthogonal noise functions with said speech extraction coefficients; and summing the output of each multiplier to construct said speech signal.
US00155988A 1971-06-23 1971-06-23 Speech synthesizer utilizing white noise Expired - Lifetime US3746791A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15598871A 1971-06-23 1971-06-23

Publications (1)

Publication Number Publication Date
US3746791A true US3746791A (en) 1973-07-17

Family

ID=22557604

Family Applications (1)

Application Number Title Priority Date Filing Date
US00155988A Expired - Lifetime US3746791A (en) 1971-06-23 1971-06-23 Speech synthesizer utilizing white noise

Country Status (1)

Country Link
US (1) US3746791A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3833767A (en) * 1971-06-23 1974-09-03 A Wolf Speech compression system
US3975587A (en) * 1974-09-13 1976-08-17 International Telephone And Telegraph Corporation Digital vocoder
US4075424A (en) * 1975-12-19 1978-02-21 International Computers Limited Speech synthesizing apparatus
US4449231A (en) * 1981-09-25 1984-05-15 Northern Telecom Limited Test signal generator for simulated speech
US4545065A (en) * 1982-04-28 1985-10-01 Xsi General Partnership Extrema coding signal processing method and apparatus
US4644476A (en) * 1984-06-29 1987-02-17 Wang Laboratories, Inc. Dialing tone generation
US5133010A (en) * 1986-01-03 1992-07-21 Motorola, Inc. Method and apparatus for synthesizing speech without voicing or pitch information
US20160196526A1 (en) * 2015-01-06 2016-07-07 Verizon Patent And Licensing Inc. Smart hook for retail inventory tracking
US20210319800A1 (en) * 2019-01-31 2021-10-14 Mitsubishi Electric Corporation Frequency band expansion device, frequency band expansion method, and storage medium storing frequency band expansion program

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3025350A (en) * 1957-06-05 1962-03-13 Herbert G Lindner Security communication system
US3036157A (en) * 1960-05-09 1962-05-22 Gen Dynamics Corp Orthogonal function communication system
US3204034A (en) * 1962-04-26 1965-08-31 Arthur H Ballard Orthogonal polynomial multiplex transmission systems
US3330910A (en) * 1964-05-06 1967-07-11 Bell Telephone Labor Inc Formant analysis and speech reconstruction
US3488445A (en) * 1966-11-14 1970-01-06 Bell Telephone Labor Inc Orthogonal frequency multiplex data transmission system
US3548107A (en) * 1968-04-30 1970-12-15 Webb James E Signal processing apparatus for multiplex transmission

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3025350A (en) * 1957-06-05 1962-03-13 Herbert G Lindner Security communication system
US3036157A (en) * 1960-05-09 1962-05-22 Gen Dynamics Corp Orthogonal function communication system
US3204034A (en) * 1962-04-26 1965-08-31 Arthur H Ballard Orthogonal polynomial multiplex transmission systems
US3330910A (en) * 1964-05-06 1967-07-11 Bell Telephone Labor Inc Formant analysis and speech reconstruction
US3488445A (en) * 1966-11-14 1970-01-06 Bell Telephone Labor Inc Orthogonal frequency multiplex data transmission system
US3548107A (en) * 1968-04-30 1970-12-15 Webb James E Signal processing apparatus for multiplex transmission

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3833767A (en) * 1971-06-23 1974-09-03 A Wolf Speech compression system
US3975587A (en) * 1974-09-13 1976-08-17 International Telephone And Telegraph Corporation Digital vocoder
US4075424A (en) * 1975-12-19 1978-02-21 International Computers Limited Speech synthesizing apparatus
US4092495A (en) * 1975-12-19 1978-05-30 International Computers Limited Speech synthesizing apparatus
US4449231A (en) * 1981-09-25 1984-05-15 Northern Telecom Limited Test signal generator for simulated speech
US4545065A (en) * 1982-04-28 1985-10-01 Xsi General Partnership Extrema coding signal processing method and apparatus
US4644476A (en) * 1984-06-29 1987-02-17 Wang Laboratories, Inc. Dialing tone generation
US5133010A (en) * 1986-01-03 1992-07-21 Motorola, Inc. Method and apparatus for synthesizing speech without voicing or pitch information
US20160196526A1 (en) * 2015-01-06 2016-07-07 Verizon Patent And Licensing Inc. Smart hook for retail inventory tracking
US9818081B2 (en) * 2015-01-06 2017-11-14 Verizon Patent And Licensing Inc. Smart hook for retail inventory tracking
US20210319800A1 (en) * 2019-01-31 2021-10-14 Mitsubishi Electric Corporation Frequency band expansion device, frequency band expansion method, and storage medium storing frequency band expansion program
US11763828B2 (en) * 2019-01-31 2023-09-19 Mitsubishi Electric Corporation Frequency band expansion device, frequency band expansion method, and storage medium storing frequency band expansion program

Similar Documents

Publication Publication Date Title
Kumaresan et al. Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications
Gold et al. Analysis of digital and analog formant synthesizers
Noll Short‐time spectrum and “cepstrum” techniques for vocal‐pitch detection
Slaney et al. On the importance of time-a temporal representation of sound
Oppenheim et al. Nonlinear filtering of multiplied and convolved signals
US5583784A (en) Frequency analysis method
CA1157564A (en) Sound synthesizer
Morgan et al. Real-time adaptive linear prediction using the least mean square gradient algorithm
US3995116A (en) Emphasis controlled speech synthesizer
KR920005507A (en) Digital signal encoder
Crochiere et al. Real-time speech coding
US3746791A (en) Speech synthesizer utilizing white noise
Li et al. Digital signal processing in audio and acoustical engineering
Noll et al. Short‐Time “Cepstrum” Pitch Detection
US3403227A (en) Adaptive digital vocoder
US3069507A (en) Autocorrelation vocoder
JP2023548707A (en) Speech enhancement methods, devices, equipment and computer programs
Haddad A class of orthogonal nonrecursive binomial filters
US3394228A (en) Apparatus for spectral scaling of speech
US3127476A (en) david
US3800093A (en) Method of designing orthogonal filters
US3268660A (en) Synthesis of artificial speech
US3405237A (en) Apparatus for determining the periodicity and aperiodicity of a complex wave
US3381093A (en) Speech coding using axis-crossing and amplitude signals
Suzuki Speech processing by splicing of autocorrelation function