US3908085A - Voice synthesizer - Google Patents

Voice synthesizer Download PDF

Info

Publication number
US3908085A
US3908085A US486506A US48650674A US3908085A US 3908085 A US3908085 A US 3908085A US 486506 A US486506 A US 486506A US 48650674 A US48650674 A US 48650674A US 3908085 A US3908085 A US 3908085A
Authority
US
United States
Prior art keywords
signal
control
output
duty cycle
phoneme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US486506A
Inventor
Richard T Gagnon
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US486506A priority Critical patent/US3908085A/en
Priority to GB27832/75A priority patent/GB1519004A/en
Priority to CA230,923A priority patent/CA1070018A/en
Priority to FR7521291A priority patent/FR2278127A1/en
Priority to JP50083888A priority patent/JPS5140007A/en
Priority to DE19752530380 priority patent/DE2530380A1/en
Application granted granted Critical
Publication of US3908085A publication Critical patent/US3908085A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00

Definitions

  • 1 pass the analog outputs through relatively slow acting filters; i.e., filters whose transfer characteristics would, in sufficient time, duplicate the end parameters of the control signal but which, within the ordinary phoneme duration intervals, will prevent the analog control signals as applied to the vocal tract model from reaching the steady-state or target values.
  • filters i.e., filters whose transfer characteristics would, in sufficient time, duplicate the end parameters of the control signal but which, within the ordinary phoneme duration intervals, will prevent the analog control signals as applied to the vocal tract model from reaching the steady-state or target values.
  • filters i.e., filters whose transfer characteristics would, in sufficient time, duplicate the end parameters of the control signal but which, within the ordinary phoneme duration intervals, will prevent the analog control signals as applied to the vocal tract model from reaching the steady-state or target values.
  • the sluggish acting filters tend to produce smooth, continuous glides between phonemes in much the same manner as the human vocal tract produces smooth transitions between phonemes in spoken speech.
  • variable duty cycle signals are generated by serializing digital signal quantities using a binary progression of weighted time intervals. Thereafter, I filter the serialized variable duty cycle waveforms to produce an analog signal for application to. the analog control devices in the vocal tract model.
  • variable timing means for varying the phoneme intervals and thereby the speech rate and further by tuning the slow acting filters such that the response times track with the varying speech rates; i.e., the response times are made shorter for higher speech rates and proportionately shorter phoneme intervals, but the ratio of phoneme interval to response time remains about the same for a given phoneme.
  • I accomplish this through the use of analog gates in series with the resistive components of the slow acting filters and by applying a variable duty cycle, high-frequency chopping signal to the analog gate thereby simulating a varying electrical parameter; i.e., resistance.
  • a variable duty cycle, high-frequency chopping signal to the analog gate thereby simulating a varying electrical parameter; i.e., resistance.
  • FIG. 1 is a block diagram of a voice synthesizer system employing the present invention
  • FIG. 2 is a more detailed block diagram of a portion of the system showing the specific element of the vocal tract model
  • FIG. 3 is a partial circuit diagram of representative portion of a single control signal generating channel
  • FIG. 4 is a timing diagram of wave forms relating to the circuits of FIGS. 1 through 3',
  • FIG. 5 is a circuit diagram of a delay system for producing the effects shown in FIG. 4.
  • FIG. 6 is a circuit diagram of a noise source.
  • FIG. 1 the major portion of a system block diagram is represented.
  • This system like the system of my copending application Ser. No. 274,029 may be operated by using any of several types of input means including business machines, computer, or phonetic key boards capable of generating a sequence of digital signals representing the various phonemes to be selected thereby to make up a given speech pattern. Since such input means are described in the prior art as well as in my copending application, I have omitted a specific description from this text.
  • the phoneme selection signals are transmitted by way of the six input lines 10 to solid state read-only memories 12 and 14, each having six input address lines and eight output lines.
  • the memories 12 and 14 respond to the phoneme selection addresses to generate phoneme parameter control signals of an analog character on the output lines 16.
  • the signals appearing on output lines 16 are sequences of serialized digital signal quantities wherein the various bits in the repeating series are time-weighted according to a binary progression.
  • the output signals have an average value which is the analog equivalent of the digital input quantities addressing those outputs.
  • Output lines 160 individually control eight phoneme parameters including output spectral frequencies, input forcing function frequencies, nasal closure, and transition rate.
  • the eight output lines 16b control eight more phoneme parameters including timing, amplitude, delay, spectral contour, clo sure, and band width.
  • a total of 16 phoneme parameters, each capable of 16 different values, are employed in the system illustrated herein to control the phonetic output.
  • the analog control quantities on output lines 16a and 16b are connected through the filters 24a and 24b, respectively, the response times of filters 24a being tuned so as to be long relative to the typical phoneme intervals which are selected for a given speech rate.
  • Filters 24b also produce a damped response to step inputs but to a lesser degree than filters 24a.
  • the nominal setting of the filte rs 24a and 24b is such that the response times thereof bear some predetermined relationship to the phoneme times peculiar to a given speech rate, the result being that the frequency parameters output from ROM I2 on lines 1611 are seldom, if ever, realized during the phoneme interval to which they directly relate.
  • the step inputs to filters 2412 are smoothed by the filter response characteristics. Therefore.
  • the filters 24 produce the phoneme control signal smoothing which is described in my copending application for all speech rates, again, I emphasize that this is most important for the vocal tract frequency control signals F1, F2, and F3.
  • the major difference here lies in the fact that these filters 24 are tunable for varying speech rates.
  • the transfer characteristics of filters 24 are such as to eventually produce an output which approximates the input, given enough time to respond; i.e., the output eventually reaches a steady state amplitude level related to the amplitude of the signal at the input.
  • the seven output signals from filters 24a and the five output signals from filters 24b are connected through the duty cycle converters 26 which convert the smooth slowly varying analog dc levels into fixed-frequency pulse trains wherein the duration or width of the pulses varies according to the input dc levels.
  • This duty cycle or pulse width modulated" signals are then applied to the various devices in the vocal tract model 28 to produce the audio speech output on line 29.
  • the lower five parameter control signals relating to amplitude, spectral contour, fricative frequency and spectral shape are applied to the filter bank in the vocal tract 28 through an excitation processor 30 comprising analog control devices for the control of the quantities indicated.
  • Source 32 is a source of audio signals which are used to produce the voiced phoneme constituents.
  • Source 32 is connected to the vocal tract filter bank 28 through the excitation processor 30.
  • the fricative excitation source 34 is a noise source hereinafter described in greater detail and is also connected to the vocal tract filter bank 28 through the excitation processor 30.
  • the duty cycle outputs on lines 16 are generated by means of basic timing apparatus including a 20 KHz clock source 18 having output line 20 connected to the duty cycle conversion unit 22 and having output lines 23 and 25 connected to the readonly memories 12 and 14.
  • the signals on lines 23 and 25 form part of the phoneme addresses and operate to serialize four selected bits of stored data onto each of the parameter output lines I6 in a binary progression wherein the first bit is assigned eight clock times, the second bit is assigned four clock times, the third bit is assigned two clock times, and the last and final bit is assigned a single clock time. More details on the manner in which this specific conversion is accomplished will be described with reference to FIG. 3.
  • lnflection signals are input from the programming means by way of lines 36 and connected to the inflection filter 38, the output of which is connected to the vocal excitation source 32 to vary the frequency or pitch of the vocal excitation source output. This produces inflection variations in the audio output on line.
  • the basic rate control signal is controlled by unit 40 having a manual tuning dial 42.
  • the speech rate signal is a duty cycle signal; i.e..
  • a time varying wave form of fixed frequency (20 KHz) but variable pulse width is connected to the slow acting filter bank 24b, the inflection filter 38 and a transition rate control unit 44 and a phoneme timer unit 46 which produces an output ramp varying from 5 volts to 0 volts in a period or interval which varies with phoneme interval timing.
  • the relative timing parameter from read-only memory 14 is one of the 16 parameters selected by the stored data in the memories and is applied to the phoneme time 46 to establish the slope of the output ramp from the timer 46.
  • phoneme timing varies not only with speech rate on an across-the-board basis, but also from phoneme-tophoneme at a give speech rate.
  • the ramp output from timer 46 is connected to a vocal delay generator 48 which functions to delay the vocal amplitude control parameter and to a closure delay generator 50 which controls a time delay of the vocal spectral contour, closure, bandwidth, and fricative amplitude control parameters, all of which are independent of the vocal tract resonant functions controlled by the memory 12.
  • the output rate of control unit 40 is also connected to the transition rate control so that the transition rates are taken into account in controlling the response times of the filters 24a whereby those response times track with the phoneme intervals.
  • the vocal tract filter bank comprises cascaded resonant filters 50, 52, 54, 56, and 58 which produce the frequency poles in the output spectrum of any given phoneme.
  • Each of the filters may be implemented in the form of a two pole filter as is more fully set forth in my copending application Ser. No. 274,029. It will be noted, however, that in the present embodiment of my invention the filters are connected in series rather than in parallel as in my previous embodiment. l find that this produces a certain advantage with respect to energy distribution between the poles and creates a more realistic speech output.
  • Filters 50, 52, and 54 are tunable so as to vary the positions of the poles in the output phoneme spectrum, whereas filters 56 and 58 are fixed pole filters.
  • the cas caded arrangement of filters is connected through a closure gate 60 which is subject to a control signal, and a KHz filter 62 which filters out the control signal carrier which might otherwise appear in the output waveform. Again the output signal appears on line 29 and corresponds with FIG. 1.
  • the entire vocal tract model comprises the vocal oscillator 32, the noise source 34 and the respective control channels for the forcing functions.
  • Vocal oscillator 32 is connected through a filter 64 and a spectral contour filter 66 which is subject to tuning by an externally derived control; i.e. the vocal spectral contour control signal produced on the fourth output line of read-only memory 14.
  • the vocal oscillator signal is also connected to a vocal constituent amplitude control unit 68 which is an analog gate as hereinafter described in greater detail.
  • the unit 68 is also subject to the externally derived control signal; i.e., the second output signal of read-only memory 14.
  • the sig nal is passed through a nasal resonance filter 70 which is subject to two control signals, the nasal closure" and the nasal frequency" signals which are derived on the fourth and fifth output lines of read-only memory 12.
  • the output of the nasal resonance filter is connected by way of line 72 to the input of the vocal tract filter bank; i.e., at the input of filter 50 as shown in FIG. 2.
  • the fricative noise source 34 is connected through the fricative amplitude control device 74 which is subject to external control, the fricative band pass filter 76 which is subject to external control, and the fricative low pass filter 78 which is also subject to external control.
  • the amplitude controlled and filtered fricative forcing function is injected into both the F2 and F5 filters 52 and 58 in the vocal tract filter bank as shown.
  • the block diagram of FIG. 2 also contains a portion of a representative control signal generating channel, in this case the channel which generates the closure control signal applied to gate 60 in the vocal tract filter bank.
  • the control signal channel comprises the readonly memory 14 which receives the digital input signal and which produces analog output signals on the various output lines thereof.
  • the output line 80 of interest is connected through a buffer amplifier 82 to establish precision voltage limits on the duty cycle signal and is thereafter applied to the closure delay generator unit 50' as previously described.
  • the output is applied through the slow acting tunable filter 24b and thereafter through the duty cycle converter 26'. From there the duty cycle signal is applied directly to the closure gate 60 for control over the closure function.
  • the term duty cycle signal refers to a fixed frequency pulse train which varies between two relatively fixed amplitude levels with varying pulse widths.
  • the read-only memory unit 14 is shown divided into decoder and output matrix sections 84 and 86, respectively.
  • the decoder unit 84 receives the phoneme address on the six input address lines having the binary weighted address values shown in the drawings. To select phoneme number l3, high input signal values are applied to the eight, four", and one" input signal lines while low signal values are applied to the remaining lines. Other addresses are similarly selected.
  • the input signal polarity convention may, of course, be reversed depending upon the specific circuitry employed.
  • timing signals on lines 23 and 25 are applied to the decoder 84 as further address constituents, the MSB" signal on line 23 having the timing characteristic illustrated, and the LSB" signal on line 25 having the modified timing characteristic also illustrated. More specifically, both the MSB and LSB signals have 15 clock time periods broken into eight, four, two, and one clock time segments. The MSB signal has a high value for the first two segments and a low value for the last two segments whereas the LSB signal has a high value for the first and third seg ments and a low value for the second and fourth segments. These nonvarying timing signals are applied to the decoder section 84 of memory 14 for all input signal combinations and operate to complete the address inputs to the decoder 84.
  • the combination of signals applied to the six binary weighted input lines select groups of four output bits for each of the eight output lines from the matrix section 84 of read-only memory 14.
  • the time distribution or order of the four selected bits for each parameter is controlled by the time varying relationship between the MSB and LSB signals and functions to distribute the bits in an eight-four-two-one clock time sequence on each of the parameter output lines of which line 81 is the selected example.
  • the first bit selected by the six bit address appears for eight clock times
  • the second bit selected appears for four clock times
  • the third bit selected appears for two clock times
  • the last bit selected appears for one clock time.
  • each analog signal value is spaced from the adjacent analog signal values by approximately one-third volt in amplitude, an easily detected amplitude variation for control purposes.
  • Output line 81 is connected to the input of the buffer amplifier 82 which, as shown in FIG. 3, has the upper limit input pin connected to a precision 5 volt source and the lower limit input pin connected to ground.
  • the duty cycle signal which is input to the amplifier 82 is reproduced at the output but between precisely defined voltage limits of 5 volts and 0 volts so as to insure accuracy in the average value of the duty cycle signal.
  • the advantages of the duty cycle signal conversion which is employed in the present embodiment of the invention are substantial in that it results in the generation of four bits for each phoneme parameter in series yet at the same time requires only two read-only memory units to generate all 64 bits.
  • the particular serialized generation of the 16 fourbit groups requires no latch devices to hold the four bits for simultaneous application to a digital-toanalog converter.
  • the approach of the present invention eliminates the need for latches as well as separate digital-to-analog conversion devices such as resistor ladder networks.
  • an alternative approach would be to employ a sufficient number of read-only memory units to generate all 64 bits at once in parallel but the economic as well as spatial requirements of this approach limit practicality as will be apparent to those skilled in the art.
  • the duty cycle signal comprising the binary weighted distribution of four bits on the parameter output line 81 is connected from the buffer amplifier 2 to a tunable slow-acting filter 24" which forms part of one of the filter banks either 240 or 24b.
  • a delay device may be employed in the connection.
  • the filter comprises an analog gate 87 having the primary terminals connected to pass the duty cycle signal therebetween, a resistor 88, a second resistor 90, a second analog gate 92, and a shunt capacitor 94 connected to the positive input of an operational amplifier 96.
  • the output of the amplifier 96 is connected through feedback path 98 back to the negative input of the amplifier and also through a capacitor 100 to the junction between the resistors 88 and 90.
  • the transition rate signal from unit 40, chopped at the 20 KHz rate is applied to the control terminals of the two analog gates 87 and 92.
  • the control signal applied to the analog gates 87 and 92 operates to render the gate conductive and non-conductive at a very high frequency relative to the highest frequency component in the input duty cycle signal and varies in its own duty cycle in accordance with the desired transition rate.
  • the average on-off time ratio of the gates 86 and 90 is varied in direct proportion to the transition rate signal. This have the effect of varying the apparent resistance of the tunable slow-acting filter 24" in accordance with the transition rate signal so that the response time of the filter tracks with the desired speech rate; i.e., it will be recalled from the description of FIG. 1 that the setting of the rate control unit effects the transition rate control unit 44 which in turn controls the duty cycle or pulse width of the chopped signal applied to the control terminals of the gates 86 and 92.
  • the output of the amplifier 96 is a dc voltage the amplitude of which varies with the average value of the duty cycle signal which is input to the filter 24".
  • the signal input to filter 24" is changing and thus the output dc level is substantially continuously varying as well.
  • the output of amplifier 96 is preferably connected through a glitch" filter comprising a series resistor 102 and a shunt capacitor 104 to get rid of spurious signals.
  • the output of the fiitch filter is connected to the comparator amplifier 106 which forms part of the unit 26 illustrated in FIG. 1.
  • This unit comprises a comparator amplifier having the positive input pin connected to receive the varying dc voltage level and the negative pin connected to a 20 KHZ sawtooth voltage wave which varies between 0 and 5 volts. Again, it will be observed that the 20 KHZ signal operates to phase synchronize all duty cycle signals in the system.
  • the output of comparator amplifier 106 is a fixed frequency pulse train wherein the pulse widths vary in accordance with the portion of the 20 KHz sawtooth which exceeds the dc voltage level applied to the positive input terminal of the comparator amplifier 106. This is a function of the amplitude of the dc signal.
  • Such means to convert from dc levels to duty cycle signals are well known to those skilled in the art and will not be described in greater detail herein.
  • the duty cycle signal output from ROM 14 is converted to a dc voltage level and then back to a duty cycle signal only to facilitate the filtering function at unit 24. If satisfactory filtering can be accomplished by operating on the duty cycle signal di rectly, such conversion may be eliminated.
  • the reconverted duty cycle signal which represents the phoneme control signal is then applied directly to the device in the vocal tract model which is to be controlled to produce the desired contribution to the particular selected phoneme.
  • the signal might, for example, be the F1 signal which is applied to control the position of the frequency pole of the filter it could be any one of the other 15 control signal quantities which are generated except, of course, the transition rate signal, the timing signal, the vocal delay signal, or the clo sure delay signal, none of which are connected directly through tunable slow-acting filters.
  • FIG. 4 discloses the saw-tooth shaped phonetic timer ramp signal which is output from the phonetic timer 46 in the circuit of FIG. 1 and which controls phoneme duration as previously described.
  • FIG. 4 also shows the typical relationship between a fricative to vocal transition wherein the amplitude of the fricative forcing function is shown to drop sharply at exactly the same time as the amplitude of the vocal forcing function risesfThe reverse is true at the end of the second phoneme time interval.
  • the fricative forcing function is delayed such that the amplitude rise occurs later in the phoneme time as shown in FIG. 4.
  • FIGS. 4 and 5 will be described as representative of either or both.
  • the forcing function or control parameter from the ROM is shown applied to the input of a switch 120 while the closure delay pedestal derived from the comparison between the phoneme timer ramp and the closure delay signal as previously described with reference to FIG. 4 is applied to the control terminal of the switch 120.
  • the portion of the forcing function, e.g., fricative amplitude, which is passed during the on time of the switch 120 is stored in a capacitor 122 and applied to the positive input of an operational amplifier I24 having feedback path 126.
  • the output of the operational amplifier is a signal related to the input control parameter but delayed by a sufficient interval as to properly synchronize the excitation events with the filtering events in the output of the synthesizer. It will be understood that a variety of approaches to this delay function can be employed and that the delay concept in general is one of several techniques described herein which may be applied in various combinations for the purpose of accomplishing more realistic speech in the synthesizer process. The vocal forcing function is similarly delayed.
  • FIG. 6 is a schematic circuit diagram of an improved noise source 34 as shown in FIGS. 1 and 2.
  • the circuit of FIGv 6 is capable of generating a psuedo-random mixture of frequencies having excellent spectrum and amplitude characteristics for use in voice synthesis.
  • the circuit comprises an 1 8 bit shift register I30 having a 20 KHz "clocl-z input 134. Taps nos. 4 and 5 are connected to opposite inputs of exclusive-OR gate 136 while taps l and 18 are connected to the inputs of similar gate 138. The outputs of gates 136 and 138 are ORd through gate 140. The output of gate 140 is exclusively ORd through gate 142 with the 1.33 KHz input waveform on line 144. The register output may be taken from any tap, and is shown taken from tap no. 14. The output is a psuedo-random function, the period of recurrence for which is long enough to look like purely random noise. The use of the 1.33 Khz signal is effective to avoid a lockup" condition wherein all of the gate outputs and register inputs are the same and the sequence fails to progress.
  • a vocal tract model comprising tunable filters and amplitude control devices responsive to input signal quantities to determine the frequency and amplitude parameters of phonemes to be output therefrom;
  • input means for specifying selected phonemes to be output from said vocal tract model and said parameters thereof in a predetermined order
  • control signal forming means connected between said input means and said tunable filters and amplitude control devices of the vocal tract model for producing a plurality of variable duty cycle waveforms as input signals for controlling the frequency and amplitude characteristics of the selected phonemes.
  • said input means comprises means for specifying a phoneme code having a plurality of bits
  • said signal forming means between the input means and vocal tract model including means for cyclically outputting said bits in series according to a predetermined timing sequence having a numerically progressive order whereby said bits define said duty cycle waveforms, the average value of which varies according to the selected bits of said code.
  • said signal forming means comprises filter means for producing a dc signal related to the average value of the duty cycle waveforms.
  • Apparatus as defined in claim 3 further including means for converting said dc signal to a second variable duty cycle signal of fixed frequency and amplitude for application to said vocal tract model.
  • Apparatus as defined in claim 1 including buffer means connected in said signal forming means for establishing precision voltage levels for said variable duty cycle signals.
  • said input means includes timing control means for selecting a relative time interval for each specified phoneme, and timing control means for varying the overall rate of phonetic production while preserving the relative timing between said phonemes.
  • said signal forming means comprises filter means for producing a dc signal related to the average value of the duty cycle waveforms.
  • timing control means includes means for varying the response time of said filter to track with said overall rate of phonetic production.
  • said signal forming means comprises at least one storage facility having input address means and output control means some but not all of the input means being connected to receive phoneme address signals, said facility having stored therein a plurality of phoneme control parameters which are applied to said output means according to the address signals, each phoneme address signal combination applied to said some of the input means being efi'ective to select a plurality of parameter control signals for application to an output.
  • timing signal means connected to the remaining input means for applying thereto timing signals for applying the selected control signals to the output individually and in a sequence having a binary-weighted time order such that the average value of the total sequence is a function of the particular address signal combination.
  • control signals on said output comprise an electrical wave-form which varies between two relatively fixed amplitude levels.
  • Apparatus as defined in claim 12 further including buffer means connected to receive the control sig' nal sequence for reproducing said sequence but with a close-tolerance on said amplitude levels.
  • Apparatus as defined in claim 13 including a filter connected to receive the control signal sequence and to output a dc signal related to the average value thereof, the filter having a response time which is slow relative to the rate at which the average value varies from one selected value to another.
  • Apparatus as defined in claim 14 further including timing means for controlling the rate of phoneme selection, said means being connected to the filter for varying said response time to track with the rate selected.
  • the filter comprises gate means connected to transmit the parameter signal value, and a control terminal connected to receive a high frequency signal having a variable duty cycle thereby to control the ON-time of the gate means.
  • Apparatus as defined in claim 16 further including means connected to said filter for converting the dc signal to a variable duty cycle signal for application to the vocal model.
  • Apparatus as defined in claim 1 including as part of said vocal tract model a source of voiced phonetic excitation quantity and a source of unvoiced phonetic excitation quantity, and means responsive to the addressing of a consecutive phonetic sequence of voiced and unvoiced phonetic quantities for producing a predetermined delay in the excitation of at least one of said quantities.

Abstract

A voice synthesizer of the type set forth in U.S. Pat. No. 3,836,717 wherein the control signals applied to the devices in the vocal track model take the form of variable pulse width ''''duty cycle'''' waveforms. A novel system for producing the duty cycle signals is disclosed. Variable speech rate is provided.

Description

United States Patent (191 [111 3,908,085
Gagnon 1 Sept. 23, 1975 4] VOICE SYNTHESIZER [76] Inventor: Richard T. Gagnon, 307 Primary Stewart Wadsworth Birmingham! Mich Attorney, Agent, or Firm-Thomas N. Young 48010 [57] ABSTRACT [22] Filed: July 1974 A voice synthesizer of the type set forth in US. Pat. [2!] Appl. No.: 486,506 No. 3,836.7]7 wherein the control signals applied to the devices in the vocal tract model take the form of variable pulse width duty cycle" waveforms. A novel 179/ JJ'Z system for producing the duty cycle signals is dis closed. Variable speech rate is provided.
[58] Field of Search ..l79/lSA,lSG, l SM,I5.55R.
179/1555 T [8 Claims, 6 Drawing Figures US Patent COCAL DELAY PARAMETER PHONEME TIMER RAMP Sept. 23,1975 Sheet 4 of 4 PEDESTAL VOCAL AMPLITUDE FRICATIVE AMPLITUDE FRICATIVE AMPLITUDE DELAYED Fig 4 /Z6/ 20 IL If TO FILTER [Fig-5 PARAMETER 1M CONTROL l /z FROM ROM CLOSURE PEOEsTAL @Z a OuTPuT CLOCK OATA A 1 2 A 5 1O 14 18 H 6 VOICE SYNTHESIZER INTRODUCTION This invention relates to voice synthesizers and particularly to improvements in voice synthesizers of the type disclosed in my copending application for US. Pat. Ser. No. 274,029, filed July 21, 1972 now US. Pat. No. 3,836,7l7, issued Sept. 17, 1974.
BACKGROUND OF THE INVENTION The synthesis of human speech by way of readily programmable electronic means presents a vast spectrum of opportunities for application in the field of information transfer of communications. One essential requirement for synthesized speech in any but novelty applications is intelligibility. I have found that one of the critical factors in producing intelligible speech involves the generation of proper dynamic transitions from one phoneme to the next. I have found that in ordinary intelligible human speech the transitions are at least equal in significance to the steady state vocal tractphonetic conditions since few if any phonemes achieve a steady-state condition for any appreciable time interval. Thus, the quality of the transitions contributes to intelligence as well as realism.
Other prior art practitioners have also apparently recognized the importance of interphoneme transitions. In one prior art system, elaborate electronic means are provided for producing piecewise linear approximations of the phoneme transition waveforms of synthesized speech. At least one other system involves a highly complex electronics system for making a running comparison between adjacent phonemes so that specific and predefined phoneme interaction waveforms may be generated in response to coded representations of the various phoneme sequences which occur in a given speech pattern.
I have found that it is not necessary to resort to elaborate and complex electronics to produce satisfactory interphoneme transitions. I have greatly simplified the prior art systems which, at the same time, I have produced superior interphoneme transitions by generating a succession of analog control signals defining steadystate phoneme parameters and by passing these signals through slow-acting filters which prevent the steadystate values from being achieved. As is more fully set forth in my copending application for patent Ser. No. 274,029, I employ a vocal tract model comprising tunable resonant filters and amplitude control devices responsive to analog control signals for defining the vari ous constituents of each phoneme in speech pattern. I generate the control signals first in digital form and I convert the signals to dc voltage levels in my illustrated embodiment by means of digital to analog converters. Thereafter, 1 pass the analog outputs through relatively slow acting filters; i.e., filters whose transfer characteristics would, in sufficient time, duplicate the end parameters of the control signal but which, within the ordinary phoneme duration intervals, will prevent the analog control signals as applied to the vocal tract model from reaching the steady-state or target values. In this fashion, the sluggish acting filters tend to produce smooth, continuous glides between phonemes in much the same manner as the human vocal tract produces smooth transitions between phonemes in spoken speech.
BRIEF SUMMARY OF THE INVENTION I have now found that it is possible to further improve upon the electronics system in speech synthesizers of the type set forth in my copending patent application Ser. No. 274,029 particularly with respect to the generation of the analog control signals from digital input quantities. In general, I accomplish this by providing means responsive to digital phoneme parameter specification signals for generating variable duty cycle wave forms, the average values of which are the analog equivalent of the digital signals. In my preferred embodiment hereinafter described in detail, the variable duty cycle signals are generated by serializing digital signal quantities using a binary progression of weighted time intervals. Thereafter, I filter the serialized variable duty cycle waveforms to produce an analog signal for application to. the analog control devices in the vocal tract model.
I have also found that it is possible to provide the facility for variable speech rates in a synthesizer of the type set forth in my copending application for patent Ser. No. 274,029 while at the same time preserving all of the advantages of the simple filters for producing the interphoneme transition. In general, I accomplish this by variable timing means for varying the phoneme intervals and thereby the speech rate and further by tuning the slow acting filters such that the response times track with the varying speech rates; i.e., the response times are made shorter for higher speech rates and proportionately shorter phoneme intervals, but the ratio of phoneme interval to response time remains about the same for a given phoneme. In the specific embodiment hereinafter described I accomplish this through the use of analog gates in series with the resistive components of the slow acting filters and by applying a variable duty cycle, high-frequency chopping signal to the analog gate thereby simulating a varying electrical parameter; i.e., resistance. In this fashion, I avoid the problem which might otherwise occur in my prior system if the actual speech rate were set either too low or too high in relation to the filter response time.
I have also discovered other improvements which might be made to my prior voice synthesizer including the use of simple analog gates in the vocal tract model so as to permit control by means of a smoothly varying duty cycle signal, and the delay of certain parameters such as voice constituent amplitude, fricative closures and certain other parameters thereby to produce still more realistic speech. I also disclose herein the use of a highly simplified noise source for generating unvoiced phoneme constituents. These improvements may, of course, be employed in various combinations with or without the other aspects of this invention as described herein. The various features and advantages of the invention will be best understood from a reading of the following specification.
BRIEF DESCRIPTION OF THE DRAWING FIG. 1 is a block diagram of a voice synthesizer system employing the present invention;
FIG. 2 is a more detailed block diagram of a portion of the system showing the specific element of the vocal tract model;
FIG. 3 is a partial circuit diagram of representative portion of a single control signal generating channel;
FIG. 4 is a timing diagram of wave forms relating to the circuits of FIGS. 1 through 3',
FIG. 5 is a circuit diagram of a delay system for producing the effects shown in FIG. 4; and
FIG. 6 is a circuit diagram of a noise source.
DETAILED DESCRIPTION OF THE SPECIFIC EMBODIMENT Looking now to FIG. 1, the major portion of a system block diagram is represented. This system, like the system of my copending application Ser. No. 274,029 may be operated by using any of several types of input means including business machines, computer, or phonetic key boards capable of generating a sequence of digital signals representing the various phonemes to be selected thereby to make up a given speech pattern. Since such input means are described in the prior art as well as in my copending application, I have omitted a specific description from this text. The phoneme selection signals are transmitted by way of the six input lines 10 to solid state read- only memories 12 and 14, each having six input address lines and eight output lines. It will be understood that I use two separate memories only because my specific input/output requirements were not met by a currently available device, there is no technical reason why a single unit would not suffice if available. The memories 12 and 14 respond to the phoneme selection addresses to generate phoneme parameter control signals of an analog character on the output lines 16. As hereinafter described in greater detail, the signals appearing on output lines 16 are sequences of serialized digital signal quantities wherein the various bits in the repeating series are time-weighted according to a binary progression. Thus, the output signals have an average value which is the analog equivalent of the digital input quantities addressing those outputs. Output lines 160 individually control eight phoneme parameters including output spectral frequencies, input forcing function frequencies, nasal closure, and transition rate. The eight output lines 16b control eight more phoneme parameters including timing, amplitude, delay, spectral contour, clo sure, and band width. A total of 16 phoneme parameters, each capable of 16 different values, are employed in the system illustrated herein to control the phonetic output.
The analog control quantities on output lines 16a and 16b are connected through the filters 24a and 24b, respectively, the response times of filters 24a being tuned so as to be long relative to the typical phoneme intervals which are selected for a given speech rate. Filters 24b also produce a damped response to step inputs but to a lesser degree than filters 24a. The nominal setting of the filte rs 24a and 24b is such that the response times thereof bear some predetermined relationship to the phoneme times peculiar to a given speech rate, the result being that the frequency parameters output from ROM I2 on lines 1611 are seldom, if ever, realized during the phoneme interval to which they directly relate. Similarly, the step inputs to filters 2412 are smoothed by the filter response characteristics. Therefore. the filters 24 produce the phoneme control signal smoothing which is described in my copending application for all speech rates, again, I emphasize that this is most important for the vocal tract frequency control signals F1, F2, and F3. The major difference here lies in the fact that these filters 24 are tunable for varying speech rates. As in the system of my copending application, the transfer characteristics of filters 24 are such as to eventually produce an output which approximates the input, given enough time to respond; i.e., the output eventually reaches a steady state amplitude level related to the amplitude of the signal at the input.
The seven output signals from filters 24a and the five output signals from filters 24b are connected through the duty cycle converters 26 which convert the smooth slowly varying analog dc levels into fixed-frequency pulse trains wherein the duration or width of the pulses varies according to the input dc levels. This duty cycle or pulse width modulated" signals are then applied to the various devices in the vocal tract model 28 to produce the audio speech output on line 29. The lower five parameter control signals relating to amplitude, spectral contour, fricative frequency and spectral shape are applied to the filter bank in the vocal tract 28 through an excitation processor 30 comprising analog control devices for the control of the quantities indicated.
The two basic constituents of synthesized speech are the vocal and fricative forcing functions which are provided sources 32 and 34. respectively. Source 32 is a source of audio signals which are used to produce the voiced phoneme constituents. Source 32 is connected to the vocal tract filter bank 28 through the excitation processor 30. The fricative excitation source 34 is a noise source hereinafter described in greater detail and is also connected to the vocal tract filter bank 28 through the excitation processor 30.
The duty cycle outputs on lines 16 are generated by means of basic timing apparatus including a 20 KHz clock source 18 having output line 20 connected to the duty cycle conversion unit 22 and having output lines 23 and 25 connected to the readonly memories 12 and 14. As will be hereinafter described in greater detail, the signals on lines 23 and 25 form part of the phoneme addresses and operate to serialize four selected bits of stored data onto each of the parameter output lines I6 in a binary progression wherein the first bit is assigned eight clock times, the second bit is assigned four clock times, the third bit is assigned two clock times, and the last and final bit is assigned a single clock time. More details on the manner in which this specific conversion is accomplished will be described with reference to FIG. 3.
lnflection signals are input from the programming means by way of lines 36 and connected to the inflection filter 38, the output of which is connected to the vocal excitation source 32 to vary the frequency or pitch of the vocal excitation source output. This produces inflection variations in the audio output on line As previously described, a feature of the present system is the ability to produce speech at varying rates from relatively fast to relatively slow rates without loss of intelligibility. The basic rate control signal is controlled by unit 40 having a manual tuning dial 42. The speech rate signal is a duty cycle signal; i.e.. a time varying wave form of fixed frequency (20 KHz) but variable pulse width and is connected to the slow acting filter bank 24b, the inflection filter 38 and a transition rate control unit 44 and a phoneme timer unit 46 which produces an output ramp varying from 5 volts to 0 volts in a period or interval which varies with phoneme interval timing. It will be noted that the relative timing parameter from read-only memory 14 is one of the 16 parameters selected by the stored data in the memories and is applied to the phoneme time 46 to establish the slope of the output ramp from the timer 46. Thus, phoneme timing varies not only with speech rate on an across-the-board basis, but also from phoneme-tophoneme at a give speech rate. The ramp output from timer 46 is connected to a vocal delay generator 48 which functions to delay the vocal amplitude control parameter and to a closure delay generator 50 which controls a time delay of the vocal spectral contour, closure, bandwidth, and fricative amplitude control parameters, all of which are independent of the vocal tract resonant functions controlled by the memory 12. The output rate of control unit 40 is also connected to the transition rate control so that the transition rates are taken into account in controlling the response times of the filters 24a whereby those response times track with the phoneme intervals.
Looking now the block diagram of FIG. 2, the details of the vocal tract model 28 will be shown with greater specificity. The vocal tract filter bank comprises cascaded resonant filters 50, 52, 54, 56, and 58 which produce the frequency poles in the output spectrum of any given phoneme. Each of the filters may be implemented in the form of a two pole filter as is more fully set forth in my copending application Ser. No. 274,029. It will be noted, however, that in the present embodiment of my invention the filters are connected in series rather than in parallel as in my previous embodiment. l find that this produces a certain advantage with respect to energy distribution between the poles and creates a more realistic speech output. It is to be under stood, however, that I do not intend the present invention to be limited to any particular arrangement of res onant filters but rather that l presently find the eascaded arrangement of filters to produce superior results. Filters 50, 52, and 54 are tunable so as to vary the positions of the poles in the output phoneme spectrum, whereas filters 56 and 58 are fixed pole filters. The cas caded arrangement of filters is connected through a closure gate 60 which is subject to a control signal, and a KHz filter 62 which filters out the control signal carrier which might otherwise appear in the output waveform. Again the output signal appears on line 29 and corresponds with FIG. 1.
The entire vocal tract model, of course, comprises the vocal oscillator 32, the noise source 34 and the respective control channels for the forcing functions. Vocal oscillator 32 is connected through a filter 64 and a spectral contour filter 66 which is subject to tuning by an externally derived control; i.e. the vocal spectral contour control signal produced on the fourth output line of read-only memory 14. The vocal oscillator signal is also connected to a vocal constituent amplitude control unit 68 which is an analog gate as hereinafter described in greater detail. The unit 68 is also subject to the externally derived control signal; i.e., the second output signal of read-only memory 14. Finally, the sig nal is passed through a nasal resonance filter 70 which is subject to two control signals, the nasal closure" and the nasal frequency" signals which are derived on the fourth and fifth output lines of read-only memory 12. The output of the nasal resonance filter is connected by way of line 72 to the input of the vocal tract filter bank; i.e., at the input of filter 50 as shown in FIG. 2.
The fricative noise source 34 is connected through the fricative amplitude control device 74 which is subject to external control, the fricative band pass filter 76 which is subject to external control, and the fricative low pass filter 78 which is also subject to external control. The amplitude controlled and filtered fricative forcing function is injected into both the F2 and F5 filters 52 and 58 in the vocal tract filter bank as shown.
The block diagram of FIG. 2 also contains a portion of a representative control signal generating channel, in this case the channel which generates the closure control signal applied to gate 60 in the vocal tract filter bank. The control signal channel comprises the readonly memory 14 which receives the digital input signal and which produces analog output signals on the various output lines thereof. The output line 80 of interest is connected through a buffer amplifier 82 to establish precision voltage limits on the duty cycle signal and is thereafter applied to the closure delay generator unit 50' as previously described. The output is applied through the slow acting tunable filter 24b and thereafter through the duty cycle converter 26'. From there the duty cycle signal is applied directly to the closure gate 60 for control over the closure function. It is to be understood that the term duty cycle signal" as used therein refers to a fixed frequency pulse train which varies between two relatively fixed amplitude levels with varying pulse widths.
Looking now to FIG. 3, another representative control signal channel will be described in still greater de tail. In FIG. 3, the read-only memory unit 14 is shown divided into decoder and output matrix sections 84 and 86, respectively. The decoder unit 84 receives the phoneme address on the six input address lines having the binary weighted address values shown in the drawings. To select phoneme number l3, high input signal values are applied to the eight, four", and one" input signal lines while low signal values are applied to the remaining lines. Other addresses are similarly selected. The input signal polarity convention may, of course, be reversed depending upon the specific circuitry employed. In addition, the timing signals on lines 23 and 25 are applied to the decoder 84 as further address constituents, the MSB" signal on line 23 having the timing characteristic illustrated, and the LSB" signal on line 25 having the modified timing characteristic also illustrated. More specifically, both the MSB and LSB signals have 15 clock time periods broken into eight, four, two, and one clock time segments. The MSB signal has a high value for the first two segments and a low value for the last two segments whereas the LSB signal has a high value for the first and third seg ments and a low value for the second and fourth segments. These nonvarying timing signals are applied to the decoder section 84 of memory 14 for all input signal combinations and operate to complete the address inputs to the decoder 84.
Specifically, the combination of signals applied to the six binary weighted input lines select groups of four output bits for each of the eight output lines from the matrix section 84 of read-only memory 14. The time distribution or order of the four selected bits for each parameter is controlled by the time varying relationship between the MSB and LSB signals and functions to distribute the bits in an eight-four-two-one clock time sequence on each of the parameter output lines of which line 81 is the selected example. In other words, the first bit selected by the six bit address appears for eight clock times, the second bit selected appears for four clock times, the third bit selected appears for two clock times, and the last bit selected appears for one clock time. This has the effect of producing a serialized and binary-weighted duty cycle signal on the output lin 81, the average value of which varies between and as increments of the maximum output voltage values; i.e., 5 volts. Thus, each analog signal value is spaced from the adjacent analog signal values by approximately one-third volt in amplitude, an easily detected amplitude variation for control purposes.
Output line 81 is connected to the input of the buffer amplifier 82 which, as shown in FIG. 3, has the upper limit input pin connected to a precision 5 volt source and the lower limit input pin connected to ground. Thus, the duty cycle signal which is input to the amplifier 82 is reproduced at the output but between precisely defined voltage limits of 5 volts and 0 volts so as to insure accuracy in the average value of the duty cycle signal. The advantages of the duty cycle signal conversion which is employed in the present embodiment of the invention are substantial in that it results in the generation of four bits for each phoneme parameter in series yet at the same time requires only two read-only memory units to generate all 64 bits. Moreover, the particular serialized generation of the 16 fourbit groups requires no latch devices to hold the four bits for simultaneous application to a digital-toanalog converter. In other words, the approach of the present invention eliminates the need for latches as well as separate digital-to-analog conversion devices such as resistor ladder networks. Of course, an alternative approach would be to employ a sufficient number of read-only memory units to generate all 64 bits at once in parallel but the economic as well as spatial requirements of this approach limit practicality as will be apparent to those skilled in the art.
The duty cycle signal comprising the binary weighted distribution of four bits on the parameter output line 81 is connected from the buffer amplifier 2 to a tunable slow-acting filter 24" which forms part of one of the filter banks either 240 or 24b. A delay device may be employed in the connection. The filter comprises an analog gate 87 having the primary terminals connected to pass the duty cycle signal therebetween, a resistor 88, a second resistor 90, a second analog gate 92, and a shunt capacitor 94 connected to the positive input of an operational amplifier 96. The output of the amplifier 96 is connected through feedback path 98 back to the negative input of the amplifier and also through a capacitor 100 to the junction between the resistors 88 and 90. The transition rate signal from unit 40, chopped at the 20 KHz rate is applied to the control terminals of the two analog gates 87 and 92. The control signal applied to the analog gates 87 and 92 operates to render the gate conductive and non-conductive at a very high frequency relative to the highest frequency component in the input duty cycle signal and varies in its own duty cycle in accordance with the desired transition rate. Thus, the average on-off time ratio of the gates 86 and 90 is varied in direct proportion to the transition rate signal. This have the effect of varying the apparent resistance of the tunable slow-acting filter 24" in accordance with the transition rate signal so that the response time of the filter tracks with the desired speech rate; i.e., it will be recalled from the description of FIG. 1 that the setting of the rate control unit effects the transition rate control unit 44 which in turn controls the duty cycle or pulse width of the chopped signal applied to the control terminals of the gates 86 and 92.
Accordingly, the output of the amplifier 96 is a dc voltage the amplitude of which varies with the average value of the duty cycle signal which is input to the filter 24". Of course, the signal input to filter 24" is changing and thus the output dc level is substantially continuously varying as well. The output of amplifier 96 is preferably connected through a glitch" filter comprising a series resistor 102 and a shunt capacitor 104 to get rid of spurious signals. The output of the fiitch filter is connected to the comparator amplifier 106 which forms part of the unit 26 illustrated in FIG. 1. This unit comprises a comparator amplifier having the positive input pin connected to receive the varying dc voltage level and the negative pin connected to a 20 KHZ sawtooth voltage wave which varies between 0 and 5 volts. Again, it will be observed that the 20 KHZ signal operates to phase synchronize all duty cycle signals in the system. The output of comparator amplifier 106 is a fixed frequency pulse train wherein the pulse widths vary in accordance with the portion of the 20 KHz sawtooth which exceeds the dc voltage level applied to the positive input terminal of the comparator amplifier 106. This is a function of the amplitude of the dc signal. Such means to convert from dc levels to duty cycle signals are well known to those skilled in the art and will not be described in greater detail herein.
It is to be understood that the duty cycle signal output from ROM 14 is converted to a dc voltage level and then back to a duty cycle signal only to facilitate the filtering function at unit 24. If satisfactory filtering can be accomplished by operating on the duty cycle signal di rectly, such conversion may be eliminated.
The reconverted duty cycle signal which represents the phoneme control signal is then applied directly to the device in the vocal tract model which is to be controlled to produce the desired contribution to the particular selected phoneme. The signal might, for example, be the F1 signal which is applied to control the position of the frequency pole of the filter it could be any one of the other 15 control signal quantities which are generated except, of course, the transition rate signal, the timing signal, the vocal delay signal, or the clo sure delay signal, none of which are connected directly through tunable slow-acting filters. The advantages of the reconversion from dc to duty cycle is to eliminate the requirement for the complex analog multipliers which are disclosed in my copending patent application and to permit the use of simple analog gates to perform proportional control and effective signal multiplication on aduty cycle basis. The economies in terms of cost, complexities and spatial requirements of this approach will be apparent to those skilled in the electronics arts.
Another important feature of my voice synthesizer as disclosed herein is the delay of excitation functions so as to match the filtering events with respect to the timed relationship between adjacent phonemes or phonemes constituents of different types. For example, in a transition from vocal to fricative phonetic constituents such as one finds in the pronunciation of the letter s it is desirable to delay the excitation of the fricative portion so as to be spaced from the amplitude decaying vocal portion of the letter. The same may be true in reverse; i.e., in fricative to vocal transitions as well. FIG. 4 discloses the saw-tooth shaped phonetic timer ramp signal which is output from the phonetic timer 46 in the circuit of FIG. 1 and which controls phoneme duration as previously described. When compared to a dc voltage level representing the vocal delay command from ROM 14, it can be seen from the second line of FIG. 4 that a pedestal" can be generated. This pedestal signal is used in the circuit of FIG. 5 as hereinafter described.
FIG. 4 also shows the typical relationship between a fricative to vocal transition wherein the amplitude of the fricative forcing function is shown to drop sharply at exactly the same time as the amplitude of the vocal forcing function risesfThe reverse is true at the end of the second phoneme time interval. To utilize these excitation functions without modification would produce unrealistic speech as the forcing functions would not correlate properly in time with the control parameters which are derived through the slow acting filters as previously described. Accordingly, the fricative forcing function is delayed such that the amplitude rise occurs later in the phoneme time as shown in FIG. 4.
To accomplish the delay function, the closure delay generator 50 and the vocal delay generator 48 in FIG. 1 are employed. Since these are similar, if not identical, in implementation, FIGS. 4 and 5 will be described as representative of either or both. In FIG. 5 the forcing function or control parameter from the ROM is shown applied to the input of a switch 120 while the closure delay pedestal derived from the comparison between the phoneme timer ramp and the closure delay signal as previously described with reference to FIG. 4 is applied to the control terminal of the switch 120. The portion of the forcing function, e.g., fricative amplitude, which is passed during the on time of the switch 120 is stored in a capacitor 122 and applied to the positive input of an operational amplifier I24 having feedback path 126. The output of the operational amplifier is a signal related to the input control parameter but delayed by a sufficient interval as to properly synchronize the excitation events with the filtering events in the output of the synthesizer. It will be understood that a variety of approaches to this delay function can be employed and that the delay concept in general is one of several techniques described herein which may be applied in various combinations for the purpose of accomplishing more realistic speech in the synthesizer process. The vocal forcing function is similarly delayed.
FIG. 6 is a schematic circuit diagram of an improved noise source 34 as shown in FIGS. 1 and 2. The circuit of FIGv 6 is capable of generating a psuedo-random mixture of frequencies having excellent spectrum and amplitude characteristics for use in voice synthesis.
The circuit comprises an 1 8 bit shift register I30 having a 20 KHz "clocl-z input 134. Taps nos. 4 and 5 are connected to opposite inputs of exclusive-OR gate 136 while taps l and 18 are connected to the inputs of similar gate 138. The outputs of gates 136 and 138 are ORd through gate 140. The output of gate 140 is exclusively ORd through gate 142 with the 1.33 KHz input waveform on line 144. The register output may be taken from any tap, and is shown taken from tap no. 14. The output is a psuedo-random function, the period of recurrence for which is long enough to look like purely random noise. The use of the 1.33 Khz signal is effective to avoid a lockup" condition wherein all of the gate outputs and register inputs are the same and the sequence fails to progress.
It is to be understood that the invention has been described with reference to specific aspects of a specific embodiment and accordingly the foregoing description is not to be construed in a limiting sense.
What is claimed is:
1. In a voice synthesizer for phoneticly synthesizing human speech:
a vocal tract model comprising tunable filters and amplitude control devices responsive to input signal quantities to determine the frequency and amplitude parameters of phonemes to be output therefrom;
input means for specifying selected phonemes to be output from said vocal tract model and said parameters thereof in a predetermined order;
and control signal forming means connected between said input means and said tunable filters and amplitude control devices of the vocal tract model for producing a plurality of variable duty cycle waveforms as input signals for controlling the frequency and amplitude characteristics of the selected phonemes.
2. Apparatus as defined in claim 1 wherein said input means comprises means for specifying a phoneme code having a plurality of bits, said signal forming means between the input means and vocal tract model including means for cyclically outputting said bits in series according to a predetermined timing sequence having a numerically progressive order whereby said bits define said duty cycle waveforms, the average value of which varies according to the selected bits of said code.
3. Apparatus as defined in claim 1 wherein said signal forming means comprises filter means for producing a dc signal related to the average value of the duty cycle waveforms.
4. Apparatus as defined in claim 3 further including means for converting said dc signal to a second variable duty cycle signal of fixed frequency and amplitude for application to said vocal tract model.
5. Apparatus as defined in claim 1 including buffer means connected in said signal forming means for establishing precision voltage levels for said variable duty cycle signals.
6. Apparatus as defined in claim 1 wherein said input means includes timing control means for selecting a relative time interval for each specified phoneme, and timing control means for varying the overall rate of phonetic production while preserving the relative timing between said phonemes.
7. Apparatus as defined in claim 6 wherein said signal forming means comprises filter means for producing a dc signal related to the average value of the duty cycle waveforms.
8. Apparatus as defined in claim 7 wherein said filter has a response time which is slow relative to the phoneme repetition rate.
9. Apparatus as defined in claim 8 wherein said timing control means includes means for varying the response time of said filter to track with said overall rate of phonetic production.
10. Apparatus as defined in claim 1 wherein said signal forming means comprises at least one storage facility having input address means and output control means some but not all of the input means being connected to receive phoneme address signals, said facility having stored therein a plurality of phoneme control parameters which are applied to said output means according to the address signals, each phoneme address signal combination applied to said some of the input means being efi'ective to select a plurality of parameter control signals for application to an output. timing signal means connected to the remaining input means for applying thereto timing signals for applying the selected control signals to the output individually and in a sequence having a binary-weighted time order such that the average value of the total sequence is a function of the particular address signal combination.
1 1. Apparatus as defined in claim 10 wherein the sequence timing is in the order 8-4-2-1 such that the average value may occur in the range of to l5.
12. Apparatus as defined in claim 10 wherein the control signals on said output comprise an electrical wave-form which varies between two relatively fixed amplitude levels.
13. Apparatus as defined in claim 12 further including buffer means connected to receive the control sig' nal sequence for reproducing said sequence but with a close-tolerance on said amplitude levels.
14. Apparatus as defined in claim 13 including a filter connected to receive the control signal sequence and to output a dc signal related to the average value thereof, the filter having a response time which is slow relative to the rate at which the average value varies from one selected value to another.
15. Apparatus as defined in claim 14 further including timing means for controlling the rate of phoneme selection, said means being connected to the filter for varying said response time to track with the rate selected.
16. Apparatus as defined in claim 15 wherein the filter comprises gate means connected to transmit the parameter signal value, and a control terminal connected to receive a high frequency signal having a variable duty cycle thereby to control the ON-time of the gate means.
17. Apparatus as defined in claim 16 further including means connected to said filter for converting the dc signal to a variable duty cycle signal for application to the vocal model.
18. Apparatus as defined in claim 1 including as part of said vocal tract model a source of voiced phonetic excitation quantity and a source of unvoiced phonetic excitation quantity, and means responsive to the addressing of a consecutive phonetic sequence of voiced and unvoiced phonetic quantities for producing a predetermined delay in the excitation of at least one of said quantities.

Claims (18)

1. In a voice synthesizer for phoneticly synthesizing human speech: a vocal tract model comprising tunable filters and amplitude control devices responsive to input signal quantities to determine the frequency and amplitude parameters of phonemes to be output therefrom; input means for specifying selected phonemes to be output from said vocal tract model and said parameters thereof in a predetermined order; and control signal forming means connected between said input means and said tunable filters and amplitude control devices of the vocal tract model for producing a plurality of variable duty cycle waveforms as input signals for controlling the frequency and amplitude characteristics of the selected phonemes.
2. Apparatus as defined in claim 1 wherein said input means comprises means for specifying a phoneme code having a plurality of bits, said signal forming means between the input means and vocal tract model including means for cyclically outputting said bits in series according to a predetermined timing sequence having a numerically progressive order whereby said bits define said duty cycle waveforms, the average value of which varies according to the selected bits of said code.
3. Apparatus as defined in claim 1 wherein said signal forming means comprises filter means for producing a dc signal related to the average value of the duty cycle waveforms.
4. Apparatus as defined in claim 3 further including means for converting said dc signal to a second variable duty cycle signal of fixed frequency and amplitude for application to said vocal tract model.
5. Apparatus as defined in claim 1 including buffer means connected in said signal forming means for establishing precision voltage levels for said variable duty cycle signals.
6. Apparatus as defined in claim 1 wherein said input means includes timing control means for selecting a relative time interval for each specified phoneme, and timing control means for varying the overall rate of phonetic production while preserving the relative timing between said phonemes.
7. Apparatus as defined in claim 6 wherein said signal forming means comprises filter means for producing a dc signal related to the average value of the duty cycle waveforms.
8. Apparatus as defined in claim 7 wherein said filter has a response time which is slow relative to the phoneme repetition rate.
9. Apparatus as defined in claim 8 wherein said timing control means includes means for varying the response time of said filter to track with said overall rate of phonetic production.
10. Apparatus as defined in claim 1 wherein said signal forming means comprises at least one storage facility having input address means and output control means some but not all of the input means being connected to receive phoneme address signals, said facility having stored therein a plurality of phoneme control parameters which are applied to said output means according to thE address signals, each phoneme address signal combination applied to said some of the input means being effective to select a plurality of parameter control signals for application to an output, timing signal means connected to the remaining input means for applying thereto timing signals for applying the selected control signals to the output individually and in a sequence having a binary-weighted time order such that the average value of the total sequence is a function of the particular address signal combination.
11. Apparatus as defined in claim 10 wherein the sequence timing is in the order 8-4-2-1 such that the average value may occur in the range of 0 to 15.
12. Apparatus as defined in claim 10 wherein the control signals on said output comprise an electrical wave-form which varies between two relatively fixed amplitude levels.
13. Apparatus as defined in claim 12 further including buffer means connected to receive the control signal sequence for reproducing said sequence but with a close-tolerance on said amplitude levels.
14. Apparatus as defined in claim 13 including a filter connected to receive the control signal sequence and to output a dc signal related to the average value thereof, the filter having a response time which is slow relative to the rate at which the average value varies from one selected value to another.
15. Apparatus as defined in claim 14 further including timing means for controlling the rate of phoneme selection, said means being connected to the filter for varying said response time to track with the rate selected.
16. Apparatus as defined in claim 15 wherein the filter comprises gate means connected to transmit the parameter signal value, and a control terminal connected to receive a high frequency signal having a variable duty cycle thereby to control the ON-time of the gate means.
17. Apparatus as defined in claim 16 further including means connected to said filter for converting the dc signal to a variable duty cycle signal for application to the vocal model.
18. Apparatus as defined in claim 1 including as part of said vocal tract model a source of voiced phonetic excitation quantity and a source of unvoiced phonetic excitation quantity, and means responsive to the addressing of a consecutive phonetic sequence of voiced and unvoiced phonetic quantities for producing a predetermined delay in the excitation of at least one of said quantities.
US486506A 1974-07-08 1974-07-08 Voice synthesizer Expired - Lifetime US3908085A (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US486506A US3908085A (en) 1974-07-08 1974-07-08 Voice synthesizer
GB27832/75A GB1519004A (en) 1974-07-08 1975-07-02 Voice synthesizer
CA230,923A CA1070018A (en) 1974-07-08 1975-07-07 Voice synthesizer
FR7521291A FR2278127A1 (en) 1974-07-08 1975-07-07 VOCAL SIGNAL SYNTHESIS APPARATUS REPRESENTING THE HUMAN SPEECH
JP50083888A JPS5140007A (en) 1974-07-08 1975-07-08 Onseishinsesaiza
DE19752530380 DE2530380A1 (en) 1974-07-08 1975-07-08 VOICE SYNTHETIZER SYSTEM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US486506A US3908085A (en) 1974-07-08 1974-07-08 Voice synthesizer

Publications (1)

Publication Number Publication Date
US3908085A true US3908085A (en) 1975-09-23

Family

ID=23932158

Family Applications (1)

Application Number Title Priority Date Filing Date
US486506A Expired - Lifetime US3908085A (en) 1974-07-08 1974-07-08 Voice synthesizer

Country Status (6)

Country Link
US (1) US3908085A (en)
JP (1) JPS5140007A (en)
CA (1) CA1070018A (en)
DE (1) DE2530380A1 (en)
FR (1) FR2278127A1 (en)
GB (1) GB1519004A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4075424A (en) * 1975-12-19 1978-02-21 International Computers Limited Speech synthesizing apparatus
FR2362462A1 (en) * 1976-08-16 1978-03-17 Federal Screw Works SPEECH SYNTHESIZER
US4153068A (en) * 1976-11-08 1979-05-08 Hokushin Electric Works, Ltd. Pneumatic controller
US4163120A (en) * 1978-04-06 1979-07-31 Bell Telephone Laboratories, Incorporated Voice synthesizer
US4179584A (en) * 1977-02-28 1979-12-18 Sharp Kabushiki Kaisha Synthetic-speech calculators
US4209844A (en) * 1977-06-17 1980-06-24 Texas Instruments Incorporated Lattice filter for waveform or speech synthesis circuits using digital logic
US4301328A (en) * 1976-08-16 1981-11-17 Federal Screw Works Voice synthesizer
US4338490A (en) * 1979-03-30 1982-07-06 Sharp Kabushiki Kaisha Speech synthesis method and device
US4344148A (en) * 1977-06-17 1982-08-10 Texas Instruments Incorporated System using digital filter for waveform or speech synthesis
US4352162A (en) * 1979-06-25 1982-09-28 Matsushita Electric Industrial Co., Ltd. Digital filter
US4363050A (en) * 1980-07-28 1982-12-07 Rca Corporation Digitized audio record and playback system
EP0074444A1 (en) * 1980-03-05 1983-03-23 Jerome Hal Lemelson Rechargeable electric battery system
US4433210A (en) * 1980-06-04 1984-02-21 Federal Screw Works Integrated circuit phoneme-based speech synthesizer
US4470150A (en) * 1982-03-18 1984-09-04 Federal Screw Works Voice synthesizer with automatic pitch and speech rate modulation
US4589132A (en) * 1982-09-13 1986-05-13 Botbol Joseph M Emergency synthesized voice generator method and apparatus
US5463715A (en) * 1992-12-30 1995-10-31 Innovation Technologies Method and apparatus for speech generation from phonetic codes
US5748838A (en) * 1991-09-24 1998-05-05 Sensimetrics Corporation Method of speech representation and synthesis using a set of high level constrained parameters
US6317713B1 (en) * 1996-03-25 2001-11-13 Arcadia, Inc. Speech synthesis based on cricothyroid and cricoid modeling

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4210781A (en) * 1977-12-16 1980-07-01 Sanyo Electric Co., Ltd. Sound synthesizing apparatus

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4092495A (en) * 1975-12-19 1978-05-30 International Computers Limited Speech synthesizing apparatus
US4075424A (en) * 1975-12-19 1978-02-21 International Computers Limited Speech synthesizing apparatus
FR2362462A1 (en) * 1976-08-16 1978-03-17 Federal Screw Works SPEECH SYNTHESIZER
US4128737A (en) * 1976-08-16 1978-12-05 Federal Screw Works Voice synthesizer
US4301328A (en) * 1976-08-16 1981-11-17 Federal Screw Works Voice synthesizer
US4153068A (en) * 1976-11-08 1979-05-08 Hokushin Electric Works, Ltd. Pneumatic controller
US4179584A (en) * 1977-02-28 1979-12-18 Sharp Kabushiki Kaisha Synthetic-speech calculators
US4344148A (en) * 1977-06-17 1982-08-10 Texas Instruments Incorporated System using digital filter for waveform or speech synthesis
US4209844A (en) * 1977-06-17 1980-06-24 Texas Instruments Incorporated Lattice filter for waveform or speech synthesis circuits using digital logic
US4163120A (en) * 1978-04-06 1979-07-31 Bell Telephone Laboratories, Incorporated Voice synthesizer
WO1979000892A1 (en) * 1978-04-06 1979-11-15 Western Electric Co Voice synthesizer
US4338490A (en) * 1979-03-30 1982-07-06 Sharp Kabushiki Kaisha Speech synthesis method and device
US4352162A (en) * 1979-06-25 1982-09-28 Matsushita Electric Industrial Co., Ltd. Digital filter
EP0074444A1 (en) * 1980-03-05 1983-03-23 Jerome Hal Lemelson Rechargeable electric battery system
US4433210A (en) * 1980-06-04 1984-02-21 Federal Screw Works Integrated circuit phoneme-based speech synthesizer
US4363050A (en) * 1980-07-28 1982-12-07 Rca Corporation Digitized audio record and playback system
US4470150A (en) * 1982-03-18 1984-09-04 Federal Screw Works Voice synthesizer with automatic pitch and speech rate modulation
US4589132A (en) * 1982-09-13 1986-05-13 Botbol Joseph M Emergency synthesized voice generator method and apparatus
US5748838A (en) * 1991-09-24 1998-05-05 Sensimetrics Corporation Method of speech representation and synthesis using a set of high level constrained parameters
US5463715A (en) * 1992-12-30 1995-10-31 Innovation Technologies Method and apparatus for speech generation from phonetic codes
US6317713B1 (en) * 1996-03-25 2001-11-13 Arcadia, Inc. Speech synthesis based on cricothyroid and cricoid modeling

Also Published As

Publication number Publication date
FR2278127B1 (en) 1980-07-11
JPS5140007A (en) 1976-04-03
CA1070018A (en) 1980-01-15
DE2530380A1 (en) 1976-01-22
GB1519004A (en) 1978-07-26
FR2278127A1 (en) 1976-02-06

Similar Documents

Publication Publication Date Title
US3908085A (en) Voice synthesizer
US4624012A (en) Method and apparatus for converting voice characteristics of synthesized speech
US4130730A (en) Voice synthesizer
US4058805A (en) Digital multitone generator for telephone dialing
US4128737A (en) Voice synthesizer
US4245336A (en) Electronic tone generator
US4283768A (en) Signal generator
US3836717A (en) Speech synthesizer responsive to a digital command input
US4470150A (en) Voice synthesizer with automatic pitch and speech rate modulation
US4301328A (en) Voice synthesizer
US4264783A (en) Digital speech synthesizer having an analog delay line vocal tract
US4656428A (en) Distorted waveform signal generator
US4351219A (en) Digital tone generation system utilizing fixed duration time functions
US4628787A (en) Sound source apparatus
US5163110A (en) Pitch control in artificial speech
US3319002A (en) Electronic formant speech synthesizer
US4177707A (en) Electronic music synthesizer
US4173915A (en) Programmable dynamic filter
USRE30991E (en) Voice synthesizer
EP0154888A2 (en) Tone signal generation device for an electronic musical instrument
US5426260A (en) Device and method for reading sound waveform data
US4392406A (en) Switched capacitor sine wave generator and keyer
SU1141591A1 (en) Television colour-musical synthesizer
US4730272A (en) Audio delay system
KR800001339B1 (en) Digital multitone generator for telephone dialing