US3624301A - Speech synthesizer utilizing stored phonemes - Google Patents
Speech synthesizer utilizing stored phonemes Download PDFInfo
- Publication number
- US3624301A US3624301A US24360A US3624301DA US3624301A US 3624301 A US3624301 A US 3624301A US 24360 A US24360 A US 24360A US 3624301D A US3624301D A US 3624301DA US 3624301 A US3624301 A US 3624301A
- Authority
- US
- United States
- Prior art keywords
- transistor
- circuit
- latch
- pulse
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 230000004044 response Effects 0.000 claims abstract description 10
- 230000005236 sound signal Effects 0.000 claims description 2
- 230000002194 synthesizing effect Effects 0.000 abstract description 2
- 108091006146 Channels Proteins 0.000 description 17
- 239000003990 capacitor Substances 0.000 description 11
- 238000010586 diagram Methods 0.000 description 11
- 230000008859 change Effects 0.000 description 6
- 230000009471 action Effects 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000007599 discharging Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 241000272470 Circus Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000001172 regenerating effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
Definitions
- Sound synthesizers and particularly speech synthesizers, have a variety of potential applications and uses.
- an operator can read a book or set of instructions and type the instructions on paper and/or punched tape.
- the punched tape may be coded so as to indicate sounds which in turn are indicative of the instructions.
- This tape could then be used with a sound synthesizer to produce speech or sound which indicates or conveys the information typed by the operator.
- a system of this type is illustrated in the patent to Gerstman et al. U.S. Pat. No. 3,158,685, dated Nov. 24, 1964.
- a photoelectric reader for bank checks could supply electrical signals indicative of the numbers on a check to a sound synthesizer which could then produce speech or sound which audibly indicates the numbers to an operator who may be locating or sorting checks in a particular manner.
- a person with speech difficulties could use a speech synthesizer to express himself or to improve his speech.
- data indicative of sound or speech can be transmitted by a circuit or a communication channel which has a limited or narrow band width, and then applied to a synthesizer at the receiving end of the circuit.
- narrow band transmission systems are the patent to David et al. U.S. Pat. No. 3,127,477, Mathews et al. U.S. Pat. No.
- an object of my invention is to provide an improved sound or speech synthesizer.
- Another object of my invention is to provide an improved and relatively simple sound or speech synthesizer which produces sound or speech having relatively high quality and fidelity.
- Another object of my invention is to provide a speech synthesizer that produces concatenated speech in accordance with applied input signals which may be coded in any suitable form.
- a storage and playback system for a plurality of separate sounds or phonemes.
- a phoneme is a distinct sound which, with other phonemes, make up or form a language.
- the number of separate phonemes stored and available for playback may vary, depending upon the number of difl'erent sounds to be synthesized. In one preferred embodiment, 32 separate phonemes were stored and available for playback.
- the phonemes are preferably recorded on separate channels on a magnetic tape or such storage media which may be driven at a controlled speed. Each of the recorded phonemes is derived and applied to a respective switch.
- the switches are connected to a common output circuit and gated or turned on in the desired sequence in response to control signals to produce concatenated speech or sound at the output circuit from a plurality of recorded phonemes. Each of the switches is gated or turned on in response to and for the duration of the respective control or command signal.
- the command signal applied to any given switch is produced by a decoder circuit which, in a preferred embodiment, can produce a command signal on any of a plurality of lines (corresponding to the number of recorded phonemes) in response to a digital command which is preferably in binary coded form.
- command signals may be, and preferably are, applied to respective latch-in circuits which have a common circuit so that each new command signal applied to its respective latch-in circuit blocks the other latch-in circuits so that no other or previous command signal is effective.
- the command signals from the latch-in circuits may be applied to respective delay and duration circuits which insure that the command signals occur after a predetermined time delay and, if desired, have a predetermined time duration.
- These command signals are then shaped in a predetermined fashion.
- the shaped command signals are applied to respective switches to gate the switches on in a predetermined sequence so as to provide stored phonemes at the output circuit. As each switch is gated on, it permits its respective phoneme to pass to the common output circuit. If the switches are gated in the desired predetermined sequence as indicated by the decoder, the recorded phonemes are supplied to the common output circuit in concatenated fashion to produce the desired speech or sound.
- FIG. I shows a block diagram of a speech synthesizer in accordance with my invention
- FIG. 2 shows a circuit diagram of one of the latch-in circuits shown in FIG. 1;
- FIG. 3 shows a circuit diagram of one of the delay and duration circuits shown in FIG. 1;
- FIG. 4 shows a circuit diagram of the one of the tone shaper circuits shown in FIG. I;
- FIG. 5 shows a circuit diagram of one of the audio switches shown in FIG. 1;
- FIG. 6 shows waveforms for explaining the operation of the circuits of FIGS. 2 and 3;
- FIG. 7' shows waveforms for explaining the operation of the circuit of FIG. 4.
- FIG. I shows a block diagram of a speech synthesizer in accordance with my invention.
- This system shown in FIG. I is based on utilizing 32 phonemes for synthesizing speech. It is to be understood that more or less phonemes may be used in accordance with my invention.
- a phoneme is a distinct sound or element of speech which, with other distinct sounds or phonemes, make up or form a language. Sounds other than phonemes may be utilized to synthesize other sounds.
- the phonemes are recorded in the phoneme storage and playback system which comprises 32 recording channels or storage mediums which can be played back.
- I contemplate a plurality of closed magnetic tapes or belts, or tracks on a magnetic drum.
- Each tape, belt, or track carries one or more chan nels of a particular phoneme.
- Each channel has a playback head which is constantly sensing the phoneme recorded on its respective channel, and producing an electrical signal indicative of this recorded phoneme.
- All channels of the recording medium preferably move relative to the playback heads at the same rate, which may be fixed or which may be variable in accordance with a glottal rate control signal.
- This glottal rate control signal may indicate the glottal rate of a particular sound or speech being synthesized, and produces an electrical signal indicative of this rate which controls the speed of the motor or drive device for the phoneme storage and playback system.
- the derivation of such glottal rate control signals is illustrated in U.S. Pat. Nos.
- Each electrical signal produced by the playback heads is applied to a respective audio switch.
- the audio switches do not pass the electrical signals to output resistors and a common summing amplifier unless the switches receive respective gating signals.
- the electrical signal provided by its respective playback head is passed by the audio switch through the summing amplifier to a volume control.
- This volume control may be manually or automatically controlled, and produces the final output of synthesized speech or sound.
- An automatic amplitude control scheme is shown in the above-mentioned Gerstman et al. U.S. Pat. No.
- the audio switches are respectively gated by pulses from a decoder.
- the pulses may have a predetermined time occurrence and duration or may last as long as supplied.
- the decoder shown in FIG. 1 simultaneously receives a plurality of coded signals at its inputs and produces a gating signal at any one of its 32 output lines in response to a particular coded signal applied to its inputs.
- the decoder shows five inputs, to which binary coded signals are simultaneously applied for indicating a particular output line to be gated. The five inputs permit 2" or 32 possible combinations if binary coded input signals are utilized. For example, if all input lines are at a logic zero, this could indicate that the phoneme for line 1 is to be gated.
- the decoder shown in block diagram form is known in the art, and converts input signals in binary coded form to output signals at any one of the 32 output lines in accordance with the input binary coded signal.
- the 32 output lines and associated circuits are not all shown in FIG. I in order to reduce crowding. Lines 1 through 5 and 28 through 32 and associated circuits are shown, and lines 6 through 27 are indicated. Assume that line 1 is energized in response to an input code calling for line 1 to be energized or gated.
- the decoder produces a negative-going pulse which is applied to its respective latch-in circuit.
- the latch-in circuit of each output line is connected by a latch-in bus to all other latch-in circuits.
- a particular latch-in circuit receives a negative-going pulse, it produces a positive pulse at its output on the latch-in bus which prevents all other latch-in circuits from becoming operative in response to a negative-going decoder pulse.
- the latch-in circuit receiving a negativegoing pulse provides a pulse to a delay and duration circuit which is optional and which shapes this pulse so that it has a predetermined point of beginning and ending. This is particularly desirable for the phonemes forming the sounds for B, D, G, K, P, and T.
- This pulse is then applied to a tone shaper circuit which provides a slope for the rise and fall portions of the pulse and an abrupt transition at the beginning of the rise and fall portions. (Different rise and fall times are more appropriate for various concatenated sounds.) The abrupt transitions reduce or eliminate erroneous gating because of transients and the slopes provide a gradual transition for gating the audio switches.
- the tone shaper circuit output pulse is applied to the audio switch. and during the time that the tone shaper pulse is applied to the audio switch, the audio switch permits the phoneme being played back by its respective phoneme playback head to be supplied to the common summing amplifier.
- the decoder When the next input coded signal is applied, the decoder produces a negative-going pulse at a particular output line, say the output line 4.
- the latch-in circuit for the output line 4 then produces a latch-in signal at the latch-in bus which prevents the other latch-in circuits from passing a gating signal.
- the line 4 then supplies a pulse to its audio switch which then permits the phoneme on channel 4 to pass through the audio switch to the summing amplifier.
- the system shown in FIG. 1 provides selective ones of 32 or any plurality, of phonemes in concatenated or series fashion. The advantage of the system of FIG. 1 is that only one of a selected number of phonemes is applied or provided at any one time.
- the system does not need or require extensive filtering, as in prior art systems, and thus provides synthesized speech or sound with a relatively simple arrangement. While the system of FIG. 1 contemplates 32 phonemes, and hence 32 storage and playback channels and 32 gating lines and circuits, a system with a different number of phonemes may be readily and easily utilized in accordance with my invention. For example, if relatively simple sounds are to be synthesized, a smaller number of phonemes would be satisfactory. However, if relatively complex sounds are to be synthesized, for example various pronunciations and accents for citizens of the United States, then more than 32 phonemes would be needed and desirable. However, any number of phonemes may be easily concatenated by use of the decoder which is capable of decoding the input signals and gating a particular output line.
- the decoder is known in the art, and may take any one of a number of forms of well-known logic circuits.
- the phoneme storage and playback system may take any one of a number of forms.
- the phoneme storage and playback system may be a single magnetic tape, preferably in the form of a loop or belt, with the predetermined number of channels recorded thereon, and with the corresponding number of playback heads, one for each channel.
- the recording medium may be driven by any suitable means, for example, a synchronous motor whose speed can be relatively easily varied by changing its applied input frequency. This change of speed is desirable in some applications where the speech to be synthesized requires that the recording medium be played at a slower or faster speed determined by the glottal rate of-the speech.
- the summing amplifier may be any conventional type amplifier.
- the volume control may be any conventional type volume control, but is preferably one whose volume is electrically controlled or varied.
- FIG. 2 shows the latch-in circuit for line 1, indicated in block diagram form in FIG. 1.
- suitable direct current voltages are supplied at the terminals having -l for a positive voltage and for a negative voltage relative to ground.
- the negative-going pulse shown at time I, in FIG.
- the transistor O4 is turned on and the transistor O3 is turned off.
- the transistor 04 is turned off as shown in FIG. 6d.
- the transistor 04 is turned qfi, its collector voltage becomes relatively negative. This negative voltage is coupled to the base of the transistor 03 as shown in FIG. 6g. This negative voltage causes the transistor Q3 to be turned on as indicated at the time t, in FIG. 61:.
- the multivibrator circuit action makes this switching relatively sharp and fast.
- the transistor 04 is turned off, its negative collector also supplies a negative voltage to the base of a PNP-type transistor 05.
- This voltage is applied to the delay and duration circuit, and is the pulse which eventually causes the audio switch to gate or pass phoneme signals.
- This positive-going voltage at the collector of the transistor 05 is coupled through a capacitor 20 to the base of an NPN-type transistor Q6.
- the emitter of the transistor O6 is coupled through a diode rectifier 21 to the latch-in bus which is coupled to the other latch-in circuits.
- This positive pulse is shown at the time I, in FIG. 6f.
- This same positive pulse is applied through a capacitor 22 to the base of an PNP-type transistor O2 to cause the emitter voltage of the transistor O2 to go from a negative value towards zero as shown in FIG. 6i.
- This emitter voltage is coupled by a capacitor 25 to the base of the transistor Q3 and tends to turn the transistor Q3 off.
- the power of this pulse is insufficient to overcome the switched condition of the transistors Q3 and Q4 because the pulse applied to the base of the transistor Q4 predominates.
- this pulse shown in FIG. 61' does restore all other multivibrators of the other latch-in circuits to the normal condition with the transistor Q3 turned off and the transistor Q4 turned on.
- FIG. 2 also shows a reset switch which may be a manually operated switch for applying a positive voltage to a reset bus.
- This reset bus is coupled to all of the latch-in circuits, and may be utilized to restore all multivibrators to their normal condition with the transistor Q3 turned off and with the transistor Q4 turned on.
- the command signal from the decoder returns to its normal or positive condition as shown in FIG. 6a.
- This turns the transistor Q1 off again, as shown in FIG. 6b.
- the collector voltage of the transistor Q1 becomes relatively negative again, but this has no effect on the multivibrator transistors Q3 and Q4 because the negative voltage on the collector of the transistor Q1 is not passed by the rectifier 16.
- the remainder of the circuit shown in FIG. 2 remains in the condition shown at the time I, in FIG. 60 through 6i. However, at some later time 1 a command is received for line 2.
- the multivibrator circuit for this line 2 switches in the manner described for the multivibrator circuit ofline l, and a positive pulse is received on the latch-in bus from the transistor corresponding to the transistor Q6.
- This positive pulse is shown in FIG. 6f at the time i
- This positive pulse is coupled through the capacitor 22 to the base of the transistor Q2 and turns the transistor Q2 off.
- the emitter of the transistor Q2 goes from a negative value toward zero, and this positive-going voltage is applied through the capacitor 25 to the base of the transistor Q3.
- This pulse is effective to turn the transistor Q3 off, so that its collector voltage goes toward a negative value.
- This negative voltage is applied to the base of the transistor Q4 and turns the transistor Q4 on.
- Multivibrator action causes this switching to take place in a relatively short time, and restores the multivibrator transistors Q3 and Q4 to their normal condition with the transistor 03 turned ofi and the transistor Q4 turned on. With the transistor 04 turned on, the transistor Q5 is turned off, and its collector voltage becomes negative again. If the pulse supplied to the delay and duration circuit has not already been terminated, this negative pulse terminates the gating signal for the audio switch associated with line 1. However, as will be explained in connection with FIG. 3, a delay and duration circuit is provided for line 1 so that this gating signal is terminated previously by the delay and duration circuit.
- FIG. 3 shows a schematic diagram of the delay and duration circuit.
- This circuit is supplied with the direct current voltages as indicated.
- This circuit is desirable for producing certain phonemes, such as for the sounds of B, D, G, K, P, and T, at a particular time, and for limiting the duration of such phonemes. If this circuit is not needed or desired in any line, it may be eliminated or bypassed with a direct connection between the latch-in circuit and the tone shaper circuit. In such a case, the pulse from the collector of the transistor Q5 (of the latch-in circuit) remains in the gating condition until a command pulse for another line is provided.
- the delay and duration circuit comprises an input NPN-type transistor Q7 to which the pulse from the transistor Q5 of the latch-in circuit of FIG. 2 is applied.
- This pulse is coupled through a capacitor 30 to the base of a transistor Q7 and is differentiated by the capacitor 30.
- the collector of the transistor Q7 is coupled to a leading or first multivibrator comprising PNP-type transistors Q10 and Q11 coupled together in regenerative fashion as a monostable or one shot multivibrator.
- the emitter of the transistor Q7 is coupled to a trailing or second multivibrator comprising NPN-type transistors Q8 and Q9 also connected as a monostable or one shot multivibrator. In these multivibrators, the transistors Q9 and Q11 are normally on.
- the transistor Q7 When a positive pulse is applied to the base of the transistor 07, the transistor Q7 is turned on for the duration of the pulse, and its emitter voltage rises from a negative voltage toward zero, and its collector voltage falls from zero toward a negative voltage as shown in FIGS. 6j and 6k.
- the emitter of the transistor Q7 is coupled to the base of the transistor Q8, and the collector of the transistor O7 is coupled to the base of the transistor 010 so that the transistors Q8 and Q10 are turned on by these voltages. This switches the two multivibrators so that the transistors Q9 and 011 are turned off as shown at the time I, in FIGS. 61 and 6m.
- the time constant (or unstable time) of the first multivibrator with the transistors Q10 and 011 is made shorter than the time constant (or unstable time) of the second multivibrator with the transistors 08 and Q9.
- the first multivibrator with the transistors Q10 and Q11 returns to its normal condition with the transistor Q11 on) at a time 1;, as shown in FIG. 6m, and the second multivibrator with the transistors Q8 and Q9 returns to its normal condition with the transistor Q9 on) at a later time t as shown in FIG. 61.
- the collector of the normally on transistor Q11 is coupled to the base of a PNP-type transistor Q13, and the collector of the normally on transistor Q9 is coupled to the base of a PNP- type transistor Q12.
- the transistor 011 is turned off
- the transistor Q13 is turned on so that its collector voltage approaches zero as shown in FIG. 60.
- the transistor O9 is turned off
- the transistor Q12 is also turned off.
- the collector voltage of the transistor Q12 remains at zero or near zero because the collector voltage of the transistor 013 is near zero.
- the collector voltage of the transistor Q13 becomes negative again, the collector voltage of the transistor Q12 may also become negative if the transistor Q12 was turned off as indicated at the time 1,.
- Both of the collectors of the transistors Q12 and Q13 become negative at the time Their collectors are coupled to a base ofa combining PNP-type transistor Q14, and cause the base of the transistor Q14 to become negative as shown in FIG. 6p.
- This turns the transistor Q14 on as indicated in FIG. 6:
- the transistor Q9 turns on again by multivibrator action.
- This supplies a negative voltage to the base of the transistor Q12 which turns the transistor Q12 on.
- the base of the transistor Q14 becomes positive again and the transistor Q14 is turned off.
- a fixed duration voltage or pulse is supplied by the collector of the transistor Q14.
- This pulse has a leading edge occurring at the predetermined time I (determined by the time constant of the first or leading multivibrator comprising the transistors Q10 and Q11) and has a trailing edge occurring at a predetermined time 1., (determined by the time constant of the second multivibrator comprising the transistors 08 and Q9).
- this pulse shown in F IG. 6: is independent, both in time and duration, of the command pulse for line 1 shown in FIG. 6a.
- this delay and fixed duration pulse may be desirable for certain phonemes.
- the pulse delay and time duration may be set or adjusted by the variable resistors 31, 32 respectively coupled to the bases of the transistors Q8 and Q9. Other time constant adjusting elements may be utilized as well.
- the phonemes associated with a particular delay and duration circuit will be gated only as long as determined by the delay and duration circuit. In the absence of a subsequent command, there will be silence until a command is received. If a particular line does not use or have a delay and duration circuit, then the phoneme for that line will last until a subsequent command is provided for another line.
- the pulse from each delay and duration circuit (or from each latch-in circuit) is applied to a tone shaper circuit shown in FIG. 4.
- This circuit provides a slope on the leading and trailing edges of the pulse by a controlled charging and discharging of a capacitor 40.
- the charging time is determined by the magnitude of a variable resistor 41, and the discharging time is determined by the magnitude of a variable resistor 42.
- FIG. 7a
- FIG. 7b shows the pulse as applied to the tone shaper circuit
- FIG. 7b shows the voltage appearing across the capacitor 40.
- the voltage has a sloped leading edge and trailing edge. This voltage with the sloped leading and trailing edges is applied to the base of a PNP-type transistor Q15.
- the transistor Q and the circuit shown provide a first abrupt change from a lower value to an intermediate value on the leading edge of the voltage, and a second abrupt change from an upper value to an intermediate value on the trailing edge of the voltage as shown in FIG. 7c.
- the first abrupt change is provided by the bias current supplied by a diode rectifier 43 and the voltage network 44
- the second abrupt change is provided by the bias current in the resistor-capacitor network 45.
- the voltage After the voltage has been shaped to provide the form shown in FIG. 70, it is supplied to the audio switch or gate circuit shown in F IG. 5. These abrupt changes are desirable to eliminate transient conditions from operating the audio switch, and the sloped leading and trailing edges are desirable to fade the audio switch in and out so that the phoneme is supplied and then removed from the common output in a controlled manner with prescribed rates of change.
- each audio switch has signals from its respective storage and playback channel applied to the PNP- type transistor Q16. These signals are normally bypassed to ground by parallel connected and normally conducting PNP- type transistors Q17 and Q18. However, when a positive voltage pulse is provided by the transistor 015, the transistors Q17 and Q18 are turned off, the signals at the emitter of the transistor Q16 are applied to the PNP-type output transistor Q19. The output transistor Q19 amplifies these signals and applies them to the summing amplifier as indicated. As soon as the voltage from the tone shaper circuit or latch-in circuit falls again, the transistors Q17 and Q18 become conductive and bypass or, short circuit the signals.
- signals from the phoneme storage and playback system are applied to the output only when the transistors Q17 and Q18 are turned off.
- signals from the audio switches are selectively applied to the common summing amplifier where they are added in series or concatenated with other signals to provide the synthesized sound or speech.
- the synthesizer may accommodate any number of phoneme channels, as opposed to frequency selective systems, and these phoneme channels are selectively concatenated or added in series to a common output.
- Persons skilled in the art will appreciate that modification may be made to my invention.
- other ways of storage and playback systems may be used, and other decoder circuits may be used.
- the time delay and duration circuit may or may not be used in each line.
- Other latch-in circuits, time delay and duration circuits, and shaper circuits may also be used.
- the circuits shown in schematic diagrams represent preferred embodiments. Therefore, while my invention has been described with reference to a particular embodiment, it is to be understood that modifications may be made without departing from the spirit of the invention or from the scope of the claims.
- a speech synthesizer comprising: a plurality of gates, each having multiple inputs and an out- P avariable speed storage and playback system having a plurality of playback channels each of which is coupled to an input of a respective one of said gates, each said channel adapted to supply a distinct continuous signal indicative of a distinct continuous phonemic sound to said respective gate, said phonemic sounds when grouped in specified manners in series producing word sounds; a summing amplifier having an Input coupled to the output of each said gate and an output for providing a series of signals indicative of word sounds;
- a decoder having n inputs and 2" outputs where n is any natural number, said inputs adapted to receive binary coded signals and to energize not more than one of the 2" outputs in response thereto;
- each of the other latch means connected to a distinct decoder output and to each of the other latch means and efiective when energized to prevent the energization of any of the remaining latch means;
- circuit means connecting each said latched means with a corresponding gate whereby energization of a specific latch means results in the energization of only the corresponding gate thereby passing a corresponding distinct continuous signal to said summing amplifier.
- variable speed storage and playback system further comprises an input terminal for varying the storage system speed.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing Not Specific To The Method Of Recording And Reproducing (AREA)
Abstract
A system for synthesizing sounds is disclosed which has a plurality of phonemes recorded for example one phoneme per track in each of several tracks of a recording medium. A plurality of latch circuits interconnected so that only one of the latch circuits may be energized at any given time are effective to gate out of the storage medium a prescribed phoneme in response to a digital code input specifying that phoneme. A plurality of phonemes are then concatenated to form intelligible sounds.
Description
iJnited States Patent Inventor William E. Richeson Fort Wayne, Ind.
Appl. No. 24,360
Filed Apr. I5, 1970 Nov. 30, I971 The Magnavox Company Fort Wayne, Ind.
Continuation of application Ser. No. 586,521, Oct. 13, 1966. This application Apr. 15, 1970, Ser. No. 24,360
Patented Assignee SPEECH SYNTHESIZER UTILIZING STORED PIIONEMES 4 Claims, 7 Drawing Figs.
U.S. Cl 179/15 A Int. Cl GIOI l/ Field of Search 179/] AS,
[56] References Cited UNITED STATES PATENTS 3,102,165 8/1963 Clapper l79/l AS 3,22l,420 l2/l965 Heinberg. 35/353 3,253,263 5/I966 Lee 340/152 3,367,045 2/1968 Mendez l79/l AS Primary Examiner-l(athleen H. ClalTy Assistant Examiner-Jon Bradford Leaheey Attorney-Richard T. Seeger GLorrA RATE COIIITROL PHONEME Sroeas: PLAvaAcK svs'rsm SIGNAL I a a 9:011' If H 11 5 4 30 z Ol/TPl/T UNES. 0560052 LATCH Fit-Av: 1am- Aumo I o IIv ml mu SHAVER SwIrcI-I W llvpurs; CIRCU T CIRCUIT CIRCUIT 2 r o q a W 3 -o i w --a 4 c n A A A H,
4 SUM/MING AMPLIFIER VOLt/M' 5 a Con/mm. "7
C'oNrkoL SIGNAL 280 -v I -MN-,
30 n -MI LATcH IN PATENTED NOV30 |97l SHEET 2 UF 6 INVENTOR. lU/LLIAM 6. RICHE'SON JEFFERS g5 VouNG Avenue-v5 MwoOumQ 20ml PATENTEnrmvsolsn $624,301
Q14 ms: (P) VOLTAGE (Income-cm? 2 VOLTAGE W PATENTEU NUV30 197a SHEET 6 BF 6 (b) T we SHAPE? ou'rpur (c) OFFSET ourpur FICL? INVLN'I'OR. MILL IAM E. QICHESON c/EFFGRS UouNG Arman/5P5- SPEECH SYNTHESIZER UTILIZING STORED PHONEMES My invention relates to a sound synthesizer, and particularly to a sound synthesizer that selects recorded or stored sounds in a predetermined sequence, and concatenates the selected sounds to form speech or other intelligible sounds.
Sound synthesizers, and particularly speech synthesizers, have a variety of potential applications and uses. For example, an operator can read a book or set of instructions and type the instructions on paper and/or punched tape. The punched tape may be coded so as to indicate sounds which in turn are indicative of the instructions. This tape could then be used with a sound synthesizer to produce speech or sound which indicates or conveys the information typed by the operator. A system of this type is illustrated in the patent to Gerstman et al. U.S. Pat. No. 3,158,685, dated Nov. 24, 1964. Or, a photoelectric reader for bank checks could supply electrical signals indicative of the numbers on a check to a sound synthesizer which could then produce speech or sound which audibly indicates the numbers to an operator who may be locating or sorting checks in a particular manner. Or, a person with speech difficulties could use a speech synthesizer to express himself or to improve his speech. Or, data indicative of sound or speech can be transmitted by a circuit or a communication channel which has a limited or narrow band width, and then applied to a synthesizer at the receiving end of the circuit. Illustrative of such narrow band transmission systems are the patent to David et al. U.S. Pat. No. 3,127,477, Mathews et al. U.S. Pat. No. 3,083,266, Schroeder U.S. Pat. No. 3,071,652, and David et al. U.S. Pat. No. 3,l90,963. The synthesizer can then produce the sound or speech. There are many other applications for a sound 'or speech synthesizer. However-,1 previously known sound or speech synthesizers have not been satisfactory for various reasons. One reason has been that the quality of the synthesized sound or speech has been poor. Another has been that the sound or speech has been synthesized on the basis of a plurality of narrow bands of frequencies which require relatively complex and extensive equipment.
Accordingly, an object of my invention is to provide an improved sound or speech synthesizer.
Another object of my invention is to provide an improved and relatively simple sound or speech synthesizer which produces sound or speech having relatively high quality and fidelity.
Another object of my invention is to provide a speech synthesizer that produces concatenated speech in accordance with applied input signals which may be coded in any suitable form.
Briefly, these and other objects are achieved in accordance with my invention by a storage and playback system for a plurality of separate sounds or phonemes. As used herein, a phoneme is a distinct sound which, with other phonemes, make up or form a language. The number of separate phonemes stored and available for playback may vary, depending upon the number of difl'erent sounds to be synthesized. In one preferred embodiment, 32 separate phonemes were stored and available for playback. The phonemes are preferably recorded on separate channels on a magnetic tape or such storage media which may be driven at a controlled speed. Each of the recorded phonemes is derived and applied to a respective switch. The switches are connected to a common output circuit and gated or turned on in the desired sequence in response to control signals to produce concatenated speech or sound at the output circuit from a plurality of recorded phonemes. Each of the switches is gated or turned on in response to and for the duration of the respective control or command signal. The command signal applied to any given switch is produced by a decoder circuit which, in a preferred embodiment, can produce a command signal on any of a plurality of lines (corresponding to the number of recorded phonemes) in response to a digital command which is preferably in binary coded form. These command signals may be, and preferably are, applied to respective latch-in circuits which have a common circuit so that each new command signal applied to its respective latch-in circuit blocks the other latch-in circuits so that no other or previous command signal is effective. The command signals from the latch-in circuits may be applied to respective delay and duration circuits which insure that the command signals occur after a predetermined time delay and, if desired, have a predetermined time duration. These command signals are then shaped in a predetermined fashion. The shaped command signals are applied to respective switches to gate the switches on in a predetermined sequence so as to provide stored phonemes at the output circuit. As each switch is gated on, it permits its respective phoneme to pass to the common output circuit. If the switches are gated in the desired predetermined sequence as indicated by the decoder, the recorded phonemes are supplied to the common output circuit in concatenated fashion to produce the desired speech or sound.
The subject matter which I regard as my invention is particularly pointed out and distinctly claimed in the claims. The structure and operation of my invention, together with further objects and advantages, may be better understood from the following description given in connection with the accompanying drawings, in which:
FIG. I shows a block diagram of a speech synthesizer in accordance with my invention;
FIG. 2 shows a circuit diagram of one of the latch-in circuits shown in FIG. 1;
FIG. 3 shows a circuit diagram of one of the delay and duration circuits shown in FIG. 1;
FIG. 4 shows a circuit diagram of the one of the tone shaper circuits shown in FIG. I;
FIG. 5 shows a circuit diagram of one of the audio switches shown in FIG. 1;
FIG. 6 shows waveforms for explaining the operation of the circuits of FIGS. 2 and 3; and
FIG. 7'shows waveforms for explaining the operation of the circuit of FIG. 4.
FIG. I shows a block diagram of a speech synthesizer in accordance with my invention. This system shown in FIG. I is based on utilizing 32 phonemes for synthesizing speech. It is to be understood that more or less phonemes may be used in accordance with my invention. As used in this application, a phoneme is a distinct sound or element of speech which, with other distinct sounds or phonemes, make up or form a language. Sounds other than phonemes may be utilized to synthesize other sounds. The phonemes are recorded in the phoneme storage and playback system which comprises 32 recording channels or storage mediums which can be played back. In the embodiment shown in FIG. I, I contemplate a plurality of closed magnetic tapes or belts, or tracks on a magnetic drum. Each tape, belt, or track carries one or more chan nels of a particular phoneme. Each channel has a playback head which is constantly sensing the phoneme recorded on its respective channel, and producing an electrical signal indicative of this recorded phoneme. All channels of the recording medium preferably move relative to the playback heads at the same rate, which may be fixed or which may be variable in accordance with a glottal rate control signal. This glottal rate control signal may indicate the glottal rate of a particular sound or speech being synthesized, and produces an electrical signal indicative of this rate which controls the speed of the motor or drive device for the phoneme storage and playback system. The derivation of such glottal rate control signals is illustrated in U.S. Pat. Nos. 3,158,685; 3,127,477; 3,083,266; 3,07l,652; and 3,l90,963 all aforementioned. Each electrical signal produced by the playback heads is applied to a respective audio switch. However, the audio switches do not pass the electrical signals to output resistors and a common summing amplifier unless the switches receive respective gating signals. when an audio switch does receive a gating signal, the electrical signal provided by its respective playback head is passed by the audio switch through the summing amplifier to a volume control. This volume control may be manually or automatically controlled, and produces the final output of synthesized speech or sound. An automatic amplitude control scheme is shown in the above-mentioned Gerstman et al. U.S. Pat. No. 3,158,685. As will be explained, only one audio switch passes or gates a phoneme at any one time to the common output summing amplifier; so that a series of or concatenation of phonemes is provided'to the summing amplifier in response to the audio switches being gated.
The audio switches are respectively gated by pulses from a decoder. The pulses may have a predetermined time occurrence and duration or may last as long as supplied. The decoder shown in FIG. 1 simultaneously receives a plurality of coded signals at its inputs and produces a gating signal at any one of its 32 output lines in response to a particular coded signal applied to its inputs. The decoder shows five inputs, to which binary coded signals are simultaneously applied for indicating a particular output line to be gated. The five inputs permit 2" or 32 possible combinations if binary coded input signals are utilized. For example, if all input lines are at a logic zero, this could indicate that the phoneme for line 1 is to be gated. If input] is at a logic 1 and all other inputs are at a logic zero, this could indicate that the phoneme for line 2 is to be gated. The decoder shown in block diagram form is known in the art, and converts input signals in binary coded form to output signals at any one of the 32 output lines in accordance with the input binary coded signal. The 32 output lines and associated circuits are not all shown in FIG. I in order to reduce crowding. Lines 1 through 5 and 28 through 32 and associated circuits are shown, and lines 6 through 27 are indicated. Assume that line 1 is energized in response to an input code calling for line 1 to be energized or gated. The decoder produces a negative-going pulse which is applied to its respective latch-in circuit. The latch-in circuit of each output line is connected by a latch-in bus to all other latch-in circuits. When a particular latch-in circuit receives a negative-going pulse, it produces a positive pulse at its output on the latch-in bus which prevents all other latch-in circuits from becoming operative in response to a negative-going decoder pulse. Thus, the possibility of two lines having gating signals at the same time is eliminated. The latch-in circuit receiving a negativegoing pulse provides a pulse to a delay and duration circuit which is optional and which shapes this pulse so that it has a predetermined point of beginning and ending. This is particularly desirable for the phonemes forming the sounds for B, D, G, K, P, and T. This pulse is then applied to a tone shaper circuit which provides a slope for the rise and fall portions of the pulse and an abrupt transition at the beginning of the rise and fall portions. (Different rise and fall times are more appropriate for various concatenated sounds.) The abrupt transitions reduce or eliminate erroneous gating because of transients and the slopes provide a gradual transition for gating the audio switches. The tone shaper circuit output pulse is applied to the audio switch. and during the time that the tone shaper pulse is applied to the audio switch, the audio switch permits the phoneme being played back by its respective phoneme playback head to be supplied to the common summing amplifier.
When the next input coded signal is applied, the decoder produces a negative-going pulse at a particular output line, say the output line 4. The latch-in circuit for the output line 4 then produces a latch-in signal at the latch-in bus which prevents the other latch-in circuits from passing a gating signal. In the manner described for line 1, the line 4 then supplies a pulse to its audio switch which then permits the phoneme on channel 4 to pass through the audio switch to the summing amplifier. Thus, the system shown in FIG. 1 provides selective ones of 32 or any plurality, of phonemes in concatenated or series fashion. The advantage of the system of FIG. 1 is that only one of a selected number of phonemes is applied or provided at any one time. The system does not need or require extensive filtering, as in prior art systems, and thus provides synthesized speech or sound with a relatively simple arrangement. While the system of FIG. 1 contemplates 32 phonemes, and hence 32 storage and playback channels and 32 gating lines and circuits, a system with a different number of phonemes may be readily and easily utilized in accordance with my invention. For example, if relatively simple sounds are to be synthesized, a smaller number of phonemes would be satisfactory. However, if relatively complex sounds are to be synthesized, for example various pronunciations and accents for citizens of the United States, then more than 32 phonemes would be needed and desirable. However, any number of phonemes may be easily concatenated by use of the decoder which is capable of decoding the input signals and gating a particular output line.
In FIG. 1, the decoder is known in the art, and may take any one of a number of forms of well-known logic circuits. Similarly, the phoneme storage and playback system may take any one of a number of forms. For example, the phoneme storage and playback system may be a single magnetic tape, preferably in the form of a loop or belt, with the predetermined number of channels recorded thereon, and with the corresponding number of playback heads, one for each channel. The recording medium may be driven by any suitable means, for example, a synchronous motor whose speed can be relatively easily varied by changing its applied input frequency. This change of speed is desirable in some applications where the speech to be synthesized requires that the recording medium be played at a slower or faster speed determined by the glottal rate of-the speech. The summing amplifier may be any conventional type amplifier. The volume control may be any conventional type volume control, but is preferably one whose volume is electrically controlled or varied.
The latch-in circuits, the delay and duration circuits, the tone shaper circuits, and the audio switches are not necessarily conventional, and circuit diagrams of these circuits are shown in FIGS. 2, 3, 4, and 5 respectively. Waveforms for explaining the operation of these circuits are shown in FIGS. 6 and 7, which are respectively plotted on common time axes. FIG. 2 shows the latch-in circuit for line 1, indicated in block diagram form in FIG. 1. In FIG. 2 suitable direct current voltages are supplied at the terminals having -l for a positive voltage and for a negative voltage relative to ground. For the latch-in circuit shown, the negative-going pulse (shown at time I, in FIG. 6a) from the decoder reverse biases a diode rectifier 10 so that a voltage divider network of resistors II, 12, 13 applies a negative voltage to the base of the PNP-type transistor 01. The transistor O1 is turned on by this negative voltage at the time I, (see FIG. 66). Its collector voltage rises toward zero a capacitor 15 and rectifier 16 provide a positive pulse (shown in FIG. 6c) to a multivibrator circuit comprising PNP-type transistors 03 and Q4. The transistors Q3 and 04 are regeneratively coupled so that when one of the transistors is conducting, the other of the transistors is turned off. When the reset switch is closed, all multivibrators are switched such that the transistor O4 is turned on. Normally, that is if no gating signal is being supplied, the transistor O4 is turned on and the transistor O3 is turned off. When the positive pulse is.supplied to the base of the transistor Q4, the transistor 04 is turned off as shown in FIG. 6d. When the transistor 04 is turned qfi, its collector voltage becomes relatively negative. This negative voltage is coupled to the base of the transistor 03 as shown in FIG. 6g. This negative voltage causes the transistor Q3 to be turned on as indicated at the time t, in FIG. 61:. The multivibrator circuit action makes this switching relatively sharp and fast. At the time I, when the transistor 04 is turned off, its negative collector also supplies a negative voltage to the base of a PNP-type transistor 05. This causes the transistor Q5 to be turned on, and its collector voltage rises from a negative value toward zero as shown in FIG. 6e. This voltage is applied to the delay and duration circuit, and is the pulse which eventually causes the audio switch to gate or pass phoneme signals. This positive-going voltage at the collector of the transistor 05 is coupled through a capacitor 20 to the base of an NPN-type transistor Q6. This causes the transistor O6 to be momentarily turned on, so that its emitter becomes momentarily positive. As shown in FIG. 2, the emitter of the transistor O6 is coupled through a diode rectifier 21 to the latch-in bus which is coupled to the other latch-in circuits. This positive pulse is shown at the time I, in FIG. 6f. This same positive pulse is applied through a capacitor 22 to the base of an PNP-type transistor O2 to cause the emitter voltage of the transistor O2 to go from a negative value towards zero as shown in FIG. 6i. This emitter voltage is coupled by a capacitor 25 to the base of the transistor Q3 and tends to turn the transistor Q3 off. However, the power of this pulse is insufficient to overcome the switched condition of the transistors Q3 and Q4 because the pulse applied to the base of the transistor Q4 predominates. However, this pulse shown in FIG. 61' does restore all other multivibrators of the other latch-in circuits to the normal condition with the transistor Q3 turned off and the transistor Q4 turned on.
FIG. 2 also shows a reset switch which may be a manually operated switch for applying a positive voltage to a reset bus. This reset bus is coupled to all of the latch-in circuits, and may be utilized to restore all multivibrators to their normal condition with the transistor Q3 turned off and with the transistor Q4 turned on.
At some time I the command signal from the decoder returns to its normal or positive condition as shown in FIG. 6a. This turns the transistor Q1 off again, as shown in FIG. 6b. The collector voltage of the transistor Q1 becomes relatively negative again, but this has no effect on the multivibrator transistors Q3 and Q4 because the negative voltage on the collector of the transistor Q1 is not passed by the rectifier 16. Thus, the remainder of the circuit shown in FIG. 2 remains in the condition shown at the time I, in FIG. 60 through 6i. However, at some later time 1 a command is received for line 2. The multivibrator circuit for this line 2 switches in the manner described for the multivibrator circuit ofline l, and a positive pulse is received on the latch-in bus from the transistor corresponding to the transistor Q6. This positive pulse is shown in FIG. 6f at the time i This positive pulse is coupled through the capacitor 22 to the base of the transistor Q2 and turns the transistor Q2 off. The emitter of the transistor Q2 goes from a negative value toward zero, and this positive-going voltage is applied through the capacitor 25 to the base of the transistor Q3. This pulse is effective to turn the transistor Q3 off, so that its collector voltage goes toward a negative value. This negative voltage is applied to the base of the transistor Q4 and turns the transistor Q4 on. Multivibrator action causes this switching to take place in a relatively short time, and restores the multivibrator transistors Q3 and Q4 to their normal condition with the transistor 03 turned ofi and the transistor Q4 turned on. With the transistor 04 turned on, the transistor Q5 is turned off, and its collector voltage becomes negative again. If the pulse supplied to the delay and duration circuit has not already been terminated, this negative pulse terminates the gating signal for the audio switch associated with line 1. However, as will be explained in connection with FIG. 3, a delay and duration circuit is provided for line 1 so that this gating signal is terminated previously by the delay and duration circuit.
FIG. 3 shows a schematic diagram of the delay and duration circuit. This circuit is supplied with the direct current voltages as indicated. This circuit is desirable for producing certain phonemes, such as for the sounds of B, D, G, K, P, and T, at a particular time, and for limiting the duration of such phonemes. If this circuit is not needed or desired in any line, it may be eliminated or bypassed with a direct connection between the latch-in circuit and the tone shaper circuit. In such a case, the pulse from the collector of the transistor Q5 (of the latch-in circuit) remains in the gating condition until a command pulse for another line is provided. The delay and duration circuit comprises an input NPN-type transistor Q7 to which the pulse from the transistor Q5 of the latch-in circuit of FIG. 2 is applied. This pulse is coupled through a capacitor 30 to the base of a transistor Q7 and is differentiated by the capacitor 30. The collector of the transistor Q7 is coupled to a leading or first multivibrator comprising PNP-type transistors Q10 and Q11 coupled together in regenerative fashion as a monostable or one shot multivibrator. The emitter of the transistor Q7 is coupled to a trailing or second multivibrator comprising NPN-type transistors Q8 and Q9 also connected as a monostable or one shot multivibrator. In these multivibrators, the transistors Q9 and Q11 are normally on. When a positive pulse is applied to the base of the transistor 07, the transistor Q7 is turned on for the duration of the pulse, and its emitter voltage rises from a negative voltage toward zero, and its collector voltage falls from zero toward a negative voltage as shown in FIGS. 6j and 6k. The emitter of the transistor Q7 is coupled to the base of the transistor Q8, and the collector of the transistor O7 is coupled to the base of the transistor 010 so that the transistors Q8 and Q10 are turned on by these voltages. This switches the two multivibrators so that the transistors Q9 and 011 are turned off as shown at the time I, in FIGS. 61 and 6m. The time constant (or unstable time) of the first multivibrator with the transistors Q10 and 011 is made shorter than the time constant (or unstable time) of the second multivibrator with the transistors 08 and Q9. Hence, the first multivibrator with the transistors Q10 and Q11 returns to its normal condition with the transistor Q11 on) at a time 1;, as shown in FIG. 6m, and the second multivibrator with the transistors Q8 and Q9 returns to its normal condition with the transistor Q9 on) at a later time t as shown in FIG. 61. The collector of the normally on transistor Q11 is coupled to the base of a PNP-type transistor Q13, and the collector of the normally on transistor Q9 is coupled to the base of a PNP- type transistor Q12. When the transistor 011 is turned off, the transistor Q13 is turned on so that its collector voltage approaches zero as shown in FIG. 60. When the transistor O9 is turned off, the transistor Q12 is also turned off. However, the collector voltage of the transistor Q12 remains at zero or near zero because the collector voltage of the transistor 013 is near zero. However, when the collector voltage of the transistor Q13 becomes negative again, the collector voltage of the transistor Q12 may also become negative if the transistor Q12 was turned off as indicated at the time 1,. Both of the collectors of the transistors Q12 and Q13 become negative at the time Their collectors are coupled to a base ofa combining PNP-type transistor Q14, and cause the base of the transistor Q14 to become negative as shown in FIG. 6p. This turns the transistor Q14 on as indicated in FIG. 6: At the time t the transistor Q9 turns on again by multivibrator action. This supplies a negative voltage to the base of the transistor Q12 which turns the transistor Q12 on. With the transistor Q12 on, the base of the transistor Q14 becomes positive again and the transistor Q14 is turned off. Thus, a fixed duration voltage or pulse is supplied by the collector of the transistor Q14. This pulse has a leading edge occurring at the predetermined time I (determined by the time constant of the first or leading multivibrator comprising the transistors Q10 and Q11) and has a trailing edge occurring at a predetermined time 1., (determined by the time constant of the second multivibrator comprising the transistors 08 and Q9). It will be seen that this pulse, shown in F IG. 6: is independent, both in time and duration, of the command pulse for line 1 shown in FIG. 6a. As pointed out previously, this delay and fixed duration pulse may be desirable for certain phonemes. The pulse delay and time duration may be set or adjusted by the variable resistors 31, 32 respectively coupled to the bases of the transistors Q8 and Q9. Other time constant adjusting elements may be utilized as well. The phonemes associated with a particular delay and duration circuit will be gated only as long as determined by the delay and duration circuit. In the absence of a subsequent command, there will be silence until a command is received. If a particular line does not use or have a delay and duration circuit, then the phoneme for that line will last until a subsequent command is provided for another line.
The pulse from each delay and duration circuit (or from each latch-in circuit) is applied to a tone shaper circuit shown in FIG. 4. This circuit provides a slope on the leading and trailing edges of the pulse by a controlled charging and discharging of a capacitor 40. The charging time is determined by the magnitude of a variable resistor 41, and the discharging time is determined by the magnitude of a variable resistor 42. FIG. 7a
shows the pulse as applied to the tone shaper circuit, and FIG. 7b shows the voltage appearing across the capacitor 40. in FIG. 7b, it will be seen that the voltage has a sloped leading edge and trailing edge. This voltage with the sloped leading and trailing edges is applied to the base of a PNP-type transistor Q15. The transistor Q and the circuit shown provide a first abrupt change from a lower value to an intermediate value on the leading edge of the voltage, and a second abrupt change from an upper value to an intermediate value on the trailing edge of the voltage as shown in FIG. 7c. The first abrupt change is provided by the bias current supplied by a diode rectifier 43 and the voltage network 44, and the second abrupt change is provided by the bias current in the resistor-capacitor network 45. After the voltage has been shaped to provide the form shown in FIG. 70, it is supplied to the audio switch or gate circuit shown in F IG. 5. These abrupt changes are desirable to eliminate transient conditions from operating the audio switch, and the sloped leading and trailing edges are desirable to fade the audio switch in and out so that the phoneme is supplied and then removed from the common output in a controlled manner with prescribed rates of change.
As shown in FIG. 5, each audio switch has signals from its respective storage and playback channel applied to the PNP- type transistor Q16. These signals are normally bypassed to ground by parallel connected and normally conducting PNP- type transistors Q17 and Q18. However, when a positive voltage pulse is provided by the transistor 015, the transistors Q17 and Q18 are turned off, the signals at the emitter of the transistor Q16 are applied to the PNP-type output transistor Q19. The output transistor Q19 amplifies these signals and applies them to the summing amplifier as indicated. As soon as the voltage from the tone shaper circuit or latch-in circuit falls again, the transistors Q17 and Q18 become conductive and bypass or, short circuit the signals. Thus, signals from the phoneme storage and playback system are applied to the output only when the transistors Q17 and Q18 are turned off. As described before, signals from the audio switches are selectively applied to the common summing amplifier where they are added in series or concatenated with other signals to provide the synthesized sound or speech.
It will thus be seen that my invention provides a new and improved sound or speech synthesizer. The synthesizer may accommodate any number of phoneme channels, as opposed to frequency selective systems, and these phoneme channels are selectively concatenated or added in series to a common output. Persons skilled in the art will appreciate that modification may be made to my invention. For example, other ways of storage and playback systems may be used, and other decoder circuits may be used. The time delay and duration circuit may or may not be used in each line. Other latch-in circuits, time delay and duration circuits, and shaper circuits may also be used. However, the circuits shown in schematic diagrams represent preferred embodiments. Therefore, while my invention has been described with reference to a particular embodiment, it is to be understood that modifications may be made without departing from the spirit of the invention or from the scope of the claims.
1 claim: 1. A speech synthesizer comprising: a plurality of gates, each having multiple inputs and an out- P avariable speed storage and playback system having a plurality of playback channels each of which is coupled to an input of a respective one of said gates, each said channel adapted to supply a distinct continuous signal indicative of a distinct continuous phonemic sound to said respective gate, said phonemic sounds when grouped in specified manners in series producing word sounds; a summing amplifier having an Input coupled to the output of each said gate and an output for providing a series of signals indicative of word sounds;
a decoder having n inputs and 2" outputs where n is any natural number, said inputs adapted to receive binary coded signals and to energize not more than one of the 2" outputs in response thereto;
plurality of latch means, each connected to a distinct decoder output and to each of the other latch means and efiective when energized to prevent the energization of any of the remaining latch means;
circuit means connecting each said latched means with a corresponding gate whereby energization of a specific latch means results in the energization of only the corresponding gate thereby passing a corresponding distinct continuous signal to said summing amplifier.
2. The speech synthesizer of claim 1 wherein said variable speed storage and playback system further comprises an input terminal for varying the storage system speed.
3. The speech synthesizer of claim 1 wherein said latch means provides a gating pulse when energized, said circuit means comprising means for modifying said pulse shape so as to extend its duration whereby some overlapping of contiguous phonemic sound signals occurs.
4. The speech synthesizer of claim 1 wherein there are 2" gates, 2" playback channels, 2" latch means, and 2" circuit means.
Claims (4)
1. A speech synthesizer comprising: a plurality of gates, each having multiple inputs and an output; a variable speed storage and playback system having a plurality of playback channels each of which is coupled to an input of a respective one of said gatEs, each said channel adapted to supply a distinct continuous signal indicative of a distinct continuous phonemic sound to said respective gate, said phonemic sounds when grouped in specified manners in series producing word sounds; a summing amplifier having an input coupled to the output of each said gate and an output for providing a series of signals indicative of word sounds; a decoder having n inputs and 2n outputs where n is any natural number, said inputs adapted to receive binary coded signals and to energize not more than one of the 2n outputs in response thereto; a plurality of latch means, each connected to a distinct decoder output and to each of the other latch means and effective when energized to prevent the energization of any of the remaining latch means; circuit means connecting each said latched means with a corresponding gate whereby energization of a specific latch means results in the energization of only the corresponding gate thereby passing a corresponding distinct continuous signal to said summing amplifier.
2. The speech synthesizer of claim 1 wherein said variable speed storage and playback system further comprises an input terminal for varying the storage system speed.
3. The speech synthesizer of claim 1 wherein said latch means provides a gating pulse when energized, said circuit means comprising means for modifying said pulse shape so as to extend its duration whereby some overlapping of contiguous phonemic sound signals occurs.
4. The speech synthesizer of claim 1 wherein there are 2n gates, 2n playback channels, 2n latch means, and 2n circuit means.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US2436070A | 1970-04-15 | 1970-04-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
US3624301A true US3624301A (en) | 1971-11-30 |
Family
ID=21820182
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US24360A Expired - Lifetime US3624301A (en) | 1970-04-15 | 1970-04-15 | Speech synthesizer utilizing stored phonemes |
Country Status (1)
Country | Link |
---|---|
US (1) | US3624301A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3794753A (en) * | 1971-09-16 | 1974-02-26 | Weston D | Synthesis of speech from a magnetic tape matrix storage of phonetic segments |
US3865982A (en) * | 1973-05-15 | 1975-02-11 | Belton Electronics Corp | Digital audiometry apparatus and method |
US3908288A (en) * | 1973-11-19 | 1975-09-30 | Jr Cecil Brown | Teaching device |
JPS5159207A (en) * | 1974-11-20 | 1976-05-24 | Shurago Moza Fuoresuto | |
JPS52122004A (en) * | 1975-11-14 | 1977-10-13 | Mozer Forrest Shrago | Method and device for synthesizing audio |
US4694496A (en) * | 1982-05-18 | 1987-09-15 | Siemens Aktiengesellschaft | Circuit for electronic speech synthesis |
WO1989003573A1 (en) * | 1987-10-09 | 1989-04-20 | Sound Entertainment, Inc. | Generating speech from digitally stored coarticulated speech segments |
US6513007B1 (en) * | 1999-08-05 | 2003-01-28 | Yamaha Corporation | Generating synthesized voice and instrumental sound |
US20040098266A1 (en) * | 2002-11-14 | 2004-05-20 | International Business Machines Corporation | Personal speech font |
US20130080176A1 (en) * | 1999-04-30 | 2013-03-28 | At&T Intellectual Property Ii, L.P. | Methods and Apparatus for Rapid Acoustic Unit Selection From a Large Speech Corpus |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3102165A (en) * | 1961-12-21 | 1963-08-27 | Ibm | Speech synthesis system |
US3221420A (en) * | 1961-11-27 | 1965-12-07 | Paul J Heinberg | Audio-visual teaching machine and method |
US3253263A (en) * | 1961-04-10 | 1966-05-24 | Ibm | Code to voice inquiry system and twospeed multi-unit buffer mechanism |
US3367045A (en) * | 1965-05-28 | 1968-02-06 | Joseph R. Mendez | Key operated phonetic sound and reproducing device |
-
1970
- 1970-04-15 US US24360A patent/US3624301A/en not_active Expired - Lifetime
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3253263A (en) * | 1961-04-10 | 1966-05-24 | Ibm | Code to voice inquiry system and twospeed multi-unit buffer mechanism |
US3221420A (en) * | 1961-11-27 | 1965-12-07 | Paul J Heinberg | Audio-visual teaching machine and method |
US3102165A (en) * | 1961-12-21 | 1963-08-27 | Ibm | Speech synthesis system |
US3367045A (en) * | 1965-05-28 | 1968-02-06 | Joseph R. Mendez | Key operated phonetic sound and reproducing device |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3794753A (en) * | 1971-09-16 | 1974-02-26 | Weston D | Synthesis of speech from a magnetic tape matrix storage of phonetic segments |
US3865982A (en) * | 1973-05-15 | 1975-02-11 | Belton Electronics Corp | Digital audiometry apparatus and method |
US3908288A (en) * | 1973-11-19 | 1975-09-30 | Jr Cecil Brown | Teaching device |
JPS5735479B2 (en) * | 1974-11-20 | 1982-07-29 | ||
JPS5159207A (en) * | 1974-11-20 | 1976-05-24 | Shurago Moza Fuoresuto | |
JPS564195A (en) * | 1974-11-20 | 1981-01-17 | Mozer Forrest Shrago | Voice synthesizer |
JPS5737079B2 (en) * | 1974-11-20 | 1982-08-07 | ||
JPS52122004A (en) * | 1975-11-14 | 1977-10-13 | Mozer Forrest Shrago | Method and device for synthesizing audio |
JPS573960B2 (en) * | 1975-11-14 | 1982-01-23 | ||
US4694496A (en) * | 1982-05-18 | 1987-09-15 | Siemens Aktiengesellschaft | Circuit for electronic speech synthesis |
WO1989003573A1 (en) * | 1987-10-09 | 1989-04-20 | Sound Entertainment, Inc. | Generating speech from digitally stored coarticulated speech segments |
US20130080176A1 (en) * | 1999-04-30 | 2013-03-28 | At&T Intellectual Property Ii, L.P. | Methods and Apparatus for Rapid Acoustic Unit Selection From a Large Speech Corpus |
US8788268B2 (en) * | 1999-04-30 | 2014-07-22 | At&T Intellectual Property Ii, L.P. | Speech synthesis from acoustic units with default values of concatenation cost |
US9236044B2 (en) | 1999-04-30 | 2016-01-12 | At&T Intellectual Property Ii, L.P. | Recording concatenation costs of most common acoustic unit sequential pairs to a concatenation cost database for speech synthesis |
US9691376B2 (en) | 1999-04-30 | 2017-06-27 | Nuance Communications, Inc. | Concatenation cost in speech synthesis for acoustic unit sequential pair using hash table and default concatenation cost |
US6513007B1 (en) * | 1999-08-05 | 2003-01-28 | Yamaha Corporation | Generating synthesized voice and instrumental sound |
US20040098266A1 (en) * | 2002-11-14 | 2004-05-20 | International Business Machines Corporation | Personal speech font |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US3624301A (en) | Speech synthesizer utilizing stored phonemes | |
US4121058A (en) | Voice processor | |
US4412306A (en) | System for minimizing space requirements for storage and transmission of digital signals | |
US3588353A (en) | Speech synthesizer utilizing timewise truncation of adjacent phonemes to provide smooth formant transition | |
US4280192A (en) | Minimum space digital storage of analog information | |
US3575555A (en) | Speech synthesizer providing smooth transistion between adjacent phonemes | |
US3647951A (en) | Edit control circuit for video tape record system | |
US3247328A (en) | Automatic tape programming | |
US3660616A (en) | Dictating and transcribing systems featuring random sentence arrangement with recognition and location of sentences in a preferred sequence | |
US3108265A (en) | Magnetic data recording system | |
GB1391686A (en) | Magnetic recording and reproducing method and system | |
US3936610A (en) | Dual delay line storage sound signal processor | |
US3761888A (en) | Broadcast station logger and printout system | |
US3668648A (en) | Data processing system | |
US3932886A (en) | Method and apparatus for mixing and recording multi-track stereo audio signals which have been recorded as several individual audio signals | |
GB1317565A (en) | Rotary drive systems | |
GB974850A (en) | Speech recognition system | |
US3234332A (en) | Acoustic apparatus and method for analyzing speech | |
US3760388A (en) | Audio waveform for digital recording | |
CA1133124A (en) | Method of recording on magnetic tape attached to a card | |
US2499603A (en) | Rerecording method and system | |
DE69029435T2 (en) | Decoding device for digital signals | |
Harris | A speech synthesizer | |
US3881072A (en) | Audible indexing for dictation apparatus | |
US2405246A (en) | Sound recording system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MAGNAVOX ELECTRONIC SYSTEMS COMPANY Free format text: CHANGE OF NAME;ASSIGNOR:MAGNAVOX GOVERNMENT AND INDUSTRIAL ELECTRONICS COMPANY A CORP. OF DELAWARE;REEL/FRAME:005900/0278 Effective date: 19910916 |