US3067288A - Phonetic typewriter of speech - Google Patents

Phonetic typewriter of speech Download PDF

Info

Publication number
US3067288A
US3067288A US45327A US4532760A US3067288A US 3067288 A US3067288 A US 3067288A US 45327 A US45327 A US 45327A US 4532760 A US4532760 A US 4532760A US 3067288 A US3067288 A US 3067288A
Authority
US
United States
Prior art keywords
signals
phonetic
sound
groups
resonances
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US45327A
Inventor
Meguer V Kalfaian
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US45327A priority Critical patent/US3067288A/en
Application granted granted Critical
Publication of US3067288A publication Critical patent/US3067288A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility

Definitions

  • the present invention relates to the analysis of speech sound waves, and more particularly to methods and means for translating spoken phonetic sounds into discrete signals, during propagation of the sound for the actuation of symbol printing keys, for example, the keys of a modied electric typewriter or the slotted code bars of teletypewriter devices, so that spoken words may be translated into visual typed words.
  • its main object is to provide methods and means for the translation of spoken phonetic sounds into typed phonetic symbols, responsive to all quantities and ranges of voices.
  • a machine may be devised to simulate the interpretive mechanism of human intelligence, in printing spoken words, as spoken by all qualities and ranges of voices, without environmental control adjustments, or preadjustments to any particular voice, it is necessary that all environmental variables are first standardized during propagation of the sound Waves, so that standard sets of parameters may be derived to collectively dene diii'erent phonetic sounds of the spoken words.
  • the rst condition is a function o producing pure phonetic sounds
  • the second condition is a function of adding character, or quality, to the sound.
  • a pure phonetic sound consists of a set of resonances whose ratios in frequency positions and amplitude levels one with respect to another remain constant, no matter what band of the voice spectrum they are produced in. This statement is true only when the sound is produced intelligibly.
  • the Word intelligible defines a condition in which the brain understands the phonetic sound without resorting to imagination. Such imagination is common among the average people, as it takes life long practice to precisely control the various movable elements of the vocal system, in producing precise musical notes or characteristically diierent phonetic sounds.
  • the character ization frequency components are inconsistent in form, and they vary in complex manner with the varying pitch (fundamental frequency) of the same speakers voice.
  • Each pure phonetit 3ound comprises a repetition of substantially replica Wave patterns (except some plosive sounds, such as the sound k; the wave pattern of this particular sound being also distinguishable for selection), which are formed by the presence of a particular set of resonances.
  • the succession of these wave patterns is effected by fairly regular memets of air from the glottis, which are set into vibration in the momentarily formed resonant cavities of the vocal system. As each puff of air enters these cavities, an initial surge of pressure is formed, and each pattern is commenced by a high peaked wave, the series of occurrences of which, for convenience, we may call as major peaks of the propagated sound Wave.
  • a wave pattern may be isolated from the propagated speech sound waves, for analysis, by selecting the wave portion between two major peaks.
  • the problem is to provide methods and means for stretching and compressing these isolated wave patterns, in a manner that, all wave patterns of diiierent phonetic sounds, as spoken by different pitched voices, will have the same time base period. in this manner, ⁇ the basic resonances of each phonetic sound, regardloss of the speakers pitch, would be located in a standard region oi the yvoice spectrum, for analysis and recognition of the phonetic sound.
  • the rst recorded wave pattern is reproduced by the first rcproducer under control of the first signal quantity, so adjusted that, the first recorded wave pattern is reproduced in a predetermined standard time base pe riod.
  • the same process is repeated with the second recorded wave pattern, so that the end result is a cyclic reproduction of the wave patterns of the propagated sound wave at a standard time base period.
  • the standard time base period may be adjusted to be several times shorter than the shortest time base period occurring in ordinary speech sound waves; the number of reproduced wave patterns will be many more than the actual recorded wave patterns, which is more desirable for more accurate resonance analysis.
  • the characterization frequency components are different than those repeated basic resonances, and are interrnixed with the basic resonances.
  • the first group of basic resonances are more distinguishable than the secondary ones, and accordingly, selection of the first group is more desirable for translation into visible intelligible indicia.
  • the present invention contemplates first a system of frequency normalization, so that all the resonances of importance are first shifted to locations where their frequency ratios with respect to a fixed reference fundamental frequency remain constant. These frequency-normalized waves are then applied to a plurality of tuned circuits, which are tuned to different harmonic frequencies of the reference fundamental frequency, and their outputs are further detected to obtain unidirectional signals. These detected signals are grouped in preknown combinations, and the signals of each group are combined in such amplitudes and polarities as to effect n Zero, or minimum, output signal when all the resonances in a particular phonetic sound are present with their proper amplitude ratios with respect one to another; this minimum output signal representing the particular phonetic sound.
  • the output of only one group will assume minimum signal, representing an incoming phonetic sound, and the remaining groups will have substantially large signal outputs.
  • the output of the group having minimum signal amplitude is then selected for translation into a representative symbol of the originally spoken phonetic sound.
  • FIG. 1 and FIG. 2 are mostly in block diagram with references of existing inventions.
  • the original speech sound waves are spectrum-normalized in amplifier block 11, and the output of this amplifier is applied to resonant circuits in blocks 12 through 20, which are tuned to harmonic frequencies f1 through fg of the reference fundamental frequency of the spectrum normalized speech sound waves in block 11; the frequency f1 being the second harmonic of said fundamental.
  • the output oscillations of blocks 12 through 20 are rectified and detected in blocks 21 through 29 in positive polarity, and in blocks 22 through 38 in negative polarity; these positive and negative polarity detected signals being impressed across output load resistors R1 through R18 respectively.
  • the resistive values of R1 to R18 are preferably chosen to be very low, and they are preferably arranged in cathode follower circuits of vacuum tubes, or in emitter follower circuits of transistors. These resistors are further tapped at various pre-known values, for example, each resistor is shown in the drawing with four taps; but the number of these taps varies on each resistor. As stated in the foregoing, each phonetic sound consists of a set of resonances having definite frequency ratios and amplitude levels with respect to the fundamental; each set comprising three or four resonances.
  • the output of one group may assume zero, or minimum, voltage level when a phonetic sound containing these given conditions is present, while the outputs of other groups will have large signals due to inability of cancellation.
  • the drawing there is shown for one group a combination of four terminals from the taps across resistors Rl, R10, Ril and R18, which are coupled to the common output load resistor R19 through coupling capacitors C1 to C4, respectively.
  • the second group there is shown a combination of four terminals from the taps across resistors R3, R6, R9 and R18, which are coupled to the common output load resistor R20 through coupling capacitors C5 to C8, respectively.
  • the incoming phonetic sound comprises the resonances at f1, f5, f6 and fg in such amplitude ratios one with respect to another that the capacitors C1 and C2 will see voltages of equal amplitudes and in opposite polarities, and the capacitors C3, C4 will see voltages of equal amplitudes and in opposite polarities.
  • the voltage across output load resistor will then be zero, while the voltage across output load resistor R20 must be substantially high, due to lack of such cancellation.
  • these outputs are first full-wave rectified and amplified in blocks 39, 40, etc.
  • the reason for full-Wave rectification in these blocks is that, the output signal voltages across R19, R20, etc., may be either in positive or negative polarity, in random fashion, due to said cancellations by oppositely poled signals.
  • the outputs of amplifiers in blocks 39, 40 are connected to relays RYl, RYZ, respectively, and these outputs are normally adjusted to draw minimum currents, so that the relays will normally be deenergized.
  • Each of these relays contains an armature 41, a normally closed circuit contact (to said armature) 42, and a normally open circuit contact (to said armature) 43.
  • the armatures, for example, 41 and 44 through 47, etc., of all the relays are connected in par-allel, with no further connection to any other source.
  • the open circuit contacts, for example, 43 and 48 through 51, etc. are connected in parallel, and further to one terminal of battery B1.
  • the closed circuit contacts for example, contacts 42 and 52 through 55, are connected to solenoids RYB through RY7, respectively.
  • the armatures 44 to 47 will establish electrical contact from battery B1 to the solenoid RY3, via parallel connected contacts 48 through 51, for operation of said solenoid RYS, which in turn pulls the predesignated key of the typewriter in block 56, for printing a visual symbol representative of the incoming spoken phonetic sound.
  • the inductive coils of relays RYl, RY 2, etc. may be shunted by capacitors, for example, C9, O10, etc., to delay the release time of their armatures, for example, armatures 41 and 44 through 47, etc., so as to allow time for the relatively slowly operating solenoids, for example, RY3 through RY7, etc., when the output currents of amplifiers, for example, in blocks 39, 40, etc., are arranged to be in pulses.
  • the relay arrangement, -as shown, and the system of combining the various detected signals can be modied, and various substitutions of parts can be made without departing from the true spirit and scope of the invention.
  • the various grouped signals may be combined in two dimensions by deecting a cathode ray beam, such as shown in my Patent No. 2,673,893, March 30, 1954.
  • the angular deection of the beam may be represented as the arriving phonetic sound.
  • the mechanical relays RY-l, RYZ, etc. may not be suitable due to their sluggish operations. For this reason, it may be necessary to rst prolong these pulses, for example, by an arrangement as shown in block diagram of FIG. 2.
  • the block 58 may represent, for example, the block 39 of FIG. l.
  • the output pulses of amplifier 5S is applied to the block S9, which is a gate having two input terminals 60 and 61.
  • This gate may be either a vacuum tube having first and second intensity control electrodes, such as the type 5915; or a transistor having rst and second base elements, such as the type 3N36; or two triode transistors connected in series so that the two respective base elements can be used as rst and second control elements.
  • the main purpose of these devices having two control elements is that, they may be used as gates in olf-condition when any one or both of the two control elements are biased in backward direction, and in on-condition only when both of the control elements are biased in forward directions.
  • the gate 59 can be set into on-condition only when the input terminals 60 and 61 are simultaneously biased in positive polarity; but also assume that the input terminal 60 is normally biased in positive polarity and the input terminal 61 is normally biased in negative polarity.
  • a positive pulse from block 62 applied to the input terminal 61 will energize the gate 59, which in turn will produce an output pulse and operate the one-shot multivibrator in block 63.
  • This block 63 can then prolong the output pulse to any desired time period for the operation of a relay or a solenoid of a typewriter.
  • the amplifier 58 produces a negative pulse and applies to the input terminal 60 of gate block S9, so that the gate becomes inactive, even though a positive pulse arrives at its input terminal 61.
  • the gate 59 operates for representation of the incoming phonetic sound.
  • the pulses generated in block 62 may be derived from block 2 in FIG. l, and so phased that they are produced coincident with the output pulses of amplier block 58, etc.
  • the speech sound waves originating from block 1 are rst applied to the block of fundamental (pitch) frequency selector 2, and to the block of stepwise automatic gain control device 3.
  • the function of the fundamental frequency selector is to produce at its output pulse-signals coincident with the termination of each arriving wave train (wave pattern of speech sound waves).
  • These pulse-signals are applied to an alternate switch 4, which alternates its state of operation at each arriving signal-pulse, and imparts the operation of a two section Iscanning system comprising blocks scan-record 5, scan-read 6, and scan-read 7, scanrecord 8, alternately in relative time period with the arriving wave patterns in block 1.
  • each section of the scanning system is associated a memory device, block 9 or block 10, which is provided for recording and reproducing the original sound waves by the scanning action, for example, the scan-record block 5 and scan-read block 7.
  • the memory devices 9 and 10 are set into recording or reproducing (reading) action by the alternate output voltages of the alternate switch 4, in an arrangement that, when memory device 9 is in recording position, for example by s-can-record block 5, the memory device 10 is in reading position of a previously recorded wave pattern, for example, by read-scan block 6, and vice-versa.
  • the original wave patterns are successively recorded and read in alternate sequence by the memory devices 9 and 10.
  • the original speech sound waves in block 1 Prior to recording action, the original speech sound waves in block 1 are rst applied to a stepwise gain control block 3, which equalizes the peak output amplitude of each successive input wave pattern individually before application upon the memory devices 9 and 10 for recording.
  • the combined output of memory tubes 9 and 10 is then applied upon the input of amplifier 11 for amplification to a useful magnitude, and nally applied to the resonant circuits in blocks 12 through 20, etc.
  • a phonetic sound consists of a train of substantially replica Wave patterns.
  • the repetition frequency rate of these wave patterns is determined by repeated puls of air from the glottis, which is variable, and ranges a frequency rate from 60 to 600 repetitions accross per second.
  • the minimum time that the physical elements can go through in making a complete cycle of change in position is not less than V10 second. Consequently, the puffs of air must stop functioning, for at least 1/10 of a second before a succeeding word is pronounced.
  • the fundamental frequency selector in block 2 is capable of producing marker signals at the arrival of each succeeding wave pattern, these marker signals are applied to the block 57, which in turn measures the time period between these marker signals. When the time period between these marker signals exceeds 1A@ second, then the word-advance block 57 transmits a current pulse to the typewriter block 56 for operating and advancing its carriage a letter space.
  • the system for identifying these phonetic sound waves which comprises: means for producing said phonetic sound waves; means for selecting said groups of resonances from the produced sound waves; means for deriving individually identified unidirectional but of unlike poled signals from said selected groups of resonances, respectively; means for combining the unlike poled signals in each of said groups; means for preadjusting the magnitudes of said signals so that the resultant output signals of said combined signals in each of said groups will have minimum value by virtue of cancellation only when the amplitude ratios of said group of resonances in the produced sound waves are in accord with said amplitude preadjustments; and means for deriving discrete signals from the states of last said signals of minimum amplitudes as identifications of pre-l known phonetic sounds.
  • the system for identifying these phonetic sounds which comprises: means for producing said phonetic sound waves; frequency transposer means and means therefor for re-shifting the frequency positions of the grouped resonances of the produced sound waves to standard frequency locations; means for selecting said groups of resonances from the re-shifted sound waves; means for deriving individually identified unidirectional but of unlike poled signals from said selected groups of resonances, respectively; means for combining the unlike poled signals in each of said groups; means for preadjusting the magnitudes of said derived signals so that the resultant output signals of said combined signals in each of said groups will have minimum value by virtue of cancellation only when the amplitude ratios of said group of resonances in the produced sound waves are in accord with said amplitude preadjustments; and means for deriving discrete signals from the states of last said signals of minimum
  • said means for deriving discrete signals comprises: means for converting the states of said signals of minimum amplitudes to auxiliary first signals; means for deriving second auxiliary signals from said produced phonetic sound waves; a plurality of normally inoperative gates, one gate for each phonetic sound to be identified, respectively, each gate preadjusted to be operated only by simultaneous applications of said first and second auxiliary signals; and means for applying the first and second auxiliary signals simultaneously upon respective said gates for operation, whereby the operating states of CII said gates may be represented as identifications of preknown phonetic sounds.
  • said means for deriving discrete signals comprises: means for converting the states of said signals of minimum am'- piitudes to auxiliary first signals; means for deriving second auxiliary signals from said produced phonetic sound waves; a plurality of normally inoperative gates, one gate for each phonetic sound to be identified, respectively, each gate preadjusted to -be operated only by simultaneous applications of said first and second auxiliary signals; and means for applying the first and second auxiliary signals simultaneously upon respective said gates 4for operation, whereby the operating states of said gates may be represented as identifications of preknown phonetic sounds.
  • said means for deriving individually identified unidirectional but of uplike poled signals from said selected groups of resonances; said means for combining the unlike poled signals in each of said groups; and said means for preadjusting the magnitudes of said derived signals comprise: first and second sets of plurality of groups of detectors for detecting said selected groups of resonances, respectively, the first sets prearranged to detect the signals in positive polarity and the second sets to detect in negative polarity; resistance loads at the outputs of said first and second sets of detectors, respectively, each of last said resistances having plurality of voltage dividing taps; plurality of groups of capacitors, each capacitor having first and second terminals, the first terminals of the groups of capacitors connected to respective groups of taps of said first and second sets of resistance loads, and the second terminals of each of the groups of said capacitors connected in parallel, the magnitudes of said connections of the taps being so proportioned as to obtain said minimum signals at said parallel connections.
  • said means for deriving individually identified unidirectional but of unlike poled signals from said selected groups of resonances; said means for combining the unlike poled signals in each of said groups; and said means for preadjusting the magnitudes of said derived signals comprise: first and second sets of plurality of groups of detectors for detecting said selected groups of resonances, respectively, the first sets prearranged to detect the signals in positive polarity and the second sets to detect in negative polarity; resistance loads at the outputs of said first and second sets of detectors, respectively, each of last said resistances having plurality of voltage dividing taps; and plurality of groups of capacitors, each capacitor having first and second terminals, the first terminals of the groups of capacitors connected to respective groups of taps of said first and second sets of resistance loads, and the second terminals of each of the groups of said capacitors connected in parallel, the magnitudes of said connections of the taps being so proportioned as to obtain minimum signals at said parallel connections.
  • each phonetic sound is identified by a group of resonances having definite frequency ratios and approximate amplitude levels one with respect to another, and wherein the frequency locations of these groups are shifted in the entire voice spectrum
  • the system for identifying these phonetic sounds which comprises: means for producing said phonetic sound waves; frequency transposer means and means therefor for reshifting the frequency positions of the grouped resonances of the produced sound waves to standard frequency locations; means for selecting said group of resonances from the re-shifted sound Waves; first and second sets of plurality of groups of detectors for detecting said selected groups of resonances, respectively, the first sets prearranged to detect the signals in positive polarity and the second sets to detect in negative polarity; resistance loads at the outputs of said first and second sets of detectors, respectively, each of last said resistances having plurality of voltage dividing taps; plurality of groups of capacitors, each capacitor having rst and second terminals, the first terminals of the groups of capacitors connected to respective groups of taps of said rst and second sets of resistance loads, and the second terminals of each of the groups of said capacitors connected in parallel, the magnitudes of said connections of the taps being so proportioned as to obtain minimum signals at said parallel connections by way of cancellation only when

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Description

Dec. 4, 1962 M. v. KALFAIAN PHONETIC TYPEWRITER OF SPEECH 2 Sheets-Sheet l Filed July 2e, 1960 f1 /VTAL @JWM Dec. 4, 1962 Filed July ze, 1960 v M. v. KALFAIAN 3,067,288
PHONETIC TYPEWRITER OF SPEECH 2 Sheets-Sheet 2 59 63 l T o? L n Raz/1y 0R GATE ONE SHOT- nam Pfg. 2-
IN VEN TOR.
United States Patent Oli-ice 3,@'L288 Patented Dec. 4, 1962 3,067,288 PHONETIC TYPEWRTER 0F SPEECH Mleguer V. Kalfaian, 962 Hyperion Ave., Los Angeles 29, Calif. Filed July 26, 1960, Ser. No. 45,327 7 Claims. (Cl. 179-1) The present invention relates to the analysis of speech sound waves, and more particularly to methods and means for translating spoken phonetic sounds into discrete signals, during propagation of the sound for the actuation of symbol printing keys, for example, the keys of a modied electric typewriter or the slotted code bars of teletypewriter devices, so that spoken words may be translated into visual typed words. its main object is to provide methods and means for the translation of spoken phonetic sounds into typed phonetic symbols, responsive to all quantities and ranges of voices.
In order that a machine, or the like, may be devised to simulate the interpretive mechanism of human intelligence, in printing spoken words, as spoken by all qualities and ranges of voices, without environmental control adjustments, or preadjustments to any particular voice, it is necessary that all environmental variables are first standardized during propagation of the sound Waves, so that standard sets of parameters may be derived to collectively dene diii'erent phonetic sounds of the spoken words. Of the many existing complexities, there are two separate conditions that must be considered in the analysis of spoken sound waves. The rst condition is a function o producing pure phonetic sounds, and the second condition is a function of adding character, or quality, to the sound. For example, a pure phonetic sound consists of a set of resonances whose ratios in frequency positions and amplitude levels one with respect to another remain constant, no matter what band of the voice spectrum they are produced in. This statement is true only when the sound is produced intelligibly. The Word intelligible defines a condition in which the brain understands the phonetic sound without resorting to imagination. Such imagination is common among the average people, as it takes life long practice to precisely control the various movable elements of the vocal system, in producing precise musical notes or characteristically diierent phonetic sounds. The character ization frequency components, however, are inconsistent in form, and they vary in complex manner with the varying pitch (fundamental frequency) of the same speakers voice.
Definition of Characterization Complexz'ties To denne characterization complexities, let one speaker pronounce a certain phonetic sound (in natural voice) at first and second fundamental frequencies. The listener can easily recognize the characteristic quality of the sound to be of the same speaker, at the same time recognizing the phonetic sound. But when the sound at the rst fundamental is recorded and reproduced at the second fundamental (by speeding or retarding the time base of reproduction), the listener can easily detect the phonetic sound but cannot recognize the characteristic quality or the voice. It is thus seen that the enormous variation of ordinary speech sound waves, with regard to complexities, are mostly caused by the characterization components, which change in form as the speakers pitch changes. if by some means these characterization frequency components could be removed from normal speech sound waves, then the wave patterns of each phonetic sound would have the same shape, regardless of the speakers pitch; except, of course, that these wave patterns would have different time bases.
information in a Wave Pattern and Isolation of Same Each pure phonetit 3ound comprises a repetition of substantially replica Wave patterns (except some plosive sounds, such as the sound k; the wave pattern of this particular sound being also distinguishable for selection), which are formed by the presence of a particular set of resonances. The succession of these wave patterns is effected by fairly regular puits of air from the glottis, which are set into vibration in the momentarily formed resonant cavities of the vocal system. As each puff of air enters these cavities, an initial surge of pressure is formed, and each pattern is commenced by a high peaked wave, the series of occurrences of which, for convenience, we may call as major peaks of the propagated sound Wave. Since the brain recognizes a phonetic sound regardless of the length of its duration, and since a phonetic sound comprises a repetition of replica wave patterns, it is then logical to say that a single wave pattern contains all the information necessary for phonetic sound analysis. Thus, a wave pattern may be isolated from the propagated speech sound waves, for analysis, by selecting the wave portion between two major peaks.
Necessity for Standordzng the Time Bases of Wave Patterns The problem, therefore, is to provide methods and means for stretching and compressing these isolated wave patterns, in a manner that, all wave patterns of diiierent phonetic sounds, as spoken by different pitched voices, will have the same time base period. in this manner, `the basic resonances of each phonetic sound, regardloss of the speakers pitch, would be located in a standard region oi the yvoice spectrum, for analysis and recognition of the phonetic sound.
Mel/lod for Stnndardz'zing the Time Bases of Wave Patterns ln one mode of practice, there may be used two recorders and two reproducers. ln operation, assume that one incoming7 Wave pattern is selected (during propagation of the sound wave) and recorded on a number one recorder, and the succeeding wave pattern is selected and recorded on the number two recorder. While the rst recording is processed, its time length (from inception to termination of the Wave pattern) is measured and stored in the form oi a rst signal quantity. Then, while the second recording is processed on the number two recorder, the rst recorded wave pattern is reproduced by the first rcproducer under control of the first signal quantity, so adjusted that, the first recorded wave pattern is reproduced in a predetermined standard time base pe riod. The same process is repeated with the second recorded wave pattern, so that the end result is a cyclic reproduction of the wave patterns of the propagated sound wave at a standard time base period. In order to allow time for reproduction of the recorded wave patterns prior to the arrival of successive wave patterns, the standard time base period may be adjusted to be several times shorter than the shortest time base period occurring in ordinary speech sound waves; the number of reproduced wave patterns will be many more than the actual recorded wave patterns, which is more desirable for more accurate resonance analysis.
Limited Bandwidth in Which Groups of Basic Resonances Are Found In most of the vowel sounds, the groups of basic resonances representing these sounds are repeated twice, and sometimes three times; particularly occurring in very low pitched voices. This 'condition may be proven by filtering out the upper or lower half portion of the entire frequency bandwidth in which the particular vowel sound had been spoken. The listener can easily recognize the spoken phonetic sound from either one of the unfiltered portions of the sound wave; but cannot recognize the characteristic quality of the voice, since as stated previously, elimination or shifting in position of an appreciable number of frequency components from the voiced sound waves deteriorates the voice color. lso, as stated in the foregoing, the characterization frequency components are different than those repeated basic resonances, and are interrnixed with the basic resonances. The first group of basic resonances, however, are more distinguishable than the secondary ones, and accordingly, selection of the first group is more desirable for translation into visible intelligible indicia. We may thus conclude that with the adoption of spectrum normalization, the frequenc spacings between the basic resonances will be much closer one with respect to another, than expected in view of the larger number of frequency components associated with each spoken phonetic sound. In actual tests, I have found that with spectrum normalization less than fifteen harmonies of the fixed reference fundamental will be needed for recognition of all phonetic sounds.
System of Phonetic Sound Recognition With the adoption of spectrum normalization, as described in the foregoing, it is only necessary to employ groups of resonant circuits, each group tuned to the particular group of basic resonances representing a phonetic sound. By detecting and preadjusting the amplitude ratios of the outputs of these circuits, a relay system may then be associated and coacted in such a manner, that one particular relay is Operated when all the basic resonances in their proper amplitude levels in a group are present simultaneously; that relay interpreting the phonetic sound in terms of visual indicia, or coded signals for transmission purposes.
In its broader aspects, the present invention contemplates first a system of frequency normalization, so that all the resonances of importance are first shifted to locations where their frequency ratios with respect to a fixed reference fundamental frequency remain constant. These frequency-normalized waves are then applied to a plurality of tuned circuits, which are tuned to different harmonic frequencies of the reference fundamental frequency, and their outputs are further detected to obtain unidirectional signals. These detected signals are grouped in preknown combinations, and the signals of each group are combined in such amplitudes and polarities as to effect n Zero, or minimum, output signal when all the resonances in a particular phonetic sound are present with their proper amplitude ratios with respect one to another; this minimum output signal representing the particular phonetic sound. In this mode of operation, the output of only one group will assume minimum signal, representing an incoming phonetic sound, and the remaining groups will have substantially large signal outputs. During this signal-on time period, the output of the group having minimum signal amplitude is then selected for translation into a representative symbol of the originally spoken phonetic sound.
Having briefly described the broader embodiments of the invention, a more detailed specification will now be followed in connection with the accompanying drawings of FIG. 1 and FIG. 2, which are mostly in block diagram with references of existing inventions.
Referring now to FIG. 1, the original speech sound waves are spectrum-normalized in amplifier block 11, and the output of this amplifier is applied to resonant circuits in blocks 12 through 20, which are tuned to harmonic frequencies f1 through fg of the reference fundamental frequency of the spectrum normalized speech sound waves in block 11; the frequency f1 being the second harmonic of said fundamental. The output oscillations of blocks 12 through 20 are rectified and detected in blocks 21 through 29 in positive polarity, and in blocks 22 through 38 in negative polarity; these positive and negative polarity detected signals being impressed across output load resistors R1 through R18 respectively. The resistive values of R1 to R18 are preferably chosen to be very low, and they are preferably arranged in cathode follower circuits of vacuum tubes, or in emitter follower circuits of transistors. These resistors are further tapped at various pre-known values, for example, each resistor is shown in the drawing with four taps; but the number of these taps varies on each resistor. As stated in the foregoing, each phonetic sound consists of a set of resonances having definite frequency ratios and amplitude levels with respect to the fundamental; each set comprising three or four resonances. Thus by combining the voltages across resistors R1 through R18 in pre-known groups and in preknown phase and tap levels, the output of one group may assume zero, or minimum, voltage level when a phonetic sound containing these given conditions is present, while the outputs of other groups will have large signals due to inability of cancellation. For example, in the drawing there is shown for one group a combination of four terminals from the taps across resistors Rl, R10, Ril and R18, which are coupled to the common output load resistor R19 through coupling capacitors C1 to C4, respectively. For the second group, there is shown a combination of four terminals from the taps across resistors R3, R6, R9 and R18, which are coupled to the common output load resistor R20 through coupling capacitors C5 to C8, respectively. Assume then that the incoming phonetic sound comprises the resonances at f1, f5, f6 and fg in such amplitude ratios one with respect to another that the capacitors C1 and C2 will see voltages of equal amplitudes and in opposite polarities, and the capacitors C3, C4 will see voltages of equal amplitudes and in opposite polarities. The voltage across output load resistor will then be zero, while the voltage across output load resistor R20 must be substantially high, due to lack of such cancellation. The statement that a large voltage across output load resistor R20 will be substantially high, is true, because in ordinary speech a spoken phonetic sound is never pure, and as stated in the foregoing, characterization frequency components are always present. Accordingly, the output of only one group of these combinations will assume Zero, or minimum, when the incoming phonetic sound impresses said conditions, while the remaining groups, as combined from across resistors R1 through R18, will be unbalanced and produce output signals of large proportions. Thus the output zero, or minimum signal, of one group of combinations may be distinguished from all of the different combinations at a time, in representing as the originally spoke phonetic sound. As stated in the foregoing, a phonetic sound may have three resonances for recognition. With such odd number, it is also simple to obtain zero output signal by combining two signals across resistors R1 to R18 in one polarity, and ad- Justing the signal amplitude of the third signal in opposite p olarity as to cancel out the output signal to zero. or minimum value. Thus the system just described is flexible for any number of combinations of signals, as desired.
In order to distinguish between the group of combined signals having zero output with that of the group of combined signals having large output signals, these outputs, for example, across output load resistors R19 and R20, are first full-wave rectified and amplified in blocks 39, 40, etc. The reason for full-Wave rectification in these blocks is that, the output signal voltages across R19, R20, etc., may be either in positive or negative polarity, in random fashion, due to said cancellations by oppositely poled signals. The outputs of amplifiers in blocks 39, 40 are connected to relays RYl, RYZ, respectively, and these outputs are normally adjusted to draw minimum currents, so that the relays will normally be deenergized. The one relay receiving zero, or minimum, current will not operate, whereas the remaining relays receiving amplified output currents will operate simultaneously. Each of these relays, for example, relay RY1, contains an armature 41, a normally closed circuit contact (to said armature) 42, and a normally open circuit contact (to said armature) 43. The armatures, for example, 41 and 44 through 47, etc., of all the relays are connected in par-allel, with no further connection to any other source. The open circuit contacts, for example, 43 and 48 through 51, etc., are connected in parallel, and further to one terminal of battery B1. The closed circuit contacts, for example, contacts 42 and 52 through 55, are connected to solenoids RYB through RY7, respectively. With the arrangement as shown, it is obvious that the solenoids RYS through RY7, etc., rem-ain normally deenergized, due to incomplete circuit to the battery B1. Assume, however, that one of the relays, for example, relay RYl remains deenergized during an incoming signal, and the rest of the relays, for example, relay R2, etc., are energized simultaneously. The armature 41 of relay RY-1 will remain in its neutral position, and the armatures of the rest of the relays, for example, relay RYZ, etc., will be pulled to make electrical contacts with their respective open circuit contacts, .for example, armatures 44 through 47 to open circuit contacts 48 through 51, respectively. The armatures 44 to 47 will establish electrical contact from battery B1 to the solenoid RY3, via parallel connected contacts 48 through 51, for operation of said solenoid RYS, which in turn pulls the predesignated key of the typewriter in block 56, for printing a visual symbol representative of the incoming spoken phonetic sound. The inductive coils of relays RYl, RY 2, etc., may be shunted by capacitors, for example, C9, O10, etc., to delay the release time of their armatures, for example, armatures 41 and 44 through 47, etc., so as to allow time for the relatively slowly operating solenoids, for example, RY3 through RY7, etc., when the output currents of amplifiers, for example, in blocks 39, 40, etc., are arranged to be in pulses. Of course, the relay arrangement, -as shown, and the system of combining the various detected signals can be modied, and various substitutions of parts can be made without departing from the true spirit and scope of the invention. Other adaptations may also be made, for example, the various grouped signals may be combined in two dimensions by deecting a cathode ray beam, such as shown in my Patent No. 2,673,893, March 30, 1954. In such an arrangement, the angular deection of the beam may be represented as the arriving phonetic sound.
When the outputs of ampliers 39, 40, etc., are in narrow pulses, the mechanical relays RY-l, RYZ, etc., may not be suitable due to their sluggish operations. For this reason, it may be necessary to rst prolong these pulses, for example, by an arrangement as shown in block diagram of FIG. 2. In this case, the block 58 may represent, for example, the block 39 of FIG. l. The output pulses of amplifier 5S is applied to the block S9, which is a gate having two input terminals 60 and 61. This gate may be either a vacuum tube having first and second intensity control electrodes, such as the type 5915; or a transistor having rst and second base elements, such as the type 3N36; or two triode transistors connected in series so that the two respective base elements can be used as rst and second control elements. The main purpose of these devices having two control elements is that, they may be used as gates in olf-condition when any one or both of the two control elements are biased in backward direction, and in on-condition only when both of the control elements are biased in forward directions. Assume then, for example, that the gate 59 can be set into on-condition only when the input terminals 60 and 61 are simultaneously biased in positive polarity; but also assume that the input terminal 60 is normally biased in positive polarity and the input terminal 61 is normally biased in negative polarity. In this given condition, a positive pulse from block 62 applied to the input terminal 61 will energize the gate 59, which in turn will produce an output pulse and operate the one-shot multivibrator in block 63. This block 63 can then prolong the output pulse to any desired time period for the operation of a relay or a solenoid of a typewriter. At the time that block 62 produces a positive pulse, the amplifier 58 produces a negative pulse and applies to the input terminal 60 of gate block S9, so that the gate becomes inactive, even though a positive pulse arrives at its input terminal 61. However, when the output of -amplier 58 is zero, or minimum, such as is the case when indicating the presence of an incoming phonetic sound, then the gate 59 operates for representation of the incoming phonetic sound. The pulses generated in block 62 may be derived from block 2 in FIG. l, and so phased that they are produced coincident with the output pulses of amplier block 58, etc.
Detailed circuitry for spectrum normalization has been disclosed in my Patent No. 2,921,133, January 12, 1960; and accordingly, further specication is not necessary herein, as reference m-ay be made to said patent. However, the system of spectrum normalization may be briefly described by way of block diagrams, as in the following.
Referring to FIG. 1, the speech sound waves originating from block 1, are rst applied to the block of fundamental (pitch) frequency selector 2, and to the block of stepwise automatic gain control device 3. The function of the fundamental frequency selector is to produce at its output pulse-signals coincident with the termination of each arriving wave train (wave pattern of speech sound waves). These pulse-signals are applied to an alternate switch 4, which alternates its state of operation at each arriving signal-pulse, and imparts the operation of a two section Iscanning system comprising blocks scan-record 5, scan-read 6, and scan-read 7, scanrecord 8, alternately in relative time period with the arriving wave patterns in block 1. With each section of the scanning system is associated a memory device, block 9 or block 10, which is provided for recording and reproducing the original sound waves by the scanning action, for example, the scan-record block 5 and scan-read block 7. The memory devices 9 and 10 are set into recording or reproducing (reading) action by the alternate output voltages of the alternate switch 4, in an arrangement that, when memory device 9 is in recording position, for example by s-can-record block 5, the memory device 10 is in reading position of a previously recorded wave pattern, for example, by read-scan block 6, and vice-versa. Thus, the original wave patterns are successively recorded and read in alternate sequence by the memory devices 9 and 10.
Prior to recording action, the original speech sound waves in block 1 are rst applied to a stepwise gain control block 3, which equalizes the peak output amplitude of each successive input wave pattern individually before application upon the memory devices 9 and 10 for recording. The combined output of memory tubes 9 and 10 is then applied upon the input of amplifier 11 for amplification to a useful magnitude, and nally applied to the resonant circuits in blocks 12 through 20, etc. When the number of reproductions of each recorded wave pattern is to be fixed by counting, then reference may be made to disclosures in my patent applications Serial No. 857,121, led December 3, 1959, and Serial No. 3,350, led January 19, 1960.
In typing spoken words, it is usually desirable that spacing between words be provided. This is accomplished by the word-advance block S7. As described in the foregoing, a phonetic sound consists of a train of substantially replica Wave patterns. The repetition frequency rate of these wave patterns is determined by repeated puls of air from the glottis, which is variable, and ranges a frequency rate from 60 to 600 repetitions accross per second. In pronouncing a whole word, however, the minimum time that the physical elements can go through in making a complete cycle of change in position is not less than V10 second. Consequently, the puffs of air must stop functioning, for at least 1/10 of a second before a succeeding word is pronounced. Since the fundamental frequency selector in block 2 is capable of producing marker signals at the arrival of each succeeding wave pattern, these marker signals are applied to the block 57, which in turn measures the time period between these marker signals. When the time period between these marker signals exceeds 1A@ second, then the word-advance block 57 transmits a current pulse to the typewriter block 56 for operating and advancing its carriage a letter space.
Having described the various features and objects of the present invention, what I claim is:
l. In spoken sound waves where each phonetic sound is identified by a group of resonances having definite frequency ratios and approximate amplitude levels one with respect to another, the system for identifying these phonetic sound waves which comprises: means for producing said phonetic sound waves; means for selecting said groups of resonances from the produced sound waves; means for deriving individually identified unidirectional but of unlike poled signals from said selected groups of resonances, respectively; means for combining the unlike poled signals in each of said groups; means for preadjusting the magnitudes of said signals so that the resultant output signals of said combined signals in each of said groups will have minimum value by virtue of cancellation only when the amplitude ratios of said group of resonances in the produced sound waves are in accord with said amplitude preadjustments; and means for deriving discrete signals from the states of last said signals of minimum amplitudes as identifications of pre-l known phonetic sounds.
2. In spoken sound waves where each phonetic sound is identified by a group of resonances having definite frequency ratios and approximate amplitude levels one with respect to another, and wherein the frequency locations of these groups are shifted in the entire voice spectrum, the system for identifying these phonetic sounds which comprises: means for producing said phonetic sound waves; frequency transposer means and means therefor for re-shifting the frequency positions of the grouped resonances of the produced sound waves to standard frequency locations; means for selecting said groups of resonances from the re-shifted sound waves; means for deriving individually identified unidirectional but of unlike poled signals from said selected groups of resonances, respectively; means for combining the unlike poled signals in each of said groups; means for preadjusting the magnitudes of said derived signals so that the resultant output signals of said combined signals in each of said groups will have minimum value by virtue of cancellation only when the amplitude ratios of said group of resonances in the produced sound waves are in accord with said amplitude preadjustments; and means for deriving discrete signals from the states of last said signals of minimum amplitudes as identifications of preknown phonetic sounds.
3. Apparatus in a system as defined in claim l, wherein said means for deriving discrete signals comprises: means for converting the states of said signals of minimum amplitudes to auxiliary first signals; means for deriving second auxiliary signals from said produced phonetic sound waves; a plurality of normally inoperative gates, one gate for each phonetic sound to be identified, respectively, each gate preadjusted to be operated only by simultaneous applications of said first and second auxiliary signals; and means for applying the first and second auxiliary signals simultaneously upon respective said gates for operation, whereby the operating states of CII said gates may be represented as identifications of preknown phonetic sounds.
4. Apparatus in a system as defined in claim 2, wherein said means for deriving discrete signals comprises: means for converting the states of said signals of minimum am'- piitudes to auxiliary first signals; means for deriving second auxiliary signals from said produced phonetic sound waves; a plurality of normally inoperative gates, one gate for each phonetic sound to be identified, respectively, each gate preadjusted to -be operated only by simultaneous applications of said first and second auxiliary signals; and means for applying the first and second auxiliary signals simultaneously upon respective said gates 4for operation, whereby the operating states of said gates may be represented as identifications of preknown phonetic sounds.
5. Apparatus in a system as defined in claim l, wherein said means for deriving individually identified unidirectional but of uplike poled signals from said selected groups of resonances; said means for combining the unlike poled signals in each of said groups; and said means for preadjusting the magnitudes of said derived signals comprise: first and second sets of plurality of groups of detectors for detecting said selected groups of resonances, respectively, the first sets prearranged to detect the signals in positive polarity and the second sets to detect in negative polarity; resistance loads at the outputs of said first and second sets of detectors, respectively, each of last said resistances having plurality of voltage dividing taps; plurality of groups of capacitors, each capacitor having first and second terminals, the first terminals of the groups of capacitors connected to respective groups of taps of said first and second sets of resistance loads, and the second terminals of each of the groups of said capacitors connected in parallel, the magnitudes of said connections of the taps being so proportioned as to obtain said minimum signals at said parallel connections.
6. Apparatus in a system as defined in claim 2, wherein said means for deriving individually identified unidirectional but of unlike poled signals from said selected groups of resonances; said means for combining the unlike poled signals in each of said groups; and said means for preadjusting the magnitudes of said derived signals comprise: first and second sets of plurality of groups of detectors for detecting said selected groups of resonances, respectively, the first sets prearranged to detect the signals in positive polarity and the second sets to detect in negative polarity; resistance loads at the outputs of said first and second sets of detectors, respectively, each of last said resistances having plurality of voltage dividing taps; and plurality of groups of capacitors, each capacitor having first and second terminals, the first terminals of the groups of capacitors connected to respective groups of taps of said first and second sets of resistance loads, and the second terminals of each of the groups of said capacitors connected in parallel, the magnitudes of said connections of the taps being so proportioned as to obtain minimum signals at said parallel connections.
7. In spoken sound waves where each phonetic sound is identified by a group of resonances having definite frequency ratios and approximate amplitude levels one with respect to another, and wherein the frequency locations of these groups are shifted in the entire voice spectrum,
the system for identifying these phonetic sounds which comprises: means for producing said phonetic sound waves; frequency transposer means and means therefor for reshifting the frequency positions of the grouped resonances of the produced sound waves to standard frequency locations; means for selecting said group of resonances from the re-shifted sound Waves; first and second sets of plurality of groups of detectors for detecting said selected groups of resonances, respectively, the first sets prearranged to detect the signals in positive polarity and the second sets to detect in negative polarity; resistance loads at the outputs of said first and second sets of detectors, respectively, each of last said resistances having plurality of voltage dividing taps; plurality of groups of capacitors, each capacitor having rst and second terminals, the first terminals of the groups of capacitors connected to respective groups of taps of said rst and second sets of resistance loads, and the second terminals of each of the groups of said capacitors connected in parallel, the magnitudes of said connections of the taps being so proportioned as to obtain minimum signals at said parallel connections by way of cancellation only when the amplitude ratios of said group of resonances in the re-shifted sound Waves are in accord with said magnitude preadjustments; means for converting the states of said signals of minimum amplitudes to auxiliary rst signals; means for deriving second auxiliary signals from said produced phonetic sound waves; a plurality of normally inoperative gates, one gate for each phonetic sound to be indentified, respectively, each gate preadjusted to be operated only by simultaneous applications of said irst References Cited in the file of this patent UNITED STATES PATENTS 2,646,465 Davis et al July 21, 1953 2,824,906 Miller Feb. 25, 1958 2,921,133 Kalfaian Jan. 12, 1960 2,938,079 Flanagan May 24, 1960 2,971,058 Olson Feb. 7, 1961 OTHER REFERENCES The Analysis and Automatic Recognition of Speech Sound, C. P, Smith, Electronic Engineering, August 1952, pp. 368-373.
US45327A 1960-07-26 1960-07-26 Phonetic typewriter of speech Expired - Lifetime US3067288A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US45327A US3067288A (en) 1960-07-26 1960-07-26 Phonetic typewriter of speech

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US45327A US3067288A (en) 1960-07-26 1960-07-26 Phonetic typewriter of speech

Publications (1)

Publication Number Publication Date
US3067288A true US3067288A (en) 1962-12-04

Family

ID=21937240

Family Applications (1)

Application Number Title Priority Date Filing Date
US45327A Expired - Lifetime US3067288A (en) 1960-07-26 1960-07-26 Phonetic typewriter of speech

Country Status (1)

Country Link
US (1) US3067288A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3204030A (en) * 1961-01-23 1965-08-31 Rca Corp Acoustic apparatus for encoding sound
US3225141A (en) * 1962-07-02 1965-12-21 Ibm Sound analyzing system
US3234332A (en) * 1961-12-01 1966-02-08 Rca Corp Acoustic apparatus and method for analyzing speech
US3238301A (en) * 1960-12-08 1966-03-01 Jean Albert Dreyfus Sound actuated devices
US3536837A (en) * 1968-03-15 1970-10-27 Ian Fenton System for uniform printing of intelligence spoken with different enunciations

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2646465A (en) * 1953-07-21 Voice-operated system
US2824906A (en) * 1952-04-03 1958-02-25 Bell Telephone Labor Inc Transmission and reconstruction of artificial speech
US2921133A (en) * 1958-03-24 1960-01-12 Meguer V Kalfaian Phonetic typewriter of speech
US2938079A (en) * 1957-01-29 1960-05-24 James L Flanagan Spectrum segmentation system for the automatic extraction of formant frequencies from human speech
US2971058A (en) * 1957-05-29 1961-02-07 Rca Corp Method of and apparatus for speech analysis and printer control mechanisms

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2646465A (en) * 1953-07-21 Voice-operated system
US2824906A (en) * 1952-04-03 1958-02-25 Bell Telephone Labor Inc Transmission and reconstruction of artificial speech
US2938079A (en) * 1957-01-29 1960-05-24 James L Flanagan Spectrum segmentation system for the automatic extraction of formant frequencies from human speech
US2971058A (en) * 1957-05-29 1961-02-07 Rca Corp Method of and apparatus for speech analysis and printer control mechanisms
US2921133A (en) * 1958-03-24 1960-01-12 Meguer V Kalfaian Phonetic typewriter of speech

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3238301A (en) * 1960-12-08 1966-03-01 Jean Albert Dreyfus Sound actuated devices
US3204030A (en) * 1961-01-23 1965-08-31 Rca Corp Acoustic apparatus for encoding sound
US3234332A (en) * 1961-12-01 1966-02-08 Rca Corp Acoustic apparatus and method for analyzing speech
US3225141A (en) * 1962-07-02 1965-12-21 Ibm Sound analyzing system
US3536837A (en) * 1968-03-15 1970-10-27 Ian Fenton System for uniform printing of intelligence spoken with different enunciations

Similar Documents

Publication Publication Date Title
US3683096A (en) Electronic player system for electrically operated musical instruments
US3158685A (en) Synthesis of speech from code signals
US2243527A (en) Production of artificial speech
US3039347A (en) Percussive type electric musical instrument
JPS64720B2 (en)
GB1072447A (en) Digital to analogue converter
US3067288A (en) Phonetic typewriter of speech
US2310429A (en) Electrical musical instrument
US3557295A (en) Wind instrument sound producing system for electronic musical instruments
US3198884A (en) Sound analyzing system
US3433880A (en) Percussion system
US3619509A (en) Broad slope determining network
US2921133A (en) Phonetic typewriter of speech
US2195081A (en) Sound printing mechanism
US3919481A (en) Phonetic sound recognizer
DE2357453A1 (en) METHOD AND EQUIPMENT FOR AUTOMATIC MUSIC PLAYBACK BY KEY INSTRUMENT
US3225141A (en) Sound analyzing system
GB1393542A (en) Voice actuated instrument
US3499987A (en) Single equivalent formant speech recognition system
US3319002A (en) Electronic formant speech synthesizer
US3190963A (en) Transmission and synthesis of speech
US3222447A (en) Multiple use of wave shaping circuits for tone production
US3172954A (en) Acoustic apparatus
US3215821A (en) Speech-controlled apparatus and method for operating speech-controlled apparatus
US3076932A (en) Amplifier