US3846586A

US3846586A - Single oral input real time analyzer with written print-out

Info

Publication number: US3846586A
Application number: US00346173A
Authority: US
Inventors: D Griggs
Original assignee: Individual
Current assignee: Griggs Talkwriter Corp
Priority date: 1973-03-29
Filing date: 1973-03-29
Publication date: 1974-11-05
Anticipated expiration: 1991-11-05

Abstract

An improved device having a first step of automatically instantaneous conversion of speech into writing by separating the speech into various types of its components, such as fricatives, vowels, plosives, nasals, etc., by the use of only a single oral input, distinguished from the original talkwriter development disclosed in Ser. No. 1,739, now U.S. Pat. No. 3,646,576, wherein two inputs were used, one from the throat and one oral. Having separated out these appropriate components of the speech, switches, gates and other circuit mechanisms are used to actuate other circuitry and a typewriter which records the input sounds. The device accomplishes the over-all goals and objectives achieved in the forementioned U.S. Pat. No. 3,646,576, except with a circuit replacement for FIG. 2 thereof and a modification of FIG. 9 thereof disclosed and provided herein. Thus the invention allows a single oral input to be analyzed in real time with its verbal message being printed out.

Description

United States Patent [191 Griggs Nov. 5, 1974 1 SINGLE ORAL INPUT REAL TIME ANALYZER WITH WRITTEN PRINT-OUT [76] Inventor: David Thurston Griggs, 5128 So.

Rolling Rd., Baltimore, Md. 21227 [22] Filed: Mar. 29, 1973 [21] Appl. No.: 346,173

52 us. C1 179/1 SA [51] Int. Cl. G101 l/l6 [58] Field of Search 179/1 SA, 1 SB, 1' VS, 15.55 R

[56] References Cited UNlTED STATES PATENTS 3,158,685 11/1964 Gerstman 179/1 SA 3,198,884 3/1965 Dersch 179/1 SA 3,225,141 12/1965 Dersch 179/1 SA 3,395,249 7/1968 Clapper 179/1 SA 3,423,530 l/1969 Coulter 179/1 SA 3,470,321 9/1969 Dersch 179/1 SA OTHER PUBLICATIONS IBM Technical Disclosure Bulletin, Harper, Vowel Separation by Time Ratio Measurements, 3/1963. IBM Technical Disclosure Bulletin, Ferrier, Plosive Measurement, 3/1963.

IBM Technical Disclosure Bulletin, Harper, Friction Voicing Separator, 2/1962.

IBM Technical Disclosure Bulletin, Dersch, Voiced Sound Detector, 8/1962.

Primary Examiner-Kathleen H. Claffy Assistant Examiner-Jon Bradford Leaheey Attorney, Agent, or Firm-Misegades, Douglas & Levy [57] ABSTRACT An improved device having a first step of automatically instantaneous conversion of speech into writing by separating the speech into various types of its components, such as fricatives, vowels, plosives, nasals, etc., by the use of only a single oral input, distinguished from the original talkwriter development disclosed in Ser. No. 1,739, now US. Pat. No. 3,646,576, wherein two inputs were used, one from the throat and one oral. Having separated out these appropriate components of the speech, switches, gates and other circuit mechanisms are used toactuate other circuitry and a typewriter which records the input sounds. The device accomplishes the over-all goals and objectives achieved in the forementioned US Pat. No. 3,646,576, except with a circuit replacement for FIG. 2 thereof and a modification of FIG. 9 thereof disclosed and provided herein. Thus the invention allows a single oral input to be analyzed in real time with its verbal message being printed out.

5 Claims, 2 Drawing Figures '10 UNVOICED FRICATIVE TRANSDUCBQS 1o DlPTHONG INDlCATOR FRICATIVE NA SAL AND VOWEL SEPARATOR TO VOWEL DETECTOR l F I VSILENCE fTl-lRESI-IOLD ADJ. LA

/ 1 STRESS SINGLE ORAL INPUT REAL TIME ANALYZER WITH WRITTEN PRINT-OUT BRIEF SUMMARY OF THE INVENTION The present invention relates to apparatus and method for analysis of a single track voice signal which is detected and processed for its component speech sounds, and more particularly the invention is directed to elimination of dual microphone inputs or a microphone for voice and a microphone for throat signals. The single track voice input method of the present invention, for some purposes, substitutes measurement of energy fluctuations and levels in signals between 300 to 800 Hz bandwidth, for-tactile sensing of phonation but does not require such substitution of energy fluctuations as a means for differentiation between different types of speech sounds which is the case where a dual track throat-mike method described in the parent application is used.

BACKGROUND OF THE INVENTION In order to aid the examiner in understanding the background of the present invention, distinguished from the invention of the parent application, there is herewith Attachment A titled Delineation of Types of Speech Sounds from Single-Track Input which is a narrative write-up exemplary of the invention and its background leading to a further understanding of the present invention.

FIELD OF THE INVENTION An object of the invention is to provide a real-time detection and analysis of speech sounds by circuit analyzing and switching of sounds according to their manner of production which in this application is taken as six in number for separate subsequent analysis for identifying specific phonemes of each given category of speech sounds. The six categories are vowels (and semi-vowels), nasals, unvoiced fricatives, voiced fricatives, unvoiced stops and voiced stops. The switching and analysis contemplated in the present invention are accomplished from a single-track oral input, such as from a single microphone, recorder source or amplifier device, as are well known in the art.

A further object and accomplishment of the invention is to provide means for producing distinctions between voiced stops and unvoiced and unreleased stops (plosives) which are detected independent of voicing, when necessary, by means of rate of change of signal strength and durational timing circuit means.

BRIEF DESCRIPTION OF THE SEVERAL FIGURES OF THE DRAWINGS The above and other objects and advantages of the invention will become apparent upon full consideration of the following detailed description and accompanying drawings in which:

FIG. 1 is a circuit schematic diagram for a sound separator apparatus or unit for a transducer module for separation into types of characterizations or sounds from a given single sonic source; and

FIG. 2 is a block diagram of the transcriber module as modified from FIG. 9 of application Ser. No. 1,739, now US Pat. No. 3,646,576, according to the preferred and best mode of the invention.

DETAILED DESCRIPTION OF THE PREFERRED I EMBODIMENT Referring now to the drawings, there is shown a sound separator apparatus in FIG. 1, and it is seen to use identical means for processing the characterized six sounds of stops (plosives), both voiced and unvoiced, and undifferentiated. It also uses the same method and apparatus for distinguishing silence from the speech elements. The figure shows a substituted component identified as fricative, nasal and vowel separator 10 for processing nasals, vowels (and semi-vowels) or fricatives, thus working differently in this respect.

Since the present invention is restricted to a singletrack oral input and cannot use a comparator of inputs of undifferentiated nature, it differentiates the sustained speech elements principally by measuring and comparing rates of change, or by identifying and comparing intensities in certain bandwidths, as described below.

A 300 Hz high-pass filter 13 is inserted into the input oral signal line 12 so as to remove any residual lowlevel line hum, and the high-pass filter is connected from its output to a sensor element 32 and to a vogad 36 by a conductor 37. The sensor element 32 amplifies for further analysis the input signal applied thereto for subsequent analysis as to its kinds of speech sounds, so that they may be shunted through or conveyed to different subsequent analytical circuits according to the kinds of speech determined and their constituent components for detection of individual speech sounds in several determined categories.

From the sensor device 32, an output is coupled to a linear amplifier or vogad 36, from whence it is fed to a set of switches or

gates

40,42,44,46,48,50, so that each passes the oral input when appropriate for its category or kind of speech sound being analyzed, as described in the application Ser. No. 1,739, now US. Pat. No. 3,646,576 Thesecategories are six in number, and

relate in the following manner to the switches 4050:

Switch 40 unvoiced stops Switch 42 voiced stops.

Switch 44 unvoiced fricatives Switch 46 voiced fricatives Switch 48 nasals Switch 50 vowels (and semi-vowels) By these divisions or separations of a conventional electric analogs of an oral input, there are derived signals from switches 40-50 that provide means of preswitching nasals, vowels and voiced fricatives.

Signal means of opening each of the gates or switches 40-50 are alsoshown in FIG. 1. The oral input line is supplied also by a connector 87 to a bandpass filter 15 that passes 300 to 800 Hz. An output here indicates the presence of low-frequency sounds of fundamental frequencies of voice activity. The presence or absence of voicing is shown in the existence or null of this output through the filter 15. The OFF and ON indications are then supplied by

connectors

72,70, both to the stops or silence detector 74 and to the unvoiced fricative gate 44 or to the voiced fricative gate 46, respectively, by means of

connectors

78 or 76. The actual filtered output 19 is also supplied to three

additional components

43,45,47, as part of the identification of nasal sounds, described below. A rate generator 17 also is connected to the output from filter 15 to measure the rate of change of amplitude of signal in this bandwidth. This differential signal is supplied by connector 80 to the stops-or-silence detector 74.

The oral input signal also is supplied through connector 87 to a network consisting of three

bandpass filters

21,23,25 and a comparator 29 which serve to delineate fricative sounds. The connector is supplied to the three different bandpass filters simultaneously: filter 21 passes 600 to 1,700 Hz; filter 23 passes 1,700 to 2,100 Hz; and filter 25 passes 6,500 to 10,000 Hz. The outputs through

filters

23 and 25 are added in connector 27 and then supplied to comparator 29. In comparator 29, these combined amplitudes are divided by the output of filter 21, and if the result is unity or greater, the comparator passes a signal through switch 31 by connector 32a to fricative gates 44,46 showing the presence of fricative sounds. Voicing ON or OFF from filter determines the opening of gate 44 or gate 46.

Connector 87 also supplies the oral input signal to a network that distinguishes nasals and vowels from fricatives. Nasal distinction is a two-stage process to be described below, and vowels are identified in the absence of the second-stage differentiation of nasals. The vowel-nasal network consists of two bandpass filters 33,35 and a rate generator 37. The input is supplied from connector 87 simultaneously to two narrow bandpass filters: filter 33 passes 380 to 420 Hz, which is desirable for low voices, and filter 35 passes 680 to 72OHz, suited to high voices. Since neither must be exclusive of the other, their outputs are added, to show the presence of activity in either bandwith, and supplied to a rate generator 37. The rate generator 37 takes readings at intervals of .01 second or less, sensing successively whether in each succeeding period there is a measurable disruption of the steady energy-level that is characteristic of a vowel or nasal. This rate generator has two

outputs

39,41. One output 39 is connected to switch 31 to allow opening of the fricative gates 44 and 46 by means of connector 52a only when there is a rate of change output from the generator 37 in the absence of a vowel or nasal. The other output from it (41) passes a signal to permit opening of either the vowel or

nasal gates

48 and 50, when there is an input but no change of rate with presence of a vowel or nasal, during two or more successive .01 second or less measured periods.

Connector 87 also is supplied to still another network to give positive identification for nasal sounds. It consists of two

bandpass filters

43,47, a comparator 45, and a witch 49. Bandpass filter 43 passes 800 to 1,300 Hz, and the resulting energy is compared in a comparator 45 with that derived simultaneously from filter 15 mentioned earlier and passing bandwidth 300 to 800 Hz. When the ratio of this is two or more times stronger than the output of filter 43, a signal is passed into switch 49 for possible positive indication of a nasal sound (including /l/). The final use of input 87 is to supply a high-pass filter 47 for over 1,700 Hz. Absence of any output from it will activate switch 49 so that indication of a nasal sound passes through connector 54a to the nasal gate 48. While the nasal gate is thus activated, an activity signal is pssed by connector 51 to the vowel gate, switch 50, so that it will be inactivated only when the nasal gate is open. Specific vowels will be identified, of course, only when their individually required parameters are met in subsequent phoneme analysis beyond the scope of this invention, but in the parent case.

The stops switches 40,42 and the fricative switches 44,46 are activated not alone by the processes described above, but in conjunction with and in response to additional processes. The stops-or-silence indicator 74 is a component almost identical to that of the Talkwriter apparatus, modified here only with one difference: the source of

inputs

70,72,80 instead of deriving from a throat sensor, derives from the network described above associated with the 300 to 800 Hz bandpass filter 15. Other inputs are a signal conductor 82 from the oral differentiator 62, signal conductor 84 for oral OFF, and the signal conductor 86 for oral ON, and an oral input from vogad 36 on conductor 88.

A threshold adjustment means 90 is connected from the output of the sensor 32 to a stress signal terminal 92.

There are seven general output signals from the stops-or-silence indicator 74, three of which are applied to the unvoiced stops switch 40, i.e., conductors 94 (two of them), 169 two of which are applied as output signals to the voiced stops switch 42 over

conductors

96,96; an output to a silence terminal 98; and an output from a switch that shows probable presence of a stop, which output passes by conductor 128 to close the unvoiced fricatives gate 44,

An output 169 used as an indication of an undifferentiated unvoiced stop is used also to pass through the unvoiced stop switch 40 and then into the detection circuit of other components of a Talkwriter apparatus. The outputs through gates 40-50, namely 88, 88,42a,4- 4a,46a,48a and 50a, serve other components of 21 Talkwriter apparatus also. The output of the sensor 32 of the oral input signal strength is supplied through connectors 58 for use in other components of the Talkwriter apparatus where an indication of oral input signal strength is required. Finally, the output of the rate generator 62 of the oral input signal is supplied through connector 82 for similar use in other components of the Talkwriter apparatus where that information is needed.

Additional embodiments of the invention in this specification will occur to others and therefore it is intended that the true spirit of the invention be limited only by the appended claims and not by the embodiment described hereinabove. Accordingly, reference should be made to the following claims in determining the true spirit of the invention.

. What is claimed is:

1. Sound separator device for speech-towriter apparatus, using single-channel input through an amplifier circuit for connected speech, and separating in real time the speech elements into each of the types of speech sounds of, for example, English: (1 vowels and semi-vowels, (2) nasals, (3) unvoiced fricatives, (4)

voiced fricatives, (5) unvoiced stops, (6) voiced stops; said separator device comprising (FIG. 1):

stops-or-silence detector means for distinguishing voiced and unvoiced stops from each other and from the intersticial silences of running speech detection means for nasal, fricative and vowel sounds as they occur in random speech sequence (10);

network means of six mutually exclusive logic gates responsive to said foregoing means for deriving the said types of'speech sounds (40,42,44,46,48,50);

said stops-or-silence detector means (74) receiving inputs of the oral signal (12) from: (1) a linear amplifier (88); (2) oral ON and oral OFF signals from an oral sensor gate (32); (3) oral differentiator outputs from a differentiator means (82); and (4) signals representing voicing as detected in the range of about 300 to 800 Hz (80); and means providing outputs (l) to logic gates for unvoiced (40) or voiced (42) stops, (2) to a gate for unvoiced fricatives (44), and (3) to a silence signal (98);

said detection means consisting of four networks including filters, timers, rate gnerators, comparators and switches that differentiate instantaneously voiced fricatives, unvoiced fricatives, vowels and semi-vowels, and nasals from each other, at least two network-activated switching means (31,49) activated in response to outputs of said networks for enabling individual identification and processing for phoneme outputs, at least one of said filters (l5) producing an output representing detected voicing in the range of about 300 to 800 l-lz.;

said network means of logic gates (40,42,44,46,48,50) operated by said stops-orsilence detector means (74) or by the nasal, fricative and vowel detection means (10) in random real-time sequence according to the sounds of the vocal input.

2. Sound separator device for speech-to-writer apparatus comprising:

a microphone for accepting oral signals; and

a detection and analysis transducer module component receiving said oral signals, said component having sound separation means for first-stage detecting, differentiating, processing and sorting speech sound inputs derived from said microphone, in at least the following stated categories: (1) vowels and semi-vowels, (2) nasals, (3) unvoiced fricatives, (4) voiced fricatives, (5) unvoiced stops, and (6) voiced stops, said sound separation means (FIG. 1) including:

an oral sensor gate (32) receiving an input of said oral signal and producing digital outputs indicative or oral ON and oral OFF gaps;

differentiator means (62 responsive to said oral sensors;

a linear amplifier (36) receiving for processing said oral signal;

four networks comprising a fricative nasal and vowel separator including filters, timers, rate generators, comparators and switches that differentiate instantaneously voiced fricatives, unvoiced fricatives, vowels and semi-vowels, nasals from each other to be identified for each of said categories, at least two network-activated switch means (31),(49) activated in response to outputs of said networks for enabling individual identification and processing for phoneme outputs, at least one of said filters (l5) producing an output representing detected voicing in the range of about 300 to 800 Hz.;

a stops-or-silence unit (74) receiving inputs of (1) said oral signal from said linear amplifier, (2) oral ON and oral OFF signals from said oral sensor gate,

' (3) oral differentiator outputs from said differentiator means, and (4) signals representing voicing as detected in the range of about 300 to 800 Hz. from said detected voicing filter;

logic gate means (40-44) for each of said categories of speech sound signals, each at least receiving outputs from said linear amplifier and said stops-orsilence unit, including:

an unvoiced stop gate (40) a voiced stop gate (42);

an unvoiced fricative gate (44) receiving signals also from said oral sensor gate, said detected voicing filter and one of said network-activated switch means;

further logic gates means (46-50) for each of said categories of speech sound signals, each at least receiving outputs from said linear amplifier, includmg:

a voiced fricative gate (46) receiving signals also from said oral sensor gate, and said one of said network-activated switch means,

a vowel gate (50) receiving signals also from an output of one of said networks and said networkactivated switch means, and

a nasal gate (48) receiving signals also from an output of one of said networks, said vowel gate, and one said network-activated switch means, said vowel gate receiving an output from said nasal gate.

3. The invention according to claim 2, wherein said sound separation means (FIG. 1) has sensor means, band-pass filters, rate generators, and comparators responsive to the oral input signals from said microphone for passing through a single channel, and sorting speech sounds according to said six categories.

4. The invention according to claim 2, wherein four networks comprising said fricative, nasal and vowel separators include:

a series of low frequency band-pass, intermediate freguency band-pass, and higher fre uency band-pass rlters fed from said linear ampli ter;

a comparator (29) having inputs from each of said series of filters, a further series of different low filters fed from said linear amplifiers, a rate generator responsive to the outputs of said further series of band-pass filters;

said one of said network-activated switch means (31 responsive to the outputs of said comparator and said rate generator, a last series of band-pass filters including a high band-pass filter and a lower bandpass filter, each fed from said linear amplifier, said detected voicing filter (15) being also fed from said linear amplifier;

a comparator responsive to the output of said detected voicing filter and said last-mentioned bandpass filter;

said other one of said network-activated switch means (49) responsive to said last-mentioned comparator and said last-mentioned high band-pass filter;

said detected voicing filter producing an ON and OFF signal, each applied to said stops-or-silence unit; and

a rate generator responsive to said detected voicing filter for producing a signal to said stops-or-silence unit.

5. The invention according to claim 4, wherein a phoneme sequence sensor and designator is provided with a series of time chopper units, each for receiving inputs and for thence providing outputs to said phoneme sequence sensor and designator, said time choppers receiving input signals identified, respectively, as fricatives, stops, vowels and nasals;

sensor means responsive to signals identified as stops;

switch means responsive to said sensor means, said switch means providing outputs to said first and third and fourth-mentioned time choppers.

Claims

1. Sound separator device for speech-to-writer apparatus, using single-channel input through an amplifier circuit for connected speech, and separating in real time the speech elements into each of the types of speech sounds of, for example, English: (1) vowels and semi-vowels, (2) nasals, (3) unvoiced fricatives, (4) voiced fricatives, (5) unvoiced stops, (6) voiced stops; said separator device comprising (FIG. 1): stops-or-silence detector means for distinguishing voiced and unvoiced stops from each other and from the intersticial silences of running speech (74); detection means for nasal, fricative and vowel sounds as they occur in random speech sequence (10); network means of six mutually exclusive logic gates responsive to said foregoing means for deriving the said types of speech sounds (40,42,44,46,48,50); said stops-or-silence detector means (74) receiving inputs of the oral signal (12) from: (1) a linear amplifier (88); (2) oral ON and oral OFF signals from an oral sensor gate (32); (3) oral differentiator outputs from a differentiator means (82); and (4) signals representing voicing as detected in the range of about 300 to 800 Hz (80); and means providing outputs (1) to logic gates for unvoiced (40) or voiced (42) stops, (2) to a gate for unvoiced fricatives (44), and (3) to a siLence signal (98); said detection means (10) consisting of four networks including filters, timers, rate gnerators, comparators and switches that differentiate instantaneously voiced fricatives, unvoiced fricatives, vowels and semi-vowels, and nasals from each other, at least two network-activated switching means (31,49) activated in response to outputs of said networks for enabling individual identification and processing for phoneme outputs, at least one of said filters (15) producing an output representing detected voicing in the range of about 300 to 800 Hz.; said network means of logic gates (40,42,44,46,48,50) operated by said stops-or-silence detector means (74) or by the nasal, fricative and vowel detection means (10) in random real-time sequence according to the sounds of the vocal input.

2. Sound separator device for speech-to-writer apparatus comprising: a microphone for accepting oral signals; and a detection and analysis transducer module component receiving said oral signals, said component having sound separation means for first-stage detecting, differentiating, processing and sorting speech sound inputs derived from said microphone, in at least the following stated categories: (1) vowels and semi-vowels, (2) nasals, (3) unvoiced fricatives, (4) voiced fricatives, (5) unvoiced stops, and (6) voiced stops, said sound separation means (FIG. 1) including: an oral sensor gate (32) receiving an input of said oral signal and producing digital outputs indicative or oral ON and oral OFF gaps; differentiator means (62) responsive to said oral sensors; a linear amplifier (36) receiving for processing said oral signal; four networks comprising a fricative nasal and vowel separator including filters, timers, rate generators, comparators and switches that differentiate instantaneously voiced fricatives, unvoiced fricatives, vowels and semi-vowels, nasals from each other to be identified for each of said categories, at least two network-activated switch means (31),(49) activated in response to outputs of said networks for enabling individual identification and processing for phoneme outputs, at least one of said filters (15) producing an output representing detected voicing in the range of about 300 to 800 Hz.; a stops-or-silence unit (74) receiving inputs of (1) said oral signal from said linear amplifier, (2) oral ON and oral OFF signals from said oral sensor gate, (3) oral differentiator outputs from said differentiator means, and (4) signals representing voicing as detected in the range of about 300 to 800 Hz. from said detected voicing filter; logic gate means (40-44) for each of said categories of speech sound signals, each at least receiving outputs from said linear amplifier and said stops-or-silence unit, including: an unvoiced stop gate (40) a voiced stop gate (42); an unvoiced fricative gate (44) receiving signals also from said oral sensor gate, said detected voicing filter and one of said network-activated switch means; further logic gates means (46-50) for each of said categories of speech sound signals, each at least receiving outputs from said linear amplifier, including: a voiced fricative gate (46) receiving signals also from said oral sensor gate, and said one of said network-activated switch means, a vowel gate (50) receiving signals also from an output of one of said networks and said network-activated switch means, and a nasal gate (48) receiving signals also from an output of one of said networks, said vowel gate, and one said network-activated switch means, said vowel gate receiving an output from said nasal gate.

4. The invention according to claim 2, wherein four networks comprising said fricative, nasal and vowel separators include: a series of low frequency band-pass, intermediate frequency band-pass, and higher frequency band-pass filters fed from said linear amplifier; a comparator (29) having inputs from each of said series of filters, a further series of different low filters fed from said linear amplifiers, a rate generator responsive to the outputs of said further series of band-pass filters; said one of said network-activated switch means (31) responsive to the outputs of said comparator and said rate generator, a last series of band-pass filters including a high band-pass filter and a lower band-pass filter, each fed from said linear amplifier, said detected voicing filter (15) being also fed from said linear amplifier; a comparator responsive to the output of said detected voicing filter and said last-mentioned band-pass filter; said other one of said network-activated switch means (49) responsive to said last-mentioned comparator and said last-mentioned high band-pass filter; said detected voicing filter producing an ON and OFF signal, each applied to said stops-or-silence unit; and a rate generator responsive to said detected voicing filter for producing a signal to said stops-or-silence unit.

5. The invention according to claim 4, wherein a phoneme sequence sensor and designator is provided with a series of time chopper units, each for receiving inputs and for thence providing outputs to said phoneme sequence sensor and designator, said time choppers receiving input signals identified, respectively, as fricatives, stops, vowels and nasals; sensor means responsive to signals identified as stops; switch means responsive to said sensor means, said switch means providing outputs to said first and third and fourth-mentioned time choppers.