US3387090A

US3387090A - Method and apparatus for displaying speech

Info

Publication number: US3387090A
Application number: US395876A
Authority: US
Inventors: Cecil C Bridges
Original assignee: Tracor Inc
Current assignee: Tracor Inc
Priority date: 1964-09-11
Filing date: 1964-09-11
Publication date: 1968-06-04
Anticipated expiration: 1985-06-04

Description

Li: m 4 mms mfrmmm l @mmm @www June 4, i968 c. c. BRIDGES 3,387,090

METHOD AND APPARATUS FOR DISPLAYING SPEECH Filed Sept. ll, 1964 4 Sheets-Sheet l FR/WAN rllllail Cecv/ C. fvafge:

INVENTOR.

BWM f6@ ATTO/UVE VJ June 4, 1968 c. C. BRIDGES METHOD AND APPARATUS FOR DSPLAYING SPEECH 4 Sheets-Sheet E Filed Sept. 1l, 1964 C6 6// C. .5f/096s INVENTOR June 4, 1968 c. QBRIDGES 3,387,090

METHOD AND APPARATUS FOR DISPLAYING SPEECH Filed Sept. ll, 1964 4 Sheets-Sheet 5 /ff 5 Lw L 0 4a ya l da l 6 ee f Cem/ C. .BNQ/ges Cla jaw INVENTOR June 4, 1968 C. C. BRIDGES METHOD AND APPARATUS FOR DISPLAYING SPEECH Filed Sept. ll, 1964 4 Sheets-Sheet 4 Cec/ C. .5r/afge.:

mvEN'roR.

Bmw/fau ATTORNEK nited States ABSTRACT F THE DISCLOSURE This application discloses apparatus for analyzing speech sounds by converting the acoustic energy to electrcal signals, preemphasizing the high frequencies, and producing a voltage related` to the phase reversals or zero axis crossings in the signal per unit time. This voltage is displayed as a function of time, for visual correlation with a standard, or the deviation from a standard voltage may be detected.

This invention concerns an electronic means and apparatus for correlating acoustic speech to visual patterns and more specifically to such means and apparatus for presenting such patterns to display distinguishable characteristic differences in phonemes and variations in phonemes.

Sounds produced by the human voice are a composite of frequencies tha-t include, in addition to a resonant fundamental frequency, a grouping of frequencies over the entire articulated range-This range is from a very low frequency of only a few cycles per second to an upper frequency in the neighborhood of 10,000 cycles per second. When a sound made by the human voice is analyzed spectrographically, it has been discovered that the intensity level of the frequencies become predominant in about -three ranges, or formants, none of which is in the vicinity of the fundamental.

These formants are determined by the resonant characteristics of the laryngeal tone through the vocal tract. Although it should be noted that the larynax harmonic is not exactly at a formant, the various resonant frequencies of the cavities of the throat, mouth, lips, and nasal passages do determine where the formants occur. Of course, as the fundamental changes and as the tongue wags and the mouth and lips are shaped in different configurations, the formants change in relative position and amplitude. The combination of these varying elements is detectable by the ear which is capable of discriminating these changes and interpreting such changes into aural recognition.

In the science of phonology, certain expressions have come into existence that carry a more technical meaning than is commonly applied to such terms. Many of these specialized terms pertain to the phonemes of speech, that is, the smallest discernible element of speech having a characteristic of its own that is distinct from all other such elements. These are not to be confused with syllables in that even a monosyllable word will include multiple phonemes. Each phoneme may be thought of as begin ning and ending with the formative process involved for uttering it, This means that for the word is, there is a phoneme identified with the pronuciation of the i and a phoneme identified with the pronunciation of the s since the mouth, lips, and tongue actually reshape themselves to produce a new sound.

Two terms of art that have special meanings in describing the origin of phonemes are voiced and unvoiced sounds. A voiced sound is derived mainly from the vibrating vocal folds. The unvoiced sounds are derived without parts of the body vibrating but through making air vibrate. Thus, the unvoiced t may be distinguished from tent icc the voiced d, the unvoiced p may be distinguished from the voiced b, the unvoiced k may be distin guished from the voiced g and so forth.

Unvoiced sounds are further subclassified into two categories, the fricative sounds, generally made by the tongue on the upper palate near the front of the mouth, and the plosive sounds, generally made by the lips coming completely together or by the tongue on the upper palate near the back of the mouth.

Although the normal ear has little diiiiculty in dietinguishing between the various sounds, it has not heretofore been possible to obtain a visual representation of the differences that effectively captures the nuances of pronunciation differences caused by such things as nasal qualities, braces, absence of teeth, as well as the more subtle differences caused by variations in accent.

Common attempts at presenting the voice spectrum for visual display have been unsatisfactory since to show the entire spectrum is to show so much garble and static that little besides duration and relative amplitude of the sound are discernible. Even room noise and other environmental sounds cause responses that interfere with interpretation.

Because the high frequency sounds add confusion to the spectographic picture, attempts have been made to filter out the high frequencies. But, when this occurs often the third second and sometimes the first formants are filtered out, discarding much of the frequency range carrying the distinguishing intelligence. The very highest frequencies made by the human voice are almost-always filtered out, eliminating the important unvoiced sounds that are in the frequency spectrum even above -the third formant.

Therefore, what is described and illustrated herein is an inexpensive and convenient speech display device, comprising first means for detecting and converting virtually the entire range of speech frequencies, roughly up to 10,000

cycles per second, to electrical energy such that the energy level of the higher frequencies are emphasized relative to the lower frequencies;

second means connected to said first means and principally responsive to the number of time axis crossings of the composite range of speech frequencies (and relatively nonresponsive to the amplitude of the speech frequencies) for producing a discrete number of spiked pulses (at virtually constant amplitude if said second means is independent of amplitude) per a unit of time, and

third means connected to said second means to produce per the unit of time a voltage amplitude proportional to the number and amplitude of the pulses produced by said second means,

said voltage amplitude being characteristic of the speech frequencies and capable of being recorded and observed by any convenient means;

Whereby each speech sound has its own distinct shape relatively independent of loudness with which the sound is enunciated and the frequency spectrum (such as in a male or female voice spectrum) in which the sound is enunciated. i

What is shown and described herein is a novel method of handling spoken sounds and taking advantage of the intelligence contained in the higher frequencies as a basis for distinguishing phonemes and not discarding these high frequencies in filters or the like. The illustrated embodiment reveals an electronic arrangement comprising a microphone, an a-mplifier with a preamplifier including a pad for emphasizing the high frequencies relative to the low frequencies, a peak clipper, a differential circuit, an integrating circuit, and a display unit.

The microphone may be of any commercial type that is capable of passing high frequency sounds without appreciable attenuation. The amplifier may be any type of audio or wide band amplifier in combination with either a preemphasis network capable of amplifying the high frequency components of the signal received from the microphone or deemphasizing the low frequency spectrum relative to the high frequency components. The peak clipper preferably clips the signal received from the amplifier at an amplitude level just above the environmental noise conditions so as to make the remainder of the circuit substantially independent of amplitude. As will be explained, this is not true for all embodiments, but for simplification of explanation, the signal from the peak clipper essentially carries its entire intelligence in the number of times the signal crosses over from negative to positive per unit of time rather than in amplitude variations in the signal.

The differential circuit receives the signal from the peak clipper and converts the number of time-axis crossings of the clipped signal to an equal number of spiked pulses. The integrator circuit having a time constant equal to the selected unit of time receives the pulses from the differentiating circuit and produces a voltage amplitude proportional to the number of pulses.

An oscilloscope, strip recorder, etc. may be connected to the output from the integrator circuit so as to present an image that is characteristic of the sound spoken into the microphone.

It is important to note that two voices, the first of which being higher pitched than the second, speaking the same identical expression will be refiected in similar traces. The higher pitched voice will contain more high frequency sounds throughout than the lower pitched voice, and therefore the voltage trace will be at a higher overall amplitude on the display unit, but the relative shape of the trace will be similar.

Because of the fact that a trace of an identically enunciated speech sound will be essentially the same regardless of the volume or voice frequency range in which it is spoken, standards of properly pronounced words or phonemes can be made with which to compare the imperfect speech of a student. The imperfect speech may be that of a child just learning the language, that of an adult with a speech impediment, that of a person with a regional or foreign accent, or that of any other person or speech simulator.

Various methods have been devised to compare the standard display, i.e. the display of a preferred standard, with that of the student. Among the cheapest and easiest methods is by placing a drawn or otherwise reproduced standard display for reference so that the student can attempt to match it. This display may be on transparent film and placed over the oscilloscope or it may be placed on any convenient means so that the student can observe the standard while practicing the speech sound to be learned.

Another means is to present the reference sound through an electronic channel, much the same as that described above for the basic display device described above. The channel may be simplified by eliminating many of the circuits required in the basic channel and recording on electronic recording tape or other means the reference display.

A double-trace oscilloscope may then be used to simultaneously display to the student the reference display and the one he makes himself. In addition to providing a comparison, such a device would act to discipline the student in urging him to repeat the sound the same number of times as the recorded standard and help him conform to the duration of the standard as well as the general shape.

Another method that may be used in connection with the reference and the basic channels would be the addition of a voltage comparator. Such a comparator would produce an output voltage dependent upon the average voltage difference between the reference and the basic channels. Such difference could 4be either simultaneous or a total difference over the entire duration of a sound cycle. The voltage may either be metered or displayed as on an oscilloscope. The object for the student, then would be to try to reduce the voltage to as low a value as possible.

A refinement of the above arrangement may be the addition of a limit trigger circuit that would turn on an out-of-limits device when the voltage difference from the comparator exceeded a predetermined value, or condition. If the value is not exceeded then the predetermined condition may be tightened up a bit by an appropriate logic circuit.

More particular description of the invention may be had by reference to the embodiments thereof which are illustrated in the appended drawings, which form a part of this specification. It is to be noted however, that the appended drawings illustrate only typical embodiments of the invention and therefore are not to be considered limiting of its scope for the invention will admit to other equally effective embodiments.

In the drawings:

FIG. 1 is a frequency spectrum chart of the frequencies commonly present in a spoken phoneme.

FIG. 2 is a frequency spectrum chart of the phoneme shown in FIG. 1 after preemphasis in accordance with this invention.

FIG. 3 is a block diagram of one system embodiment of this invention.

FIG. 4 is a schematic diagram of one embodiment of this invention.

FIG. 5 `is a partial schematic diagram of a portion of one embodiment of this invention.

FIG. 6 is a waveform diagram of sample traces produced in accordance with this invention.

FIG. 7 is a pictorial interconnection diagram in accordance with one system embodiment of this invention.

FIG. 8 is a block diagram of various alternate arrangements of system components in accordance with this invention.

Refer now to FIG. 1, which is a typical representation of a spectrographic analysis of frequency components plotted to compare -their relative amplitudes .in a speech sound. The fundamental frequency 2 of the originating vibrating folds for the sound is very small with respect to the many other component frequencies in the diagram. For ordinary voices, both male and fem-ale, the fundamental frequency is in the to 500 c.p. s. range.

Looking at the frequencies as they 'become higher it is observed that they become greater in amplitude until a certain maximum value is reached. The frequencies in the range near this first peak value are defined as occurring at the first formant 4. The maximum frequency is not necessarily either the harm-onic frequency of the large cavity of the mouth or throat or an even multiple of the fundamental, although it is strongly inuenced by both.

Looking at the frequency spectrum of FIG. l the amplitude falls off to a relatively small value and then increases to a second peak, smaller than the first in amplitude. The frequencies in the range near this second peak are classed together and identified as the second formant 6. Similarly a third group of frequencies are defined as the third formant, the peak amplitude value of this group being typically smaller in amplitude than the second formant peak value.

At still higher frequencies than the third formant, occurring roughly beginning in the vicinity of 6000 c.\p.s., are the group of frequencies caused by the unvoiced sounds. These frequencies -have no yrelation at all to the fundamental and very little relation t0 -the natural harmonic resonant qualities of the various physiological cavities. These frequencies probably can best be thought of as a random mixture merely represented by an envelope 10. Meaningful information lis carried in these frequencies up to, and possibly above, 10,000 cps., although possibly as low as 8000 c.p.s. is sufficient for distinguishing purposes with respect to the present invention.

The sound to be examined on the display of the embodiment of this invention shown in FIG. 3 may be spoken into any detection and converting means such as a microphone 12, which may be any conventional electroacoustic transducer that -responds to sound waves and which delivers essentially equivalen-t electric waves over the wide `band of frequencies necessary for examination and discrimination. Although not a requirement for operablity, it is desirable that any microphone used have a certain |amount of directivity and 'background noise discrimination qualities so that there is little interference with the spoken expression under study. The analysis diagrammed in FIG. 1 is of a representative signal produced from the microphone 12.

Microphone 12 may be connected to any conventional wideband lamplifier 14 matched with -proper matching impedance characteristics. One such amplifying means is the Preferred Circuit No. PSC 19 found in NAVWEPS 16-1-519-2, printed Apr. l, 1962. Included within the amplifier may be any type of common -preemphasis network means capable of emphasizing the high frequencies with respect to the low frequencies. Such a network may conveniently take the form of a tunable high pass filter comprised of passive components that attenuates the lows more th-an the highs, although more sophisticated networks containing active component-s are in common usage.

A filtering system with selectable filter pads or with a variable component .so that the degree of preemphasis may be easily controlled is desirable .in some applica-tions, such las adjusting the waveshape during initial setup conditions.

The output from amplifier 14 is represented by waveform 16 in FIG. 3. Actually, waveform 16 may best be examined with reference to FIG. 2 where a spectrographic analysis lsimilar to FIG. 1 is shown. As mentioned above, the normal speech sounds have comm-only a first form-ant 4, a second formant 6 and a third formant 8 in descending amplitudes.

After preemphasris, the first formant 4 is relatively deemphasized, either by actual deemph-asis, less emph-asis, `or by absence of emphasis so as to provide essentially fiat amplification over the range of frequencies of the first formant, and appears now as first formant 18. The second -formant 20, yby contrast, is amplified and is essentially the same amplitude as (or possibly even la little higher than) the amplitude of first formant 18. Similarly, thi-rd formant 22 is amplified to be as high a's or higher than second formant 20. Finally, the envelope of the unvoiced frequency sounds are amplified greatly so that the peak portion of this envelope, also, is as high as, or possibly a little higher than, -the peaks of the various formants.

A satisfactory amount of preempha'sis has been a circuit that preemphasizes decibels at 5 kilocycles and 20 deci'bles at 20 kilocycles.

From the amplifier circuit 14, the signal is applied to a circuit means for determining .the ultimate shape of the voltage representation for the visual display. In essence, this circuit means translates the number of signal crossings per unit of time into a voltage amplitude, so that the more numerous the crossings the larger the voltage.

One such means may conveniently take the form shown in FIG. 4, which depicts successively, an impedance matching circuit 26, a peak clipper 28, a differentiat-or 30, and an integrator 32. The impedance matching circuit 26 is shown as a matching transformer 34, although other commonly known types of circuit-s may be used equal success. Since the output from the amplifier may be only a few ohms, a transformer is used to provide the necessary isolation for the remainder of the circuit. The output, or secondary, winding of the transformer normally ha-s a-n impedance on the order of a few .hundred ohms.

The secondary of transformer 34 may be connected to an amplitude limiting or clipping means such as a peak 6 clipper 28, a convenient form of which may be a pair of

diodes

36 and 38 as shown in FIG. 4. The diodes are connected so that the cathode of diode 36 is connected to the more positive side of the transformer and diode 38 is connected so that its cathode is connected to the more negative side of the transformer. To predetermine which side of the transformer is the more positive connection, a ground 40 may be connected to the bottom line.

If an internal resistance of diode 38 is assumed then when the voltage level on positive cycles of the voltage exceeds a threshold level the diode will conduct and appear to be thefinternal resistance of the diode, thereby establishing a maximum equivalent to the voltage drop across this resistance, as explained starting on page 583 of Electronic Fundamentals and Applications by John D. Ryder, copyright 1950 by Prentice-Hall, Inc. Because the threshold level of the diode may be assumed to be very small, on the order of a fraction of a volt, lthe clipping level is very small, ideally just above the environmental noise level of the room or background.

The negative portion of the applied voltage is clamped to ground through diode 36, leaving a waveform that varies from ground to a positive voltage.`

The signal from the peak clipper 28 may lthen be applied to a differentiating means such as a differentiator 30. One such circuit may simply be a combination of a capacitor 42 and resistor 44 connected so that the capacitor is in series with the positive line from the peak clipper and the resistor is shunted between output side of the capacitor and ground. The theory of this differentiator is given starting on page 569 of the Ryder reference cited above.

The effect of a differentiating means on a squared wave is to produce a peaked pulse for every positive level reached. The peak of such a pulse is at the same amplitude as the amplitude level of the squared wave into the differentiator. Hence, the output from differentiator 30 may be considered to be responsive to phase reversals and may be represented by a series of pulses at the voltage level of the clipped voltage height, there being a pulse for each positive-going signal, or change of signal from negative-going to positive-going. Since the clipping level is at a very small value, as explained above, the number of pulses is equivalent to the number of time axis crossings of the signal in the positive direction.

The duration of each pulse is established by the time constant of capacitor 42 and resistor 44, and not by the duration of the applied signal. Therefore, each pulse is uniform in duration as well as amplitude.

The output from the differentiator 30 may then be applied to an integrating means such as integrator 32, a convenient form of which may be merely a combination of a diode 46, a capacitor 48 and a'resistor 50. Diode 46 conducts each time a positive pulse is received from the differentiator and applies the voltage to the capacitor 48 of the integrator 22, which in turn, bleeds off across resistor 50. If a number of pulses pass through diode 46, before the voltage can bleed off across resistor 50, then the voltage level will be an effective measurement of the number of pulses, or an effective voltage summing means. That is, the number of pulses received will be proportional to the voltage level.

A satisfactory time constant for the operation of the integrator has been found to be 0.2 second.

Since the voltage is constantly being replenished by succesive pulses and constantly being drained off by resistor 50, there is a voltage output which is relatively smooth reflecting a voltage amplitude for an everchanging series of pulses that overlap one another as time progresses.

This voltage level may be represented on a display unit, a convenient one of such units being an oscilloscope 52, although a strip recorder or similar device may be equally acceptable. Each voice sound is represented with a trace having unique and identifiable amplitude characteristics. Since an oscilloscope has a high input resistance, and since a relatively long time constant including a high resistance value is required for the integrator, resistor 50 may be merely the input resistance of the oscilloscope.

Although the peak clipper and differentiator circuit was described above as being purely responsive to phase reversals or time-axis crossings, this is not the case for the actually built equipment. And the fact that it is not makes the operation of the circuit more practical than if the circuit were responsive strictly to the number of time-axis crossings and ignored the frequency at which these time axis crossings occurred.

Once the high frequencies are emphasized in the amplier 14, then it is not necessarily desirable to have a linear response throughout the frequency range so that the voltage on the display unit is linear with the numbe.r of time-axis crossings If it were, the very high frequency sounds would be reflected in a voltage much higher than the voltage from the low frequency sounds so that it would be difficult to display these voltages on the same display unit scale.

The diodes used in the peak clipper inherently overcome this diiiculty. This is because the voltage drop across a conducting diode is somewhat a function of current, so that instead of the procurement of a truly flat output when the diode conducts during the clipping operation, the output voltage is slightly humped or rounded. The wider the waveform (as for the low frequency signals) the higher the peak of the hump is allowed to go and hence the higher the peak will be on the subsequent and corresponding differentiator pulse. This means that the time axis crossings response is not linear, but tends to be logarithmic and allows the same scale on the display unit to be used for meaningful results of .high and low frequencies.

Several of the recited component parts of the system described above with reference to FIG. 4 have known alternative structures that are well known in the electronics art and which may be substituted for those components recited in the above example circuit. For example, it may be desirable to Ause a Schmitt trigger for .the pulseproducing means instead of the peak clipper and differentiator. Such a circuit turns on when a predetermined voltage threshold level is reached. Such a circuit may have an output which produces a spike, similar to the output from the capacitor-42-resistor-14 arrangement.

Another alternate circuit that may be used in place of the peak clipper and differentiator shown in FIG. 4 is a tunnel diode. Characteristic of a tunnel diode is that it produces a spiked output when a certain applied voltage level is exceeded, which output is similar to the output from differentiator 30.

It should be mentioned, however, that either of these two suggested substitutes results in a more linear response to frequency than does the two-diode arrangement and hence the use of either makes it a little harder to display the resulting waveform on a display unit. Of course, the response can be altered by the use of a logarithmic-responsive circuit on the output of the amplifier 14 so that a linear response for the rest of the circuit may be perfectly desirable.

The integrator circuit, in addition to the one shown in FIG. 4, may also conveniently take the form of an .RC combination or an LR combination, as described n page 571 of the Ryder text reference above and both of which are common in the art. A preferred structure of the RC combination that has been used successfully is the Daniels type integrator. This arrangement shows two sections of resistors and capacitors, as shown in FIG. 5 in which resistor 58 is three times the value of resistor 54 and capacitor 60 is one-third the value of capacitor 56. This circuit arrangement has extremely good drop-off voltage characteristics so that it can operate with a relatively short time constant and be responsive to high freD quencies.

As explained above, each sound is represented by a voltage that has a distinct shape relatively independent of loudness (although an extremely loud signal will have an effect on the maximum height of the hump of the voltage waveform from the diodes in the clipper circuit) of the sound. This shape, depending on the components used, may or may not be linear and may or may not be logarithmic in response. However, it will be dependent on time axis crossings of the generating sound signal.

Another characteristic of the shape is that the overall frequency of the sound has little effect on the resulting waveshape. This means that a deep male voice enunciating an expression produces a trace that will be similar to a woman or a child with a high voice enunciating the same expression. The only difference will be the height of the trace (function of voltage) and the slight amount of variation caused by the non-linear response. But since this response is a relative response, then the overall appearances of the traces are highly similar in form or shape.

Resulting waveforms for a few example expressions are shown in FIG. 6. The relatively unvoiced expression ka shown in waveform 62 has a much steeper onset than the relatively voiced expression ga" shown in waveform 64. This is to be expected since there are a greater number of high frequencies in the unvoiced sounds than in the voiced sounds.

The same type of reasoning applies to the difference in waveforms between the relatively unvoiced sound ta, waveform 66, with that of the relatively voiced sound da, waveform 68.

Waveforms

70 and 72 show the comparison between the indication of the pronunciation'of the phonemes bee and bi as in bit Waveform 70 shows that there is a somewhat slow build up of frequency at the beginning and even slower, or more gradual, drop off of frequency at the end. A careful pronunciation of this phoneme confirms the truth of the trace.

Waveform 72, by comparison starts and stays at virtnally the same frequency throughout its trace. This, too, is confirmed by a careful pronunciation of the expression, which reveals that the vowel sound is initiated toward the front of the mouth and not so much toward the rear as in bee Finally, waveforms 74 and 76 show the traces of the expressions cha and jaw, respectively. Both sounds start approximately at the same frequency, but the cha trace 74 maintains the vowel sound at the same frequency for a much longer duration than does the jaw trace 76.

Also, notice that there is a characteristic dip following the initial peak in the jaw trace that clearly distinguishes the two traces. It is just such things as this, which are often subtle and unsuspected, that give each expressed sound a shape that distinguishes it from all others.

One convenient use of the invention may be the displaying of the speech sounds from a student with those of a reference used as a standard of excellence. The student just learning the language, or trying to correct or perfect his speech affected by some defect or accent, may merely attempt to match the display with his own spoken sounds.

One technique for accomplishing such a comparison may be through the use of tracing the standard on a piece of tracing paper and overlaying the face of a display oscilloscope therewith. Of course, the trace may be traced by any convenient means, such as photographically. Also, it may not be desirable to overlay the oscilloscope for some reason, so the trace may be put on any type 0f con-1 venient medium and merely placed alongside the oscilloscope for easy reference.

More sophisticated methods of making comparison between a reference, or standard, and that of the basic channel of the student may Ibe desired for a particular application. One such arrangement that may be conveniently employed is shown in FIG. 7. Here, a microphone 77, a preamplifier 78, an amplifier 80, a character determining circuit 82, and one channel of a doubletrace oscilloscope 86 form the 4basic channel and reference 9 channel 84 and the other channel of the double-trace oscilloscope 86 form the com-plete display reference channel.

The boxes of the basic channel are shown arranged in the manner shown to obtain maximum use of standard component packaging. The microphone 77, the preamplifier 80, and the double-trace oscilloscope 86 are standard units and are commercially avail-able as complete entities. Notice that the preamplifier must have means for emphasizing the high frequencies and preferably should also have means for de-emphasizing the low frequencies. Also, the emphasis should preferably be adjustable in various frequency ranges. Such preamplifers are standard components for hi-fidelity equipment.

The character determining circuit may be any convenient circuit that produces a voltage responsive to timeaxis crossings of the frequency, as explained above, for displaying on one of the two display channels of the oscilloscope.

The reference channel may be a duplicate of the basic channel starting with a tape recording of the reference sound. However, the channel can be simplified by merely taping the voltage that results from the action of such a channel of components so that it may be displayed on the second trace channel of the oscilloscope.

In -use the reference sound may be repeated several times so that the student can watch the references trace while speaking the sound himself and attempting to conform as closely as possible therewith.

An improved comparison system is shown in FIG. 8, which shows a reference channel 88 and a basic channel 90 similar to the ones for FIG. 7. But, instead of an oscilloscope connected to the respective outputs from these two'channels, a voltage comparator 92 is shown. A voltage comparator similar to the one discussed starting on page 459 of Pulse and Digital Circuits, by Millman and Taub and published by McGraw-Hill Book Company, Inc., copyright 1956, may be used. Such a voltage comparator produces an output voltage proportional to the difference 'between two applied signals.

lIn F-IG. 8, when the two traces are identical, then there will be no output. When there is a difference, there will be an output. In addition, the more the difference, the greater the output. The output from voltage comparator 92 may be applied directly to -an oscilloscope 94 or it may be applied to an integrator 96, and then to oscilloscope 94 or voltage meter (not shown). The advantage of using an integrator is that a longtime constant may be lused so that the total amount of voltage difference may be shown for a whose phoneme, rather than the relatively instantaneous voltage difference that would otherwise be shown.

Note that the use of the system as described requires the student to start his voice sound in synchronism with the standard and maintain the same duration of the sound. This teaches the student not only proper pronunciation, but also teaches him the proper metering of the words and the duration of the sounds.

A more sophisticated system is shown in the dotted portion of FIG. 8. Here the output from the voltage comparator is applied to limit trigger circuit 98, such as a Schmitt trigger, so that when a preset threshold voltage amplitude (refiecting unacceptable error from the standard) is reached an impulse triggers an out-of limits indicating device 104, which may conveniently be a light. If the limit is not reached then there is no output pulse on line 106. But, there is an output directly -from the comparator on line 108 each time a word is spoken so that Nor gate 100 is conditioned to allow the variable level device 102 to change.

This variable level device 102 may be any convenient device, such as a counter and logic arrangement that selects progressively larger (or smaller) impedances for changing the control voltage for the limit trigger circuit 10 98. What is established, in any event, is a tightening, or resetting, of the limits of circuit 98 so that a subsequent voltage difference from the comparator at the same level as the time before, which failed to trigger the out-oflimits `device 104, this time triggers device 104. This makes it harder and harder for the student to mispronounce a sound without device 104 indicating, having the effect of forcing him into closer match of the reference.

While various embodiments of the invention have been described, it is obvious that various substitutes or modifications of structure may be made without varying from the scope of the invention. One such variation not already mentioned above, is a circuit that triggers on the change of slope (slope detection), rather than on the number of time axis crossings, for producing the pulses to the integrator 32. Also, an arrangement may be devised wherein the negative as well as, or instead of, the positive clipped peaks could `be used for triggering related circuits for producing information-shaped voltage traces.

What is claimed is:

1. In a system for analyzing speech sounds means for converting substantially all of the frequency components contained in a speech sound into corresponding electrical energy,

means 'for emphasizing the frequency components of said energy above an intermediate frequency relative to those frequency components below said intermediate frequency, and

means for producing a voltage primarily related to phase reversals of the resulting energy,

said voltage having -a unique and identifiable variable amplitude characteristic as a function of time for every uttered speech sound, and

means for producing a representation of said voltage as `a function of time.

2. In a system according to claim 1, the means for producing a representation of said voltage including means for producing a visual display of such voltage as .a function of time.

3. In a speech analyzing system,

detecting means for converting sounds in the range of speech frequencies to electrical energy, amplifying means connected to said detecting means such that the frequency components of the electrical energy above an intermediate frequency are emphasized relative to the frequency components of the electrical energy below the intermediate frequency,

means connected to said amplifier means responsive to time axis crossings of `the energy signal from said amplifier means so as to produce an output comprising a series of spiked pulses at substantially constant amplitude regardless of the amplitude of the energy signal, the number of said pulses being proportional to the number of time axis crossings,

voltage summing means connected to the last said means for producing a voltage the amplitude of which corresponds to the number of said spiked pulses per unit of time, and

display means connected to said voltage summing means for producing a display of said voltage as a function of time whereby a unique and identifiable trace for every uttered speech sound is provided.

4. A speech display device, comprising detecting means for converting the range of speech frequencies including up to approximately 8000 cycles per second to electrical energy,

amplifying means connected to said detecting means such that t-he frequency components of the electrical energy above an intermediate frequency, which is below approximately 5000 cycles per second, are emphasized relative to the frequency components of the electrical energy below such intermediate frequency,

means connected to said amplifying means for limiting the amplitude of the energy signal from said amplifier means just above the environmental noise level,

differentiating means connected to said limiting means for producing a series of pulses, the number of said pulses being proportional to the number of time axis crossings,

integrating means connected to said differentiating means for producing a voltage amplitude dependent upon the number of pulses per unit of time and the amplitude of said pulses, and

display means connected to said integrating means for producing a unique and identifiable trace for every uttered speech sound.

5. A speech display device in accordance with claim 4,

wherein said limiting means is connected to said amplifying means via a positive and a negative line and said limiting means comprises a pair of diodes, one of said diodes connected wit-h its cathode to the positive line and its anode to the negative line and the other of said diodes connected with its cathode to the negative line and its anode to the positive line.

6 A speech display device in accordance with claim 4,

wherein said limiting means is approximately logarithmically responsive to the frequencies of the energy signal from said amplifier means such that the lower frequencies are not so limited as the higher frequencies, and

said differentiating means produces pulses with higher pulses for the lower frequencies than for the higher frequencies 7. A speech display device, comprising detecting means for converting the range of speech frequencies including up to about 8000 Hz. to electrical energy,

amplifying means connected to caid detecting means such `that the frequency components of the electrical energy above about 5000 Hz. are emphasized rela tive to the frequency components of the electrical energy below about 5000 Hz.,

means connected to said amplifying means for producing a pulse each time the amplitude level of the energy reaches an established threshold level, each produced pulse being of about equal duration and amplitude dimension with every other produced pulse,

integrating means connected to said pulse-producing means for producing a voltage amplitude dependent upon the number of pulses per unit of time and the amplitude of said pulses, and

display means connected to said integrating means for producing a visual display of said voltage as a function of time whereby a unique and identifiable trace is produced for every uttered speech sounde 8. A speech display device comprising detecting means for converting the range of speech frequencies including up to about 8000 Hz, to electrical energy,

amplifying means connected to said detecting means such that the frequency components of the electrical energy bove about 5000 Hz. are emphasized relative to the frequency components of the electrical energy below about 5000 Hz.,

means connected to said amplifying means for producing a pulse each time the amplitude level of the energy reaches an established threshold level, each produced pulse being of approximately equal duration and amplitude dimension with every other produced pLlse,

integrating means connected to said pulse-producing means for producing a voltage amplitude dependent upon the number of pulses per unit of time and the amplitude of said pulses, said voltage having a unique and identifiable variable amplitude characteristic for every uttered speech sound,

wherein said integrating means comprises a diode with its anode connected to the output of said pulse producing means and its cathode to the input of the display means, and a parallel combination of a capacitance and the input resistance of said display means connected to the cathode `of said diode, the opposite end of said parallel combination being connected to a signal return path, said capacitance and said resistance establishing the time constant for said integrating means. 9. A speech display device comprising detecting means for converting the range of speech frequencies including up to about 8000 Hz. to electrical energy, amplifying means connected to said detecting means such that the frequency components of the electrical energy above about 5000 Hz. are emphasized relative to the frequency components of the electrical energy ybelow about 5000 Hz., means connected to said amplifying means for producing a pulse each time the amplitude level of the energy reaches an established threshold level, each produced pulse 4being of approximately equal duration and amplitude dimension with every other produced pulse, integrating means connected to said pulse-producing means for producing a voltage amplitude dependent upon the number of pulses per unit of time and the amplitude of said pulses, said voltage having a unique and identifiable variable amplitude characteristic for every uttered speech sound,

wherein said integrating means comprises a first resistance connected to the output of said pulse producting means,

a first capacitance connected to the output of said first resistance with its opposite end being connected to a signal return path,

a second resistance with a first end connected to the connection between said first resistance and said first capacitance and with its second end being connected to the input of said display means, and having a resistance value approximately three times that of said first resistance, and

a second capacitance connected to the end of said second resistance that is connected to the display means with the opposite end of said second capacitance connected to the signal return path, and having a capacitance value approximately one-third that of said first capacitance,

said capacitance and said resistances establishing the time constant for said integrating means.

10. A system for providing a comparison of a displayed speech sound with a recognized reference, comprising i3 as a function of time for every uttered speech` sound,

means for displaying said voltage adjacent a visual reference for convenient comparison therewith.

11. A speech display device for comparing a displayed speech so-und with a recognized reference, comprising first electronic means for presenting a recorded reference voltage being characteristic of the recognized reference sound;

second electronic means for presenting a voltage being characteristic of the speech sound to be compared, including detecting means for converting the range of frequencies of said speech sound including up to approximately 800 cycles per second to electrical energy,

amplifying means connected to said detecting means such that the frequency components of the electrical energy above an intermediate frequency, which is below approximately 5000 cycles per second, are emphasized relative to the frequency components of the electrical energy below said intermediate frequency, means connected to said amplifier means responsive to time axis crossings of the energy signal from said amplifier means so as to produce an output comprising a series of spiked pulses at substantially constant amplitude, the number of said pulses being proportional to the number of time axis crossings,

voltage summing means connected to the last said means for producing a voltage the amplitude of which corresponds to the number of said spiked pulses per unit of time, said voltage having a unique and identifiable shape for every uttered speech sound; and

a display device having dual display channels, one

display channel being connected to the output of said first electronic means presenting the recorded reference voltage and the other display channel being connected to the second electronic means presenting the speech sound to be compared so that the two channels are viewable substantially simultaneously..

12. A device for comparing a speech sound of unknown quality with a recognized reference sound, com-1 prising a first electronic means for presenting a recorded reference voltage being characteristic of the recognized. reference Y sound, second electronic means for presenting a voltage being characteristic of the speech sound to be compared,

comparator means connected to said rst electronic means and second electronic means `for producing a voltage which is a measure of the voltage difference between said first and second means,

indicating means connected to said comparator means triggered when the voltage from said comparator means exceeds a preset amplitude value, said indicating means comprising a limit circuit connected to said comparator means that produces an output signal when a preset voltage amplitude value from said comparator means is exceeded, and

warning means connected to said limit circuit and activated by said output signal.

13. A device for comparing a speech sound of unknown quality with a recognized reference sound, comprising a first electronic means for presenting a recorded reference voltage being characteristic of the recognized reference sound, second electronic means for presenting a voltage being characteristic of the speech sound to be compared,

comparator means connected to said first electronic means and second electronic means for producing a voltage which is a measure of the voltage difference between said first and second means,

indicating means connected to said comparator means triggered when the voltage from said comparator means exceeds a preset amplitude value, and

means connected to said comparator means and to said electronic means for resetting the preset limit to a new value when the voltage from said comparator means does not exceed the preset amplitude value, said resetting reducing the preset limit of said indicating means.

14. A speech display device for comparing a speech sound of unknown quality with a recognized reference sound, comprising first electronic means for presenting a recorded reference voltage being characteristic of the recognized reference sound,

second electronic means for presenting a voltage being characteristic of the speech sound to be compared, said voltage produced by the second electronic means being related to the phase reversals per unit time of an electrical signal corresponding to such speech sound with higher frequencies precmphasized,

comparator means connected to said first electronic means and said second electronic means for producing a voltage which is a measure of the voltage difference -between the outputs of said first and second means, and

a display device connected to said comparator means for displaying the voltage level as an indication of quality of the speech sound as related to the reference sound. s

15. A device for comparing a speech sound of unknown quality with a recognized reference sound, cornprising a first electronic means for presenting a recorded reference voltage being characteristic of the recognized reference sound, including detecting means for converting the range cf frequencies in said reference sound including up to approximately 8000 cycles per second to electrical energy,

amplifying means connected to said detecting means such that the frequencycomponents of the electrical energy above an intermediate frequency, which is below approximately 5000 cycles per second, are emphasized relative to the frequency components of the electrical energy below said intermediate frequency,

means connected to said amplifier means responsive'to time axis crossings of the energy Signal from said amplifier means so as to produce an output comprising a series of spiked pulses at substantially constant amplitude, the number of' said pulses being proportional to the number of time axis crossings,

voltage summing means connected to the last said-QT; means for producing a voltage the arnplitude' of which corresponds to the number of said-l spiked pulses per unit of time;

a second electronic means for presenting a recorded reference voltage `being characteristic of the speech sound to be compared, including detecting means for converting the range of speech frequencies including up to approximately 8000 cycles per second to electrical energy,

amplifying means connected to said detecting means such that the frequency components of the electrical energy above any intermediate frequency, which is below approximately 5000 cycles per second, are emphasized relative to the frequency components of the electrical energy below said intermediate frequency,

means connected to said amplifier means responsive to time axis crossings of lthe energy signal from said amplifier means so as to produce an output comprising a series of spiked pulses at 15 substantially constant amplitude, the number of said pulses being proportional to the number of time axis crossings, voltage summing means connected tcI the land said means for producing a voltage the amplitude 0f which corresponds to the number of said spiked pulses per unit of time; comparator means connected to said iirst electronic means and said second electronic means for producing a voltage which is a measure of the voltage difference between said. rst and second electronic means; and voltage summing means connected to said comparator means for producing a voltage the amplitude of which 16 16. A device in accordance with claim 11, and including measurement means connected to said voltage summing means for producingr an indication of conformity of said speech sound of unknown quality to said reference sound.

References Cited UNITED STATES PATENTS 3,202,761 8/1965 Bibber() 179-1 3,294,918 12/1966 Gold 179--1 3,316,353 4/1967 Dersch 179-1 KATHLEEN H. CLAFFY, Primary Examiner.

corresponds to the total voltage difference taken over 15 R. MURRAY, Assistant Examiner.

the entire sound interval.