US3522376A - Single-equivalent formant prenormalizer utilizing feedback - Google Patents

Single-equivalent formant prenormalizer utilizing feedback Download PDF

Info

Publication number
US3522376A
US3522376A US702623A US3522376DA US3522376A US 3522376 A US3522376 A US 3522376A US 702623 A US702623 A US 702623A US 3522376D A US3522376D A US 3522376DA US 3522376 A US3522376 A US 3522376A
Authority
US
United States
Prior art keywords
formant
amplitude
frequency
equivalent
formants
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US702623A
Inventor
Louis R Focht
Charles F Teacher
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Space Systems Loral LLC
Original Assignee
Philco Ford Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Philco Ford Corp filed Critical Philco Ford Corp
Application granted granted Critical
Publication of US3522376A publication Critical patent/US3522376A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00

Definitions

  • ABSTRACT OF THE DISCLOSURE A single-equivalent formant speech analysis system in lwhich the amplitude of the low frequency formants of a sound are decreased when a dominant high frequency formant is detected or increased when a dominant low frequency formant is detected. This is accomplished by using the output of the single-equivalent formant detector as a feedback signal to regulate the amplitude of the low frequency formants.
  • the dominant formant of a sound is the formant of largest amplitude.
  • the differences between the amplitudes of the high frequency formants of a sound and the amplitudes of its low frequency formants are increased whenever a dominant high frequency formant is detected.
  • This increase is controlled in a feedback arrangement by a signal having an amplitude respresentative of the frequency of the single-equivalent formant of the sound.
  • an electrical representation of a speech wave is supplied to a high-pass filter channel and to a low-pass filter channel.
  • the high-pass filter channel passes information unaltered to a 'voltage summation network, the output of which is supplied to a single-equivalent formant frequency analyzer of the type described in the aforementioned copending application.
  • the output of the low-pass filter channel is supplied to a variable gain network, the output of -which is supplied also to the voltage summation network.
  • the gain of the variable gain network is 3,522,376 Patented July 28, 1970 controlled by the output of the single-equivalent formant frequency analyzer in a feedback arrangement.
  • FIG. l is a block diagram of a single-equivalent formant frequency analyzer in accordance with the present invention.
  • FIG. 2 is a graph showing the relative formant amplitudes for the vowel sounds u (boot) and z' (eve);
  • FIG. 3 is a waveform representation of the singleequivalent formant frequency voltage when the word buoy is uttered.
  • FIG. l shows a single-equivalent formant frequency analysis system in accordance with the present invention.
  • An electrical representation of a speech wave such as produced by a standard telephone carbon microphone (not shown) is supplied to a low-pass filter 2 and to a highpass filter 4.
  • filter 2 is designed to pass energy in a frequency band extending from approximately cycles per second (the lower limit of ordinary telephone transmission) to 1500 cycles per second and filter 4 is designed to pass energy in a frequency band extending from approximately 1500 cycles per second to approximately 3200 cycles per second (the upper limit of ordinary telephone transmission). It is not intended that the invention be limited to the frequency bands set forth above.
  • Filter 4 is coupled through a voltage summation network 6 of a type well known in the electronics art, for example a simple resistive adder network, to a singleequivalent formant frequency detector 8.
  • Detector 8 produces a signal the amplitude of which is representative of the single-equivalent formant frequency of the input speech wave by measuring the period of the first major oscillation of the input speech wave.
  • Detector 8 may consist of a bistable switching device coupled to a pulse width to amplitude converter. The construction and operation of detector 8 is described in detail in the aforementioned copending application.
  • Filter 2 is coupled through a variable gain (multiplier) network 10 to a second input of voltage summation network 6.
  • the gain of network 10 is controlled by the output signal of detector 8 which is applied to network 10 by connection 12.
  • the direction of the change in gain of network 10 corresponds directly to the direction of the change in amplitude of the output signal of detector 8. That is, the gain of network 10 will be decreased when the amplitude of the output signal from detector 8 decreases and will be increased lwhen the amplitude of the output signal from detector 8 increases.
  • a suitable multiplier network for the system of the present invention is illustrated (FIG. 3) and described in U.S. Pat. No. 3,017,019, issued to V. R. Briggs on Jan. 16, 1962, entitled, Pulse Width Signal Multiplying System.
  • FIG. 2 shows the frequencies of the first three formants of these vowel sounds plotted against their relative formant amplitudes in db after a 9 db per octave high frequency emphasis.
  • the 9 db per octave high frequency emphasis is necessary to illustrate the effect of the formants on the human hearing mechanism because it is believed that a high frequency emphasis of approximately 9 db per octave is performed in the human hearing mechanism.
  • the solid line 22 illustrates that the amplitude of the first formant F1 of the vowel sound u (boot) is larger than the amplitude of the second and third formants F2 and F3 for this vowel sound while the solid line 24 illustrates that the amplitudes of the second and third formants F2 and F3 of the vowel sound i (eve) are substantially larger (for most speakers) than the first formant F1 for this vowel sound.
  • the amplitude of the signal from formant frequency detector 8 is inversely proportional to the frequency of the dominant formant. Therefore, ideally, when the word buoy is spoken, the output of detector 8 will have the waveform indicated as A in FIG. 3.
  • Region x of waveform A represents articulation of the vowel sound u (boot) of the word buoy. Since the first formant F1 of this sound is of greater amplitude than the second and third formants F2 and F3 of this sound (FIG. 2), region x is of relatively high amplitude.
  • Region y of ⁇ waveform A represents articulation of the vowel sound (eve) of the word buoy. Since the largest amplitude formant F2 of this sound is of greater frequency than that of the largest amplitude formant F1 of the ⁇ vowel sound u l(FIG. 2), region y is of smaller amplitude than region x.
  • the amplitudes of the dominant high frequency formants for some vowel sounds are lower than those of most speakers.
  • the formant amplitude-frequency distribution of the vowel sound i for such a speaker may be as shown by the dashed curve 26 in FIG. 2.
  • detector 8 When a signal having this formant amplitude-frequency distribution is supplied to detector l8, detector 8 usually makes the initial determination that the second formant F2 is dominant, that is, of largest amplitude, but, due to the small difference between the amplitudes of the first and second formants F1 and F2, this determination is often only temporary and detector 8 often produces an oscillating output signal such as waveform B of FIG. 3. Waveform B indicates that the detector 8 is designating in a random fashion the second formant F2 and then the first n formant F1 as the dominant formant.
  • this random fluctuation of the signal representative of the single-equivalent formant frequency is corrected by decreasing the amplitude of the low frequency formants of a sound when a dominant high frequency formant is detected.
  • This correction is achieved by using the output of detector 8 as a gain control for variable gain network 10.
  • the gain of network 10 is decreased, thus decreasing the amplitude of the low frequency formants of the signal supplied to detector 8 relative to the high frequency formants of this signal (dotted curve 28 of FIG. 2).
  • This increases the amplitude difference between the high and low frequency formants, providing prolonged detection of the high frequency formants as the dominant formants and thereby increasing the stability and reliability of the analysis system for all speakers.
  • This decrease in the amplitude of the low frequency formants of a sound in order to detect a dominant high frequency formant of the same sound does not prevent the subsequent detection of a dominant low frequency formant of a different sound because the amplitude of the latter formant will generally be high enough so that it produces an increase in the amplitude of the output signal of detector 8 even though its amplitude is decreased initially by network 10.
  • This increase in the amplitude of the output signal of detector 8 increases the gain of network 10 which further increases the amplitude of the output signal of detector 8. The result of these increases is an output signal from detector 8 which represents articulation of a sound having a dominant low frequency formant.
  • An improved system for analyzing a speech wave comprising first means for supplying an electrical representation of an acoustic speech Wave, the formants of said speech wave having a given frequency-amplitude relationship; second means for producing a signal representative of the frequency of the single-equivalent formant of a speech wave applied thereto; and third means coupled between said first means and said second means, said third means including control means for changing the frequency-amplitude relationship of the formants of said speech wave supplied to said second means whenever a signal indicative of the presence in said speech wave of a dominant high frequency formant appears at the output of said second means.
  • control means increases the difference between the amplitudes of the high frequency formants of said speech wave and the amplitudes of the low frequency formants of said speech wave in response to said signal indicative of the presence in said speech wave of a dominant high frequency formant.
  • control means includes a low pass filter coupled to said first means, a high pass filter coupled to said first means, a variable gain network having an input coupled to said low pass lter and an input coupled to the output of said second means, and a signal summation network having inputs coupled to the outputs of said high pass filter and said variable gain network, and an output coupled to said second means.
  • said low pass filter has a bandpass extending from approximately cycles per second to approximately 15'00 cycles per second and said high pass lter as a bandpass extending from approximately 1500 cycles per second to approximately 3200 cycles per second.
  • control means decreases the amplitudes of the low-frequency formants of said speech wave in response to said Signal indicative of the presence in said speech wave of a dominant high frequency formant.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)

Description

Filed Feb. 2, 1968 L. R.- FOCHT ETAL SINGLE-QUIVALENT"FORMANT PRENORMALIZER UTILIZING FEEDBACK July 2s, 1970 United States Patent "ice 3,522,376 SINGLE-EQUIVALENT FORMANT PRE- NORMALIZER UTILIZING FEEDBACK Louis R. Focht, Huntingdon Valley, and Charles F. Teacher, Philadelphia, Pa., assignors to Philco-Ford Corporation, Philadelphia, Pa., a corporation of Delaware Filed Feb. 2, 1968, Ser. No. 702,623 Int. Cl. G01l l /00 U.S. Cl. 179-1 5 Claims ABSTRACT OF THE DISCLOSURE A single-equivalent formant speech analysis system in lwhich the amplitude of the low frequency formants of a sound are decreased when a dominant high frequency formant is detected or increased when a dominant low frequency formant is detected. This is accomplished by using the output of the single-equivalent formant detector as a feedback signal to regulate the amplitude of the low frequency formants.
by a single signal are amplitude of which is representative of the frequency of the dominant formant of the sound. As used herein, the dominant formant of a sound is the formant of largest amplitude.
It has been found that for some speakers, the amplitudes of the high frequency formants of some sounds tend to deviate from the optimum or standard (based on statistical determinations) for which the aforementioned single-equivalent formant speech analysis system is calibrated. This deviation can produce errors in operation and hence makes the analyzer unreliable for these speakers.
It is therefore an ob-ject of the present invention to provide a single-equivalent formant speech analysis system of the type described in the aforementioned copending yapplication which will respond reliably to a wider range of speech sounds.
It is a further object of the present invention to provide a single-equivalent formant speech analysis system of the type described in the aforementioned copending application capable of responding reliably to all speakers.
According to the present invention, the differences between the amplitudes of the high frequency formants of a sound and the amplitudes of its low frequency formants are increased whenever a dominant high frequency formant is detected. This increase is controlled in a feedback arrangement by a signal having an amplitude respresentative of the frequency of the single-equivalent formant of the sound.
In a preferred embodiment of the present invention an electrical representation of a speech wave is supplied to a high-pass filter channel and to a low-pass filter channel. The high-pass filter channel passes information unaltered to a 'voltage summation network, the output of which is supplied to a single-equivalent formant frequency analyzer of the type described in the aforementioned copending application. The output of the low-pass filter channel, on the other hand, is supplied to a variable gain network, the output of -which is supplied also to the voltage summation network. The gain of the variable gain network is 3,522,376 Patented July 28, 1970 controlled by the output of the single-equivalent formant frequency analyzer in a feedback arrangement.
The above objects and other objects inherent in the present invention will become more apparent when considered in conjunction with the following specification and drawings in which:
FIG. l is a block diagram of a single-equivalent formant frequency analyzer in accordance with the present invention;
FIG. 2 is a graph showing the relative formant amplitudes for the vowel sounds u (boot) and z' (eve); and
FIG. 3 is a waveform representation of the singleequivalent formant frequency voltage when the word buoy is uttered.
Referring now to the drawings, the block diagram of FIG. l shows a single-equivalent formant frequency analysis system in accordance with the present invention. An electrical representation of a speech wave, such as produced by a standard telephone carbon microphone (not shown) is supplied to a low-pass filter 2 and to a highpass filter 4. In accordance with a preferred embodiment of the present invention filter 2 is designed to pass energy in a frequency band extending from approximately cycles per second (the lower limit of ordinary telephone transmission) to 1500 cycles per second and filter 4 is designed to pass energy in a frequency band extending from approximately 1500 cycles per second to approximately 3200 cycles per second (the upper limit of ordinary telephone transmission). It is not intended that the invention be limited to the frequency bands set forth above.
Filter 4 is coupled through a voltage summation network 6 of a type well known in the electronics art, for example a simple resistive adder network, to a singleequivalent formant frequency detector 8. Detector 8 produces a signal the amplitude of which is representative of the single-equivalent formant frequency of the input speech wave by measuring the period of the first major oscillation of the input speech wave. Detector 8 may consist of a bistable switching device coupled to a pulse width to amplitude converter. The construction and operation of detector 8 is described in detail in the aforementioned copending application.
Filter 2 is coupled through a variable gain (multiplier) network 10 to a second input of voltage summation network 6. The gain of network 10 is controlled by the output signal of detector 8 which is applied to network 10 by connection 12. The direction of the change in gain of network 10 corresponds directly to the direction of the change in amplitude of the output signal of detector 8. That is, the gain of network 10 will be decreased when the amplitude of the output signal from detector 8 decreases and will be increased lwhen the amplitude of the output signal from detector 8 increases. A suitable multiplier network for the system of the present invention is ilustrated (FIG. 3) and described in U.S. Pat. No. 3,017,019, issued to V. R. Briggs on Jan. 16, 1962, entitled, Pulse Width Signal Multiplying System.
The advantages achieved by the circuit of FIG. l will be apparent when the circuit of FIG. 1 is analyzed in conjunction with the articulation of the word buoy. This word contains the vowel sounds u (boot) and i (eve). FIG. 2 shows the frequencies of the first three formants of these vowel sounds plotted against their relative formant amplitudes in db after a 9 db per octave high frequency emphasis. The 9 db per octave high frequency emphasis is necessary to illustrate the effect of the formants on the human hearing mechanism because it is believed that a high frequency emphasis of approximately 9 db per octave is performed in the human hearing mechanism. In FIG. 2 the solid line 22 illustrates that the amplitude of the first formant F1 of the vowel sound u (boot) is larger than the amplitude of the second and third formants F2 and F3 for this vowel sound while the solid line 24 illustrates that the amplitudes of the second and third formants F2 and F3 of the vowel sound i (eve) are substantially larger (for most speakers) than the first formant F1 for this vowel sound.
As explained in the abovementioned copending application, the amplitude of the signal from formant frequency detector 8 is inversely proportional to the frequency of the dominant formant. Therefore, ideally, when the word buoy is spoken, the output of detector 8 will have the waveform indicated as A in FIG. 3. Region x of waveform A represents articulation of the vowel sound u (boot) of the word buoy. Since the first formant F1 of this sound is of greater amplitude than the second and third formants F2 and F3 of this sound (FIG. 2), region x is of relatively high amplitude. Region y of `waveform A represents articulation of the vowel sound (eve) of the word buoy. Since the largest amplitude formant F2 of this sound is of greater frequency than that of the largest amplitude formant F1 of the `vowel sound u l(FIG. 2), region y is of smaller amplitude than region x.
For some speakers, the amplitudes of the dominant high frequency formants for some vowel sounds are lower than those of most speakers. The formant amplitude-frequency distribution of the vowel sound i for such a speaker may be as shown by the dashed curve 26 in FIG. 2. When a signal having this formant amplitude-frequency distribution is supplied to detector l8, detector 8 usually makes the initial determination that the second formant F2 is dominant, that is, of largest amplitude, but, due to the small difference between the amplitudes of the first and second formants F1 and F2, this determination is often only temporary and detector 8 often produces an oscillating output signal such as waveform B of FIG. 3. Waveform B indicates that the detector 8 is designating in a random fashion the second formant F2 and then the first n formant F1 as the dominant formant.
According to the present invention this random fluctuation of the signal representative of the single-equivalent formant frequency is corrected by decreasing the amplitude of the low frequency formants of a sound when a dominant high frequency formant is detected. This correction is achieved by using the output of detector 8 as a gain control for variable gain network 10. When a dominant high frequency formant is detected initially (represented by a small amplitude signal from detector 8), the gain of network 10 is decreased, thus decreasing the amplitude of the low frequency formants of the signal supplied to detector 8 relative to the high frequency formants of this signal (dotted curve 28 of FIG. 2). This increases the amplitude difference between the high and low frequency formants, providing prolonged detection of the high frequency formants as the dominant formants and thereby increasing the stability and reliability of the analysis system for all speakers.
This decrease in the amplitude of the low frequency formants of a sound in order to detect a dominant high frequency formant of the same sound does not prevent the subsequent detection of a dominant low frequency formant of a different sound because the amplitude of the latter formant will generally be high enough so that it produces an increase in the amplitude of the output signal of detector 8 even though its amplitude is decreased initially by network 10. This increase in the amplitude of the output signal of detector 8 increases the gain of network 10 which further increases the amplitude of the output signal of detector 8. The result of these increases is an output signal from detector 8 which represents articulation of a sound having a dominant low frequency formant.
While the invention has been described with reference to a particular embodiment thereof, it will be apparent that various modifications and other embodiments thereof will occur to those skilled in the art within the scope of the invention. Accordingly, we desire the scope of our invention to be limited only by the appended claims.
What `we claim is:
1. An improved system for analyzing a speech wave comprising first means for supplying an electrical representation of an acoustic speech Wave, the formants of said speech wave having a given frequency-amplitude relationship; second means for producing a signal representative of the frequency of the single-equivalent formant of a speech wave applied thereto; and third means coupled between said first means and said second means, said third means including control means for changing the frequency-amplitude relationship of the formants of said speech wave supplied to said second means whenever a signal indicative of the presence in said speech wave of a dominant high frequency formant appears at the output of said second means.
2. The system of claim 1 wherein said control means increases the difference between the amplitudes of the high frequency formants of said speech wave and the amplitudes of the low frequency formants of said speech wave in response to said signal indicative of the presence in said speech wave of a dominant high frequency formant.
3. The system of claim 2 wherein said control means includes a low pass filter coupled to said first means, a high pass filter coupled to said first means, a variable gain network having an input coupled to said low pass lter and an input coupled to the output of said second means, and a signal summation network having inputs coupled to the outputs of said high pass filter and said variable gain network, and an output coupled to said second means.
4. The system of claim 3 wherein said low pass filter has a bandpass extending from approximately cycles per second to approximately 15'00 cycles per second and said high pass lter as a bandpass extending from approximately 1500 cycles per second to approximately 3200 cycles per second.
5. The system of claim 1 wherein said control means decreases the amplitudes of the low-frequency formants of said speech wave in response to said Signal indicative of the presence in said speech wave of a dominant high frequency formant.
No references cited.
KATHLEEN H. CLAFFY, Primary Examiner J. B. LEAHEEY, Assistant Examiner
US702623A 1968-02-02 1968-02-02 Single-equivalent formant prenormalizer utilizing feedback Expired - Lifetime US3522376A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US70262368A 1968-02-02 1968-02-02

Publications (1)

Publication Number Publication Date
US3522376A true US3522376A (en) 1970-07-28

Family

ID=24821982

Family Applications (1)

Application Number Title Priority Date Filing Date
US702623A Expired - Lifetime US3522376A (en) 1968-02-02 1968-02-02 Single-equivalent formant prenormalizer utilizing feedback

Country Status (1)

Country Link
US (1) US3522376A (en)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Similar Documents

Publication Publication Date Title
US4151368A (en) Music synthesizer with breath-sensing modulator
US5243660A (en) Directional microphone system
US3429976A (en) Electrical woodwind musical instrument having electronically produced sounds for accompaniment
US4322579A (en) Sound reproduction in a space with an independent sound source
KR890005973A (en) Frequency response characteristics adjustment method and device
KR100260224B1 (en) Howling preventing apparatus
US4162461A (en) Apparatus for extracting the fundamental frequency from a complex audio wave form
US4164626A (en) Pitch detector and method thereof
US4506379A (en) Method and system for discriminating human voice signal
US3522376A (en) Single-equivalent formant prenormalizer utilizing feedback
US3213199A (en) System for masking information
GB1309700A (en) Speech redognition apparatus
US3518566A (en) Audio system with modified output
JP2979119B2 (en) Automatic dynamic range control circuit
US3668322A (en) Dynamic presence equalizer
US3573374A (en) Formant vocoder utilizing resonator damping
DE69608822T2 (en) HEARING AID WITH IMPROVED PERCENTAGE GENERATOR
US2219729A (en) Device employed in the conversion of electrical energy into acoustic energy and viceversa
US3548100A (en) Formant frequency extractor
JPH06338746A (en) Agc circuit for audio apparatus
JPS6119297A (en) Low frequency correction circuit
US5220287A (en) Voice processing apparatus
US6633847B1 (en) Voice activated circuit and radio using same
JPS607848B2 (en) automatic volume adjustment device
US3491205A (en) Plural formant speech synthesizer