US2691137A - Device for extracting the excitation function from speech signals - Google Patents

Device for extracting the excitation function from speech signals Download PDF

Info

Publication number
US2691137A
US2691137A US296101A US29610152A US2691137A US 2691137 A US2691137 A US 2691137A US 296101 A US296101 A US 296101A US 29610152 A US29610152 A US 29610152A US 2691137 A US2691137 A US 2691137A
Authority
US
United States
Prior art keywords
channels
output
frequency
plurality
low frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US296101A
Inventor
Caldwell P Smith
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
US Air Force
Original Assignee
US Air Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by US Air Force filed Critical US Air Force
Priority to US296101A priority Critical patent/US2691137A/en
Application granted granted Critical
Publication of US2691137A publication Critical patent/US2691137A/en
Priority claimed from US58015256 external-priority patent/USRE24670E/en
Anticipated expiration legal-status Critical
Application status is Expired - Lifetime legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Description

C. P. SMITH EXTR DEVICE FOR ACTING THE EXCITATION FUNCTION FROM SPEECH SIGNALS Filed June 27. 1952 HND INVENTOR. CWLDWLL Ft 5N/7H um i Wird/0 .1,

HTOENE Y6 Patented Oct. 5, 1954 DEVICE FOR EXTRACTING THE EXCII'ATION FUNCTION FROM SPEECH SIGNALS Caldwell P. Smith, Boston, Mass., assigner to the United States of America as represented by the Secretary of the Air Force Application June 27, 1952, Serial No. 296,101

(Granted under Title 35, U. S. Code (1952),

sec. 2 6) 4 Claims.

The invention described herein may be manulectured and used by or for the Government for governmental purposes without the payment to me of any royalty thereon.

This invention relates to a device for producing electrical signals, one of which is indicative of the total amount of energy in a plurality of channels, another of which is available for distinguishing between voice and unvoced speech sounds and a third of which is available as a pitch Signal.

The device shown in the single figure of the accompanying drawing and contemplated hereby comprises an input bus 35 of two conductors on which is impressed an electrical signal that may be indicative of sound, such as human speech.

The bus 35 is connected in parallel to a plurality of channels I to 32, inclusive, of which the channels 3 to 30 are not shown for simplicity. The components within the channels are substantially duplicates of each other and hence corresponding channel components are designated by corresponding numerals primed.

The channels are of narrow frequency bandwidths which divide the speech or other frequencies into contiguous channels. Each filter is of the tuned audio band-pass type shown in the drawing in the dashed block designated 36 as a variable Q audio filter or more simply as a single tuned circuit. The filters are arranged with a frequency separation from each other corresponding to the Koenig scale. Each filter passes its output to a balanced fullwave rectifier 31.

Electrical signals suitable for distinguishing between voiced and unvoiced sounds are accomplished by the division of the network into two sections, one embracing low frequency balanced outputs and the other embracing high frequency balanced outputs from the rectirlers 31, 31'I etc. The division between these two sections is at approximately 3.000 cycles per second for male voices. The frequency division is so connected that three pairs of outputs numbered 42, 43 and 44 are obtained therebetween from a plurality of interconnected resistances. This plurality of interconnected resistances is connected with two pairs of input terminals, one from the low frequency filters and one from the high frequency lters and has three pairs of output terminals 42, 43 and 44.

The first output 44 from the plurality of interconnected resistances is in accordance with the summation of all the positive and negative voltages as a. measure of the total energy in the speech signal. The second output 42 is in accordance with the difference between the positive and negative voltages from the low frequency filter network, which is indicative of a pitch signal and uninuenced by the positive and negative voltages from the high frequency channels. The third output 43 is equal to the algebraic difference of the high frequency and low frequency outputs going to a detector not shown. and designated on the drawing as voiced unvoiced detector. The plurality of interconnected resistors between the low frequency and high frequency filter groups are mixed at contacts 43 to produce a difference voltage.

The channel frequencies of 1,000 cycles per second and below are increased by hundreds. Above 1,000 cycles per second the frequencies increase in a logarithmic relation. Voiced sounds provide exponential pulses that are periodic at the frequency of the fundamental pitch. Unvoiced sounds are shown by wave traces having ragged edges. The detector to be connected with the terminal 43 is commonly available and hence is not shown.

The heart of the invention resides in the combination of the filters 36, the full-wave rectiers 31 and the matrix or bridge system of the plurality of interconnected resistors l0 to 19, inelusive and the capacitors 80, 8l and 02, between the members of the contact pairs 42, 43 and 44 whereby the low and high frequency inputs yield the three respective outputs at the terminals 42, 43 and 44.

The full-wave rectifier 31 gives a balanced D. C. output from an A. C. input. The plurality of interconnected resistances designated in the drawings 10 to 19, inclusive, is a highly complex device with an apparent mode of operation.

Illustrative lter frequencies and bandwidths for the channels are shown below:

Filter frequencies and bandwidths fr A! (cycles/sec.) (cycles/sec.)

97' 50 158 75 246 347 100 448 10B 548 100 649 1m 749 100 849 lm 049 100 1, 040 10S l, 117 1, 287 im' i, 42) 138 1, sei 150 1, 720 102 l, B88 176 2, 070 2, 27o als 2, 481 223 2, 715 243 2. 968 3, 244 285 3, 540 3, 803 3411 4, 212 362 4, 591 304 5, 000 427 5, 446 463 s, ses soo (i, 430 500 0, 930 500 In the above chart, channels I through 2|) cover the low frequency field up to approximately 2481 cycles per second. Signals above this frequency are handled by the remaining channels in the device.

Each single tuned circuit 36 comprises tuned circuit tube sections 50 and 5D with an inductor I shunted by a variable capacitor 52 between the grids of the tube sections. The cathodes of the tube sections 50 and 50' are shunted by an adjustable Q control potentiometer resistor 53, to which a variable tap 54 is adjustably applied. A two section cathode follower tube has one section 60 connected directly with one of the input leads from the bus 35 that also is connected through a resistor 55 with the grid of the tuned circuit first tube section 50 and through a resistor 58 with a B+ current source. 'Ihe grid of the second section 60' of the cathode follower tube is connected directly to the other lead from the bus 35 that is connected directly to the plate of the tuned circuit tube section 50 and through resistor 51 with the B+ power supply. The B+ power supply is applied directly to the plates of the cathode follower tube sections B0 and 60' and through resistors 51 and 58 with the plates of the tuned circuit tube sections 50 and 5D' respectively. The grid of the second section 50 of the tuned circuit tube is connected through a resistor 58 with the second lead from the bus 35 and with the grid of the cathode follower tube section 60. Output from the single tuned circuit 36 is from the two cathodes of the cathode follower tube section E0 and 60' to the full-wave rectier 31.

In the measurement of the excitation function, the plurality of interconnected resistances in the matrix containing the contacts 42, 43 and 44 are connected directly to the rectified output from the filters to provide a measure of the energy envelope as it fluctuates in time, averaged over one three hundredths of a second by means of the resistance-capacitance smoothing circuit shown. The signal generated by this summation provides a representation of the excitation emitted from the vocal cords. The exponential wave rises steeply and decays gradually in synchronization with the opening and the closing of the vocal flaps during phonation.

In order to differentiate between voiced and unvoiced sounds, at the contacts 43 the summation network is divided into two sections. one of which sums the energy envelopes from all the low frequency filters and the other sums the high frequency filters. As the formants generally lie below 3,000 cycles per second the cross-over point has been set at this frequency. The two summation signals are mixed to produce a detection signal proportional to the difference between high and low frequency energy. In the resulting volttage wave form the sibilants produce upward deflection having ragged edges and the voiced sounds produce exponential pulses deflecting downward and periodic at the frequency of the fundamental pitch.

The low frequency summation signal provides a means of measuring the fundamental pitch of voiced sounds which is independent of the pres-I ence or absence of low-frequency components in the original speech signal. This is in contrast with conventional techniques which use a lowpass filter to separate the fundamental pitch.

The pitched signal may be used as excitation for variable resonators in order to generate synthetic formants which can be added to the original speech in synchronism with the original excitation. This becomes more effective if the pitched signal is first passed through a nonlinear device in order to replace the high frequency harmonic structure extending up to 3,000 cyclesper-second which was contained in the original excitation pulses from the vocal cords and progressive attenuation in the vocal resonators, the analyzing filters and the smooth circuits.

The sum of the lowand high-frequency summation signals smoothed in a resistance-capacitance filter provides a running average of the total speech energy weighted by pre-emphasis of the high frequencies before analysis. This signal is used to control an automatic-gain-control amplifier ahead of the speech analyzer in order to provide an automatic normalization of the speech energy level.

What I claim is:

1. A circuit for producing a plurality of electrical signals, comprising a plurality of channels of contiguous frequency bands, and each channel containing a single tuned circuit, a balanced fullwave rectifier rectifying the output from said tuned circuit and having a positive and negative balanced output, a pair of smoothing resistors in the balanced output of each of said recti'lers and the plurality of said channels divided into two positive and negative outputs from low frequency channels and two outputs from high frequency channels, and a plurality of interconnected resistances between the balanced outputs from the pair of end resistors in the low frequency channels and from the end resistors in the high frequency channels leading to a pair of total energy signal terminals with a capacitor therebetween and a negative terminal resistively connected to the negative output from the low frequency channels and resistively connected to the negative output from the high frequency channels and a positive terminal resistively connected to the positive output from the low frequency channels and resistively connected to the positive output from the high frequency channels.

2. A circuit for producing a plurality of electrical signals, comprising a plurality of channels of contiguous frequency bands, and each channel containing a single tuned circuit, a balanced fullwave rectifier rectifying the output from said tuned circuit, and each channel terminating in a pair of resistors receiving their inputs from said balanced rectifier, said plurality of channels divided into low frequency channels and high frequency channels each having an output at a pair of channel leads of opposite polarity, and a plurality of interconnected resistances deriving their input from the pairs of channel leads and leading to a pair of pitch signal terminals with a capacitor therebetween and with a negative terminal resistively connected to the low frequency negative channel lead and with a positive terminal resistively connected to the low frequency positive channel lead.

3. A circuit for producing a plurality of electrical signals, comprising a plurality of channels of contiguous frequency bands, and each channel containing a single tuned circuit, a balanced fullwave rectifier rectifying the output from said tuned circuit, and a pair of resistors at the output end of each channel, said plurality of channels divided into low frequency channels increasing by cycles per second between channels up to 1,000 cycles per second and said high frequency channels increasing between channels in a logarithmic relation above 1000 cycles per second,

a pair of low frequency channel terminal leads of opposite polarity, a pair of high frequency channel terminal leas of opposite polarity, and a plurality of interconnected resistances deriving their input from the two pairs of channel leads and providing three pairs of output circuit terminals inclusive of a pair of voiced unvoiced detector terminals capacitively coupled together and with each separate detector terminal resistively connected to the low frequency channel lead of one polarity and to the high frequency channel lead of opposite polarity.

4. A circuit for producing a plurality of electrical signals comprising a plurality of tuned circuits each having a center frequency differing from the center frequency of each of the other circuits and having a predetermined bandwidth, said tuned circuits having a common input to which a complex electrical signal representative of sounds may be applied, those circuits having a center frequency at or immediately adjacent to the pitch of the sound comprising a low frequency group and said tuned circuits having center frequencies higher than those circuits comprising the W frequency group making up a higher frequency group, means to produce a balanced D. C. signal proportional to the sum of the outputs of those tuned circuits comprising the low frequency group, means to produce a balanced D. C. signal proportional to the sum of the outputs of those tuned circuits making up the high frequency group, a plurality of resistances defining a circuit having two pairs of inputs and three pairs of outputs and so interconnected that when the balanced D. C. signal proportional to the sum of the outputs of the low frequency group of tuned circuits is applied to one of said pair of input terminals and the balanced D. C. signal proportional to the sum of the outputs making up the high frequency group is applied to the other pair of input terminals, the rst pair of output terminals that are capacitively coupled together and each of which first output terminals is resistively connected to both the low frequency and high frequency D. C. signal of the same polarity such that the rst pair of output terminals will produce a total energy signal which is the sum of the two input signals, the second pair of output terminals that are capacitively coupled together and each of which second output terminals is resistively connected to an output from the low frequency group of tuned circuits such that the second pair of output terminals will produce a signal proportional only to the input signal representative of the output of the low frequency group and the third pair of output terminals that are capacitively coupled together and each of which third output terminals is resistively connected to an output terminal of one polarity from the 10W frequency group of tuned circuits and to an output terminal of opposite polarity from the high frequency group of tuned circuits such that the third pair of output terminals will produce a signal proportional to the diierence between the first input signal and the second input signal.

References Cited in the file of this patent UNITED STATES PATENTS Number Name Date 2,575,909 Davis et al Nov. 20, 1951 2,575,910 Mathes Nov. 20, 1951

US296101A 1952-06-27 1952-06-27 Device for extracting the excitation function from speech signals Expired - Lifetime US2691137A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US296101A US2691137A (en) 1952-06-27 1952-06-27 Device for extracting the excitation function from speech signals

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US296101A US2691137A (en) 1952-06-27 1952-06-27 Device for extracting the excitation function from speech signals
US58015256 USRE24670E (en) 1952-06-27 1956-04-23 Device for extracting the excitation function from speech signals

Publications (1)

Publication Number Publication Date
US2691137A true US2691137A (en) 1954-10-05

Family

ID=23140608

Family Applications (1)

Application Number Title Priority Date Filing Date
US296101A Expired - Lifetime US2691137A (en) 1952-06-27 1952-06-27 Device for extracting the excitation function from speech signals

Country Status (1)

Country Link
US (1) US2691137A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3171406A (en) * 1961-09-26 1965-03-02 Melpar Inc Heart beat frequency analyzer
US3198884A (en) * 1960-08-29 1965-08-03 Ibm Sound analyzing system
US3600516A (en) * 1969-06-02 1971-08-17 Ibm Voicing detection and pitch extraction system
US3789399A (en) * 1961-09-28 1974-01-29 Us Air Force Frequency discriminator device
US10278637B2 (en) 2012-08-29 2019-05-07 Brown University Accurate analysis tool and method for the quantitative acoustic assessment of infant cry

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2575910A (en) * 1949-09-21 1951-11-20 Bell Telephone Labor Inc Voice-operated signaling system
US2575909A (en) * 1949-07-01 1951-11-20 Bell Telephone Labor Inc Voice-operated system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2575909A (en) * 1949-07-01 1951-11-20 Bell Telephone Labor Inc Voice-operated system
US2575910A (en) * 1949-09-21 1951-11-20 Bell Telephone Labor Inc Voice-operated signaling system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3198884A (en) * 1960-08-29 1965-08-03 Ibm Sound analyzing system
US3171406A (en) * 1961-09-26 1965-03-02 Melpar Inc Heart beat frequency analyzer
US3789399A (en) * 1961-09-28 1974-01-29 Us Air Force Frequency discriminator device
US3600516A (en) * 1969-06-02 1971-08-17 Ibm Voicing detection and pitch extraction system
US10278637B2 (en) 2012-08-29 2019-05-07 Brown University Accurate analysis tool and method for the quantitative acoustic assessment of infant cry

Similar Documents

Publication Publication Date Title
Kameoka et al. Consonance theory part I: Consonance of dyads
Dickson An acoustic study of nasality
Schroeder Response to “Comments on ‘New Method of Measuring Reverberation Time’”[PW Smith, Jr., J. Acoust. Soc. Am. 38, 359 (L)(1965)]
Oppenheim et al. Homomorphic analysis of speech
Ghitza Auditory nerve representation as a front-end for speech recognition in a noisy environment
Tolonen et al. A computationally efficient multipitch analysis model
Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification
Blackman et al. The measurement of power spectra from the point of view of communications engineering—Part I
Ananthapadmanabha et al. Epoch extraction from linear prediction residual for identification of closed glottis interval
Viswanathan et al. Quantization properties of transmission parameters in linear predictive systems
Fletcher Theory and Instrumentation for Quantitative Measurement for Nasality
US20040165730A1 (en) Segmenting audio signals into auditory events
US3097349A (en) Information processing apparatus
Kim et al. Power-normalized cepstral coefficients (PNCC) for robust speech recognition
US4542525A (en) Method and apparatus for classifying audio signals
Mathews et al. Pitch synchronous analysis of voiced sounds
Chistovich Central auditory processing of peripheral vowel spectra
Summers et al. Effects of noise on speech production: Acoustic and perceptual analyses
US4720863A (en) Method and apparatus for text-independent speaker recognition
Shailer et al. Gap detection and the auditory filter: Phase effects using sinusoidal stimuli
JP4308278B2 (en) The methods and apparatus of the objective speech quality measurement telecommunication device
Kingsbury A direct comparison of the loudness of pure tones
US4074069A (en) Method and apparatus for judging voiced and unvoiced conditions of speech signal
CA2165229C (en) Method and apparatus for characterizing an input signal
US5621854A (en) Method and apparatus for objective speech quality measurements of telecommunication equipment