US3600516A - Voicing detection and pitch extraction system - Google Patents

Voicing detection and pitch extraction system Download PDF

Info

Publication number
US3600516A
US3600516A US829414A US82941469A US3600516A US 3600516 A US3600516 A US 3600516A US 829414 A US829414 A US 829414A US 82941469 A US82941469 A US 82941469A US 3600516 A US3600516 A US 3600516A
Authority
US
United States
Prior art keywords
output
frequency
speech
signal
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US829414A
Inventor
John H King Jr
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US829414A priority Critical patent/US3600516A/en
Priority to FR7015370A priority patent/FR2045772A6/en
Priority to JP45038773A priority patent/JPS508602B1/ja
Priority to DE19702025233 priority patent/DE2025233A1/en
Priority to GB25406/70A priority patent/GB1246079A/en
Application granted granted Critical
Publication of US3600516A publication Critical patent/US3600516A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Definitions

  • a frequency discriminator whose input is provided by UNITED STATES PATENTS the band-pass filtered output of the limiter provides a voltage 2,243,526 5/1941 Dudley 179/1 AS waveform whose special energy distribution is utilized for dis- 2,340,364 2/1944 Bedford 179/1 AS crimination between voiced and unvoiced sounds.
  • the present invention is directed toward voicing detection and voice pitch extraction.
  • the system embodiment employs a plurality of individual band-pass filters each having a bandpass width greater than the highest fundamental frequency of the voice, and sufficient to pass at least two harmonics.
  • a measure of the speech waveform power spectrum periodicity, for all voiced sounds issues as a modulated waveform having a periodicity equal to the voice fundamental.
  • the periodicity of the speech waveform spectrum may be measured with a high degree of accuracy and reliability because the outputs corresponding to voice sounds are highly correlated, whereas random noise, background noises and nonvoiced speech sounds provide complex waveforms that have low correlations.
  • the strength of the signal representative of the voice fundamental is greatly enhanced relative to other components of the modulated waveform by virtue of the signal processing properties of a hard limiter.
  • a voltage level is also rendered representing the voice fundamental pitch by means of a frequency discriminator.
  • the spectral energy distribution of the output of the frequency discriminator is utilized for discriminating between voiced and nonvoiced sounds.
  • the invention is, accordingly, directed to overcome the inability-of prior art systems by being more accurately responsive to a wider variety of speech signals, especially those in which rapid fluctuations in the overall spectral energy distribution occur due to changes in the vocal tract cavity during production of a connected sequence of vowel sounds.
  • the capability of the present invention is generally achieved by means of a system which obtains a measure of the speech power spectrum periodicity signal, by suitable nonlinear signal processing, and renders a substantially DC signal representation of the voice fundamental frequency substantially independent of the absolute amplitude of the voice signal.
  • the primary object of the invention is directed to a voicing detection system and voice fundamental pitch extraction system, which has a higher degree of accuracy and reliability and is less costly than voicing detection and voice pitch tracking systems of the prior art.
  • Another object resides in the capabilities of the present invention to provide more meaningful data at lower costs than the prior art systems.
  • Another object resides in the provision of a highly sophisticated system which derives meaningful voicing data predicated upon detecting and measuring the speech waveform power spectrum periodicity.
  • Yet another object resides in the provision of a voicing detection system which provides a high degree of discrimination between voiced and unvoiced sounds over a wide dynamic range.
  • Still another object resides in the provision of a voice fundamental pitch extraction system capable of operation over a wide dynamic range.
  • FIG. 1 is a schematic representation showing the arrangement of the principal means constituting the voicing detection and pitch extraction system.
  • FIG. 2 is a detailed drawing of the voicing detection and pitch extraction system.
  • FIG. 1 shows a schematic arrangement of the principal means constituting the voicing detection and dividual full-wave rectifiers in a rectifier bank 4.
  • the rectified outputs from the rectifier bank 4 are transmitted to a signal processing network 14 by means of lines 4-1a through 4-1511.
  • the signal processing network 14 the speech waveform is reduced to a substantially pure sinusoidal waveform, the frequency of which is proportionally related to the fundamental pitch of the input speech waveform whenever the latter results from voiced speech.
  • the output signal from the processing network 14 is passed on by line 10-1 to a frequency discriminator 11 which translates the instantaneous frequency of the waveform on line 10-11 into a substantially DC signal, the level of which is indicative of the instantaneous frequency of the voice fundamental pitch during intervals of voiced speech.
  • the outputs of the signal processing network 1 1 and the frequency discriminator 11 are essentially random waveforms. The presence and absence of these random waveforms are utilized to discriminate between intervals of voiced and unvoiced speech as follows.
  • the output of the frequency discriminator is passed by means of line 11-2 to a voice-no voice decision network 13 whose output line 13-1 provides a DC level of a given value when the input line 11-2 issues a pattern of random waveforms which in effect represents the presence of unvoiced sounds in the speech waveform or, a DC level of another value when the input line 11-2 issues said substantially DC signal which in effect represents the presence of voiced sounds in the speech spectrum.
  • the output from the frequency discriminator is substantially a DC level which rises and falls in response to relatively slow variations of the voice pitch.
  • This output is passed on to the input of a low-pass filter 12 by means of line 11-1, the voltage V on line 12-1 representing the output of the filter.
  • the function of this lowpass filter is to remove any small and rapid fluctuations superimposed on the slowly varying level.
  • the voltage V represents the instantaneous fundamental pitch of the voice during intervals of voiced speech.
  • FIG. 2 shows in more detail the preferred embodiment.
  • sound waves entering the system by way of the microphone 1, are converted into electrical waveform signals by means of the transducing properties of the microphone.
  • These electrical waveform signals enter an amplifier 2 by means of line 1a and are amplified to a suitable level.
  • the amplified signals enter a filter bank 3, by way of line 20, comprised of 15 individual filters, three of which are shown, namely, 3-1, 3-2 and 3-15.
  • the filters employed are of the active network type, each having a bandwidth of approximately 300 Hz., the topmost filter 3-1 having a center frequency of 300 Hz.
  • the filter bank 31 thus provides a plurality of orthogonal signal channels, controlled by the contiguously tuned filters, each providing, during an interval of voiced speech, a modulated waveform, the envelope of which has a period equal to the period of the fundamental of the voice. Modulation of the envelope of these waveforms results from the linear combination of waveforms constituting the harmonic components of the fundamental voice frequency.
  • the high degree of periodicity of the power spectrum of voiced speech waveforms results from the fact that the predominate mode of excitation of the vocal tract, during intervals of voiced speech, is by means of the glottal vibrator (vocal cords) which is known to possess a substantially sawtooth variation in the opening of the glottis.
  • the voiced sound waveforms are predominately rich in harmonics that are integer multiples of the fundamental frequency which, for the male voice extends from about 70 Hz. to 150 Hz. in normal speech, and the meaningful spectrum of which extends from about 300 Hz. to somewhat beyond 3,000 Hz.
  • a minimum of two harmonic components will be spanned by the passband of each bandpass filter in the filter bank. 1
  • the modulated waveforms, issuing from the band-pass filters 3-1, 3-2, through 3-15 are passed through full-wave rectifier-s, or detectors, 4-1, 4-2, through 4-15, by way of lines 3-1a, 3-2a, through 3-15a.
  • the function of the rectifiers is to provide a set of signals representative of the time variation of envelopes of the signals issuing from the band-pass filters 3-1, 3-2, through 3-15.
  • the outputs from the rectifiers are transmitted byway of lines 4-1a, 4-2a, through 4-l5a to the fifteen inputs of the signal summing network which includes DC blocking capacitors 5-1a, 5-2a, through 5-15a and resistors S-lb, 5-2b, through 5-15b.
  • the output of the signal summing network is passed through a band-pass filter 6, by means of line 50, having a passband extending from 70 Hz. to 250 Hz. that is more than sufficient to span the frequency range of the male voice fundamental frequency.
  • the output of band-pass filter 6 appearing on line 6-1, during intervals of voiced speech, reflects a fundamental frequency including possible second and third harmonics weaker than the fundamental.
  • band-pass filter 6 During intervals of unvoiced speech the output of band-pass filter 6 is essentially a band of random noise having significant energy confined to the frequency range 70 Hz. to 250 Hz.
  • the signal issuing from the band-pass filter 6 is passed on to a balanced modulator 7 by way of line 6-1.
  • the function of thebalanced modulator 7 is to shift the frequency range of the signal issuing from band-pass filter 6 to a considerably higher range of frequencies thereby yielding a frequency-translated signal whose percentage bandwidth is quite narrow with respect to its expected frequency range thus resulting in a reduction in the percentage bandwidth of the signal.
  • the desired action is achieved by driving the balanced modulator 7 with a reference signal provided by local oscillator 6a connected by means of line 6a-1, the local oscillator frequency being typically 15 kHz.
  • the output of the balanced modulator consists of a double sideband suppressed carrier modulated waveform which is passed on to band-pass filter 8 by means of line 7-1.
  • band-pass filter 8 The function of band-pass filter 8 is to select either the upper or lower sideband signal and reject the other sideband signal and any residual carrier signal at the local oscillator frequency of l5 kHz. which may be present in the output from the balanced modulator due to slight imbalance.
  • the band-pass filter 8 would have a passband extending from l5,070 kHz. to l5,250 kHz., when designed to select the upper sideband signal.
  • the signal output of band-pass filter 8 is substantially a frequency translated version of the signal issuing from band-pass filter 6, but with the distinguishing property of being narrow band.
  • This signal is passed on to a hard limiter 9 by way of line 8-1.
  • the output of the limiter 9 is passed on, by way of line 9-1, to a second band-pass filter 10, having essentially the same passband as the filter 8.
  • limiter 9 and band-pass filter 10 The combined action of limiter 9 and band-pass filter 10 is such that the signal issuing from the band-pass filter 10 will be of substantially constant amplitude with an average frequency linearly related to the voice fundamental during intervals of voiced speech.
  • the output of band-pass filter 10 is substantially a random noise signal with a significant energy spectrum extending from roughly 15,070 kHz. to 15,250 kHz.
  • This desirable signal processing property just described is the result of signal capture phenomenon exhibited by a hard limiting process followed by band-pass filtering.
  • the output of filter 10 is passed on to a frequency discriminator 11 by way of line 10-1.
  • discriminator 11 One function of discriminator 11 is to detect the quasi instantaneous frequency of the signal issuing from filter 10 during intervals of voiced speech. During intervals of unvoiced speech the output of the frequency discriminator 11 is substantially a random noise signal with significant energy content extending from about 0 Hz. to
  • the output of the frequency discriminator is passed to a low-pass filter 12 by way of line 11-1.
  • the lowpass filter 12 by virtue of having a cutoff frequency of 15 Hz., serves to remove minor high frequency fluctuations from the output of the frequency discriminator so that the output V, of low-pass filter 12, on line 12-1, provides a voltage level representation of the quasi instantaneous (short term average) voice fundamental frequencyduring intervals of voiced speech.
  • the output signal from the frequency discriminator is of random character.
  • the distinct difference in the character of the signals issuing from the frequency discriminator during intervals of voiced speech and intervals of unvoiced speech, is utilized by the voice-no voice decision network 13 to provide appropriate outputs in the following manner.
  • the output from the frequency discriminator 11 is passes on to a high-pass filter 17, by way of line 11-2, having a cutoff frequency of about 50 Hz.
  • a high-pass filter 17 having a cutoff frequency of about 50 Hz.
  • the output signal from the frequency discriminator has a spectral energy distribution confined to frequencies below 50 Hz. while the output of high pass filter 17 is substantially zero.
  • the signal output from the frequency discriminator is of substantial amplitude and of random character with a spectral energy distribution concentrated in the frequency range above 50 Hz.
  • the output of high-pass filter 17 is passed on to rectifier 15 by way of line 17-1 and the output from rectifier 15 is then passed on, by way of line 15-1, to low-pass filter 16 having a cutoff frequency of 15 Hz. From the foregoing it is seen that the output from low-pass filter 16 will be substantially a DC level during intervals of unvoiced speech that is different from the DC level output during intervals of voiced speech. These different DC signal levels are utilized in a decision rendering function by passing the output of low-pass filter 16 to a threshold detector circuit 21, by way of line 16-1.
  • the threshold detector circuit which affects the actual decision, is comprised of a high gain differential DC amplifier 20, input resistor 18, and positive feedback resistor 19.
  • the detection threshold (i.e., the decision threshold) is controlled by the level of reference voltage V, applied to the positive input of the differential amplifier 20.
  • the actual voice or no voice decision is indicated by which of two possible signal levels exists at the output V, of the threshold detector circuit line 13-1.
  • a voicing detection apparatus for detecting voiced sounds present in speech waveforms comprising:
  • a filter bank constituted of a plurality of contiguously tuned band-pass filters responsive to said speech waveforms to provide modulated waveforms, the periodicity of the envelope of each of the latter waveforms corresponding to the periodicity of the voice fundamental;
  • a processing network responsive to said time variant signals to provide a substantially pure sinusoidal waveform whose frequency is proportionally related to the fundamental pitch of voiced sounds in the speech waveforms
  • said processing network comprising a summing network connected to the output of said detectors, a broad bandpass filter connected to the output of said summing network, and a modulator, connected to the output of said broad band-pass filter, to provide a frequency translated signal whose percentage bandwidth is reduced in relation to its expected frequency range;
  • a frequency discriminator interconnected to the output of said network, providing a substantially DC signal output, the level of which being a function of the instantaneous voice pitch during voiced speech, and
  • a decision network interconnection to the frequency discriminator output and providing a DC signal level of one value in response to signals representing unvoiced sounds and a DC signal level of another value in response to signals representing voiced sounds.
  • the voicing detection apparatus as in claim 1 wherein said processing network further includes a limiter for enhancing the ratio of the voice fundamental frequency to the harmonic frequencies, and a narrow band-pass filter connected to said limiter for rejecting unwanted higher harmonic frequenones.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

Voicing detection and pitch extraction from speech sounds are achieved by means of an embodiment including a plurality of bandpass filters each having sufficient passing bandwidth to pass at least two harmonics of the fundamental voice frequency, whereby each provides a signal for all voiced sounds in the form of modulated waves, the envelopes of which having a periodicity equal to the voice fundamental. This periodicity is further enhanced by means of a hard limiter. A frequency discriminator whose input is provided by the band-pass filtered output of the limiter provides a voltage waveform whose special energy distribution is utilized for discrimination between voiced and unvoiced sounds.

Description

United States Patent 1 1 3,600,516
[72] Inventor John H. K1ng,.lr. 2,561,478 7/1951 Mitchell 179/1 AS Endwell,N.Y. 2,691,137 10/1954 Smith 179/1 AS [211 App]. No. 829,414 2,927,969 3/1960 Miller.... 1 179/1 AS [22] Filed June 2, 1969 3,488,446 1/1970 Miller 179/1 AS [4'5] Patmed I M M Primary Examiner Kathleen H. Claffy [7H Asslgliec us new a es Assistant lbramirler---Jon Bradford Lcaheey Auumvys -Hnnllm1md .lmicln aml Amlrcw lnrns Armonk, NY. I
E T! N AND PITCH EXTRACTION I 154} BET C o ABSTRACT: Voicing detection and pitch extraction from speech sounds are achieved by means of an embodiment in- 3 Claims, 2 Drawing Figs.
eluding a plurality of band-pass filters each having sufficient U.S. a ing to pass at'least two harmonics the funda, [51] Int. Cl G101 l/04 mental voice frequency whereby each provides a Signal f n [50] Field of Search 179/1 AS, 1 voiced sounds i h form f dulated waves, the envelopes 1555 R of which having a periodicity equal to the voice fundamental. R f Ci ed This periodicity is further enhanced by meansof a hard [56] e t limiter. A frequency discriminator whose input is provided by UNITED STATES PATENTS the band-pass filtered output of the limiter provides a voltage 2,243,526 5/1941 Dudley 179/1 AS waveform whose special energy distribution is utilized for dis- 2,340,364 2/1944 Bedford 179/1 AS crimination between voiced and unvoiced sounds.
1 I 44 1' '1 I 0 15001. 1 N V 1 5 111 1 450 r HO I 5-21, 1 I h} 1 1 D1 1 P 4 2 r 2 I 14-20 I 1 1 FILTER RECTIFIER 6 1 1 1 7-1 AMP I H B P 4h MODJLATOR BANK BANK FILTER 1-BALANCED1 I 1 1 1 611-1 I 3-15 I 1-15.l 5c ems I 1 5-15.: 4-15 1 050 I I Dr- 1 5-15b A 5-150 l "I 1 11 10-1 r 10 1 1H 9 1----- I I 1 1 LOW -PASS I, rarourucv 11mm PASS 5: mm BAND PASS I 1 FILTER msc. I FILTER 1 FILTER I V I .J 1 1 -1 RECTIFI ER 17-1 1 M I LOW PASS I men PASS I g FILTER 1 I5 FILTER 1 1 1 L .J
PATENTED AUG] 1197: 350051 AMP FIG. I
FIG. 2
FILTER BANK RECTIFIER BANK SIGNAL PROCESSING NETWORK FREQUENCY i DISCRIMINATOR LOW PASS FILTER g volci N0 VOICE RECTIFIER DECISION NETWORKS FILTER AGE/VT JOHN H. KING JR.
VOICING DETECTION AND PITCH EXTRACTION SYSTEM BACKGROUND OF THE INVENTION The present invention is directed toward voicing detection and voice pitch extraction. The system embodiment employs a plurality of individual band-pass filters each having a bandpass width greater than the highest fundamental frequency of the voice, and sufficient to pass at least two harmonics. By virtue of the present invention, a measure of the speech waveform power spectrum periodicity, for all voiced sounds, issues as a modulated waveform having a periodicity equal to the voice fundamental. As a consequence, the periodicity of the speech waveform spectrum may be measured with a high degree of accuracy and reliability because the outputs corresponding to voice sounds are highly correlated, whereas random noise, background noises and nonvoiced speech sounds provide complex waveforms that have low correlations. In addition, the strength of the signal representative of the voice fundamental is greatly enhanced relative to other components of the modulated waveform by virtue of the signal processing properties of a hard limiter. A voltage level is also rendered representing the voice fundamental pitch by means of a frequency discriminator. Finally the spectral energy distribution of the output of the frequency discriminator is utilized for discriminating between voiced and nonvoiced sounds.
The invention is, accordingly, directed to overcome the inability-of prior art systems by being more accurately responsive to a wider variety of speech signals, especially those in which rapid fluctuations in the overall spectral energy distribution occur due to changes in the vocal tract cavity during production of a connected sequence of vowel sounds. The capability of the present invention is generally achieved by means of a system which obtains a measure of the speech power spectrum periodicity signal, by suitable nonlinear signal processing, and renders a substantially DC signal representation of the voice fundamental frequency substantially independent of the absolute amplitude of the voice signal.
OBJECTS The primary object of the invention is directed to a voicing detection system and voice fundamental pitch extraction system, which has a higher degree of accuracy and reliability and is less costly than voicing detection and voice pitch tracking systems of the prior art.
Another object resides in the capabilities of the present invention to provide more meaningful data at lower costs than the prior art systems.
Another object resides in the provision of a highly sophisticated system which derives meaningful voicing data predicated upon detecting and measuring the speech waveform power spectrum periodicity.
Yet another object resides in the provision of a voicing detection system which provides a high degree of discrimination between voiced and unvoiced sounds over a wide dynamic range.
Still another object resides in the provision of a voice fundamental pitch extraction system capable of operation over a wide dynamic range.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more detailed description of the preferred embodiment of the invention as illustrated in the accompanying drawings.
In the drawings:
FIG. 1 is a schematic representation showing the arrangement of the principal means constituting the voicing detection and pitch extraction system.
FIG. 2 is a detailed drawing of the voicing detection and pitch extraction system.
A general understanding of the present invention may now be had from FIG. 1 which shows a schematic arrangement of the principal means constituting the voicing detection and dividual full-wave rectifiers in a rectifier bank 4. The rectified outputs from the rectifier bank 4 are transmitted to a signal processing network 14 by means of lines 4-1a through 4-1511. By means of the signal processing network 14 the speech waveform is reduced to a substantially pure sinusoidal waveform, the frequency of which is proportionally related to the fundamental pitch of the input speech waveform whenever the latter results from voiced speech. The output signal from the processing network 14 is passed on by line 10-1 to a frequency discriminator 11 which translates the instantaneous frequency of the waveform on line 10-11 into a substantially DC signal, the level of which is indicative of the instantaneous frequency of the voice fundamental pitch during intervals of voiced speech. During intervals of no speech or unvoiced speech typical of consonant and fricative sounds, the outputs of the signal processing network 1 1 and the frequency discriminator 11 are essentially random waveforms. The presence and absence of these random waveforms are utilized to discriminate between intervals of voiced and unvoiced speech as follows.
The output of the frequency discriminator is passed by means of line 11-2 to a voice-no voice decision network 13 whose output line 13-1 provides a DC level of a given value when the input line 11-2 issues a pattern of random waveforms which in effect represents the presence of unvoiced sounds in the speech waveform or, a DC level of another value when the input line 11-2 issues said substantially DC signal which in effect represents the presence of voiced sounds in the speech spectrum.
During intervals of voiced speech, the output from the frequency discriminator is substantially a DC level which rises and falls in response to relatively slow variations of the voice pitch. This output is passed on to the input of a low-pass filter 12 by means of line 11-1, the voltage V on line 12-1 representing the output of the filter. The function of this lowpass filter is to remove any small and rapid fluctuations superimposed on the slowly varying level. Thus the voltage V represents the instantaneous fundamental pitch of the voice during intervals of voiced speech.
To appreciate more fully the manner in which the bank of rectifiers and the signal processing network combine to extract signals representative of the voice fundamental pitch, reference is invited to FIG. 2 which shows in more detail the preferred embodiment. In FIG. 2, sound waves entering the system, by way of the microphone 1, are converted into electrical waveform signals by means of the transducing properties of the microphone. These electrical waveform signals enter an amplifier 2 by means of line 1a and are amplified to a suitable level. The amplified signals enter a filter bank 3, by way of line 20, comprised of 15 individual filters, three of which are shown, namely, 3-1, 3-2 and 3-15. The filters employed are of the active network type, each having a bandwidth of approximately 300 Hz., the topmost filter 3-1 having a center frequency of 300 Hz. and the lowermost filter 3-15 a center frequency of 3,000 Hz. The filter bank 31 thus provides a plurality of orthogonal signal channels, controlled by the contiguously tuned filters, each providing, during an interval of voiced speech, a modulated waveform, the envelope of which has a period equal to the period of the fundamental of the voice. Modulation of the envelope of these waveforms results from the linear combination of waveforms constituting the harmonic components of the fundamental voice frequency. The high degree of periodicity of the power spectrum of voiced speech waveforms results from the fact that the predominate mode of excitation of the vocal tract, during intervals of voiced speech, is by means of the glottal vibrator (vocal cords) which is known to possess a substantially sawtooth variation in the opening of the glottis. During these intervals the voiced sound waveforms are predominately rich in harmonics that are integer multiples of the fundamental frequency which, for the male voice extends from about 70 Hz. to 150 Hz. in normal speech, and the meaningful spectrum of which extends from about 300 Hz. to somewhat beyond 3,000 Hz. Thus it is seen, that in this particular embodiment, a minimum of two harmonic components (for the highest fundamental pitch) will be spanned by the passband of each bandpass filter in the filter bank. 1
The modulated waveforms, issuing from the band-pass filters 3-1, 3-2, through 3-15 are passed through full-wave rectifier-s, or detectors, 4-1, 4-2, through 4-15, by way of lines 3-1a, 3-2a, through 3-15a. The function of the rectifiers is to provide a set of signals representative of the time variation of envelopes of the signals issuing from the band-pass filters 3-1, 3-2, through 3-15. The outputs from the rectifiers are transmitted byway of lines 4-1a, 4-2a, through 4-l5a to the fifteen inputs of the signal summing network which includes DC blocking capacitors 5-1a, 5-2a, through 5-15a and resistors S-lb, 5-2b, through 5-15b. The output of the signal summing network is passed through a band-pass filter 6, by means of line 50, having a passband extending from 70 Hz. to 250 Hz. that is more than sufficient to span the frequency range of the male voice fundamental frequency. The output of band-pass filter 6 appearing on line 6-1, during intervals of voiced speech, reflects a fundamental frequency including possible second and third harmonics weaker than the fundamental.
During intervals of unvoiced speech the output of band-pass filter 6 is essentially a band of random noise having significant energy confined to the frequency range 70 Hz. to 250 Hz.
The signal issuing from the band-pass filter 6 is passed on to a balanced modulator 7 by way of line 6-1. The function of thebalanced modulator 7 is to shift the frequency range of the signal issuing from band-pass filter 6 to a considerably higher range of frequencies thereby yielding a frequency-translated signal whose percentage bandwidth is quite narrow with respect to its expected frequency range thus resulting in a reduction in the percentage bandwidth of the signal. The desired action is achieved by driving the balanced modulator 7 with a reference signal provided by local oscillator 6a connected by means of line 6a-1, the local oscillator frequency being typically 15 kHz. The output of the balanced modulator consists of a double sideband suppressed carrier modulated waveform which is passed on to band-pass filter 8 by means of line 7-1. The function of band-pass filter 8 is to select either the upper or lower sideband signal and reject the other sideband signal and any residual carrier signal at the local oscillator frequency of l5 kHz. which may be present in the output from the balanced modulator due to slight imbalance. Typically, the band-pass filter 8 would have a passband extending from l5,070 kHz. to l5,250 kHz., when designed to select the upper sideband signal. The signal output of band-pass filter 8 is substantially a frequency translated version of the signal issuing from band-pass filter 6, but with the distinguishing property of being narrow band. This signal is passed on to a hard limiter 9 by way of line 8-1. The output of the limiter 9 is passed on, by way of line 9-1, to a second band-pass filter 10, having essentially the same passband as the filter 8.
The combined action of limiter 9 and band-pass filter 10 is such that the signal issuing from the band-pass filter 10 will be of substantially constant amplitude with an average frequency linearly related to the voice fundamental during intervals of voiced speech. During intervals of unvoicedspeech the output of band-pass filter 10 is substantially a random noise signal with a significant energy spectrum extending from roughly 15,070 kHz. to 15,250 kHz. This desirable signal processing property just described is the result of signal capture phenomenon exhibited by a hard limiting process followed by band-pass filtering.
The output of filter 10 is passed on to a frequency discriminator 11 by way of line 10-1. One function of discriminator 11 is to detect the quasi instantaneous frequency of the signal issuing from filter 10 during intervals of voiced speech. During intervals of unvoiced speech the output of the frequency discriminator 11 is substantially a random noise signal with significant energy content extending from about 0 Hz. to
' around 180 Hz. The output of the frequency discriminator is passed to a low-pass filter 12 by way of line 11-1. The lowpass filter 12, by virtue of having a cutoff frequency of 15 Hz., serves to remove minor high frequency fluctuations from the output of the frequency discriminator so that the output V, of low-pass filter 12, on line 12-1, provides a voltage level representation of the quasi instantaneous (short term average) voice fundamental frequencyduring intervals of voiced speech.
During the presence of speech sounds, or any other sounds for that matter, that are not harmonic in character, but instead have a structure more akin to random noise, the output signal from the frequency discriminator is of random character. The distinct difference in the character of the signals issuing from the frequency discriminator during intervals of voiced speech and intervals of unvoiced speech, is utilized by the voice-no voice decision network 13 to provide appropriate outputs in the following manner.
The output from the frequency discriminator 11 is passe on to a high-pass filter 17, by way of line 11-2, having a cutoff frequency of about 50 Hz. During voiced speech the output signal from the frequency discriminator has a spectral energy distribution confined to frequencies below 50 Hz. while the output of high pass filter 17 is substantially zero. During intervals of unvoiced speech the signal output from the frequency discriminator is of substantial amplitude and of random character with a spectral energy distribution concentrated in the frequency range above 50 Hz.
The output of high-pass filter 17 is passed on to rectifier 15 by way of line 17-1 and the output from rectifier 15 is then passed on, by way of line 15-1, to low-pass filter 16 having a cutoff frequency of 15 Hz. From the foregoing it is seen that the output from low-pass filter 16 will be substantially a DC level during intervals of unvoiced speech that is different from the DC level output during intervals of voiced speech. These different DC signal levels are utilized in a decision rendering function by passing the output of low-pass filter 16 to a threshold detector circuit 21, by way of line 16-1. The threshold detector circuit, which affects the actual decision, is comprised of a high gain differential DC amplifier 20, input resistor 18, and positive feedback resistor 19. The detection threshold (i.e., the decision threshold) is controlled by the level of reference voltage V, applied to the positive input of the differential amplifier 20. The actual voice or no voice decision is indicated by which of two possible signal levels exists at the output V, of the threshold detector circuit line 13-1.
While the invention has been particularly shown and described with reference to a preferred embodiment thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.
What I claim is:
1. A voicing detection apparatus for detecting voiced sounds present in speech waveforms comprising:
a filter bank constituted of a plurality of contiguously tuned band-pass filters responsive to said speech waveforms to provide modulated waveforms, the periodicity of the envelope of each of the latter waveforms corresponding to the periodicity of the voice fundamental;
a plurality of detectors each responsive to a specific modulated waveform to provide appropriate linearly summed time variant signals;
a processing network responsive to said time variant signals to provide a substantially pure sinusoidal waveform whose frequency is proportionally related to the fundamental pitch of voiced sounds in the speech waveforms,
. said processing network comprising a summing network connected to the output of said detectors, a broad bandpass filter connected to the output of said summing network, and a modulator, connected to the output of said broad band-pass filter, to provide a frequency translated signal whose percentage bandwidth is reduced in relation to its expected frequency range;
a frequency discriminator, interconnected to the output of said network, providing a substantially DC signal output, the level of which being a function of the instantaneous voice pitch during voiced speech, and
a decision network interconnection to the frequency discriminator output and providing a DC signal level of one value in response to signals representing unvoiced sounds and a DC signal level of another value in response to signals representing voiced sounds.
I 2. The voicing detection apparatus as in claim 1 wherein said processing network further includes a limiter for enhancing the ratio of the voice fundamental frequency to the harmonic frequencies, and a narrow band-pass filter connected to said limiter for rejecting unwanted higher harmonic frequenones.
3. The voicing detection apparatus as in claim 1 wherein said contiguously tuned band-pass filters are each of the active network type having a passband width of approximately 300 Hz., and said filter bank is adapted to accommodate a spectral bandwidth extending from approximately 300 Hz. to approximately 3,000 Hz.

Claims (3)

1. A voicing detection apparatus for detecting voiced sounds present in speech waveforms comprising: a filter bank constituted of a plurality of contiguously tuned band-pass filters responsive to said speech waveforms to provide modulated waveforms, the periodicity of the envelope of each of the latter waveforms corresponding to the periodicity of the voice fundamental; a plurality of detectors each responsive to a specific modulated waveform to provide appropriate linearly summed time variant signals; a processing network responsive to said time variant signals to provide a substantially pure sinusoidal waveform whose frequency is proportionally related to the fundamental pitch of voiced sounds in the speech waveforms, said processing network comprising a summing network connected to the output of said detectors, a broad band-pass filter connected to the output of said summing network, and a modulator, connected to the output of said broad band-pass filter, to provide a frequency translated signal whose percentage bandwidth is reduced in relation to its expected frequency range; a frequency discriminator, interconnected to the output of said network, providing a substantially DC signal output, the level of which being a function of the instantaneous voice pitch during voiced speech, and a decision network interconnection to the frequency discriminator output and providing a DC signal level of one value in response to signals representing unvoiced sounds and a DC signal level of another value in response to signals representing voiced sounds.
2. The voicing detection apparatus as in claim 1 wherein said processing network further includes a limiter for enhancing the ratio of the voice fundamental frequency to the harmonic frequencies, and a narrow band-pass filter connected to said limiter for rejecting unwanted higher harmonic frequencies.
3. The voicing detection apparatus as in claim 1 wherein said contiguously tuned band-pass filters are each of the active network type having a passband width of approximately 300 Hz., and said filter bank is adapted to accommodate a spectral bandwidth extending from approximately 300 Hz. to approximately 3,000 Hz.
US829414A 1966-09-29 1969-06-02 Voicing detection and pitch extraction system Expired - Lifetime US3600516A (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US829414A US3600516A (en) 1969-06-02 1969-06-02 Voicing detection and pitch extraction system
FR7015370A FR2045772A6 (en) 1966-09-29 1970-04-28 voice detection system
JP45038773A JPS508602B1 (en) 1969-06-02 1970-05-08
DE19702025233 DE2025233A1 (en) 1966-09-29 1970-05-23 Arrangement for determining the voiced components of speech sounds
GB25406/70A GB1246079A (en) 1969-06-02 1970-05-27 Speech analysis system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US829414A US3600516A (en) 1969-06-02 1969-06-02 Voicing detection and pitch extraction system

Publications (1)

Publication Number Publication Date
US3600516A true US3600516A (en) 1971-08-17

Family

ID=25254478

Family Applications (1)

Application Number Title Priority Date Filing Date
US829414A Expired - Lifetime US3600516A (en) 1966-09-29 1969-06-02 Voicing detection and pitch extraction system

Country Status (3)

Country Link
US (1) US3600516A (en)
JP (1) JPS508602B1 (en)
GB (1) GB1246079A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970441A (en) * 1997-08-25 1999-10-19 Telefonaktiebolaget Lm Ericsson Detection of periodicity information from an audio signal
US5986198A (en) * 1995-01-18 1999-11-16 Ivl Technologies Ltd. Method and apparatus for changing the timbre and/or pitch of audio signals
US6046395A (en) * 1995-01-18 2000-04-04 Ivl Technologies Ltd. Method and apparatus for changing the timbre and/or pitch of audio signals
US6336092B1 (en) * 1997-04-28 2002-01-01 Ivl Technologies Ltd Targeted vocal transformation
US20020073417A1 (en) * 2000-09-29 2002-06-13 Tetsujiro Kondo Audience response determination apparatus, playback output control system, audience response determination method, playback output control method, and recording media
US20110084953A1 (en) * 2009-10-12 2011-04-14 Chia-Yu Lee Organic light emitting display having a power saving mechanism

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2243526A (en) * 1940-03-16 1941-05-27 Bell Telephone Labor Inc Production of artificial speech
US2340364A (en) * 1942-08-22 1944-02-01 Rca Corp Audio transmission circuit
US2561478A (en) * 1948-05-28 1951-07-24 Bell Telephone Labor Inc Analyzing system for determining the fundamental frequency of a complex wave
US2691137A (en) * 1952-06-27 1954-10-05 Us Air Force Device for extracting the excitation function from speech signals
US2927969A (en) * 1954-10-20 1960-03-08 Bell Telephone Labor Inc Determination of pitch frequency of complex wave
US3488446A (en) * 1966-10-31 1970-01-06 Bell Telephone Labor Inc Apparatus for deriving pitch information from a speech wave

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2243526A (en) * 1940-03-16 1941-05-27 Bell Telephone Labor Inc Production of artificial speech
US2340364A (en) * 1942-08-22 1944-02-01 Rca Corp Audio transmission circuit
US2561478A (en) * 1948-05-28 1951-07-24 Bell Telephone Labor Inc Analyzing system for determining the fundamental frequency of a complex wave
US2691137A (en) * 1952-06-27 1954-10-05 Us Air Force Device for extracting the excitation function from speech signals
US2927969A (en) * 1954-10-20 1960-03-08 Bell Telephone Labor Inc Determination of pitch frequency of complex wave
US3488446A (en) * 1966-10-31 1970-01-06 Bell Telephone Labor Inc Apparatus for deriving pitch information from a speech wave

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5986198A (en) * 1995-01-18 1999-11-16 Ivl Technologies Ltd. Method and apparatus for changing the timbre and/or pitch of audio signals
US6046395A (en) * 1995-01-18 2000-04-04 Ivl Technologies Ltd. Method and apparatus for changing the timbre and/or pitch of audio signals
US6336092B1 (en) * 1997-04-28 2002-01-01 Ivl Technologies Ltd Targeted vocal transformation
US5970441A (en) * 1997-08-25 1999-10-19 Telefonaktiebolaget Lm Ericsson Detection of periodicity information from an audio signal
US20020073417A1 (en) * 2000-09-29 2002-06-13 Tetsujiro Kondo Audience response determination apparatus, playback output control system, audience response determination method, playback output control method, and recording media
US7555766B2 (en) * 2000-09-29 2009-06-30 Sony Corporation Audience response determination
US20110084953A1 (en) * 2009-10-12 2011-04-14 Chia-Yu Lee Organic light emitting display having a power saving mechanism

Also Published As

Publication number Publication date
GB1246079A (en) 1971-09-15
JPS508602B1 (en) 1975-04-05

Similar Documents

Publication Publication Date Title
JP2948739B2 (en) Karaoke system user's song scorer
EP0091466B1 (en) Method and apparatus for registering the use of a television receiver in connection with at least one video tape player
US4039754A (en) Speech analyzer
ES450719A1 (en) Arrangement for recognizing sounds
US4091237A (en) Bi-Phase harmonic histogram pitch extractor
US3617636A (en) Pitch detection apparatus
US3855417A (en) Method and apparatus for phonation analysis lending to valid truth/lie decisions by spectral energy region comparison
US3600516A (en) Voicing detection and pitch extraction system
JPH04150252A (en) Device for identifying voice/data in voice band
US4164626A (en) Pitch detector and method thereof
US2810787A (en) Compressed frequency communication system
US3592969A (en) Speech analyzing apparatus
US4506379A (en) Method and system for discriminating human voice signal
US2857465A (en) Vocoder transmission system
DE69132081T2 (en) Distinguishing between information and noise in a communication signal
US4044204A (en) Device for separating the voiced and unvoiced portions of speech
US3838217A (en) Amplitude regulator means for separating frequency variations and amplitude variations of electrical signals
Miller Performance characteristics of an experimental harmonic identification pitch extraction (HIPEX) system
US3509281A (en) Voicing detection system
US3196212A (en) Local amplitude detector
JPS5491007A (en) Audio recognition unit
USRE24670E (en) Device for extracting the excitation function from speech signals
US3507999A (en) Speech-noise discriminator
US2561478A (en) Analyzing system for determining the fundamental frequency of a complex wave
KR930004739B1 (en) Voice pitch detecting device