US3600516A - Voicing detection and pitch extraction system - Google Patents
Voicing detection and pitch extraction system Download PDFInfo
- Publication number
- US3600516A US3600516A US829414A US82941469A US3600516A US 3600516 A US3600516 A US 3600516A US 829414 A US829414 A US 829414A US 82941469 A US82941469 A US 82941469A US 3600516 A US3600516 A US 3600516A
- Authority
- US
- United States
- Prior art keywords
- output
- frequency
- speech
- signal
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 18
- 238000000605 extraction Methods 0.000 title abstract description 10
- 230000003595 spectral effect Effects 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 5
- 230000002708 enhancing effect Effects 0.000 claims description 2
- 238000001228 spectrum Methods 0.000 description 8
- 230000009471 action Effects 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 210000004704 glottis Anatomy 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 238000000034 method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000002463 transducing effect Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
Definitions
- a frequency discriminator whose input is provided by UNITED STATES PATENTS the band-pass filtered output of the limiter provides a voltage 2,243,526 5/1941 Dudley 179/1 AS waveform whose special energy distribution is utilized for dis- 2,340,364 2/1944 Bedford 179/1 AS crimination between voiced and unvoiced sounds.
- the present invention is directed toward voicing detection and voice pitch extraction.
- the system embodiment employs a plurality of individual band-pass filters each having a bandpass width greater than the highest fundamental frequency of the voice, and sufficient to pass at least two harmonics.
- a measure of the speech waveform power spectrum periodicity, for all voiced sounds issues as a modulated waveform having a periodicity equal to the voice fundamental.
- the periodicity of the speech waveform spectrum may be measured with a high degree of accuracy and reliability because the outputs corresponding to voice sounds are highly correlated, whereas random noise, background noises and nonvoiced speech sounds provide complex waveforms that have low correlations.
- the strength of the signal representative of the voice fundamental is greatly enhanced relative to other components of the modulated waveform by virtue of the signal processing properties of a hard limiter.
- a voltage level is also rendered representing the voice fundamental pitch by means of a frequency discriminator.
- the spectral energy distribution of the output of the frequency discriminator is utilized for discriminating between voiced and nonvoiced sounds.
- the invention is, accordingly, directed to overcome the inability-of prior art systems by being more accurately responsive to a wider variety of speech signals, especially those in which rapid fluctuations in the overall spectral energy distribution occur due to changes in the vocal tract cavity during production of a connected sequence of vowel sounds.
- the capability of the present invention is generally achieved by means of a system which obtains a measure of the speech power spectrum periodicity signal, by suitable nonlinear signal processing, and renders a substantially DC signal representation of the voice fundamental frequency substantially independent of the absolute amplitude of the voice signal.
- the primary object of the invention is directed to a voicing detection system and voice fundamental pitch extraction system, which has a higher degree of accuracy and reliability and is less costly than voicing detection and voice pitch tracking systems of the prior art.
- Another object resides in the capabilities of the present invention to provide more meaningful data at lower costs than the prior art systems.
- Another object resides in the provision of a highly sophisticated system which derives meaningful voicing data predicated upon detecting and measuring the speech waveform power spectrum periodicity.
- Yet another object resides in the provision of a voicing detection system which provides a high degree of discrimination between voiced and unvoiced sounds over a wide dynamic range.
- Still another object resides in the provision of a voice fundamental pitch extraction system capable of operation over a wide dynamic range.
- FIG. 1 is a schematic representation showing the arrangement of the principal means constituting the voicing detection and pitch extraction system.
- FIG. 2 is a detailed drawing of the voicing detection and pitch extraction system.
- FIG. 1 shows a schematic arrangement of the principal means constituting the voicing detection and dividual full-wave rectifiers in a rectifier bank 4.
- the rectified outputs from the rectifier bank 4 are transmitted to a signal processing network 14 by means of lines 4-1a through 4-1511.
- the signal processing network 14 the speech waveform is reduced to a substantially pure sinusoidal waveform, the frequency of which is proportionally related to the fundamental pitch of the input speech waveform whenever the latter results from voiced speech.
- the output signal from the processing network 14 is passed on by line 10-1 to a frequency discriminator 11 which translates the instantaneous frequency of the waveform on line 10-11 into a substantially DC signal, the level of which is indicative of the instantaneous frequency of the voice fundamental pitch during intervals of voiced speech.
- the outputs of the signal processing network 1 1 and the frequency discriminator 11 are essentially random waveforms. The presence and absence of these random waveforms are utilized to discriminate between intervals of voiced and unvoiced speech as follows.
- the output of the frequency discriminator is passed by means of line 11-2 to a voice-no voice decision network 13 whose output line 13-1 provides a DC level of a given value when the input line 11-2 issues a pattern of random waveforms which in effect represents the presence of unvoiced sounds in the speech waveform or, a DC level of another value when the input line 11-2 issues said substantially DC signal which in effect represents the presence of voiced sounds in the speech spectrum.
- the output from the frequency discriminator is substantially a DC level which rises and falls in response to relatively slow variations of the voice pitch.
- This output is passed on to the input of a low-pass filter 12 by means of line 11-1, the voltage V on line 12-1 representing the output of the filter.
- the function of this lowpass filter is to remove any small and rapid fluctuations superimposed on the slowly varying level.
- the voltage V represents the instantaneous fundamental pitch of the voice during intervals of voiced speech.
- FIG. 2 shows in more detail the preferred embodiment.
- sound waves entering the system by way of the microphone 1, are converted into electrical waveform signals by means of the transducing properties of the microphone.
- These electrical waveform signals enter an amplifier 2 by means of line 1a and are amplified to a suitable level.
- the amplified signals enter a filter bank 3, by way of line 20, comprised of 15 individual filters, three of which are shown, namely, 3-1, 3-2 and 3-15.
- the filters employed are of the active network type, each having a bandwidth of approximately 300 Hz., the topmost filter 3-1 having a center frequency of 300 Hz.
- the filter bank 31 thus provides a plurality of orthogonal signal channels, controlled by the contiguously tuned filters, each providing, during an interval of voiced speech, a modulated waveform, the envelope of which has a period equal to the period of the fundamental of the voice. Modulation of the envelope of these waveforms results from the linear combination of waveforms constituting the harmonic components of the fundamental voice frequency.
- the high degree of periodicity of the power spectrum of voiced speech waveforms results from the fact that the predominate mode of excitation of the vocal tract, during intervals of voiced speech, is by means of the glottal vibrator (vocal cords) which is known to possess a substantially sawtooth variation in the opening of the glottis.
- the voiced sound waveforms are predominately rich in harmonics that are integer multiples of the fundamental frequency which, for the male voice extends from about 70 Hz. to 150 Hz. in normal speech, and the meaningful spectrum of which extends from about 300 Hz. to somewhat beyond 3,000 Hz.
- a minimum of two harmonic components will be spanned by the passband of each bandpass filter in the filter bank. 1
- the modulated waveforms, issuing from the band-pass filters 3-1, 3-2, through 3-15 are passed through full-wave rectifier-s, or detectors, 4-1, 4-2, through 4-15, by way of lines 3-1a, 3-2a, through 3-15a.
- the function of the rectifiers is to provide a set of signals representative of the time variation of envelopes of the signals issuing from the band-pass filters 3-1, 3-2, through 3-15.
- the outputs from the rectifiers are transmitted byway of lines 4-1a, 4-2a, through 4-l5a to the fifteen inputs of the signal summing network which includes DC blocking capacitors 5-1a, 5-2a, through 5-15a and resistors S-lb, 5-2b, through 5-15b.
- the output of the signal summing network is passed through a band-pass filter 6, by means of line 50, having a passband extending from 70 Hz. to 250 Hz. that is more than sufficient to span the frequency range of the male voice fundamental frequency.
- the output of band-pass filter 6 appearing on line 6-1, during intervals of voiced speech, reflects a fundamental frequency including possible second and third harmonics weaker than the fundamental.
- band-pass filter 6 During intervals of unvoiced speech the output of band-pass filter 6 is essentially a band of random noise having significant energy confined to the frequency range 70 Hz. to 250 Hz.
- the signal issuing from the band-pass filter 6 is passed on to a balanced modulator 7 by way of line 6-1.
- the function of thebalanced modulator 7 is to shift the frequency range of the signal issuing from band-pass filter 6 to a considerably higher range of frequencies thereby yielding a frequency-translated signal whose percentage bandwidth is quite narrow with respect to its expected frequency range thus resulting in a reduction in the percentage bandwidth of the signal.
- the desired action is achieved by driving the balanced modulator 7 with a reference signal provided by local oscillator 6a connected by means of line 6a-1, the local oscillator frequency being typically 15 kHz.
- the output of the balanced modulator consists of a double sideband suppressed carrier modulated waveform which is passed on to band-pass filter 8 by means of line 7-1.
- band-pass filter 8 The function of band-pass filter 8 is to select either the upper or lower sideband signal and reject the other sideband signal and any residual carrier signal at the local oscillator frequency of l5 kHz. which may be present in the output from the balanced modulator due to slight imbalance.
- the band-pass filter 8 would have a passband extending from l5,070 kHz. to l5,250 kHz., when designed to select the upper sideband signal.
- the signal output of band-pass filter 8 is substantially a frequency translated version of the signal issuing from band-pass filter 6, but with the distinguishing property of being narrow band.
- This signal is passed on to a hard limiter 9 by way of line 8-1.
- the output of the limiter 9 is passed on, by way of line 9-1, to a second band-pass filter 10, having essentially the same passband as the filter 8.
- limiter 9 and band-pass filter 10 The combined action of limiter 9 and band-pass filter 10 is such that the signal issuing from the band-pass filter 10 will be of substantially constant amplitude with an average frequency linearly related to the voice fundamental during intervals of voiced speech.
- the output of band-pass filter 10 is substantially a random noise signal with a significant energy spectrum extending from roughly 15,070 kHz. to 15,250 kHz.
- This desirable signal processing property just described is the result of signal capture phenomenon exhibited by a hard limiting process followed by band-pass filtering.
- the output of filter 10 is passed on to a frequency discriminator 11 by way of line 10-1.
- discriminator 11 One function of discriminator 11 is to detect the quasi instantaneous frequency of the signal issuing from filter 10 during intervals of voiced speech. During intervals of unvoiced speech the output of the frequency discriminator 11 is substantially a random noise signal with significant energy content extending from about 0 Hz. to
- the output of the frequency discriminator is passed to a low-pass filter 12 by way of line 11-1.
- the lowpass filter 12 by virtue of having a cutoff frequency of 15 Hz., serves to remove minor high frequency fluctuations from the output of the frequency discriminator so that the output V, of low-pass filter 12, on line 12-1, provides a voltage level representation of the quasi instantaneous (short term average) voice fundamental frequencyduring intervals of voiced speech.
- the output signal from the frequency discriminator is of random character.
- the distinct difference in the character of the signals issuing from the frequency discriminator during intervals of voiced speech and intervals of unvoiced speech, is utilized by the voice-no voice decision network 13 to provide appropriate outputs in the following manner.
- the output from the frequency discriminator 11 is passes on to a high-pass filter 17, by way of line 11-2, having a cutoff frequency of about 50 Hz.
- a high-pass filter 17 having a cutoff frequency of about 50 Hz.
- the output signal from the frequency discriminator has a spectral energy distribution confined to frequencies below 50 Hz. while the output of high pass filter 17 is substantially zero.
- the signal output from the frequency discriminator is of substantial amplitude and of random character with a spectral energy distribution concentrated in the frequency range above 50 Hz.
- the output of high-pass filter 17 is passed on to rectifier 15 by way of line 17-1 and the output from rectifier 15 is then passed on, by way of line 15-1, to low-pass filter 16 having a cutoff frequency of 15 Hz. From the foregoing it is seen that the output from low-pass filter 16 will be substantially a DC level during intervals of unvoiced speech that is different from the DC level output during intervals of voiced speech. These different DC signal levels are utilized in a decision rendering function by passing the output of low-pass filter 16 to a threshold detector circuit 21, by way of line 16-1.
- the threshold detector circuit which affects the actual decision, is comprised of a high gain differential DC amplifier 20, input resistor 18, and positive feedback resistor 19.
- the detection threshold (i.e., the decision threshold) is controlled by the level of reference voltage V, applied to the positive input of the differential amplifier 20.
- the actual voice or no voice decision is indicated by which of two possible signal levels exists at the output V, of the threshold detector circuit line 13-1.
- a voicing detection apparatus for detecting voiced sounds present in speech waveforms comprising:
- a filter bank constituted of a plurality of contiguously tuned band-pass filters responsive to said speech waveforms to provide modulated waveforms, the periodicity of the envelope of each of the latter waveforms corresponding to the periodicity of the voice fundamental;
- a processing network responsive to said time variant signals to provide a substantially pure sinusoidal waveform whose frequency is proportionally related to the fundamental pitch of voiced sounds in the speech waveforms
- said processing network comprising a summing network connected to the output of said detectors, a broad bandpass filter connected to the output of said summing network, and a modulator, connected to the output of said broad band-pass filter, to provide a frequency translated signal whose percentage bandwidth is reduced in relation to its expected frequency range;
- a frequency discriminator interconnected to the output of said network, providing a substantially DC signal output, the level of which being a function of the instantaneous voice pitch during voiced speech, and
- a decision network interconnection to the frequency discriminator output and providing a DC signal level of one value in response to signals representing unvoiced sounds and a DC signal level of another value in response to signals representing voiced sounds.
- the voicing detection apparatus as in claim 1 wherein said processing network further includes a limiter for enhancing the ratio of the voice fundamental frequency to the harmonic frequencies, and a narrow band-pass filter connected to said limiter for rejecting unwanted higher harmonic frequenones.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US829414A US3600516A (en) | 1969-06-02 | 1969-06-02 | Voicing detection and pitch extraction system |
FR7015370A FR2045772A6 (fr) | 1966-09-29 | 1970-04-28 | système de détection de la voix |
JP45038773A JPS508602B1 (ja) | 1969-06-02 | 1970-05-08 | |
DE19702025233 DE2025233A1 (de) | 1966-09-29 | 1970-05-23 | Anordnung zur bestimmung der stimmhaften anteile von sprachlauten |
GB25406/70A GB1246079A (en) | 1969-06-02 | 1970-05-27 | Speech analysis system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US829414A US3600516A (en) | 1969-06-02 | 1969-06-02 | Voicing detection and pitch extraction system |
Publications (1)
Publication Number | Publication Date |
---|---|
US3600516A true US3600516A (en) | 1971-08-17 |
Family
ID=25254478
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US829414A Expired - Lifetime US3600516A (en) | 1966-09-29 | 1969-06-02 | Voicing detection and pitch extraction system |
Country Status (3)
Country | Link |
---|---|
US (1) | US3600516A (ja) |
JP (1) | JPS508602B1 (ja) |
GB (1) | GB1246079A (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5970441A (en) * | 1997-08-25 | 1999-10-19 | Telefonaktiebolaget Lm Ericsson | Detection of periodicity information from an audio signal |
US5986198A (en) * | 1995-01-18 | 1999-11-16 | Ivl Technologies Ltd. | Method and apparatus for changing the timbre and/or pitch of audio signals |
US6046395A (en) * | 1995-01-18 | 2000-04-04 | Ivl Technologies Ltd. | Method and apparatus for changing the timbre and/or pitch of audio signals |
US6336092B1 (en) * | 1997-04-28 | 2002-01-01 | Ivl Technologies Ltd | Targeted vocal transformation |
US20020073417A1 (en) * | 2000-09-29 | 2002-06-13 | Tetsujiro Kondo | Audience response determination apparatus, playback output control system, audience response determination method, playback output control method, and recording media |
US20110084953A1 (en) * | 2009-10-12 | 2011-04-14 | Chia-Yu Lee | Organic light emitting display having a power saving mechanism |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2243526A (en) * | 1940-03-16 | 1941-05-27 | Bell Telephone Labor Inc | Production of artificial speech |
US2340364A (en) * | 1942-08-22 | 1944-02-01 | Rca Corp | Audio transmission circuit |
US2561478A (en) * | 1948-05-28 | 1951-07-24 | Bell Telephone Labor Inc | Analyzing system for determining the fundamental frequency of a complex wave |
US2691137A (en) * | 1952-06-27 | 1954-10-05 | Us Air Force | Device for extracting the excitation function from speech signals |
US2927969A (en) * | 1954-10-20 | 1960-03-08 | Bell Telephone Labor Inc | Determination of pitch frequency of complex wave |
US3488446A (en) * | 1966-10-31 | 1970-01-06 | Bell Telephone Labor Inc | Apparatus for deriving pitch information from a speech wave |
-
1969
- 1969-06-02 US US829414A patent/US3600516A/en not_active Expired - Lifetime
-
1970
- 1970-05-08 JP JP45038773A patent/JPS508602B1/ja active Pending
- 1970-05-27 GB GB25406/70A patent/GB1246079A/en not_active Expired
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2243526A (en) * | 1940-03-16 | 1941-05-27 | Bell Telephone Labor Inc | Production of artificial speech |
US2340364A (en) * | 1942-08-22 | 1944-02-01 | Rca Corp | Audio transmission circuit |
US2561478A (en) * | 1948-05-28 | 1951-07-24 | Bell Telephone Labor Inc | Analyzing system for determining the fundamental frequency of a complex wave |
US2691137A (en) * | 1952-06-27 | 1954-10-05 | Us Air Force | Device for extracting the excitation function from speech signals |
US2927969A (en) * | 1954-10-20 | 1960-03-08 | Bell Telephone Labor Inc | Determination of pitch frequency of complex wave |
US3488446A (en) * | 1966-10-31 | 1970-01-06 | Bell Telephone Labor Inc | Apparatus for deriving pitch information from a speech wave |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5986198A (en) * | 1995-01-18 | 1999-11-16 | Ivl Technologies Ltd. | Method and apparatus for changing the timbre and/or pitch of audio signals |
US6046395A (en) * | 1995-01-18 | 2000-04-04 | Ivl Technologies Ltd. | Method and apparatus for changing the timbre and/or pitch of audio signals |
US6336092B1 (en) * | 1997-04-28 | 2002-01-01 | Ivl Technologies Ltd | Targeted vocal transformation |
US5970441A (en) * | 1997-08-25 | 1999-10-19 | Telefonaktiebolaget Lm Ericsson | Detection of periodicity information from an audio signal |
US20020073417A1 (en) * | 2000-09-29 | 2002-06-13 | Tetsujiro Kondo | Audience response determination apparatus, playback output control system, audience response determination method, playback output control method, and recording media |
US7555766B2 (en) * | 2000-09-29 | 2009-06-30 | Sony Corporation | Audience response determination |
US20110084953A1 (en) * | 2009-10-12 | 2011-04-14 | Chia-Yu Lee | Organic light emitting display having a power saving mechanism |
Also Published As
Publication number | Publication date |
---|---|
JPS508602B1 (ja) | 1975-04-05 |
GB1246079A (en) | 1971-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0091466B1 (en) | Method and apparatus for registering the use of a television receiver in connection with at least one video tape player | |
US4039754A (en) | Speech analyzer | |
JPH07302092A (ja) | カラオケシステムの使用者の歌点数採点装置 | |
ES450719A1 (es) | Una disposicion para uso en el reconocimiento de sonidos. | |
US3855417A (en) | Method and apparatus for phonation analysis lending to valid truth/lie decisions by spectral energy region comparison | |
US3600516A (en) | Voicing detection and pitch extraction system | |
JPH04150252A (ja) | 音声/音声帯域内データ識別装置 | |
US3546584A (en) | Apparatus for analyzing a complex waveform containing pitch synchronous information | |
US4164626A (en) | Pitch detector and method thereof | |
US2810787A (en) | Compressed frequency communication system | |
US2857465A (en) | Vocoder transmission system | |
DE69132081D1 (de) | Unterscheidung zwischen Information und Geräusch in einem Kommunikationssignal | |
GB978303A (en) | Improvements in or relating to means for processing signals composed of components of different frequencies | |
US3838217A (en) | Amplitude regulator means for separating frequency variations and amplitude variations of electrical signals | |
Miller | Performance characteristics of an experimental harmonic identification pitch extraction (HIPEX) system | |
US3509281A (en) | Voicing detection system | |
US3321582A (en) | Wave analyzer | |
US3196212A (en) | Local amplitude detector | |
US3448216A (en) | Vocoder system | |
USRE24670E (en) | Device for extracting the excitation function from speech signals | |
JPS5491009A (en) | Audio recognition unit | |
US3507999A (en) | Speech-noise discriminator | |
US2561478A (en) | Analyzing system for determining the fundamental frequency of a complex wave | |
US5134657A (en) | Vocal demodulator | |
JPS6223878B2 (ja) |