US3723667A - Apparatus for speech compression - Google Patents

Apparatus for speech compression Download PDF

Info

Publication number
US3723667A
US3723667A US00214615A US3723667DA US3723667A US 3723667 A US3723667 A US 3723667A US 00214615 A US00214615 A US 00214615A US 3723667D A US3723667D A US 3723667DA US 3723667 A US3723667 A US 3723667A
Authority
US
United States
Prior art keywords
speech
power supply
vowel
recording
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US00214615A
Inventor
J Park
W Mortimore
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PKM Corp
Original Assignee
PKM Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PKM Corp filed Critical PKM Corp
Application granted granted Critical
Publication of US3723667A publication Critical patent/US3723667A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/22Means responsive to presence or absence of recorded information signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B15/00Driving, starting or stopping record carriers of filamentary or web form; Driving both such record carriers and heads; Guiding such record carriers or containers therefor; Control thereof; Control of operating function
    • G11B15/18Driving; Starting; Stopping; Arrangements for control or regulation thereof
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/00007Time or data compression or expansion
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/022Electronic editing of analogue information signals, e.g. audio or video signals
    • G11B27/029Insert-editing

Definitions

  • a vowel detector is provided and is coupled to the drive means power supply for detecting the initiation and continuing presence of [56] References cued vowel sounds in speech signals.
  • the vowel detector is UNITED STATES PATENTS adapted to regularly and periodically interrupt the drive means power supply for certain predetermined 2,411,501 ll/l946 Brubaker ..l79/l00.l VC time intervals in response to the initiation and conn' gl et g/ 3 92 tinued presence of vowel sounds in the input.
  • FIG 3 I FREQUENCY IN H! -24 Y L I l I I I I I I 62.5 I25 250 500 I,000 2,000 4,000 8,000 I5,000
  • PAIENIEDIIIRZ'IIUYS SIIEEI 4 BF 7 VOWEL PAUSE --0R
  • SHEET 5 [IF 7 MANUAL INTERRUPT r LOGIC o I LEVEL I TAPE E SPEECH ONE- TRANSPORT o DETECTOR SHOT MINIMUM PAUSE LENGTH OF PAUSE DEiEETED'ADJUST lNSERTED-ADJUST PLAYBACK SPEAK-ER OR ELECTRONICS HEADPHONE REALIZATION OF SPEECH EXPANDER ELAPSED TIME ON RECORDING I I N SEC.
  • FIG 9 PATENTEDmzmra 3723,66?
  • the present invention relates generally to a means for recording and compressing speech sound, and more particularly to a method and apparatus for recording and selectively deleting pauses as well as certain portions of normal speech sound from the recording. It has been found that controlled and selective deletion of certain portions of normal speech render the recorded message highly intelligible, even when compressed to a time of less than one-half of the actual speech.
  • the speech input by periodically discarding a fixed segment of the input and bringing the ends of the retained input together to make a continuous, time-shortened signal. If the length of the retained segment is sufficiently long with respect to the fundamental pitch period of the voice, then the voice will retain most of its natural quality. The length of deleted segment must be sufficiently long with respect to the retained segment so as to effect the desired or required time compression, but not so long so as to obscure the important transitional elements or consonants in speech which are normally of short duration.
  • the input medium must either be played in at a faster than normal rate, or the alternative, the output must be arranged to be played back after processing at an increased rate.
  • the device described by Fairbanks et al. attains the necessary frequency shifting by utilizing a rotating head assembly.
  • Other devices utilizing similar techniques may employ tapped delay lines in which the input is provided from tapes which are being sampled at a suitable rate to receive the desired shift and bring the ends of the retained segments together.
  • a tape recording device is employed utilizing the speech input from a microphone, phonograph, tape recorder, or other similar structures which function in real-time, with a time-compressed reproduction being produced which may be played back on any standard playing apparatus.
  • the structure includes a recording means for receiving and recording speech signals from an input, with drive means being provided for the recording means, and with a power supply being coupled to the drive means.
  • Selective deletion of portions of the speech sound is accomplished by a substantial elimination of pauses, as well as a means for eliminating periodic portions of vowel sounds.
  • the apparatus of the present invention permits compression of speech to be undertaken to a substantial degree, with intelligible results being obtained with a compression providing a resultant play-back time of less than about 30 percent of the original speech time.
  • the apparatus of the present invention provides a means for expanding recorded speech as well.
  • Prior techniques included the use of a slow play-back with a resulting frequency shift, but changes in pitch make the speech unintelligible if slow rates are employed. While systematic repetition of short segments of recorded speech may be utilized to preserve the pitch, the character of such a recording is diminished because of apparent breaks in the speech provided at arbitrary points.
  • the apparatus of the present invention may function by selectively inserting additional pauses where pauses will normally occur, and thus allow play-back at the recorded time or greater, resulting in minimal, if any, loss in intelligibility.
  • FIG. 1 is a block diagram illustrating the fundamental components utilized in a speech compressor apparatus prepared pursuant to the present invention
  • FIG. 2 is a characteristic plot of frequency versus relative amplitude for the pre-filter structure utilized in connection with the present invention
  • FIG. 3 is a plot of the frequency versus the relative amplitude for the spectrum shaping apparatus of the speech detector portion of the apparatus
  • FIG. 4 is a plot of the frequency versus relative amplitude for the spectrum shaping for the vowel detector
  • FIG. 5 is a schematic diagram of a speech detector system which may be employed in the apparatus utilized to practice the present invention, and capable of delivering a response curve similar to that shown in FIG. 3;
  • FIG. 6 is a schematic diagram of the vowel detector which may be utilized to achieve the resultant curve shown in FIG. 4;
  • FIG. 7 is a typical timing diagram showing how speech-compression is achieved through a combination of pause deletion and vowel shortening
  • FIG. 8 is a-block diagram of a speech expansion structure which may be utilized in connection with the apparatus of the present invention.
  • FIG. 9 is a timing diagram showing how speech expansion is attained through the expander system shown in FIG. 8;
  • FIG. 10 is a schematic diagram of a vowel chopper which may be employed in connection with the present invention.
  • FIG. 1 l is a schematic diagram illustrating the pause indicator which may be utilized in connection with the apparatus of the present invention
  • FIG. 12 is a compression (or expansion) meter which may be utilized in connection with the apparatus of the present invention, and particularly for achieving an adjustable compression (or expansion) with a visual indication of the extent of compression;
  • FIG. 13 is a schematic diagram of, a portion of the speech expander concept illustrated in FIGS. 8 and 9.
  • FIG. 1 of the drawings wherein the speech compressor apparatus fabricated pursuant to the present invention is illustrated in block diagram form.
  • the system includes an input 20 which delivers a speech signal to a preamplifier 21.
  • the preamplified signal then passes to the pre-filter 22, and thence to a vowel detector system 23 and a speech detector 24.
  • the speech detector is, in turn, coupled to tape transport 25, so as to interrupt flow of power to the tape transport upon occurrence of a pause in the speech.
  • the output of vowel detector 23 is delivered to vowel chopper 26, and ultimately to tape transport 25 where the power supply for the tape transport is controllably regulated by vowel chopper 26.
  • the minimum pause to be retained may be adjustably preset in the speech detector.
  • the amount of vowel compression may be adjustably set in vowel chopper 26.
  • a pause indicator, either visual or audible, such as is illustrated in FIG. 1 at 27 and 28 may also be employed if desired.
  • a visual indication of the compression occurring in the speech signal is provided as indicated at 29.
  • the record electronic section 30 represents a bias oscillator, record amplifier and record driver.
  • the purpose of this portion of the system is to supply the appropriate electrical signal to the record and erase heads of a tape recorder, when a tape recorder is being employed.
  • Such electronic systems are well known in the industry and are commercially available.
  • the tape transport 25 is indicated as having a fast start/stop capability. This transport incorporates a read/write head, erase head, as well as drive means for moving the tape across the heads.
  • a power supply is provided for the drive means, with the power supply being actuated electrically for starting and stopping the tape.
  • the tape start-up time from full stop to full speed should be no greater than about 40 milliseconds for pause shortening operations, and no greater than about 20 milliseconds for vowel shortening. Start-up times of about 30 milliseconds and 10 milliseconds respectively are preferred. Furthermore, the stop time from full speed to full stop must be substantially the same. Tape transports having such start/stop capabilities are commercially available, and are widely used in the electronic data processing industry.
  • an important feature of the present invention is the generation of a control signal for the power supply to control the drive means for the recording mechanisms. As indicated, this signal is based on pause elimination and vowel shortening.
  • speech signals are recorded by way of the tape transport whenever the control signal is on. Such a signal exists whenever an appropriate voltage or current level is available to place the transport in the operational mode.
  • the control signal When a speech signal is not present, the control signal will not be present and the transport will not be moving the tape.
  • a speech signal is detected and it is not a vowel sound, then the transport is operative and tape is being carried across the record head.
  • a speech signal is present and it is ascertained that it is a vowel sound, then a first predetermined portion of the sound is recorded, and thereafter the sound is recorded on a periodic, cyclic, or chopped" basis.
  • a vowel sound is recorded for the first 1 seconds, and for the next t seconds, the sound is not being recorded. Thereafter, if the vowel sound is continuing, the next t seconds are recorded, followed by a period of t seconds of no recording. This cycle continues until the speech sound becomes a non-vowel in which case it is fully recorded, or, in the alternative, until the speech signal is no longer present in which case the power supply is interrupted and the transport stops.
  • the input signal derived from a microphone, tape head, phonograph, radio or other transducer providing an electrical signal representing the speech sound.
  • This signal is initially amplified in the preamplifier 21 to bring it up to standard levels, such as, for example, a peak at O-VU at the record head.
  • the signal is preferably filtered. It has been found that the filter utilized should have the characteristics shown in the diagram of FIG. 2, with frequencies below about 250 Hz being reduced to eliminate hum and rumble, and to insure that the envelope detector doesnot follow the natural pitch-period resonance of certain speakers. Furthermore, frequencies substantially. above approximately 6,000 B2 are reduced or eliminated in order to minimize the effects of hiss and background room noise.
  • This filtered signal is then passed into the vowel detector and speech detector, as indicated.
  • FIG. 5 of the drawings wherein a typical speech detector system is illustrated.
  • This detector includes components for accomplishing three basic functions, spectrum shaping, envelope detection, and threshold detection.
  • Spectrum shaping is necessary in order that low energy speech sounds necessary for good intelligibility, are weighted the same as the high energy vowel sounds.
  • the weighting shown in FIG. 3 of the drawings has been found to provide a nearly flat spectrum at the output of the spectrum shaping circuit for most speakers.
  • Capacitor 35 charges rapidly when'speech energy is present, and when the voltage reaches a threshold (about 2 volts for the circuit shown), the output signal goes to a logical level indicating speech being present.
  • variable resistor 37 When a pause occurs, transistor 36 is turned off, and the charge on capacitor 35 discharges through variable resistor 37. When the voltage falls below a second threshold, in this case about 0.7 volts, the output signal immediately drops to a level indicating speech being absent. The circuit indicates that the time to reach this threshold determines the length of the pause that is retained, and accordingly adjustment of variable resistor 37 may be utilized to control this time. In the circuit illustrated in FIG. 5, it is simple to utilize times as short as 10 milliseconds or less, or utilize times as long as l0s of seconds or even longer. When a signal is again present, capacitor 35 charges and an output is indicated.
  • FIG. 6 of the drawings wherein the schematic illustration of the vowel detector is shown.
  • vowel sounds have their primary energy (first formants) between about 250 and 800 Hz.
  • Most consonants have their primary energy in frequencies above approximately 1,000 I-Iz.
  • voice signals are filtered by the vowel spectrum selector, the circuit shown in FIG. 6.
  • This filter has the characteristic as is indicated in FIG. 4, and provides energy in the area of between about 250 Hz and 800 Hz.
  • the output of this filter will provide consonant sounds having voltage levels that are 30 db or lower in intensity than vowel sounds.
  • the envelope detector and threshold device operate similarly to the speech detector discussed above, with one important difference being that when a vowel sound ends, the circuit is designed so that the no-vowel level appears at the output within less than about 20 milliseconds delay. It is, of course, necessary to retain a portion of the vowel sound, hence the output of the vowel detector goes to the vowel chopper shown in FIG. 10.
  • the purpose of this circuit of FIG. 10 is to produce an output level for the power supply to the drive means for a period of t seconds, and interrupt this power for the next succeeding seconds alternately as is illustrated in FIG. 7 until the vowel sounds terminate. When the vowel sounds terminate, the output again returns to a level indicating no vowels present.
  • the system illustrated in FIG. 10 consists of two one-shot multivibrators and several logic gates.
  • the time constant R C, in the first one-shot multivibrator determines the time period for t and the time constant R, C, in the second one-shot multivibrator determines the timeperiod t,.
  • the percent of the vowel sound that is deleted is, of course, equivalent to t /(t, t )X 100.
  • the time I should be chosen to conmin at least several cycles of the lowest resonant voice sound anticipated for the device, this frequency typically being on the order of 100 Hz, and accordingly having a period of 10 milliseconds.
  • t should be at least about 30 milliseconds.
  • t should be smaller than the shortest vowel sounds so some shortening will, in fact, occur. In general, vowel sounds are seldom shorter than about milliseconds for most speakers.
  • the time t is selected in conjunction with t, in order to obtain the desiredvowel noted that the output of the vowel chopper and the speech detector are arranged in AND configuration to form the control signal to the drive means power supply. This is illustrated in FIG. 7.
  • the control signal is off whenever speech is absent or during the time t when vowels are present in the speech signal.
  • This control signal activates the tape recorder sothat the signal derived from the vowel detector and its chopper element, together with the speech detector, are utilized to activate the recorder and interrupt or stop the recorder as appropriate.
  • any style of recorder may be utilized, including magnetic tape, wire, disc, or the like, the primary requirement being that it have a capability of starting and stopping rapidly, as indicated hereinabove.
  • the signal level to the system is set at the preamplifier 21, as indicated, so that the recorder peaks are approximately at O-VU, as a standard practice.
  • the preamplifier is, of course, a standard type structure which is commer cially available.
  • the level set into the controller determines the signallevels that will activate the speech and vowel detectors.
  • this level can be set so that signals as low as 30 db below O-VU activate the speech and vowel detectors.
  • this level must be set so that such noise does not trigger the speech and vowel detectors, for example, the arrangement being such that only signals at db below O-VUor greater will trigger the speech and vowel detectors.
  • FIG. 11 A technique to accomplish such an arrangement is illustrated in FIG. 11.
  • the light driver is activated to light a lamp when the speech indicator is off.
  • an audible tone may be generated utilizing the oscillator as illustrated. It will be appreciated that any form of oscillator will suffice for generating an audible tone.
  • the oscillator When no signal is present at the output of the speech detector, the oscillator will be activated so as to generate theaudible tone at this time. The resulting tone is available by way of a speaker or head phone to the operator. This arrangement is illustrated in FIG.
  • FIG. 9 illustrates a timing diagram showing the method of approach for speech expansion.
  • the speech signal which is being played from a recorded medium is monitored utilizing the speech detector, and when speech is absent, as indicatedby the detector, a control signal is generated which stops the play-back of the recorded signal for a period of time t,, whereupon play-back resumes. The play-back continues until the speech detector goes from a speech indication level to a speech absence level, whereupon the process is repeated.
  • FIG. 8 One method of realizing this method of speech expansion pursuant to the present invention is shown in the block diagram of FIG. 8. In this embodiment, a tape transport is in the play-back mode and the signal to be expanded is recorded on a magnetic tape.
  • the tape head picks up the recorded speech signal and on the one hand it is passed through the usual play-back electronics, and presented to the listener by way of a speaker or head phone. On the other hand, it is also played into-the speech detector described in detail hereinabove, whereupon the output of the speech detector is on when speech is present, and off when speech is absent. When this output signal ceases, a one-shot multivibrator is triggered which produces a control signal. Normally the output of this one-shot multivibrator indicates that the transport is in the operational mode.
  • the one-shot multivibrator When the speech detector output goes from an indication of speech to no-speech, the one-shot multivibrator is triggered and the control signal is lost, with the transport stopping for a period of seconds, after which the transport resumes normal play-back until the speech detector output again falls to a no-speech level, whereupon the process is repeated.
  • FIG. 13 A possible method of generating the interval of time of t, seconds is shown in FIG. 13.
  • Two methods are provided foradjusting the amount of expansion. The first of these is by changing the time constant R C in FIG. 13, thus changing the time It is appreciated that with this circuit, one is able to vary t from as low as about 20 milliseconds to as long as several seconds or more. Of course, the longer t the more the speech is expanded-
  • the second method of varying the amount of expansion is simply by adjusting the minimum pause before the speech detector indicates a condition of no speech. This is accomplished by adjusting R C of FIG. 5. If this. time constant is sufficiently long, then short pauses will not be detected and hence not expanded, and the amount of expansion will be decreased. If even the very shortest pauses are detected (R C of FIG. 5)
  • the drive means and power supply for the recording means are standard and conventional in the art. Obviously, battery or AC driven units may be employed.
  • the pause elimination and vowel shortening occurs by means of controlling the current flow from the power supply to the drive means.
  • Means for recording and selectively deleting portions of normal speech sound comprising:
  • a. input means a. input means, recording means for receiving and recording speech signals from said input means, drive means for said recording means, and a power supply delivering energyto said drive means;
  • speech detector means coupled to said drive means power supply for detecting the presence of a speech signal in said input means and for energizing said drive means power supply only in response to the presence of a speech signal therein;
  • vowel detector means coupled to said drive means power supply for detecting the initiation and continuing presence of vowel sounds in speech signals in said input means, said vowel detector means being adapted to regularly and periodically interrupt said drive means power supply for certain predetermined time intervals in response to the initiation and continued presence of vowel sounds in said input, with means being provided for periodically chopping said power supply into a plurality of substantially regularly spaced apart power pulses having predetermined time duration, with said periodic chopping of drive means power supply commencing after a certain predetermined time interval following initial detection of vowel presence and continuing during the presence of vowel sounds in said input.
  • the speech compression means as defined in claim 1 being particularly characterized in that filter means are provided in the speech input for passing signals of between about 250 Hz and 6,000 Hz.
  • the speech compression means as defined in claim 1 being particularly characterized in that said periodic chopping of drive means power supply provides for power pulses of about 60 milliseconds followed by an idle period of about 30 milliseconds.
  • the speech compression means as defined in claim 1 being particularly characterized in that said recording means has a start up time capability of less than about 10 milliseconds.
  • the speech compression means as defined in claim 1 being particularly characterized in that said speech detector means continues to energize said drive means power supply for a predetermined period of time greater than approximately 10 milliseconds following the termination of each speech signal.
  • the recording means as defined in claim 1 being particularly characterized in that filter means are provided for speech detection, the filter being adapted to pass signals of modest amplitude at frequencies less than about 1,000 Hz, with the amplitude increasing substantially uniformly until an input frequency of about 8,000 Hz is reached.
  • the recording means as defined in claim 6 being particularly characterized in that said increase is at a level of about 24 db./octave at frequencies from between 1,000 Hz and 8,000 Hz.
  • the recording means as defined in claim 1 being particularly characterized in that vowel detector means are provided in the speech input for passing signals having a frequency of between about 250- Hz and 1,200 Hz.
  • the speech compression means as defined in claim 1 being particularly characterized in that control means are provided for controllably adjusting the extent of compression.
  • Means for recording and selectively modifying portions of normal speech sound comprising:
  • the speech compression means as defined in claim 10 being particularly characterized in that said recording means includes first and second serially coupled recording means with drive means for each of said recording means, means for continuing the energization of said second recording means upon each occurrence of the termination of the presence of a speech signal in said first recording means.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing Not Specific To The Method Of Recording And Reproducing (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

Means for recording and selectively deleting portions of normal speech sound which includes a recorder for receiving and recording speech signals from an input, with a drive means being provided for the recorder, and with a power supply being provided for the drive means. A speech detector is coupled to the power supply for the drive means and is arranged to energize the drive means only in response to the presence of a speech signal in the input. A vowel detector is provided and is coupled to the drive means power supply for detecting the initiation and continuing presence of vowel sounds in speech signals. The vowel detector is adapted to regularly and periodically interrupt the drive means power supply for certain predetermined time intervals in response to the initiation and continued presence of vowel sounds in the input.

Description

United States Patent Park, Jr. et al. 1 Mar. 27", 1973 54 APPARATUS FOR SPEECH 3,532,821 10 1970 Nakata et al. ..179 1 SA COMPRESSION 3,428,748 2/1969 Flanagan ..179 1 SA [75] Inventors: John H. Park, Jr., St. Paul; William a Examiner Ra mend F cardmo C M r M" 1' b :11 f y M' or meapo o o Attorney-Orrin M. Haugen inn.
[73] Assignee: PKM Corporation, St. Paul, Minn. ABSTRACT [22] Filed; Jam 3, 1972 Means for recording and selectively deleting portions of normal speech sound which includes a recorder for PP N04 214,615 receiving and recording speech signals from an input, with a drive means being provided for the recorder, U'S. Cl. I R and a power supply being pI'OVided for the drive 179/1 179/18 means. A speech detector is coupled to the power [51] int Cl Gnb 19/20 H66 supply for the drive means and is arranged to energize [58] Fie'ld 179/100 VC 1 VC the drive means only in response to the presence of a 15 55 speech signal in the input. A vowel detector is provided and is coupled to the drive means power supply for detecting the initiation and continuing presence of [56] References cued vowel sounds in speech signals. The vowel detector is UNITED STATES PATENTS adapted to regularly and periodically interrupt the drive means power supply for certain predetermined 2,411,501 ll/l946 Brubaker ..l79/l00.l VC time intervals in response to the initiation and conn' gl et g/ 3 92 tinued presence of vowel sounds in the input. u ey ..l7 5 2,115,803 5/1938 Dudley ..l79/l5.55 R 11 Claims, 13 Drawing Figures SET AMOUNT OF 22 23 VOWEL COMPRESSION LEVEL PRE- VOWEL VOWEL DRIVER POWER SUPPLY FILTER DETECTOR CHOPPER f29 J24 2'0 SPEECH COMPRESSION DETECTOR METER SPEECH PRE- $|GNAL AMPLIFIER r 25 PAUSE TONE 2'! INDICATOR GENERATOR 28 TAPE TRANSPORT SET MINIMUM [with fast start/stop] PAUSE RETAINED AND DRIVER RECORD k ELECTRONICS V I l SPEAKER 0R {so HEADPHONE REALI ZATION 0F SPEECH COMPRESSOR RELATIVE AMPLITUDE IN db RELATIVE AMPLITUDE IN db RELATIVE AMPLITUDE IN db PAIfNIEIIIIIRZTIUIS e -(23 7 SHEET 2 [IF 7 FREQUENCY IN H2 -24 I L I I I I I I I 62.5 I25 250 500 L000 2,000 4,000 8,000 I6,000
PRE-FILTER CHARACTERISTIC FIG 2 FR U N I 0 E CY N H:
I I I I I I I I L 62.5 I25 250 500 I,000 2,000'4,000 8,000 I6,000
SPECTRUM SHAPING FOR SPEECH DETECTOR FIG 3 I FREQUENCY IN H! -24 Y L I l I I I I I I I 62.5 I25 250 500 I,000 2,000 4,000 8,000 I5,000
SPECTRUM SHAPING FOR VOWEL DETECTOR FIG.4
PATENTEDHARZYIUIS 3,723,667
SHEET 3 UF 7 OUTPUT SPECTRUM SHAPING ENVELOPE DETECTOR THRESHOLD DETECTOR SPEECH DETECTOR FIG.5
OUTPUT V )L v M J SPECTRUM SHAPING ENVELOPE DETECTOR THRESHOLD DETECTOR VOWEL DETECTOR FIG.6
PAIENIEDIIIRZ'IIUYS SIIEEI 4 BF 7 VOWEL PAUSE --0R|G|NA|. ELAPSED TIME ,-|.l2 sec.
| CI P I I ORIGINAL I SPEECH IIIIII SPEECH DETECTOR OUTPUT VOWEL DETECTOR OUTPUT II U U VOWEL CHOPPER OUTPUT RECORD DELETE CONTROL SIGNAL I ELAPSED T|ME,-.54 see. I
I COMPRESSED SPEECH TIMING DIAGRAM FOR SPEECH COMPRESSION FIG.7
PATENTEDHARZYIUIS $723,667
SHEET 5 [IF 7 MANUAL INTERRUPT r LOGIC o I LEVEL I TAPE E SPEECH ONE- TRANSPORT o DETECTOR SHOT MINIMUM PAUSE LENGTH OF PAUSE DEiEETED'ADJUST lNSERTED-ADJUST PLAYBACK SPEAK-ER OR ELECTRONICS HEADPHONE REALIZATION OF SPEECH EXPANDER ELAPSED TIME ON RECORDING I I N SEC. I I RECORDED SPEECH Il n I I I I I l I l l l I I U I SPEECH DETECTOR OUTPUT 1 l T I STOP ACTION OF qwzzwLfi "I l PLAYIBACK IINSERTED: INSERTED: :INSERTED l PAUSE I l PAUSE I I PAUSE I EXP ED |r W 'r H SPEECH i I I ELAPSED TIME ON PLAYBACK 2.7 sec. 1
FIG 9 PATENTEDmzmra 3723,66?
sum 6 or 7 INPUT- OUTPUT TO ADJUST RI FOR 1 FOR 1 POWE I 2 SUPPLY VOWEL CHOPPER FIGJO INPUT 0 OUTPUT FROM SPEECH CONTROL SIGNAL DETECTOR ADJUST R FOR PORTION OF SPEECH EXPANDER FIG. 13
PATENTEUHARZ'IIUYS SHEET 7 BF 7 TONE GENERATOR +l7 SPEECH I00 SI8NAL 0 F. seon PAUSE To D INDlCATOR (audio) \Nr 0 SPEECH IOM HEADPHONE OR ZLZ SPEAKER AMPLIFIER all resistors Kohms all capacitors pf all transistors 2N3392 unless specified PAUSE INDICATOR (visual) F I G. I I
PAUSE INDICATORS FULL SCALE ADJUST CONTROL 39K 25 K SGML 200m COMPRESSION movem t INDICATOR COMPRESSION METER FIG.I2
APPARATUS FOR SPEECH COMPRESSION BACKGROUND OF THE INVENTION The present invention relates generally to a means for recording and compressing speech sound, and more particularly to a method and apparatus for recording and selectively deleting pauses as well as certain portions of normal speech sound from the recording. It has been found that controlled and selective deletion of certain portions of normal speech render the recorded message highly intelligible, even when compressed to a time of less than one-half of the actual speech.
Studies have indicated that the normal human ear and brain are rarely, if ever, overtaxed when listening to human speech at normal rates. Furthermore, studies have indicated that a normal listener is able to understand and comprehend speech even when delivered at a rate at least 3 times as rapid as natural speech. Accordingly, in recording lectures, business memoranda, or the like, much time can be saved by compressing the speech in terms of time, without deleting any significant portions of the spoken words, or detracting from the intelligibility.
In the past, speech compression has been accomplished by means of systematic or periodic deletion of certain portions of the spoken message. Such a device is described in an article by Fairbanks, et al., Method for Time or Frequency Compression-Expansion of Speech, Transactions of the I.R.E., PG on Audio, Vol.
AU-2, No. l, January-February, 1954, pages 7-11,
and achieves a time compression of the speech input by periodically discarding a fixed segment of the input and bringing the ends of the retained input together to make a continuous, time-shortened signal. If the length of the retained segment is sufficiently long with respect to the fundamental pitch period of the voice, then the voice will retain most of its natural quality. The length of deleted segment must be sufficiently long with respect to the retained segment so as to effect the desired or required time compression, but not so long so as to obscure the important transitional elements or consonants in speech which are normally of short duration. Inasmuch as the technique or practice of bringing the ends of the retained segments together results in an apparent low-range frequency of the voice, the input medium must either be played in at a faster than normal rate, or the alternative, the output must be arranged to be played back after processing at an increased rate. The device described by Fairbanks et al. attains the necessary frequency shifting by utilizing a rotating head assembly.
Other devices utilizing similar techniques may employ tapped delay lines in which the input is provided from tapes which are being sampled at a suitable rate to receive the desired shift and bring the ends of the retained segments together.
Those speech compression devices which utilize systematic or periodic deletion of input suffer from a number of disadvantages. For example, those mechanical devices which utilize rotating head assemblies require careful adjustment and maintenance, and are considered complex and expensive. Mechanical delay lines, which have been utilized in the past, are sensitive to mechanical shock. Electronic type delay lines have also been utilized. Furthermore, the extent of time compression which can be derived from systematic deletion is limited to no less than about 60 percent of the original time, since if additional compression is undertaken, the portion retained is such that many of the transitional elements of the sound are either blurted or deleted, thereby reducing intelligibility.
The time compression which is obtained from systematic deletion is frequently unnatural when compared with the normal human production of rapid speech. Studies have shown that the normal speaker, when attempting to speak more rapidly, will initially shorten the pauses between phonemes by bringing spoken sounds up more closely together without shortening the spoken sounds proportionally. It has been further found that the shortening that does occur when the speaker is attempting to speak at a more rapid rate takes place in the voiced or vowel-like sounds. It is believed that the transitional elements, particularly unvoiced consonants, cannot be appreciably shortened in duration since manipulation of the vocal apparatus is more intricate and involved for these sounds than for the longer vowel sounds. Accordingly, rapid human speech is characterized by shortened or minimal pauses along with shortened vowel-like sounds in the speech. To remain reasonably intelligible, transitional elements including the unvoiced consonants are shortened only very slightly, if at all.
It follows, therefore, that there is no reasonable relationship between the normal or natural reactions of a speaker attempting to speak at a more rapid rate, and the technique of systematic deletion. It is appreciated, of course, that systematic deletion produces a result in which the pauses in the speech appear to be unnaturally long, and the consonants unnaturally short, the combined effect of which renders the compressed speech somewhat unintelligible.
SUMMARY OF THE INVENTION In-order to carry out the method of time compression in accordance with the present invention, a tape recording device is employed utilizing the speech input from a microphone, phonograph, tape recorder, or other similar structures which function in real-time, with a time-compressed reproduction being produced which may be played back on any standard playing apparatus. Essentially, the structure includes a recording means for receiving and recording speech signals from an input, with drive means being provided for the recording means, and with a power supply being coupled to the drive means. Selective deletion of portions of the speech sound is accomplished by a substantial elimination of pauses, as well as a means for eliminating periodic portions of vowel sounds. The apparatus of the present invention permits compression of speech to be undertaken to a substantial degree, with intelligible results being obtained with a compression providing a resultant play-back time of less than about 30 percent of the original speech time.
In addition to compression of speech, the apparatus of the present invention provides a means for expanding recorded speech as well. Prior techniques included the use of a slow play-back with a resulting frequency shift, but changes in pitch make the speech unintelligible if slow rates are employed. While systematic repetition of short segments of recorded speech may be utilized to preserve the pitch, the character of such a recording is diminished because of apparent breaks in the speech provided at arbitrary points. The apparatus of the present invention may function by selectively inserting additional pauses where pauses will normally occur, and thus allow play-back at the recorded time or greater, resulting in minimal, if any, loss in intelligibility.
Therefore, it is a primary object of the present invention to provide an improved speech compression apparatus which functions on the elimination or drastic shortening of pauses, coupled with the deletion of certain portions of vowel or vowel-like sounds.
It is a further object of the present invention to provide an improved apparatus for modifying speech timing including selective speech compression and expansion which is simple in construction, rugged, and relatively inexpensive.
It is yet a further object of the present invention to provide an improved speech compression apparatus which functions on the basis of shortening or eliminating pauses, and deleting certain portions of vowels or vowel-like sounds, this speech compression being accomplished with very little loss of intelligibility.
Other and further objects of the present invention will become apparent to those skilled in the art upon a study of the following specification, appended claims, and accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram illustrating the fundamental components utilized in a speech compressor apparatus prepared pursuant to the present invention;
FIG. 2 is a characteristic plot of frequency versus relative amplitude for the pre-filter structure utilized in connection with the present invention;
FIG. 3 is a plot of the frequency versus the relative amplitude for the spectrum shaping apparatus of the speech detector portion of the apparatus;
FIG. 4 is a plot of the frequency versus relative amplitude for the spectrum shaping for the vowel detector;
FIG. 5 is a schematic diagram of a speech detector system which may be employed in the apparatus utilized to practice the present invention, and capable of delivering a response curve similar to that shown in FIG. 3;
FIG. 6 is a schematic diagram of the vowel detector which may be utilized to achieve the resultant curve shown in FIG. 4;
FIG. 7 is a typical timing diagram showing how speech-compression is achieved through a combination of pause deletion and vowel shortening;
FIG. 8 is a-block diagram of a speech expansion structure which may be utilized in connection with the apparatus of the present invention;
FIG. 9 is a timing diagram showing how speech expansion is attained through the expander system shown in FIG. 8;
FIG. 10 is a schematic diagram of a vowel chopper which may be employed in connection with the present invention;
FIG. 1 l is a schematic diagram illustrating the pause indicator which may be utilized in connection with the apparatus of the present invention;
FIG. 12 is a compression (or expansion) meter which may be utilized in connection with the apparatus of the present invention, and particularly for achieving an adjustable compression (or expansion) with a visual indication of the extent of compression; and
FIG. 13 is a schematic diagram of, a portion of the speech expander concept illustrated in FIGS. 8 and 9.
DESCRIPTION OF THE PREFERRED EMBODIMENT Attention is now directed to FIG. 1 of the drawings wherein the speech compressor apparatus fabricated pursuant to the present invention is illustrated in block diagram form. The system includes an input 20 which delivers a speech signal to a preamplifier 21. The preamplified signal then passes to the pre-filter 22, and thence to a vowel detector system 23 and a speech detector 24. The speech detector is, in turn, coupled to tape transport 25, so as to interrupt flow of power to the tape transport upon occurrence of a pause in the speech. The output of vowel detector 23 is delivered to vowel chopper 26, and ultimately to tape transport 25 where the power supply for the tape transport is controllably regulated by vowel chopper 26.
As is indicated in the drawing of FIG. 1, the minimum pause to be retained may be adjustably preset in the speech detector. Also, the amount of vowel compression may be adjustably set in vowel chopper 26. A pause indicator, either visual or audible, such as is illustrated in FIG. 1 at 27 and 28 may also be employed if desired. Also, a visual indication of the compression occurring in the speech signal is provided as indicated at 29.
With continued attention being directed to FIG. 1 of the drawings, the record electronic section 30 represents a bias oscillator, record amplifier and record driver. The purpose of this portion of the system is to supply the appropriate electrical signal to the record and erase heads of a tape recorder, when a tape recorder is being employed. Such electronic systems are well known in the industry and are commercially available. The tape transport 25 is indicated as having a fast start/stop capability. This transport incorporates a read/write head, erase head, as well as drive means for moving the tape across the heads. In addition, a power supply is provided for the drive means, with the power supply being actuated electrically for starting and stopping the tape. For a structure to be fully compatible with the various objects and methods utilized in the present invention, the tape start-up time from full stop to full speed should be no greater than about 40 milliseconds for pause shortening operations, and no greater than about 20 milliseconds for vowel shortening. Start-up times of about 30 milliseconds and 10 milliseconds respectively are preferred. Furthermore, the stop time from full speed to full stop must be substantially the same. Tape transports having such start/stop capabilities are commercially available, and are widely used in the electronic data processing industry.
As can be appreciated, an important feature of the present invention is the generation of a control signal for the power supply to control the drive means for the recording mechanisms. As indicated, this signal is based on pause elimination and vowel shortening.
As is indicated in FIG. 1, speech signals are recorded by way of the tape transport whenever the control signal is on. Such a signal exists whenever an appropriate voltage or current level is available to place the transport in the operational mode. When a speech signal is not present, the control signal will not be present and the transport will not be moving the tape. When a speech signal is detected and it is not a vowel sound, then the transport is operative and tape is being carried across the record head. When a speech signal is present and it is ascertained that it is a vowel sound, then a first predetermined portion of the sound is recorded, and thereafter the sound is recorded on a periodic, cyclic, or chopped" basis. For example, a vowel sound is recorded for the first 1 seconds, and for the next t seconds, the sound is not being recorded. Thereafter, if the vowel sound is continuing, the next t seconds are recorded, followed by a period of t seconds of no recording. This cycle continues until the speech sound becomes a non-vowel in which case it is fully recorded, or, in the alternative, until the speech signal is no longer present in which case the power supply is interrupted and the transport stops.
With continued attention being directed to FIG. 1, the input signal, derived from a microphone, tape head, phonograph, radio or other transducer providing an electrical signal representing the speech sound. This signal is initially amplified in the preamplifier 21 to bring it up to standard levels, such as, for example, a peak at O-VU at the record head. In order to reduce noise and other unwanted signals which have a frequency spectrum falling outside of the voice spectrum, the signal is preferably filtered. It has been found that the filter utilized should have the characteristics shown in the diagram of FIG. 2, with frequencies below about 250 Hz being reduced to eliminate hum and rumble, and to insure that the envelope detector doesnot follow the natural pitch-period resonance of certain speakers. Furthermore, frequencies substantially. above approximately 6,000 B2 are reduced or eliminated in order to minimize the effects of hiss and background room noise. This filtered signal is then passed into the vowel detector and speech detector, as indicated.
Attention is now directed to FIG. 5 of the drawings wherein a typical speech detector system is illustrated. This detector includes components for accomplishing three basic functions, spectrum shaping, envelope detection, and threshold detection. Spectrum shaping is necessary in order that low energy speech sounds necessary for good intelligibility, are weighted the same as the high energy vowel sounds. The weighting shown in FIG. 3 of the drawings has been found to provide a nearly flat spectrum at the output of the spectrum shaping circuit for most speakers. After spectrum shaping, the resulting signal is detected as indicated. Capacitor 35 charges rapidly when'speech energy is present, and when the voltage reaches a threshold (about 2 volts for the circuit shown), the output signal goes to a logical level indicating speech being present. When a pause occurs, transistor 36 is turned off, and the charge on capacitor 35 discharges through variable resistor 37. When the voltage falls below a second threshold, in this case about 0.7 volts, the output signal immediately drops to a level indicating speech being absent. The circuit indicates that the time to reach this threshold determines the length of the pause that is retained, and accordingly adjustment of variable resistor 37 may be utilized to control this time. In the circuit illustrated in FIG. 5, it is simple to utilize times as short as 10 milliseconds or less, or utilize times as long as l0s of seconds or even longer. When a signal is again present, capacitor 35 charges and an output is indicated.
Attention is now directed to FIG. 6 of the drawings wherein the schematic illustration of the vowel detector is shown. It is known that vowel sounds have their primary energy (first formants) between about 250 and 800 Hz. Most consonants have their primary energy in frequencies above approximately 1,000 I-Iz. Accordingly, voice signals are filtered by the vowel spectrum selector, the circuit shown in FIG. 6. This filter has the characteristic as is indicated in FIG. 4, and provides energy in the area of between about 250 Hz and 800 Hz. The output of this filter will provide consonant sounds having voltage levels that are 30 db or lower in intensity than vowel sounds. The envelope detector and threshold device operate similarly to the speech detector discussed above, with one important difference being that when a vowel sound ends, the circuit is designed so that the no-vowel level appears at the output within less than about 20 milliseconds delay. It is, of course, necessary to retain a portion of the vowel sound, hence the output of the vowel detector goes to the vowel chopper shown in FIG. 10. The purpose of this circuit of FIG. 10 is to produce an output level for the power supply to the drive means for a period of t seconds, and interrupt this power for the next succeeding seconds alternately as is illustrated in FIG. 7 until the vowel sounds terminate. When the vowel sounds terminate, the output again returns to a level indicating no vowels present. This function insures that consonants occurring immediately after a vowel sound are not lost. The system illustrated in FIG. 10 consists of two one-shot multivibrators and several logic gates. The time constant R C, in the first one-shot multivibrator determines the time period for t and the time constant R, C, in the second one-shot multivibrator determines the timeperiod t,. The percent of the vowel sound that is deleted is, of course, equivalent to t /(t, t )X 100. The time I, should be chosen to conmin at least several cycles of the lowest resonant voice sound anticipated for the device, this frequency typically being on the order of 100 Hz, and accordingly having a period of 10 milliseconds. Hence t should be at least about 30 milliseconds. On the other hand, t, should be smaller than the shortest vowel sounds so some shortening will, in fact, occur. In general, vowel sounds are seldom shorter than about milliseconds for most speakers. Thus, the time t, is selected in conjunction with t, in order to obtain the desiredvowel noted that the output of the vowel chopper and the speech detector are arranged in AND configuration to form the control signal to the drive means power supply. This is illustrated in FIG. 7. Thus, the control signal is off whenever speech is absent or during the time t when vowels are present in the speech signal. This control signal activates the tape recorder sothat the signal derived from the vowel detector and its chopper element, together with the speech detector, are utilized to activate the recorder and interrupt or stop the recorder as appropriate. It will be appreciated that any style of recorder may be utilized, including magnetic tape, wire, disc, or the like, the primary requirement being that it have a capability of starting and stopping rapidly, as indicated hereinabove. The signal level to the system is set at the preamplifier 21, as indicated, so that the recorder peaks are approximately at O-VU, as a standard practice. The preamplifier is, of course, a standard type structure which is commer cially available. The level set into the controller determines the signallevels that will activate the speech and vowel detectors. Thus, when the background noise is of low volume (40 db below O-VU), this level can be set so that signals as low as 30 db below O-VU activate the speech and vowel detectors. When the background noise increases so as to achieve a level of about 20 db below O-VU, this level must be set so that such noise does not trigger the speech and vowel detectors, for example, the arrangement being such that only signals at db below O-VUor greater will trigger the speech and vowel detectors.
In order to facilitate the setting of the level control and the pause length control, it is, of course, desirable to have visual and audible signals to indicate times when the speech detector output is off. A technique to accomplish such an arrangement is illustrated in FIG. 11. As can be seen from the schematic of FIG. 11, the light driver is activated to light a lamp when the speech indicator is off. Also, an audible tone may be generated utilizing the oscillator as illustrated. It will be appreciated that any form of oscillator will suffice for generating an audible tone. When no signal is present at the output of the speech detector, the oscillator will be activated so as to generate theaudible tone at this time. The resulting tone is available by way of a speaker or head phone to the operator. This arrangement is illustrated in FIG. 1, wherein this indicated function is added to the incoming voice signal, and hence played through the monitor speaker or head phone. At this time, the operator may simultaneously monitor what is being recorded and the indication of what portions are to be deleted due to the function of the speech detector and its affect on the power supply for the driver means.
Another feature of the present invention is the use of the structure for speech expansion. FIG. 9 illustrates a timing diagram showing the method of approach for speech expansion. The speech signal which is being played from a recorded medium is monitored utilizing the speech detector, and when speech is absent, as indicatedby the detector, a control signal is generated which stops the play-back of the recorded signal for a period of time t,, whereupon play-back resumes. The play-back continues until the speech detector goes from a speech indication level to a speech absence level, whereupon the process is repeated. One method of realizing this method of speech expansion pursuant to the present invention is shown in the block diagram of FIG. 8. In this embodiment, a tape transport is in the play-back mode and the signal to be expanded is recorded on a magnetic tape. The tape head picks up the recorded speech signal and on the one hand it is passed through the usual play-back electronics, and presented to the listener by way of a speaker or head phone. On the other hand, it is also played into-the speech detector described in detail hereinabove, whereupon the output of the speech detector is on when speech is present, and off when speech is absent. When this output signal ceases, a one-shot multivibrator is triggered which produces a control signal. Normally the output of this one-shot multivibrator indicates that the transport is in the operational mode. When the speech detector output goes from an indication of speech to no-speech, the one-shot multivibrator is triggered and the control signal is lost, with the transport stopping for a period of seconds, after which the transport resumes normal play-back until the speech detector output again falls to a no-speech level, whereupon the process is repeated.
A possible method of generating the interval of time of t, seconds is shown in FIG. 13. Two methods are provided foradjusting the amount of expansion. The first of these is by changing the time constant R C in FIG. 13, thus changing the time It is appreciated that with this circuit, one is able to vary t from as low as about 20 milliseconds to as long as several seconds or more. Of course, the longer t the more the speech is expanded- The second method of varying the amount of expansion is simply by adjusting the minimum pause before the speech detector indicates a condition of no speech. This is accomplished by adjusting R C of FIG. 5. If this. time constant is sufficiently long, then short pauses will not be detected and hence not expanded, and the amount of expansion will be decreased. If even the very shortest pauses are detected (R C of FIG. 5)
will be very small, and in this case there will be a greater amount of expansion.
As has been indicated, the drive means and power supply for the recording means are standard and conventional in the art. Obviously, battery or AC driven units may be employed. The pause elimination and vowel shortening occurs by means of controlling the current flow from the power supply to the drive means.
We claim:
1. Means for recording and selectively deleting portions of normal speech sound comprising:
a. input means, recording means for receiving and recording speech signals from said input means, drive means for said recording means, and a power supply delivering energyto said drive means;
. speech detector means coupled to said drive means power supply for detecting the presence of a speech signal in said input means and for energizing said drive means power supply only in response to the presence of a speech signal therein;
c. vowel detector means coupled to said drive means power supply for detecting the initiation and continuing presence of vowel sounds in speech signals in said input means, said vowel detector means being adapted to regularly and periodically interrupt said drive means power supply for certain predetermined time intervals in response to the initiation and continued presence of vowel sounds in said input, with means being provided for periodically chopping said power supply into a plurality of substantially regularly spaced apart power pulses having predetermined time duration, with said periodic chopping of drive means power supply commencing after a certain predetermined time interval following initial detection of vowel presence and continuing during the presence of vowel sounds in said input.
2. The speech compression means as defined in claim 1 being particularly characterized in that filter means are provided in the speech input for passing signals of between about 250 Hz and 6,000 Hz.
3. The speech compression means as defined in claim 1 being particularly characterized in that said periodic chopping of drive means power supply provides for power pulses of about 60 milliseconds followed by an idle period of about 30 milliseconds.
4. The speech compression means as defined in claim 1 being particularly characterized in that said recording means has a start up time capability of less than about 10 milliseconds.
5. The speech compression means as defined in claim 1 being particularly characterized in that said speech detector means continues to energize said drive means power supply for a predetermined period of time greater than approximately 10 milliseconds following the termination of each speech signal.
6. The recording means as defined in claim 1 being particularly characterized in that filter means are provided for speech detection, the filter being adapted to pass signals of modest amplitude at frequencies less than about 1,000 Hz, with the amplitude increasing substantially uniformly until an input frequency of about 8,000 Hz is reached.
7. The recording means as defined in claim 6 being particularly characterized in that said increase is at a level of about 24 db./octave at frequencies from between 1,000 Hz and 8,000 Hz.
8. The recording means as defined in claim 1 being particularly characterized in that vowel detector means are provided in the speech input for passing signals having a frequency of between about 250- Hz and 1,200 Hz.
9. The speech compression means as defined in claim 1 being particularly characterized in that control means are provided for controllably adjusting the extent of compression.
10. Means for recording and selectively modifying portions of normal speech sound comprising:
a. input means, recording means for receiving and recording speech signals from said input means,
drive means for said recording means, and a power predetermined time intervals in response to the inmatron and continued presence of vowel sounds in said input, with means being provided for periodically chopping said power supply into a plurality of substantially regularly spaced apart power pulses having predetermined time duration, with said periodic chopping of drive means power supply commencing after a certain predetermined time interval following initial detection of vowel presence and continuing during the presenceof vowel sounds in said input; and means for selectively continuing the energization of said drive means for predetermined periods of timeupon detection of termination of the presence of a speech signal in said speech detector means. 11. The speech compression means as defined in claim 10 being particularly characterized in that said recording means includes first and second serially coupled recording means with drive means for each of said recording means, means for continuing the energization of said second recording means upon each occurrence of the termination of the presence of a speech signal in said first recording means.

Claims (11)

1. Means for recording and selectively deleting portions of normal speech sound comprising: a. input means, recording means for receiving and recording speech signals from said input means, drive means for said recording means, and a power supply delivering energy to said drive means; b. speech detector means coupled to said drive means power supply for detecting the presence of a speech signal in said input means and for energizing said drive means power supply only in response to the presence of a speech signal therein; c. vowel detector means coupled to said drive means power supply for detecting the initiation and continuing presence of vowel sounds in speech signals in said input means, said vowel detector means being adapted to regularly and periodically interrupt said drive means power supply for certain predetermined time intervals in response to the initiation and continued presence of vowel sounds in said input, with means being provided for periodically chopping said power supply into a plurality of substantially regularly spaced apart power pulses having predetermined time duration, with said periodic chopping of drive means power supply commencing after a certain predetermined time interval following initial detection of vowel presence and continuing during the presence of vowel sounds in said input.
2. The speech compression means as defined in claim 1 being particularly characterized in that filter means are provided in the speech input for passing signals of between about 250 Hz and 6,000 Hz.
3. The speech compression means as defined in claim 1 being particularly characterized in that said periodic chopping of drive means power supply provides for power pulses of about 60 milliseconds followed by an idle period of about 30 milliseconds.
4. The speech compression means as defined in claim 1 being particularly characterized in that said recording means has a start up time capability of less than about 10 milliseconds.
5. The speech compression means as defined in claim 1 being particularly characterized in that said speech detector means continues to energize said drive means power supply for a predetermined period of time greater than approximately 10 milliseconds following the termination of each speech signal.
6. The recording means as defined in claim 1 being particularly characterized in that filter means are provided for speech detection, the filter being adapted to pass signals of modest amplitude at frequencies less than about 1,000 Hz, with the amplitude increasing substantially uniformly until an input frequency of about 8,000 Hz is reached.
7. The recording means as defined in claim 6 being particularly characterized in that said increase is at a level of about 24 db./octave at frequencies from between 1,000 Hz and 8,000 Hz.
8. The recording means as defined in claim 1 being particularly characterized in that vowel detector means are provided in the speech input for passing signals having a frequency of between about 250 Hz and 1,200 Hz.
9. The speech compression means as defined in claim 1 being particularly characterized in that control means are provided for controllably adjusting the extent of compression.
10. Means for recording and selectively modifying portions of normal speech sound comprising: a. input means, recording means for receiving and recording speech signals from said input means, drive means for said recording means, and a power supply delivering energy to said drive means; b. speech detector means coupled to said drive means power supply for detecting the presence of a speech signal in said input means and for energizing said drive means power supply only in response to the presence of a speech signal therein; c. vowel detector means coupled to said drive means power supply for detecting the initiation anD continuing presence of vowel sounds in speech signals in said input means, said vowel detector means being adapted to regularly and periodically interrupt said drive means power supply for certain predetermined time intervals in response to the initiation and continued presence of vowel sounds in said input, with means being provided for periodically chopping said power supply into a plurality of substantially regularly spaced apart power pulses having predetermined time duration, with said periodic chopping of drive means power supply commencing after a certain predetermined time interval following initial detection of vowel presence and continuing during the presence of vowel sounds in said input; and d. means for selectively continuing the energization of said drive means for predetermined periods of time upon detection of termination of the presence of a speech signal in said speech detector means.
11. The speech compression means as defined in claim 10 being particularly characterized in that said recording means includes first and second serially coupled recording means with drive means for each of said recording means, means for continuing the energization of said second recording means upon each occurrence of the termination of the presence of a speech signal in said first recording means.
US00214615A 1972-01-03 1972-01-03 Apparatus for speech compression Expired - Lifetime US3723667A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US21461572A 1972-01-03 1972-01-03

Publications (1)

Publication Number Publication Date
US3723667A true US3723667A (en) 1973-03-27

Family

ID=22799773

Family Applications (1)

Application Number Title Priority Date Filing Date
US00214615A Expired - Lifetime US3723667A (en) 1972-01-03 1972-01-03 Apparatus for speech compression

Country Status (3)

Country Link
US (1) US3723667A (en)
JP (1) JPS4878907A (en)
DE (1) DE2259178A1 (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3864519A (en) * 1973-05-11 1975-02-04 Ford Ind Inc Speech-gap-responsive control apparatus
US4087632A (en) * 1976-11-26 1978-05-02 Bell Telephone Laboratories, Incorporated Speech recognition system
US4130739A (en) * 1977-06-09 1978-12-19 International Business Machines Corporation Circuitry for compression of silence in dictation speech recording
DE2939499A1 (en) 1978-09-28 1980-04-10 Olympus Optical Co MAGNETIC TAPE PLAYER
US4207440A (en) * 1976-01-30 1980-06-10 The Vsc Company Dictation recorder with speech-extendable adjustment predetermined playback time
US4247910A (en) * 1978-12-21 1981-01-27 Bell Telephone Laboratories, Incorporated Arrangement for deleting leading message portions
US4272810A (en) * 1978-12-21 1981-06-09 Bell Telephone Laboratories, Incorporated Arrangement for deleting trailing message portions
US4375083A (en) * 1980-01-31 1983-02-22 Bell Telephone Laboratories, Incorporated Signal sequence editing method and apparatus with automatic time fitting of edited segments
US4388495A (en) * 1981-05-01 1983-06-14 Interstate Electronics Corporation Speech recognition microcomputer
US4412098A (en) * 1979-09-10 1983-10-25 Interstate Electronics Corporation Audio signal recognition computer
FR2599915A1 (en) * 1986-04-24 1987-12-11 Inst Radioveschatelnogo Prie METHOD FOR WRITING AND READING SOUND INFORMATION SIGNALS IN DIGITAL FORM AND DEVICE IMPLEMENTING SAID METHOD
US4893197A (en) * 1988-12-29 1990-01-09 Dictaphone Corporation Pause compression and reconstitution for recording/playback apparatus
WO1993009531A1 (en) * 1991-10-30 1993-05-13 Peter John Charles Spurgeon Processing of electrical and audio signals
EP0929068A2 (en) * 1998-01-06 1999-07-14 Pioneer Electronic Corporation Method of and apparatus for reproducing a plurality of information pieces
US6085157A (en) * 1996-01-19 2000-07-04 Matsushita Electric Industrial Co., Ltd. Reproducing velocity converting apparatus with different speech velocity between voiced sound and unvoiced sound
US6246752B1 (en) 1999-06-08 2001-06-12 Valerie Bscheider System and method for data recording
US6249570B1 (en) 1999-06-08 2001-06-19 David A. Glowny System and method for recording and storing telephone call information
US6252947B1 (en) 1999-06-08 2001-06-26 David A. Diamond System and method for data recording and playback
US6252946B1 (en) 1999-06-08 2001-06-26 David A. Glowny System and method for integrating call record information
US20040106017A1 (en) * 2000-10-24 2004-06-03 Harry Buhay Method of making coated articles and coated articles made thereby
US6775372B1 (en) 1999-06-02 2004-08-10 Dictaphone Corporation System and method for multi-stage data logging
US6775648B1 (en) * 1996-03-08 2004-08-10 Koninklijke Philips Electronics N.V. Dictation and transcription apparatus
WO2005099190A1 (en) 2004-04-07 2005-10-20 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for increasing perceived interactivity in communications systems
US20090058611A1 (en) * 2006-02-28 2009-03-05 Takashi Kawamura Wearable device
US20200105281A1 (en) * 2012-03-29 2020-04-02 Smule, Inc. Automatic conversion of speech into song, rap or other audible expression having target meter or rhythm

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11108764B2 (en) 2018-07-02 2021-08-31 Salesforce.Com, Inc. Automating responses to authentication requests using unsupervised computer learning techniques

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2115803A (en) * 1935-10-30 1938-05-03 Bell Telephone Labor Inc Signaling system
US2286072A (en) * 1939-12-22 1942-06-09 Bell Telephone Labor Inc Treatment of speech waves for transmission or recording
US2411501A (en) * 1944-05-16 1946-11-26 Memovox Inc Sound recording system
US3428748A (en) * 1965-12-28 1969-02-18 Bell Telephone Labor Inc Vowel detector
US3471652A (en) * 1965-05-24 1969-10-07 Northrop Corp System for recording operational failure history
US3532821A (en) * 1967-11-29 1970-10-06 Hitachi Ltd Speech synthesizer

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2115803A (en) * 1935-10-30 1938-05-03 Bell Telephone Labor Inc Signaling system
US2286072A (en) * 1939-12-22 1942-06-09 Bell Telephone Labor Inc Treatment of speech waves for transmission or recording
US2411501A (en) * 1944-05-16 1946-11-26 Memovox Inc Sound recording system
US3471652A (en) * 1965-05-24 1969-10-07 Northrop Corp System for recording operational failure history
US3428748A (en) * 1965-12-28 1969-02-18 Bell Telephone Labor Inc Vowel detector
US3532821A (en) * 1967-11-29 1970-10-06 Hitachi Ltd Speech synthesizer

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3864519A (en) * 1973-05-11 1975-02-04 Ford Ind Inc Speech-gap-responsive control apparatus
US4207440A (en) * 1976-01-30 1980-06-10 The Vsc Company Dictation recorder with speech-extendable adjustment predetermined playback time
US4087632A (en) * 1976-11-26 1978-05-02 Bell Telephone Laboratories, Incorporated Speech recognition system
US4130739A (en) * 1977-06-09 1978-12-19 International Business Machines Corporation Circuitry for compression of silence in dictation speech recording
FR2394206A1 (en) * 1977-06-09 1979-01-05 Ibm DEVICE FOR COMPRESSING SILENCES IN VOICE FREQUENCY RECORDERS
DE2939499A1 (en) 1978-09-28 1980-04-10 Olympus Optical Co MAGNETIC TAPE PLAYER
US4247910A (en) * 1978-12-21 1981-01-27 Bell Telephone Laboratories, Incorporated Arrangement for deleting leading message portions
US4272810A (en) * 1978-12-21 1981-06-09 Bell Telephone Laboratories, Incorporated Arrangement for deleting trailing message portions
US4412098A (en) * 1979-09-10 1983-10-25 Interstate Electronics Corporation Audio signal recognition computer
US4375083A (en) * 1980-01-31 1983-02-22 Bell Telephone Laboratories, Incorporated Signal sequence editing method and apparatus with automatic time fitting of edited segments
US4388495A (en) * 1981-05-01 1983-06-14 Interstate Electronics Corporation Speech recognition microcomputer
FR2599915A1 (en) * 1986-04-24 1987-12-11 Inst Radioveschatelnogo Prie METHOD FOR WRITING AND READING SOUND INFORMATION SIGNALS IN DIGITAL FORM AND DEVICE IMPLEMENTING SAID METHOD
US4893197A (en) * 1988-12-29 1990-01-09 Dictaphone Corporation Pause compression and reconstitution for recording/playback apparatus
WO1993009531A1 (en) * 1991-10-30 1993-05-13 Peter John Charles Spurgeon Processing of electrical and audio signals
US6085157A (en) * 1996-01-19 2000-07-04 Matsushita Electric Industrial Co., Ltd. Reproducing velocity converting apparatus with different speech velocity between voiced sound and unvoiced sound
US6775648B1 (en) * 1996-03-08 2004-08-10 Koninklijke Philips Electronics N.V. Dictation and transcription apparatus
EP0929068A2 (en) * 1998-01-06 1999-07-14 Pioneer Electronic Corporation Method of and apparatus for reproducing a plurality of information pieces
EP0929068A3 (en) * 1998-01-06 1999-10-06 Pioneer Electronic Corporation Method of and apparatus for reproducing a plurality of information pieces
US6807450B1 (en) 1998-01-06 2004-10-19 Pioneer Electronic Corporation Method of and apparatus for reproducing a plurality of information pieces
US6775372B1 (en) 1999-06-02 2004-08-10 Dictaphone Corporation System and method for multi-stage data logging
US6249570B1 (en) 1999-06-08 2001-06-19 David A. Glowny System and method for recording and storing telephone call information
US6252947B1 (en) 1999-06-08 2001-06-26 David A. Diamond System and method for data recording and playback
US6252946B1 (en) 1999-06-08 2001-06-26 David A. Glowny System and method for integrating call record information
US20010055372A1 (en) * 1999-06-08 2001-12-27 Dictaphone Corporation System and method for integrating call record information
US20020035616A1 (en) * 1999-06-08 2002-03-21 Dictaphone Corporation. System and method for data recording and playback
US6728345B2 (en) * 1999-06-08 2004-04-27 Dictaphone Corporation System and method for recording and storing telephone call information
US20010040942A1 (en) * 1999-06-08 2001-11-15 Dictaphone Corporation System and method for recording and storing telephone call information
US6246752B1 (en) 1999-06-08 2001-06-12 Valerie Bscheider System and method for data recording
US20010043685A1 (en) * 1999-06-08 2001-11-22 Dictaphone Corporation System and method for data recording
US6785369B2 (en) * 1999-06-08 2004-08-31 Dictaphone Corporation System and method for data recording and playback
US6937706B2 (en) * 1999-06-08 2005-08-30 Dictaphone Corporation System and method for data recording
US20040106017A1 (en) * 2000-10-24 2004-06-03 Harry Buhay Method of making coated articles and coated articles made thereby
EP1735968B1 (en) * 2004-04-07 2014-09-10 TELEFONAKTIEBOLAGET LM ERICSSON (publ) Method and apparatus for increasing perceived interactivity in communications systems
WO2005099190A1 (en) 2004-04-07 2005-10-20 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for increasing perceived interactivity in communications systems
US20090058611A1 (en) * 2006-02-28 2009-03-05 Takashi Kawamura Wearable device
US8581700B2 (en) * 2006-02-28 2013-11-12 Panasonic Corporation Wearable device
US20200105281A1 (en) * 2012-03-29 2020-04-02 Smule, Inc. Automatic conversion of speech into song, rap or other audible expression having target meter or rhythm
US11127407B2 (en) * 2012-03-29 2021-09-21 Smule, Inc. Automatic conversion of speech into song, rap or other audible expression having target meter or rhythm
US12033644B2 (en) 2012-03-29 2024-07-09 Smule, Inc. Automatic conversion of speech into song, rap or other audible expression having target meter or rhythm

Also Published As

Publication number Publication date
JPS4878907A (en) 1973-10-23
DE2259178A1 (en) 1973-07-12

Similar Documents

Publication Publication Date Title
US3723667A (en) Apparatus for speech compression
JP3151459B2 (en) Public address clarity enhancement system
Arons Techniques, perception, and applications of time-compressed speech
US5828994A (en) Non-uniform time scale modification of recorded audio
EP0090589A1 (en) Method and apparatus for use in processing signals
US3369077A (en) Pitch modification of audio waveforms
US3588353A (en) Speech synthesizer utilizing timewise truncation of adjacent phonemes to provide smooth formant transition
GB1591996A (en) Apparatus for recognising words from among continuous speech
US4508457A (en) Electronic timepiece with record/playback circuits
US4384170A (en) Method and apparatus for speech synthesizing
Grimm Perception of Segments of English‐Spoken Consonant‐Vowel Syllables
US5293273A (en) Voice actuated recording device having recovery of initial speech data after pause intervals
Estes et al. Speech synthesis from stored data
Ahmend et al. Effect of sample duration on the articulation of sounds in normal and clipped speech
JP2734028B2 (en) Audio recording device
Olson et al. Speech processing techniques and applications
Underwood Time interval statistics in speech synthesis: a critical evaluation
JP2001154684A (en) Speech speed converter
Jensen et al. Pause adjustment mechanism and measurement system (PAMMS)
JPH0368399B2 (en)
JPH06308992A (en) Voice type electronic book
WO1993009531A1 (en) Processing of electrical and audio signals
JP2962777B2 (en) Audio signal time-base expansion / compression device
JPS59148104A (en) Automatic sound recording and reproducing device
SU1078458A1 (en) Device for reproducing magnetic pulse-duration modulated record