CA1216525A

CA1216525A - Method and means for processing speech

Info

Publication number: CA1216525A
Application number: CA000444499A
Authority: CA
Inventors: Larry K. Henrickson; Dorothy A. Huntington
Original assignee: Individual
Current assignee: Individual
Priority date: 1983-01-03
Filing date: 1983-12-30
Publication date: 1987-01-13
Also published as: WO1984002793A1; DK421084D0; EP0131033A1; NO843506L; FI843388A; EP0131033A4; FI843388A0; JPS60500472A; DK421084A

Abstract

ABSTRACT OF THE DISCLOSURE

A method of and apparatus for processing audio signals in which a measure of amplitude of audio signals in a selected time period is obtained. The audio signals for the selected time period are delayed until the measure of amplitude is obtained, and then the delayed audio signals are normalized using the measure of amplitude. High frequency emphasis may be employed prior to obtaining the measure of amplitude. Alterna-tively, a multi-channel system can be employed for processing audio signals in limited frequency bands.
The method and apparatus are applicable in a variety of applications including hearing aids, audio storage media, broadcast and public address systems, and voice communications such as telephone systems.

Description

~2~ i2~

METHOD AND MEANS FOR PROCESSING SPEECH

This invention relates genexally to speech proce~sing, and more particularly ~he invention relates to a method and means for amplifying speech, such as for the hard of hearing, without advers61y affecting the 5 signal intelligence thereof.

It is well recognized that persons having sensorineural hearing impairment generally have a very limited dynamic range, that is, very li~kle difference between the inten~ity level o~ the softest speech of which lC they are aware (e~g. speech awareness threshold or SAT), the intensity le~el of ~peech which is mo~t comfortable fox them (mos~ comfortable level or MCL), and the intensity at which sp~ech becomes too loud to be tolerated (uncomfortable loudnes~ level or UCL).
15 It is generally agreed that it would be highly desirable to reduce the wide range of speech intQnsity levels to a more re~tricted range suitable for the sensorineural hearing impairment of each individual listener.

Speech compression ~y~tems axe known which employ 20 automatic gain control~ However, prior art systems employing peak clipping and in3tantaneous compre sion produce harmonic distortion which tend~ to emphasize the stronger, low-fre~uency componants of speech and obscures the higher frequenciesO A compreh~nsiv~

survey is presented by Braida e~ al in "~earing Aids - A Re~iew of Past Research on Linear Amplifica-tion, Amplitud~ Compres~ion, and Frequency Lowering", American Speech-Language~Hearing Association, Rockville, Maryland, April 1979. This ~urvey provides an extensive critical review o t~.e compression li~erature in conjunction with a tutorial on compression concepts.
The survey suggests that th~ lack of benefi~s from compression as shown in the survey literature reflects more a failure of researchers to adequately grasp the concepts and complexity of compression, in theory and implementation, rather than the potential benefit of amplitude compres~ion itself.

It is recogniæed that the acoustical patt~rns of speech can be systemically analyzed in three primary time-domain component~. ~1) a fine-temporal pattern reflecting the spectral dis~ri~ution of each brief acoustic segment, (2) a gross-temporal pattern reflect-ing the durations o~ the various acouskic segments based on changes in fine-temporal patterns, and (3) a time-varying amplitude pat~ern. Tha fine temporal cues from segments of speech as short as five or ten milliseconds will often provide a listener with suffi-cient information to identify the place of articulation for consonants. Similarly, the gross temporal pattern will often provide sufficient information regarding the manner of arti~ulation, e~pecially among the classes of fricatives, affricates and ~top-plosives.
The time varying ampli~ude pa~tern, or " peech anvelope", is the natural reqult of a speech production process but may convey mostly redundant information already conveyed by a gross~temporal pattern. Robinson and Huntington, in a talk before th~ Acoustical Society of ~merica in April, 1973, recognized that conv~ntional compression amplification introduces unde~irable distortion when brief time cons~ant~ are utilized, and reacts too sluggi~hly or longer time constan~s. A

~gi5~

method was proposed in which the average power of the speech wave form over intervals of several tens of milliseconds is measured con~inuously and is used to determine ~he gain to be applied to tha waveform at the center of each interval, with the resulting ampli-tude compressed ~ignal being delayed by one-half the length of the averaging interval. Preliminary result~
from a computer simulation suggested that speech intelligibility could be improved by this proce3s.
However, further wo~k was not undertaken by Robinson and Huntington to develop the process.

An object of ~he present inven~ion is an improved method of processing speech to facilitate reception without distorting the intelligible content thereof~

Another object o~ the invention i8 apparatus f~r compressing speech pat~erns whereby the varia~ions in time varying amplitude pattern or envelope are minimized without adversely affecting tha fine-temporal and gross-temporal pa~terns of the speech.

The present invention is directed to a method and apparatus for processing speech in which a time-varying averaged or root-mean-square ~RMS) amplitude pattern is obtained and used to normalize the time varying amplitude pattern o speech and provide a compressed speech pattern po~itioned between the speech awareness threshold and the uncomfortable loudness level, ideally at the listener's most comort-able level. Spectral shaping is employed to emphasize the high frequency content. The invention can be implemented in a single channel or multi channel system. Suitable microphone means i~ employed to pick up a speech pattsrn, and the speech pattern rom the microphone is preamplified and then proeessed by a suitable shaping filter which emphasizes the high frequency content thereof. The root mean-square of q ~ ~l 5~

. ~

the amplitude of the spectrallyshaped signal is then determined over a specific time period~ and the inverse of the root-mean-square is then used to modulate the spectral~yshaped signal/ thus producing a normalized amplitude. Importantly, the shaped signal is delayed for a sufficient time period to compensate for the time delay involved in the root-mean-square determination prior to the amplitude compression. The resulting signal is thus compressed and then adjustPd -to the desired hearing ran~e with ~he spectral shaping providing a retention of the fine-temporal pattern and the gross-temporal pattern.
Thus, in accordance with a broad aspect of the invPntion, there is provided apparatus for enhancing speech comprising micro-phone means for receiving audio signals and generating electrical signals in response thereto, high frequency emphasis means connected to said microphone means for amplifying said electrical signals, amplitude detection means connected to said high frequency empha-sis means for receiving amplified signals and obtaining a root mean square (RMS) amplitude of said amplified signals over a selected period of time, delay means connected to said high~fre-

2~ quency emphasis means for receiving and delaying spectrally shaped electrical signals for a portion of said selected period of timer and signal compression means connected to said dela~ means and to said amplitude detection means and compressing the delayed ampli-fied electrical signals by a ratio of at least 10:1 in accordance with said root-mean-s~uare average value whereby a constant ampli-tude out~ut signaI without significant signal distortion is ob-tained.

~ 6~
-4a-In accordance with another ~road aspect of the ~n~entIon there is provided, in speech enhancement apparatus, a method of compressing the amplitude of audio signals without speech distor-tion comprising the steps of obtaining a measure of amplitude of said audio signals over a selected time period, delaying said sig~
nals by said selected time period, and compressing said delayed signals corresponding to said selected period of time by a ratio of at least 10:1 in accordance with said measure o-E amplitude.
In accordance with another broad aspect of the invention there is provided apparatus for processing audio signals compris-ing a plurality of band pass filters for receiving and filtering audio signals into a plurality of limited frequency bands, a plurality of amplitude compresscr means each connected with a band pass filter, each of said amplitude compressor means including means for obtaining a measure of amplitude of audio signals during a selected period of time, means for delaying audio signals for said selected period of time, and means for compressing the delayed audio signals during said period of time based on said measure of aMplitude, and summing means connected to said pluralit~ of ampli-~0 tude compressor means for receiving and summing compression ofaudio signals.
T].le invention and objects and features thereof will be more readily apparent from the following detailed description and appended claims when taken with the drawing, in which:
Figure 1 is a functionalblock diagram of a single channel speech processing apparatus in accordance with one embodiment of ~2~ii;2~
-4b-the present invention.
Figure 2 is a graph illustratiny the compression of speech in accordance with the present invention.
Figure 3 is a functional block diagram of a multichannel embodiment of speech processing apparatus in accordance with the invention.
Figures 4A and 4B are functional block diagrams of a tape recording system in accordance with other embodiments of the inven-tion.
Referring now to the drawings, Figure 1 is a functional block diagram of a single channel speech processor in accordance with one embodiment of the invention which has been built using conventional, commercially available components. In this embodi-ment a microphone 10 ~i having a broad frequency response (e.g. an electret microphone having a response o 100 Hertz to 10K
Hertz such as a Knowles E~ 1934) picks up audio signal3 and transmits electrical signal6 ~o a pre-amplifier 12 having 26 dB of gain between 100 Hertz to lOK Hertz.
The amplified signal is then passed to high frequency emphasis circui~y 14 (e~g. TI064 quad amplifier) which proviaes 6 dB/octave gain ovar the range from 100 Hertz to two kiloHertz and a 1at response above two kiloIIertz. An auxiliary input is pro~ided at 16 whereby signals from a radio receiver, for example, can be applied to the high frequency emphasis circuitry 14.

The ~ignal from circuitry 14 is then passed to an ~MS
detector high-frequency emphasis circuitry 14 is al~o provided to delay circuitry 18 having a delay equal to the time constant of the RMS detector 16. In one embodiment ~he RMS detector comprised an analog series AD 536A and the delay circuitry 18 comprised a Raticon SAD 4096 bucket brigade devics operated from a 80 kilohertz digital clock 20.

The delayed signal from the delay device 18 i~ then applied as the numerator in a divider circuit 20 ~e.g.
Analog Devices AD 535 precision divider) and the RMS
amplitude of the delayed signal is applied to the divider 20 as a denominator. Accordingly, the output from the divider 20 is a delayed amplitude compressed signal which is applied to the raceiver 22 (Knowles ED
1925~

Figure 2 is a plot of the compressed output level in dB SPL for the signal applied to receiver 22 versus the input level in aB SPL of the signal from the microphone 10. For input level~ balow abou~ 45 dB, the output level is attenuated. At an input level of 45 dB, the output level i5 compressed and maintained uni~orm at approximately 100 dB SPL which is the MCL
le~el. Tha compression ratio remai~s at 10:1 or grPater fsr input level~ above 45 dB.

Figure 3 is a multi-channel signal compression system in accordance with another embodiment of the invention in which signals are filtered and compressed in a plurality of frequency bands. In this embodiment signals from the microphone 30 are applied ko a low band (100-400 hZ) filter 32, a middle band (400-1,600 Hz) filtex 34, and a high band (1,600-6,400Hzl filter 36. Signals from each of the filters are pa~sed to amplitude compressor circui~ry 38, 40, and 42. Each of the compressor circuits includes delay circuitry, RMS detector circuitry, and divider circuitry as illustrated in Figure 1. Because each channel includes a narrow band of fre~uencies, the high frequency emphasis circuitry of Figure 1 i~ not required. The compressed signals are then applied to a summing amplifier 44 with the composite summed signal then applied to the recei~er 46.

Figures 4A and 4B are functional block diagrams of other embodiments of the invention u~aful with tape recorders and in which the compressed signal and the detected RMS value are both recorded in time sequence with a tape recorder. In Figure 4A 9ignals from the microphone 50 or other audio source are applied to amplitude compres~or 52 which ma~ be a ~ingle channel device as in Figure 1 or a multiple channel device as in Figure 3. The compressed audio qignal is then recorded in an analog channel of the tape reaorder 54, and the de~ected RMS valua is recorded in an FM channPl of the recorder 54. Thereaftar, the recorded compressed audio signal and the recorded RMS ~alue can be applied to a multiplier 56 fxom which the original audio signal and the original dynamic ranga is produced.
The resulting decompressed ~ignal is applied through a5~a~
, .

frequency de emphasis circuik 58 to the receiver 59.

Figure 4B i5 a ~ampled digital recording system similar to the analog systPm of Figure 4A. In thi embodiment signals from microphone 60 are applied to the amplitude compressor 62, as in Figure 4A, and then the compressed audio signal and the RMS value are converted to digital form by analog to digital circuits 63 and 65. The digital signals are then stored in digital recorder 64. The recorded signals ar converted ~ack to analog signals by digital ~o analog converter 57 and multi-plying DAC 66. The decompressed signals from DAC 66 are frequency de-emphasized at 68 and then applied to the receiver 69.

These embodiments of the invention are particularly advantageous since tape r~corder~ typically have a limited dynamic range. Thus, by recording the compressed audio signal and ~he RMS on the recorder, the full dynamic range of the recorded signal can be reconstructed in the multiplier 56 and multiplier ~6.

In the preferred embodiments described herein, an RMS
detector has been employed. However, other measures of the signal amplitude over a period of time, including an average value and an approximation o the RMS
value, can he employed. A~ used herein, RMS value include~ suitable approximations thareof. Further, while a divider has been employed in the preferred embodiments for obtaining the compressed signal, a logarithmic measure of khe detected RMS or averaged value can be employed for obtaining the compressed value.

The invention has broad application~ including, for example, hearing aids and audio storage media (a~
described herein), sampled digital ~torage ~ystem, broadcast systems, public addrass sy~tams, And general 5~

voice communication including telephone3. The invention is especially useful for communication in a noisy environment and through a noisy communication link such as in ield applications.

Thus, while the inventlon ha~ been described with reference to specific embodiments, thq description i~
illustrative of the invention and i~ no~ to be cDnstrued as limiting the inveniion. Variou~ modifications and applications may occur to those ~killed in the art without departing from the ~rue spirît and scope of the invention as defined by the appended claim~.

Claims

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:

1. Apparatus for enchancing speech comprising microphone means for receiving audio signals and generating electrical signals in response thereto, high frequency emphasis means connected to said microphone means for amplifying said electrical signals, amplitude detection means connected to said high frequency emphasis means for receiving amplified signals and obtaining a root mean square (RMS) amplitude of said amplified signals over a selected period of time, delay means connected to said high-frequency emphasis means for receiving and delaying spectrally shaped electrical signals for a portion of said selected period of time, and signal compression means connected to said delay means and to said ampli-tude detection means and compressing the delayed amplified electr-ical signals by a ratio of at least 10:1 in accordance with said root-mean-square average value whereby a constant amplitude output signal without significant signal distortion is obtained.

2. Apparatus as defined by Claim 1 wherein said high fre-quency emphasis means amplifies said electrical signals by six dB per octave at least to 2 kiloHertz.

3. Apparatus as defined by Claim 1 wherein said delay means comprises a bucket brigade device.

4. Apparatus as defined by Claim 1 wherein said compression means comprises a divider for dividing said delayed non-linearly amplified signals by said root-mean-square value.

5. Apparatus as defined by Claim 1 and further including pre-amplification means interconnecting said microphone means to said high-frequency emphasis means.

6. In speech enhancement apparatus, a method of compressing the amplitude of audio signals without speech distortion compris-ing the steps of obtaining a measure of amplitude of said audio signals over a selected time period, delaying said signals by said selected time period, and compressing said delayed signals corresponding to said selected period of time by a ratio of at least 10:1 in accordance with said measure of amplitude.

7. The method as defined by Claim 6 wherein said measure of amplitude is a root-mean-square value.

8. The method as defined by Claim 7 wherein said step of compressing comprises dividing said signals for a selected period of time by the root-mean-square amplitude value thereof.

9. The method as defined by Claim 6 and further including the step of emphasizing high frequency components of said signals, prior to obtaining said root-mean-square value.

10. Apparatus for processing audio signals comprising a plur-ality of band pass filters for receiving and filtering audio signals into a plurality of limited frequency bands, a plurality of amplitude compressor means each connected with a band pass filter, each of said amplitude compressor means including means for obtaining a measure of amplitude of audio signals during a selected period of time, means for delaying audio signals for said selected period of time, and means for compressing the delayed audio signals during said period of time based on said measure of amplitude, and summing means connected to said plurality of amplitude compressor means for receiving and summing compression of audio signals.

11. Apparatus as defined by Claim 10 wherein said means for obtaining a measure of amplitude comprises a root-mean-square (RMS) detector.