WO1999017278A1 - Method and apparatus for improving speech intelligibility - Google Patents

Method and apparatus for improving speech intelligibility Download PDF

Info

Publication number
WO1999017278A1
WO1999017278A1 PCT/GB1998/002890 GB9802890W WO9917278A1 WO 1999017278 A1 WO1999017278 A1 WO 1999017278A1 GB 9802890 W GB9802890 W GB 9802890W WO 9917278 A1 WO9917278 A1 WO 9917278A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
intelligibility
word
threshold
speech
Prior art date
Application number
PCT/GB1998/002890
Other languages
French (fr)
Inventor
Peter William Barnett
Original Assignee
Peter William Barnett
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peter William Barnett filed Critical Peter William Barnett
Priority to GB0006872A priority Critical patent/GB2344982A/en
Priority to AU91772/98A priority patent/AU9177298A/en
Priority to EP98944106A priority patent/EP1018108A1/en
Publication of WO1999017278A1 publication Critical patent/WO1999017278A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques

Definitions

  • the present invention relates to a method and apparatus for improving the intelligibility of the spoken word.
  • the intelligibility of the spoken word is determined by a number of different factors. It has been known from an examination of the Temples of Ancient Egypt and Amphitheatres of Greece and Rome that structural changes to buildings and spaces can improve the intelligibility of the spoken word but the impetus towards a more scientific approach to the problem of improving intelligibility came with the advent of the telephone, which resulted in a large volume of scientific work but mainly with the aim of solving problems of distortion, bandwidth and transducer design in telephone systems.
  • Knudsen published a work entitled "On Hearing in Auditoriums" where he postulated that what he termed percentage articulation was a function of reverberation, noise, room shape and echo.
  • STI speech transmission index
  • the factors affecting speech intelligibility can be grouped into four major areas namely those associated with the talker, the listener, the space and the transmission system.
  • the list of factors can be greatly reduced by assuming a perfect talker, a normal listener, a space free from anomalies and a perfect transmission system. These assumptions reduce the list of influencing factors to those dependent on direct sound pressure level, reverberant sound pressure level, reverberation time and noise. If we understand that the direct sound pressure level is the signal (speech) and wanted component, then this list further reduces to two ratios as follows: direct-to-reverberant ratio i.e. the ratio of wanted to unwanted sound, and the signal-to-noise ratio i.e.
  • speech intelligibility is a function of the product of all three factors mentioned above. As previously mentioned, the detrimental effects and the limitations imposed by one dependent variable may not be fully compensated by another. While the intelligibility of speech can be improved by making structural alterations to the space where the listener is present e.g. by reducing the reverberation time within the space by the introduction of acoustic absorption, in certain instances it is not economic to make such material changes and there is a need for a simpler and more economic way of improving the intelligibility of the spoken word in public areas. It is now commonplace for spoken words to be amplified by electrical apparatus we have looked at ways in which the problem can be solved electronically.
  • the present invention provides apparatus for broadcasting speech into an acoustic space through one or more loudspeakers which comprises means for compressing the higher amplitude portions of the spoken words and expanding the thus compressed signal whereby the emphasise the lower amplitude parts of the spoken words.
  • the compression is preferably in the range 1: 1 to 10: 1 and usually 2: 1 to 4: 1. In most cases a compression ratio of 3 : 1 will suffice.
  • the present invention also provides a method for broadcasting public address announcements into closed acoustic spaces which comprises the use of the compression/expansion apparatus.
  • the threshold at which compression commences is preferably selected depending on the speech characteristics of the person enunciating the words but as an alternative, the speech signals can be normalized prior to transmission to the compressor/expander apparatus in which case the threshold can be preset to a specific value depending on the output from the normalization circuitry.
  • Fig. 1 shows a graph of STI versus word score
  • Fig. 2 shows a graph of input against output for a compressor according to the present invention
  • Fig. 3 shows a block diagram of a arrangement according to the present invention.
  • Fig. 4 shows the amplitude waveforms of various words before and after processing by apparatus according to the present invention.
  • Speech has a dynamic range of some 10-20 dB or, in pressure terms around 100: 1, i.e. the quietest parts of our speech (at a normal level) are around 100th of the loudest.
  • vowel sounds 100 Hz - 1 Kz
  • These large vowel sounds tend to mask the weaker consonants which are vital and play a far more important role in intelligibility.
  • the vowel sounds also enhance the reverberant sound thereby reducing the direct-to reverberant ratio.
  • Fig. 4a shows a waveform diagram of the word "drop” in its original, uncompressed form. After compression, as shown in Fig. 4b, it will be seen that the difference between the amplitudes of the loudest and quietest part of the word have been reduced. Thus, on expansion or amplification, the very quiet consonant "p” has been enhanced with respect to the vowel sound “o” and therefore the intelligibility of the word "drop” has been improved.
  • Fig. 3 a diagram of a suitable apparatus is shown where the user speaks into a microphone 10. The output from the microphone is then passed through a normaliser which will process the input signal and provide a normalised sound output. The output from the normaliser 11 is fed to a compression and expansion circuit, sometimes known as a compander, 12 which applies amplitude compression to the input signal if the amplitude exceeds a pre-set threshold. The compander 12 is arranged to start compression at a threshold which is set relative to the magnitude of the speech in the signal chain.
  • the threshold should be set at a value less than halfway up the dynamic range so that the majority of the speech signal is subject to compression. It has been found that a threshold at a value 5 - 6 dB above the level of the quietest part of the speech is adequate. Another way of expressing this is to look at the average value of the peak amplitudes of the speech signal, in which case the threshold should be in the range 28 dB to 22 dB below the average of the peak levels of the speech. Typically, the threshold is set at 25 dB below the average of the peaks.
  • the amount of compression is usually in the range 2: 1 - 10: 1 but might be as high as 20: 1.
  • the output from the compander 12 is then fed to an electro-acoustic transducer in the form of a loud speaker system 13 for broadcast to the listener who is in an acoustic space.
  • the above apparatus can be used with good effect in public address systems for all public spaces including but not limited to stations, theatres and cinemas. It also has application in other areas where ambient noise levels are high and speech intelligibility is important such as in aircraft for in-flight announcement and also for induction loops and hearing aids for persons with impaired hearing since it has been found that those who suffer from impaired hearing due to age can have their understanding of spoken words improved if the aforementioned technique is utilized.

Abstract

A method and apparatus for improving the intelligibility of the spoken word in an acoustic space comprise generating an electrical signal indicative of a word or words, inputting the signal to a signal processor including a signal compressor, comparing the amplitude of the input signal with a threshold level and compressing any part of the signal in excess of the threshold, expanding both the compressed and uncompressed signal and outputting the expanded signal as an audible signal.

Description

Method and Apparatus for Improving Speech Intelligibility
The present invention relates to a method and apparatus for improving the intelligibility of the spoken word. The intelligibility of the spoken word is determined by a number of different factors. It has been known from an examination of the Temples of Ancient Egypt and Amphitheatres of Greece and Rome that structural changes to buildings and spaces can improve the intelligibility of the spoken word but the impetus towards a more scientific approach to the problem of improving intelligibility came with the advent of the telephone, which resulted in a large volume of scientific work but mainly with the aim of solving problems of distortion, bandwidth and transducer design in telephone systems. However, in 1929 Knudsen published a work entitled "On Hearing in Auditoriums" where he postulated that what he termed percentage articulation was a function of reverberation, noise, room shape and echo.
By today's standards, the postulation by Knudsen is incomplete but two important factors are present even at this early date. Firstly, if we remove the obvious limitations to speech intelligibility i.e. that the speech is not loud enough and there is too much noise present, it is clear that articulation is limited by the acoustics and geometry of the space. Secondly, intelligibility or articulation is presented as a product of a number of reduction factors. Each factor lies in the range 0-1 and hence judicious application of one parameter or influence cannot undo the shortfall imposed by another.
In 1971, Peutz et al hypothesized that speech intelligibility could be assessed on the basis of the number of lost consonants and a formula was suggested which could give a measure of the likelihood of the percentage of lost consonants based on the distance of the listener from the source, the reverberation time of the space and the volume of the space.
Work in the area of speech intelligibility continued and a speech transmission index (STI) by Houtgast and Steeneken was developed and introduced in 1980. They determined that there was a direct and robust correlation between speech intelligibility as measured by word scores and a modulation transfer function between a source and receive position. The correlation between STI and word scores as shown by the experimental data is shown in Fig. 1 The speech transmission index has been widely adopted and it is now accepted that an STI score of 0.5 is the minimum required for reasonable intelligibility in most circumstances. Recently, an STI of 0.5 has been specified in relation to locations such as underground and rail transportation, stadia, shopping centres, cinemas and all public places. It can be shown that the factors affecting speech intelligibility can be grouped into four major areas namely those associated with the talker, the listener, the space and the transmission system. The list of factors can be greatly reduced by assuming a perfect talker, a normal listener, a space free from anomalies and a perfect transmission system. These assumptions reduce the list of influencing factors to those dependent on direct sound pressure level, reverberant sound pressure level, reverberation time and noise. If we understand that the direct sound pressure level is the signal (speech) and wanted component, then this list further reduces to two ratios as follows: direct-to-reverberant ratio i.e. the ratio of wanted to unwanted sound, and the signal-to-noise ratio i.e. the ratio of wanted signal to the noise, together with the reverberation time of the space. In other words, speech intelligibility is a function of the product of all three factors mentioned above. As previously mentioned, the detrimental effects and the limitations imposed by one dependent variable may not be fully compensated by another. While the intelligibility of speech can be improved by making structural alterations to the space where the listener is present e.g. by reducing the reverberation time within the space by the introduction of acoustic absorption, in certain instances it is not economic to make such material changes and there is a need for a simpler and more economic way of improving the intelligibility of the spoken word in public areas. It is now commonplace for spoken words to be amplified by electrical apparatus we have looked at ways in which the problem can be solved electronically.
The present invention provides apparatus for broadcasting speech into an acoustic space through one or more loudspeakers which comprises means for compressing the higher amplitude portions of the spoken words and expanding the thus compressed signal whereby the emphasise the lower amplitude parts of the spoken words. The compression is preferably in the range 1: 1 to 10: 1 and usually 2: 1 to 4: 1. In most cases a compression ratio of 3 : 1 will suffice. The present invention also provides a method for broadcasting public address announcements into closed acoustic spaces which comprises the use of the compression/expansion apparatus. The threshold at which compression commences is preferably selected depending on the speech characteristics of the person enunciating the words but as an alternative, the speech signals can be normalized prior to transmission to the compressor/expander apparatus in which case the threshold can be preset to a specific value depending on the output from the normalization circuitry.
Features and advantages of the present invention will become apparent from the following description of an embodiment thereof given by way of example with reference to the accompanying drawings in which:
Fig. 1 shows a graph of STI versus word score;
Fig. 2 shows a graph of input against output for a compressor according to the present invention;
Fig. 3 shows a block diagram of a arrangement according to the present invention; and
Fig. 4 shows the amplitude waveforms of various words before and after processing by apparatus according to the present invention.
Speech has a dynamic range of some 10-20 dB or, in pressure terms around 100: 1, i.e. the quietest parts of our speech (at a normal level) are around 100th of the loudest. In fact, vowel sounds ( 100 Hz - 1 Kz) which are voiced i.e. formed in the voice box and larynx, are much more powerful than the consonant or unvoiced components which are formed in the mouth and with the teeth and expellation of air. These large vowel sounds tend to mask the weaker consonants which are vital and play a far more important role in intelligibility. The vowel sounds also enhance the reverberant sound thereby reducing the direct-to reverberant ratio.
From consideration of these facts relating to the structure of speech, it is apparent that simply increasing the gain of a public address amplifier is not sufficient to improve intelligibility. In fact, it can have the opposite effect in very reverberant spaces. As a consequence, we have investigated amplitude compression which reduces the range between peaks and troughs. The benefit of amplitude compression, unlike gain, is that it is dynamic and is applied above a threshold. It is proposed that the threshold should be set at a level where the vowel sounds will be compressed but the weak consonant sounds will not be compressed. This has the advantage that signal processing is applied differentially to the wanted signal and the noise (or reverberation). Figure 2 shows a typical relationship between input and output levels. It will be seen that up to a threshold the output level is linear. At threshold compression is applied and the drawing shows different compression ratios as compared with the uncompressed signal ( 1 : 1 ). It is thus clear that the effect of applying amplitude compression to speech is to reduce the ratio of largest to smaller sounds.
If one now looks at Fig. 4, Fig. 4a shows a waveform diagram of the word "drop" in its original, uncompressed form. After compression, as shown in Fig. 4b, it will be seen that the difference between the amplitudes of the loudest and quietest part of the word have been reduced. Thus, on expansion or amplification, the very quiet consonant "p" has been enhanced with respect to the vowel sound "o" and therefore the intelligibility of the word "drop" has been improved.
As far as the word "turf' is concerned which is shown in Fig. 4c, comparison with Fig. 4d shows that there is very little compression applied hence on expansion the whole word has been amplified.
If one looks at the word "nest" in Figure 4e it will be seen that the vowel sound "e" has been compressed because on expansion the parts of the signal representing the "s" and the "t" have been amplified with respect to the vowel sound.
It will be appreciated that the effects of the compression will be altered depending on the threshold at which compression occurs as well as the compression ratio used. If one looks at Fig. 3, a diagram of a suitable apparatus is shown where the user speaks into a microphone 10. The output from the microphone is then passed through a normaliser which will process the input signal and provide a normalised sound output. The output from the normaliser 11 is fed to a compression and expansion circuit, sometimes known as a compander, 12 which applies amplitude compression to the input signal if the amplitude exceeds a pre-set threshold. The compander 12 is arranged to start compression at a threshold which is set relative to the magnitude of the speech in the signal chain. It has been determined that the threshold should be set at a value less than halfway up the dynamic range so that the majority of the speech signal is subject to compression. It has been found that a threshold at a value 5 - 6 dB above the level of the quietest part of the speech is adequate. Another way of expressing this is to look at the average value of the peak amplitudes of the speech signal, in which case the threshold should be in the range 28 dB to 22 dB below the average of the peak levels of the speech. Typically, the threshold is set at 25 dB below the average of the peaks. The amount of compression is usually in the range 2: 1 - 10: 1 but might be as high as 20: 1. The output from the compander 12 is then fed to an electro-acoustic transducer in the form of a loud speaker system 13 for broadcast to the listener who is in an acoustic space.
The above apparatus can be used with good effect in public address systems for all public spaces including but not limited to stations, theatres and cinemas. It also has application in other areas where ambient noise levels are high and speech intelligibility is important such as in aircraft for in-flight announcement and also for induction loops and hearing aids for persons with impaired hearing since it has been found that those who suffer from impaired hearing due to age can have their understanding of spoken words improved if the aforementioned technique is utilized.
Various tests have been carried out utilizing the equipment and it has been shown that the RASTI score of a space can be improved by approximately 0.1 or 10%.

Claims

CLAIMS:
1. A method of improving the intelligibility of the spoken word in an acoustic space comprising generating an electrical signal indicative of a word or words, inputting the signal to a signal processor including a signal compressor, comparing the amplitude of the input signal with a threshold level and compressing any part of the signal in excess of the threshold, expanding both the compressed and uncompressed signal and outputting the expanded signal as an audible signal.
2. A method according to claim 1, wherein the threshold level is at 5 or 6 dB above normal signal levels.
3. A method according to claim 1 or 2, wherein the step of generating an electrical signal includes generating a normalised electrical signal.
4. A method according to claim 1, 2 or 3 ,wherein the signal compressor compresses the generated signal by a ratio of between 1 : 1 and 10: 1.
5. Apparatus for improving the intelligibility of the spoken word and comprising means for generating an electrical signal indicative of a word or words, a signal processor including means for comparing the amplitude of the generated electrical signal with a threshold level, means for compressing any part of the signal in excess of the threshold level, and means for expanding both the compressed and uncompressed signal, and an output device for generating an audible signal.
6. Apparatus according to claim 5, wherein the means for compressing the signal is arranged to compress the signal by a ratio between 1 : 1 and 10: 1.
7. Apparatus according to claim 5 or 6, wherein the comparing means is arranged to compare the signal with a level which is 5 or 6 dB above the normal level.
8. Apparatus according to claim 5, 6 or 7, and comprising means for normalising the electrical signal prior to signal processing means.
PCT/GB1998/002890 1997-09-26 1998-09-24 Method and apparatus for improving speech intelligibility WO1999017278A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
GB0006872A GB2344982A (en) 1997-09-26 1998-09-24 Method and apparatus for improving speech intelligibility
AU91772/98A AU9177298A (en) 1997-09-26 1998-09-24 Method and apparatus for improving speech intelligibility
EP98944106A EP1018108A1 (en) 1997-09-26 1998-09-24 Method and apparatus for improving speech intelligibility

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB9720544.7 1997-09-26
GB9720544A GB9720544D0 (en) 1997-09-26 1997-09-26 Method and apparatus for inputting speech intelligibility

Publications (1)

Publication Number Publication Date
WO1999017278A1 true WO1999017278A1 (en) 1999-04-08

Family

ID=10819714

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1998/002890 WO1999017278A1 (en) 1997-09-26 1998-09-24 Method and apparatus for improving speech intelligibility

Country Status (4)

Country Link
EP (1) EP1018108A1 (en)
AU (1) AU9177298A (en)
GB (1) GB9720544D0 (en)
WO (1) WO1999017278A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0076687A1 (en) * 1981-10-05 1983-04-13 Signatron, Inc. Speech intelligibility enhancement system and method
EP0279451A2 (en) * 1987-02-20 1988-08-24 Fujitsu Limited Speech coding transmission equipment
US5506899A (en) * 1993-08-20 1996-04-09 Sony Corporation Voice suppressor
US5737719A (en) * 1995-12-19 1998-04-07 U S West, Inc. Method and apparatus for enhancement of telephonic speech signals

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0076687A1 (en) * 1981-10-05 1983-04-13 Signatron, Inc. Speech intelligibility enhancement system and method
EP0279451A2 (en) * 1987-02-20 1988-08-24 Fujitsu Limited Speech coding transmission equipment
US5506899A (en) * 1993-08-20 1996-04-09 Sony Corporation Voice suppressor
US5737719A (en) * 1995-12-19 1998-04-07 U S West, Inc. Method and apparatus for enhancement of telephonic speech signals

Also Published As

Publication number Publication date
EP1018108A1 (en) 2000-07-12
AU9177298A (en) 1999-04-23
GB9720544D0 (en) 1997-11-26

Similar Documents

Publication Publication Date Title
Bronkhorst The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions
EP0796489B1 (en) Method for transforming a speech signal using a pitch manipulator
US5737719A (en) Method and apparatus for enhancement of telephonic speech signals
CN109065067A (en) A kind of conference terminal voice de-noising method based on neural network model
Pollack et al. Masking of speech by noise at high sound levels
US20030216907A1 (en) Enhancing the aural perception of speech
JP2017538146A (en) Systems, methods, and devices for intelligent speech recognition and processing
Kennedy et al. Consonant–vowel intensity ratios for maximizing consonant recognition by hearing-impaired listeners
Nejime et al. Evaluation of the effect of speech-rate slowing on speech intelligibility in noise using a simulation of cochlear hearing loss
US20060126859A1 (en) Sound system improving speech intelligibility
Nábělek Performance of hearing‐impaired listeners under various types of amplitude compression
US20060239472A1 (en) Sound quality adjusting apparatus and sound quality adjusting method
JP4876245B2 (en) Consonant processing device, voice information transmission device, and consonant processing method
JP3367592B2 (en) Automatic gain adjustment device
JP2000152394A (en) Hearing aid for moderately hard of hearing, transmission system having provision for the moderately hard of hearing, recording and reproducing device for the moderately hard of hearing and reproducing device having provision for the moderately hard of hearing
WO2009001035A2 (en) Transmission of audio information
Arai et al. Effective speech processing for various impaired listeners
KR20090082605A (en) Creation Method of channel of digital hearing-aid and Multi-channel digital hearing-aid
JPH09311696A (en) Automatic gain control device
US7123732B2 (en) Process to adapt the signal amplification in a hearing device as well as a hearing device
Kusumoto et al. Modulation enhancement of speech as a preprocessing for reverberant chambers with the hearing-impaired
EP1018108A1 (en) Method and apparatus for improving speech intelligibility
Yanick et al. Signal processing to improve intelligibility in the presence of noice for persons with a ski-slope hearing impairment
Vaughan et al. Time-expanded speech and speech recognition in older adults.
RU2589298C1 (en) Method of increasing legible and informative audio signals in the noise situation

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref country code: GB

Ref document number: 200006872

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1998944106

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: KR

WWE Wipo information: entry into national phase

Ref document number: 09509431

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 1998944106

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA

WWW Wipo information: withdrawn in national office

Ref document number: 1998944106

Country of ref document: EP