WO1999053721A1

WO1999053721A1 - Improved hearing enhancement system and method

Info

Publication number: WO1999053721A1
Application number: PCT/US1998/010694
Authority: WO
Inventors: Ronald D. Blum; William E. Kokonaski; William R. Saunders; Michael A. Vaudrey
Original assignee: Hearing Enhancement Company, L.L.C.
Priority date: 1998-04-14
Filing date: 1998-05-29
Publication date: 1999-10-21
Also published as: AU7798698A

Abstract

The present invention covers the separation of relevant voice information or a preferred audio signal component from background information of the total audio signal in an audio program in such a way that the end user can adjust the volume of the relevant voice information or preferred signal and the remaining audio separately to his hearing needs or taste. The present invention includes a variety of means (1, 2, 3, 5, 6) for maintaining the orthogonality of the important voice information or preferred signal from the background or remaining audio information.

Description

IMPROVED HEARING ENHANCEMENT SYSTEM AND METHOD

The present invention covers the separation of relevant voice information or a preferred audio signal component from background information or the total audio signal in an audio program in such a way that the end user can adjust the Volume of the relevant voice information or preferred signal and the [backgroi djτemaimng audio separately to his hearing needs or taste. In many applications, the preferred audio signal will be a pure voice or mostly pure voice audio signal while the remaining audio signal will consist of all other components of the total audio signal other than the preferred voice signal For other applications, the preferred signal may be a specific instrument in a musical recording while the remaining audio will be all other components of the musical recording. The present invention includes a variety of means for [separating] maintaining the orthogonality of the important voice information or preferred signal from the background or remaining audio information such as separate tracks on tape recordings, time encoding, frequency encoding or multiple channels in transmitted audio signals. The applications for this invention would include video recording, television and radio broadcasting, as well as tape, CD music recording. DVD, and specific play back means for adjusting the separate volume levels by the end user. While the present invention is intended for the hearing impaired as legally defined and will revolutionize the understanding of audio signals by them it can be used by anyone who wishes to more finely tune the audio of a system to his or her particular pleasure. Hearing loss is specific to each individual and one has to address those specific frequencies that are missing as the frequencies have different transfer characteristics. In addition, hearing preferences vary considerably with age and even within the same age group. The invention also contemplates the use of a voice recognition system to automatically separate the mostl^y voice components from the background components

FIELD OF INVENTION

The present invention pertains to the application of separate audio signals to adjust the background and critical voice component in a program signal to optimize the mixture of these two signals to the listeners personnel taste. Specifically it pertains to the application of separate audio signals to adjust a preferred audio signal component and the remaining audio signal components of a total audio program signal to optimize the mixture of these two signals for the purpose of improving the understanding or intelligibility of the preferred signal or for the listening pleasure of the end-user The invention can be utilized with television, radio, tape recorded audio programming, CDs, DVDs or any kind of video equipment having an audio component or just pure audio equipment Tests were run on two groups of individuals which showed that hearing preferences are not just age dependant but vary among younger segments of the society, a finding not expected as popular wisdom states that hearing abilities diminish over the years One does not expect to find a wide variance in younger people the same age which the second test disclosed herein shows

BACKGROUND OF INVENTION

In U S Patent No 4, 024, 344, Dolby discussed a method of creating a "center channel" for dialog in cinema sound This technique involved correlation of the left and right stereophonic channels and adjusting the gain on either combined and/or the separate left or right channel depending on the degree of correlation between the left and right channel The assumption of Dolby is that the strong correlation between the left and right channel would indicate the presence of dialog The center channel, which is_a band passed filter summation of the left and right channel, would be amplified or attenuated depending on the degree of correlation between the left and right channel The problem with this approach is that it does not discriminate between meaningful dialog and simple correlated sound, nor does it address unwanted voice information within the voice band Nor does it provide a means for maintaining the separation of the preferred signal ( in this application, construed to be a blend of left and right channel ) throughout the processing sequence of the audio programming production A further limitation is that the individual listener cannot adjust the degree to which the center channel is amplified or attenuated

The separation of voice from background audio in television signals is discussed by Shiraki in U S Patent No 5, 197, 100 The technique employed by Shiraki involved the use of band pass filtering in combination with summing and subtracting circuits to form a "voice channel" that would be differentiated from the rest of the audio programming The limitation of this approach is that the band pass filter only discriminates frequencies within a predetermined range, in that case 200Hz to 5000Hz It can not discriminate between voice and background audio that may happen to fall within the band pass frequency Furthermore, the application of band pass filtering cannot distinguish between relevant and irrelevant speech components within an audio signal Means of reducing background noise in audio frames have been discussed by Solve, et al, in US Patent No 5,485,522, which shows a speech detector and a noise estimator used to adaptively adjust attenuation to each frame of an audio signal This and other forms of Adaptive Noise Filtering can not distinguish between voice and other non- stationary audio in the voice band, such as music or irrelevant voice information

U S Patent No 5,434,922, to Miller et al, discloses a method and system for sound optimization which measures both the music and noise in a vehicle Again, Miller et al uses analog-to-digital conversions and adaptive digital filtering [with algorithms] to compensate for the ambient noise background by enhancing the sound signal automatically which is different than the technique disclosed herein as it does not allow for separately adjusting the signals and then blending them together

In general, prior art techniques employing band pass filtering or selective equalization will not remove voice band background or noise within the voice band range from the speech components of the audio program In the past, there have been numerous advancements that improve the quality of audio broadcasts ( stereo, Dolby processing, etc ) The previously cited inventions of Dolby, Shiraki and Miller et al have all attempted to modify some content of the audio signal through various signal processing hardware or algorithms but those methods do not exploit the individual needs or preferences of different listeners Some of the more undamental advancements in audio technology, like stereo recordings, also utilized two different signals and re-combined the two signals at some stage before the listener heard the total audio content However, there are innovative differences in those techniques and the claims of this invention In the instant invention the preferred signal is substantially different in content than either the left stereo or the right stereo and is provided to the listener for individual adjustment over the remaining part of the audio signal in order to provide the listener with an optimal listening solution based on needs associated with hearing loss or simply listening pleasure

To illustrate by the example the innovation of this application one should consider the adjustment of left stereo and right stereo by an end user In most stereo recordings, the left signal and right signal are constructed by mixing multiple types of audio signals Consequently, the balance adjustment of left and right stereo signals does not provide discrimination of specific audio information by the end-user In the case of [For] stereo broadcasts where there is a voice announcer and some background tracks ( for example, a musical background or a laugh track ) both left and right stereo signals will contain identical audio information, although the information may be distributed unevenly to the left and right signal Therefore, a balance adjustment does not provide the end-user with the ability to improve the discrimination of a specific part of the audio signal (for example, voice) over the remaining part of the audio signal The present invention provides new opportunities for listeners to adjust the volume of a specific component of the audio signal relative to the remaining audio signal heretofore not possible with any type of audio processing methods or equipment

Finally, in the case of studio recordings, large differences exist between the present invention and what the standard methodology is Currently, vocals are usually recorded separately and are later mixed with the instrumentals and placed on a single track The end user can only adjust volume, tone and balance (as in the case of stereo), but not the volume of the voice component or the background

SUMMARY OF THE INVENTION

This invention provides a means for the end listener to adjust the meaningful voice audio and the background audio independently, to compensate for his hearing loss The end listener in this case is the person watching his television or video tape program, or listening to his music in his home or car The invention provides a means for the end- listener to any audio program or broadcast or audio content of a video program or broadcast, to personally adjust, using one or more adjustment means, the mix between a meaningful preferred audio signal and the remaining audio signal, to satisfy his aural needs and/or preferences The end-listener in this case is the person watching his television, DVD or video tape program or listening to other audio programs/broadcasts or listening to music in his home or car One application of this invention is as an assistive device to the hearing impaired In this case the preferred audio signal is the relative voice audio content Many hearing impaired listeners are characterized by their inability to understand the voice content of the audio signal, particularly in the presence of background audio signals For the hearing impaired all background audio signals will be contained in the remaining audio signal There are several interfering audio signals that make understanding relevant program audio difficult for individuals with hearing deficiencies One interfering signal is the general user audio background, that includes music, traffic noise, wind, running water, etc These interfering sounds may not reside in the voice frequency band (roughly 200 to 5000 Hz), but if played at sufficiently high volumes may mask the relative voice audio Another interfering signal is non-voice information_that does occupy the voice frequency band Examples of this include string and horn musical instruments, and broad band random noise This type of interfering signal can not be band pass filtered, because it resides in the same frequency range as the relevant voice information Finally, the most difficult interfering signal is unwanted or non relevant human speech in the background of the audio track Speech filtering can not be used in this case, since the speech filtering algorithms can not distinguish between relevant and irrelevant voice information The voice channel separated out contains mostly voice as a small amount of ambient noise may remain unfiltered

Therefore, the present invention covers a variety of methods[ for adjusting] that provide hearing impaired listeners with independent adjustment of the volume of relative voice information in relation to the general background information in an audio program The simplest approach is to simply record the voice information on a separate track or channel However, this requires a modification to the playback equipment as well as a new method of producing the audio program In the case of broadcasting, one approach is to modulate the relevant voice information with a slightly higher or slightly lower frequency on broadcast but within the general channel range for a given broadcast frequency On the receive side, the signal is down converted with two local oscillators, one that down converts the general background audio, and the other that down converts the relevant voice information These signals are sent to separate variable gain preamplifiers that the listener adjusts with a dial or knob on the receiver The following detailed discussion of the invention covers the method of keeping relative voice and background audio separate for either recorded programming or for broadcast, using some generic means of signal encoding Receiving and playback systems are also discussed in terms of a generic method of signal decoding Various methods of encoding and decoding are well known in television and telephone art and are available and known to those of ordinary skill in the art This invention is not limited to the delivery mechanism discussed here but is meant to include any technique known to those of ordinary skill in the art of delivering a signal from the acoustic source to the end-listener An example of where the present invention would be useful includes sports broadcasting The commentators' voices are adjusted to the listeners needs relative to the crowd noise In movies or videos, important speech information, such as a conversation between characters in a scene, is independently adjusted, while background audio including irrelevant speech is reduced to the listeners preferences or needs The invention is applicable as well to music recordings where the listener adjusts the vocals and the background instruments separately This invention can be used to enhance the end-listener's aural pleasure, as well as being an assistive listening device

Accordingly, it is an object of this invention to provide a system which separates the mostly voice components from the background components of an audio signal so that the user can individually adjust each component to suit his or her own hearing needs and requirements

It is a further object of this invention to provide a system that uses voice recognition to automatically separate mostly voice components from background components in an audio signal and to allow for separate adjustment by a listener

These and other objects will become clearer when reference is had to the accompanying drawings in which

Fig 1 shows a circuit diagram of the basic system comprising this invention, and

Fig 2 shows a diagram of the circuit for recording and playing back recorded signals, and

Fig 3 shows a circuit diagram of the use of a voice recognition means to automatically separate the mostly voice and background signals, and

Fig 4 shows the circuit diagram showing the digital version of the circuit shown in Fig 3

DETAILED DESCRIPTION OF THE INVENTION The end-user's independent adjustment of the preferred audio signal and the remaining audio signal will be the apparent manifestation of this invention However, the invention includes and relies upon techniques that will require a change in various standard methods used to create the audio programming portions found today in television, radio. CDs. DVDs and other audio presentations To illustrate the details of the invention- consider the application where the preferred audio signal is the relevant voice information as will be the case when this invention is used as an assistive listening device for the hearing impaired Figure 1 illustrates a general approach to separating relevant voice information from general background audio in a recorded or broadcast program There will first need to be a determination made by the programming director as to the definition of relevant voice An actor, group of actors, or commentator(s) must be identified as the relevant speaker(s) Once the relevant speakers are identified, their voices will be picked up by the voice microphone(s) 1 The voice microphone will need to be either a close talking microphone (in the case of commentators) or by highly directional shot gun microphones used in sound recording In addition to being highly directional these microphones will need to be voice_band limited preferably from 200 to 5000 Hz The combination of directionality and band pass filtering minimize the background noise acoustically coupled to the relevant voice information upon recording In the case of certain types of programming, the need for preventing acoustic coupling can be avoided by recording relevant voice or dialog offline and dubbing it in where appropriate with the video portion of the program The background microphones 2 should be fairly broadband to provide the full audio quality of background information such as music A camera 3 will be used to provide the video portion of the program The audio signals (voice and relevant voice) will be encoded with the video signal at the encoder 4 In general the audio signal is usually separated from the video signal by simply modulating it with a different carrier frequency Since most broadcasts are now in stereo, one way to encode the relevant voice information with the background is to multiplex the relevant voice information on the separate stereo channels in much the same way left front and right front channels are added to two channel stereo to produce a quadriphonic disc recording Although this would create the need for additional broadcast bandwidth, for recorded media this would not present a problem, as long as the audio circuitry in the video disc or tape player is designed to demodulate the relevant voice information Once the signals are encoded, by whatever means deemed appropriate, the encoded signals are sent out for broadcast 5, or recorded on to tape or disc 6 In case of recorded audio video information, the background and voice information could be simply placed on separate recording tracks Figure 2 illustrates a possible method of receiving and playing back the encoded program signals A receiver system 7 would demodulate the main carrier frequency from the encoded audio/video signals, in the case of broadcast information In the case of recorded media, the heads from a VCR or the laser reader from a CD player 8 would produce the encoded audio/video signals In either case these signals would be sent to a decoding system 9 The decoder would separate the signals into video, voice audio, and background audio using standard decoding techniques such as envelope detection in combination with frequency or time division demodulation The background audio signal is sent to a separate variable gain amplifier 10, that the listener can adjust to his or her preference The voice signal is sent to a variable gain amplifier 11, that can be adjusted by the listener to his or her particular needs The two adjusted signals, are summed by a unity gain summing amplifier 12 to produce the final audio output In this manner the listener can adjust relevant voice to background levels to optimize his listening abilities

Referring to Figure 3 there is shown a diagram of the further refinement of this invention which is a system 100 utilizing a voice recognition chip 101 which separates the information into a mostly voice channel and a background noise channel after recognizing the speech components of the incoming signal The mostly voice-like components are separated from the background components and grouped into separate data streams The signals are then converted back to analog signals to create a mostly voice channel and a background noise channel Fig 4 shows a circuit diagram of the digital version of the device 102 The audio signal is received, separated into the digital signals as described and then back to an analog signal

A further embodiment of the instant invention utilizes a chip which is programmable and by initialization is tailored to each users malady It is understood that the system and method are adaptable to both analog and digital signals [ Having described the invention in detail it will be obvious to those of ordinary skill in the art that many changes and modifications can be made without departing from the scope of the appended claims.] For the present invention, if the application is to increase the listening pleasure of the end-user, it is possible to change the voice microphone 1 to read the preferred signal microphone and the background microphone 2 to read the remaining audio signal microphone Then, the rest of the means for providing the end-listener independent adjustment of the preferred signal and remaining audio will follow exactly as the preceding discussion To illustrate that people have different hearing needs and preferences an older group of men was selected and asked to do an adjustment that a group of students later did between a fixed background crowd noise and the voice of an announcer where only the latter could be varied and the former was set at 6 00 The results with the older group were as follows

Member Setting

I 750 2 450 3 400 4 750 5 300 6 700 7 650

775

9 550

10 700

1 1 500

To further illustrate the fact that people of all ages have different hearing needs and preferences, a group of 21 college students were selected to listen to a mixture of voice and background and to select, by making one adjustment to the voice level the ratio of the voice to the background The background noise, in this case crowd noise at a football game, was fixed at a setting of six (6 00) and the students were allowed to adjust the volume of the announcers play by play voice which had been recorded separately and was pure voice or mostly pure voice I e , the students were selected to do the same test the adults did Students were selected so as to minimize hearing infirmities caused by age The students were all in their late teens or early twenties The results were as follows

Student Setting of Voice

475

2 375

425

450 5 5.20 6 5 75

7 4.25

8 6.70

9 3.25

10 6.00

11 5 00

11 5.25

11 3.00

14 4.25

15 3.25

16 3.00

11 6.00

18 2.00

19 4.00

20 5.50

21 6 00

The ages of the older group ranged from 36 to 59 with the preponderance of the individuals being in the 40 or 50 year old group As is indicated by the results, the average setting tended to be reasonably high indicating some loss of hearing across the board The range again varied from 3 00 to 7 75, a spread of 4 75 which confirmed the findings of the range of variance in peoples preferred listening ratio of voice to background or any preferred signal to remaining audio (PSRA) The overall span for the volume setting for both groups of subjects ranged from 2 0 to 7 75 These levels represent the actual values on the volume adjustment mechanism used to perform this experiment They provide an indication of the range of signal to noise values ( when compared to the "noise' level of 6.0 ) that may be desirable from different users. To gain a better understanding of how this relates to relative loudness variations chosen by different users, consider that the volume adjustment from 2 0 to 7 75 represented an increase of 20dB or ten (10) times Thus, for even this small sampling of the population and single type of audio programming it was found that different listeners do prefer quite drastically different levels of "preferred signal" with respect to "remaining audio" This preference cuts across age groups showing that it is consistent with individual preference and basic hearing abilities, an unexpected result

As the test results show, the range that students without hearing infirmities caused by age selected varied considerably from a low setting of 2 00 to a high of 6 70. a spread of 4 70 or almost one half of the total range of from 1 to 10 The test is illustrative of how the "one size fits all" mentality of most recorded and broadcast audio signals falls far short of giving the individual listener the ability to adjust the mix to suit his or her own preferences and hearing needs Again, the students had a wide spread in their settings as did the older group demonstrating the individual differences in preferences and hearing needs

Having described the invention in detail, it will be obvious to those of ordinary skill in the art that many changes and modifications can be made without departing from the scope of the appended claims

Claims

What is claimed:

1. A method for separating a preferred audio signal component in an audio or video program with an audio component from the remaining components of the same program in a manner that allows the listener to adjust and blend the levels of the preferred signal components and the remaining audio to optimize his hearing ability or listening pleasure, said method comprising

recording the voice component on a first recording means and producing a first signal representative thereof, recording the remaining audio components on a second recording means and producing a second signal representative thereof, encoding said first and second signals and placing them in a distribution medium and distributing them, decoding said first and second signals, separating said signals into remaining audio components and preferred audio signal components, providing listener adjustment capability for said separate remaining audio and preferred signal component, and combining said signals after providing adjustment capability to provide a a final audio output optimized for the listeners individual hearing ability and aesthetics.

2. A method as in claim 1 and also including the steps of recording a video component signal, encoding said video component signal and placing it in said distribution medium together with said other component signals and distributing it, decoding said video signal and providing a visual output to said listener

3. A method as in claim 2 wherein said distribution medium is live broadcast on the airwaves. A method as in claim 2 wherein said distribution medium is recording said component signals on a recording means and distributing them

A method as in claim 4 wherein said recording means is a tape cassette

A method as in claim 4 wherein said recording means is a CD, DVD or video disk

A method as in claim 1 wherein said recording of the voice component is done with a directional voice band limited microphone with band pass filtering to minimize the background noise acoustically coupled to said voice component

A method as in claim 2 wherein said recording of the voice component is done off line to minimize background noise and dubbing it onto said video component

A method as in claim 1 wherein said background audio component is recorded using a broadband microphone to provide full audio quality of background information as in the case of music

A method as in claim 1 wherein the background audio component and voice component are separated by modulating them with different carrier frequencies

A method as in claim 2 wherein the background audio component and voice components are encoded by multiplexing the relevant voice information on separate stereo channels

A method as in claim 1 wherein the background audio component and voice components are recorded by recording on separate recording tracks A method as in claim 12 wherein said separate recording tracks is a DVD.

A method as in claim 12 wherein said separate tracks have adjustable channels.

A method as in claim 1 wherein said listener adjustment capability is provided by gain adjustments of said background audio component and said voice component.

A method as in claim 15 wherein said components are joined by a unity gain summing amplifier

A method as in claim 2 wherein said encoding of the component signals is done by time division methods

A method as in claim 2 wherein said encoding of the component signals is done by frequency division modulation

A hearing enhancement system for separating the preferred voice signal in an audio or video program from the remaining audio components of the same program in a manner that allows the listener to adjust and blend the levels of the preferred voice signal to remaining audio to optimize his or her hearing ability and to set the ratio for his or her own preference , said system comprising a first recording means for recording said preferred voice signal and turning it into a signal representative thereof, a second recording means for recording the remaining background audio and turning it into a signal representative thereof, encoding means for encoding said signals for distribution to a listener, distribution means for distributing said signals to a listener, decoding means for decoding said signals into voice and background audio components, and component blending means to combine said signals to provide an audio output optimized to the listeners particular hearing ability A system as in claim 19 and including means for separately adjusting the voice signal and the remaining audio background

A system as in claim 19 and including means to record a video of said preferred signal, said encoding and decoding means operative to act on said video preferred signal and an output means for said signal to provide viewing for the listener

A system as in claim 20 wherein said distribution means includes means for converting said signals into a recording medium

A system as in claim 21 wherein said recording medium is a video tape or disk

A system as in claim 20 wherein said distribution includes broadcasting said component signals and receiving said component signals

A system as in claim 21 wherein said means for converting said signals into a recording medium is a CD

A system as in claim 21 wherein said means for converting said signals into a recording medium is a DVD

A system as in claim 21 wherein said means for converting said signals into a recording medium is an audio tape

A system as in claim 19 wherein said first recording means is a directional voice band limited microphone with band pass filtering to minimize the background noise acoustically coupled to said voice component

A system as in claim 19 wherein said second recording means is a broadband microphone to provide full quality of background information A system as in claim 19 wherein said remaining background audio and preferred voice signal are separated by modulating them with different carrier frequencies

A system as in claim 20 wherein said remaining background audio and preferred voice signal are encoded by multiplexing the relevant voice information on separate channels

A system as in claim 19 wherein said separate adjustment means are gain adjustment means for said preferred signal and remaining audio

A system as in claim 19 wherein said signal blending means is a unity gain summing amplifier

A system as in claim 19 wherein said encoding means is a time division encoder

A system as in claim 19 wherein said encoding means is a frequency division modulator

A method whereby the listener enhances his hearing abilities for a specific audio or video with an audio component by way of his adjusting and/or blending the levels of the relevant voice components and the background component of said audio or video with an audio component by said listener, said method comprising

separately adjusting each one of said components to levels acceptable to said listener, and blending said adjusted components together for listening purposes

A method as in claim 36 wherein said listener separately adjusts the levels of the voice components and the background component to levels which tend to enhance his individual hearing A hearing enhancement feature for equipment used to transmit or play audio sounds to a listener which possesss a tuning, blending or control mechanism which allows the listener the ability to adjust and or blend the levels of the relevant voice components and the background component for his or her hearing needs and or preference

A hearing enhancement feature for sound equipment used to transmit or play audio sounds having separate preferred audio signals and remaining audio signals to a listener, said feature comprising

a first adjustment means adapted to adjust a preferred audio component to a level which enhances the listeners hearing, and

a second adjustment means adapted to adjust the remaining audio signal to a level which does not confound a listeners hearing, and

whereby the listeners hearing is improved in its ability to distinguish the preferred audio signal from the remaining audio signal

A hearing feature as in claim 39 wherein said first and second adjustment means are gain adjustment means for each signal

A hearing feature as in claim 39 and including means to blend said signals together again to provide coherent sound to the listener

A hearing feature as in claim 39 wherein said signal blending means is a unity gain summing amplifier

An audio system which uses a voice recognition system algorithm to separate the mostly voice signals from background noise signals to produce separate mostly voice and background audio channels that can be individually adjusted by the listener to his listening preferences