IMPROVED HEARING ENHANCEMENT SYSTEM AND METHOD
The present invention covers the separation of relevant voice information or a preferred audio signal component from background information or the total audio signal in an audio program in such a way that the end user can adjust the Volume of the relevant voice information or preferred signal and the [backgroi djτemaimng audio separately to his hearing needs or taste. In many applications, the preferred audio signal will be a pure voice or mostly pure voice audio signal while the remaining audio signal will consist of all other components of the total audio signal other than the preferred voice signal For other applications, the preferred signal may be a specific instrument in a musical recording while the remaining audio will be all other components of the musical recording. The present invention includes a variety of means for [separating] maintaining the orthogonality of the important voice information or preferred signal from the background or remaining audio information such as separate tracks on tape recordings, time encoding, frequency encoding or multiple channels in transmitted audio signals. The applications for this invention would include video recording, television and radio broadcasting, as well as tape, CD music recording. DVD, and specific play back means for adjusting the separate volume levels by the end user. While the present invention is intended for the hearing impaired as legally defined and will revolutionize the understanding of audio signals by them it can be used by anyone who wishes to more finely tune the audio of a system to his or her particular pleasure. Hearing loss is specific to each individual and one has to address those specific frequencies that are missing as the frequencies have different transfer characteristics. In addition, hearing preferences vary considerably with age and even within the same age group. The invention also contemplates the use of a voice recognition system to automatically separate the mostly voice components from the background components
FIELD OF INVENTION
The present invention pertains to the application of separate audio signals to adjust the background and critical voice component in a program signal to optimize the mixture of these two signals to the listeners personnel taste. Specifically it pertains to the application of separate audio signals to adjust a preferred audio signal component and the remaining audio signal components of a total audio program signal to optimize the mixture
of these two signals for the purpose of improving the understanding or intelligibility of the preferred signal or for the listening pleasure of the end-user The invention can be utilized with television, radio, tape recorded audio programming, CDs, DVDs or any kind of video equipment having an audio component or just pure audio equipment Tests were run on two groups of individuals which showed that hearing preferences are not just age dependant but vary among younger segments of the society, a finding not expected as popular wisdom states that hearing abilities diminish over the years One does not expect to find a wide variance in younger people the same age which the second test disclosed herein shows
BACKGROUND OF INVENTION
In U S Patent No 4, 024, 344, Dolby discussed a method of creating a "center channel" for dialog in cinema sound This technique involved correlation of the left and right stereophonic channels and adjusting the gain on either combined and/or the separate left or right channel depending on the degree of correlation between the left and right channel The assumption of Dolby is that the strong correlation between the left and right channel would indicate the presence of dialog The center channel, which is_a band passed filter summation of the left and right channel, would be amplified or attenuated depending on the degree of correlation between the left and right channel The problem with this approach is that it does not discriminate between meaningful dialog and simple correlated sound, nor does it address unwanted voice information within the voice band Nor does it provide a means for maintaining the separation of the preferred signal ( in this application, construed to be a blend of left and right channel ) throughout the processing sequence of the audio programming production A further limitation is that the individual listener cannot adjust the degree to which the center channel is amplified or attenuated
The separation of voice from background audio in television signals is discussed by Shiraki in U S Patent No 5, 197, 100 The technique employed by Shiraki involved the use of band pass filtering in combination with summing and subtracting circuits to form a "voice channel" that would be differentiated from the rest of the audio programming The limitation of this approach is that the band pass filter only discriminates frequencies within a predetermined range, in that case 200Hz to 5000Hz It can not discriminate between voice and background audio that may happen to fall within the band pass frequency Furthermore, the application of band pass filtering cannot distinguish between relevant and irrelevant speech components within an audio signal
Means of reducing background noise in audio frames have been discussed by Solve, et al, in US Patent No 5,485,522, which shows a speech detector and a noise estimator used to adaptively adjust attenuation to each frame of an audio signal This and other forms of Adaptive Noise Filtering can not distinguish between voice and other non- stationary audio in the voice band, such as music or irrelevant voice information
U S Patent No 5,434,922, to Miller et al, discloses a method and system for sound optimization which measures both the music and noise in a vehicle Again, Miller et al uses analog-to-digital conversions and adaptive digital filtering [with algorithms] to compensate for the ambient noise background by enhancing the sound signal automatically which is different than the technique disclosed herein as it does not allow for separately adjusting the signals and then blending them together
In general, prior art techniques employing band pass filtering or selective equalization will not remove voice band background or noise within the voice band range from the speech components of the audio program In the past, there have been numerous advancements that improve the quality of audio broadcasts ( stereo, Dolby processing, etc ) The previously cited inventions of Dolby, Shiraki and Miller et al have all attempted to modify some content of the audio signal through various signal processing hardware or algorithms but those methods do not exploit the individual needs or preferences of different listeners Some of the more undamental advancements in audio technology, like stereo recordings, also utilized two different signals and re-combined the two signals at some stage before the listener heard the total audio content However, there are innovative differences in those techniques and the claims of this invention In the instant invention the preferred signal is substantially different in content than either the left stereo or the right stereo and is provided to the listener for individual adjustment over the remaining part of the audio signal in order to provide the listener with an optimal listening solution based on needs associated with hearing loss or simply listening pleasure
To illustrate by the example the innovation of this application one should consider the adjustment of left stereo and right stereo by an end user In most stereo recordings, the left signal and right signal are constructed by mixing multiple types of audio signals Consequently, the balance adjustment of left and right stereo signals does not provide
discrimination of specific audio information by the end-user In the case of [For] stereo broadcasts where there is a voice announcer and some background tracks ( for example, a musical background or a laugh track ) both left and right stereo signals will contain identical audio information, although the information may be distributed unevenly to the left and right signal Therefore, a balance adjustment does not provide the end-user with the ability to improve the discrimination of a specific part of the audio signal (for example, voice) over the remaining part of the audio signal The present invention provides new opportunities for listeners to adjust the volume of a specific component of the audio signal relative to the remaining audio signal heretofore not possible with any type of audio processing methods or equipment
Finally, in the case of studio recordings, large differences exist between the present invention and what the standard methodology is Currently, vocals are usually recorded separately and are later mixed with the instrumentals and placed on a single track The end user can only adjust volume, tone and balance (as in the case of stereo), but not the volume of the voice component or the background
SUMMARY OF THE INVENTION
This invention provides a means for the end listener to adjust the meaningful voice audio and the background audio independently, to compensate for his hearing loss The end listener in this case is the person watching his television or video tape program, or listening to his music in his home or car The invention provides a means for the end- listener to any audio program or broadcast or audio content of a video program or broadcast, to personally adjust, using one or more adjustment means, the mix between a meaningful preferred audio signal and the remaining audio signal, to satisfy his aural needs and/or preferences The end-listener in this case is the person watching his television, DVD or video tape program or listening to other audio programs/broadcasts or listening to music in his home or car
One application of this invention is as an assistive device to the hearing impaired In this case the preferred audio signal is the relative voice audio content Many hearing impaired listeners are characterized by their inability to understand the voice content of the audio signal, particularly in the presence of background audio signals For the hearing impaired all background audio signals will be contained in the remaining audio signal There are several interfering audio signals that make understanding relevant program audio difficult for individuals with hearing deficiencies One interfering signal is the general user audio background, that includes music, traffic noise, wind, running water, etc These interfering sounds may not reside in the voice frequency band (roughly 200 to 5000 Hz), but if played at sufficiently high volumes may mask the relative voice audio Another interfering signal is non-voice information_that does occupy the voice frequency band Examples of this include string and horn musical instruments, and broad band random noise This type of interfering signal can not be band pass filtered, because it resides in the same frequency range as the relevant voice information Finally, the most difficult interfering signal is unwanted or non relevant human speech in the background of the audio track Speech filtering can not be used in this case, since the speech filtering algorithms can not distinguish between relevant and irrelevant voice information The voice channel separated out contains mostly voice as a small amount of ambient noise may remain unfiltered
Therefore, the present invention covers a variety of methods[ for adjusting] that provide hearing impaired listeners with independent adjustment of the volume of relative voice information in relation to the general background information in an audio program The simplest approach is to simply record the voice information on a separate track or channel However, this requires a modification to the playback equipment as well as a new method of producing the audio program In the case of broadcasting, one approach is to modulate the relevant voice information with a slightly higher or slightly lower frequency on broadcast but within the general channel range for a given broadcast frequency On the receive side, the signal is down converted with two local oscillators, one that down converts the general background audio, and the other that down converts the relevant voice information These signals are sent to separate variable gain preamplifiers that the listener adjusts with a dial or knob on the receiver The following detailed discussion of the invention covers the method of keeping relative voice and background audio separate for either recorded programming or for broadcast, using some generic means of signal encoding Receiving and playback systems are also discussed in terms of a generic method of signal decoding Various methods of encoding and decoding are well known in television and telephone art and are available and known to those of ordinary skill in the art
This invention is not limited to the delivery mechanism discussed here but is meant to include any technique known to those of ordinary skill in the art of delivering a signal from the acoustic source to the end-listener An example of where the present invention would be useful includes sports broadcasting The commentators' voices are adjusted to the listeners needs relative to the crowd noise In movies or videos, important speech information, such as a conversation between characters in a scene, is independently adjusted, while background audio including irrelevant speech is reduced to the listeners preferences or needs The invention is applicable as well to music recordings where the listener adjusts the vocals and the background instruments separately This invention can be used to enhance the end-listener's aural pleasure, as well as being an assistive listening device
Accordingly, it is an object of this invention to provide a system which separates the mostly voice components from the background components of an audio signal so that the user can individually adjust each component to suit his or her own hearing needs and requirements
It is a further object of this invention to provide a system that uses voice recognition to automatically separate mostly voice components from background components in an audio signal and to allow for separate adjustment by a listener
These and other objects will become clearer when reference is had to the accompanying drawings in which
Fig 1 shows a circuit diagram of the basic system comprising this invention, and
Fig 2 shows a diagram of the circuit for recording and playing back recorded signals, and
Fig 3 shows a circuit diagram of the use of a voice recognition means to automatically separate the mostly voice and background signals, and
Fig 4 shows the circuit diagram showing the digital version of the circuit shown in Fig 3
DETAILED DESCRIPTION OF THE INVENTION
The end-user's independent adjustment of the preferred audio signal and the remaining audio signal will be the apparent manifestation of this invention However, the invention includes and relies upon techniques that will require a change in various standard methods used to create the audio programming portions found today in television, radio. CDs. DVDs and other audio presentations To illustrate the details of the invention- consider the application where the preferred audio signal is the relevant voice information as will be the case when this invention is used as an assistive listening device for the hearing impaired Figure 1 illustrates a general approach to separating relevant voice information from general background audio in a recorded or broadcast program There will first need to be a determination made by the programming director as to the definition of relevant voice An actor, group of actors, or commentator(s) must be identified as the relevant speaker(s) Once the relevant speakers are identified, their voices will be picked up by the voice microphone(s) 1 The voice microphone will need to be either a close talking microphone (in the case of commentators) or by highly directional shot gun microphones used in sound recording In addition to being highly directional these microphones will need to be voice_band limited preferably from 200 to 5000 Hz The combination of directionality and band pass filtering minimize the background noise acoustically coupled to the relevant voice information upon recording In the case of certain types of programming, the need for preventing acoustic coupling can be avoided by recording relevant voice or dialog offline and dubbing it in where appropriate with the video portion of the program The background microphones 2 should be fairly broadband to provide the full audio quality of background information such as music A camera 3 will be used to provide the video portion of the program The audio signals (voice and relevant voice) will be encoded with the video signal at the encoder 4 In general the audio signal is usually separated from the video signal by simply modulating it with a different carrier frequency Since most broadcasts are now in stereo, one way to encode the relevant voice information with the background is to multiplex the relevant voice information on the separate stereo channels in much the same way left front and right front channels are added to two channel stereo to produce a quadriphonic disc recording Although this would create the need for additional broadcast bandwidth, for recorded media this would not present a problem, as long as the audio circuitry in the video disc or tape player is designed to demodulate the relevant voice information Once the signals are encoded, by whatever means deemed appropriate, the encoded signals are sent out for broadcast 5, or recorded on to tape or disc 6 In case of recorded audio video information, the background and voice information could be simply placed on separate recording tracks
Figure 2 illustrates a possible method of receiving and playing back the encoded program signals A receiver system 7 would demodulate the main carrier frequency from the encoded audio/video signals, in the case of broadcast information In the case of recorded media, the heads from a VCR or the laser reader from a CD player 8 would produce the encoded audio/video signals In either case these signals would be sent to a decoding system 9 The decoder would separate the signals into video, voice audio, and background audio using standard decoding techniques such as envelope detection in combination with frequency or time division demodulation The background audio signal is sent to a separate variable gain amplifier 10, that the listener can adjust to his or her preference The voice signal is sent to a variable gain amplifier 11, that can be adjusted by the listener to his or her particular needs The two adjusted signals, are summed by a unity gain summing amplifier 12 to produce the final audio output In this manner the listener can adjust relevant voice to background levels to optimize his listening abilities
Referring to Figure 3 there is shown a diagram of the further refinement of this invention which is a system 100 utilizing a voice recognition chip 101 which separates the information into a mostly voice channel and a background noise channel after recognizing the speech components of the incoming signal The mostly voice-like components are separated from the background components and grouped into separate data streams The signals are then converted back to analog signals to create a mostly voice channel and a background noise channel Fig 4 shows a circuit diagram of the digital version of the device 102 The audio signal is received, separated into the digital signals as described and then back to an analog signal
A further embodiment of the instant invention utilizes a chip which is programmable and by initialization is tailored to each users malady It is understood that the system and method are adaptable to both analog and digital signals [ Having described the invention in detail it will be obvious to those of ordinary skill in the art that many changes and modifications can be made without departing from the scope of the appended claims.] For the present invention, if the application is to increase the listening pleasure of the end-user, it is possible to change the voice microphone 1 to read the preferred signal microphone and the background microphone 2 to read the remaining audio signal microphone Then, the rest of the means for providing the end-listener independent adjustment of the preferred signal and remaining audio will follow exactly as the preceding discussion
To illustrate that people have different hearing needs and preferences an older group of men was selected and asked to do an adjustment that a group of students later did between a fixed background crowd noise and the voice of an announcer where only the latter could be varied and the former was set at 6 00 The results with the older group were as follows
Member Setting
I 750 2 450 3 400 4 750 5 300 6 700 7 650
775
9 550
10 700
1 1 500
To further illustrate the fact that people of all ages have different hearing needs and preferences, a group of 21 college students were selected to listen to a mixture of voice and background and to select, by making one adjustment to the voice level the ratio of the voice to the background The background noise, in this case crowd noise at a football game, was fixed at a setting of six (6 00) and the students were allowed to adjust the volume of the announcers play by play voice which had been recorded separately and was pure voice or mostly pure voice I e , the students were selected to do the same test the adults did Students were selected so as to minimize hearing infirmities caused by age The students were all in their late teens or early twenties The results were as follows
Student Setting of Voice
475
2 375
425
450
5 5.20 6 5 75
7 4.25
8 6.70
9 3.25
10 6.00
11 5 00
11 5.25
11 3.00
14 4.25
15 3.25
16 3.00
11 6.00
18 2.00
19 4.00
20 5.50
21 6 00
The ages of the older group ranged from 36 to 59 with the preponderance of the individuals being in the 40 or 50 year old group As is indicated by the results, the average setting tended to be reasonably high indicating some loss of hearing across the board The range again varied from 3 00 to 7 75, a spread of 4 75 which confirmed the findings of the range of variance in peoples preferred listening ratio of voice to background or any preferred signal to remaining audio (PSRA) The overall span for the volume setting for both groups of subjects ranged from 2 0 to 7 75 These levels represent the actual values on the volume adjustment mechanism used to perform this experiment They provide an indication of the range of signal to noise values ( when compared to the "noise' level of 6.0 ) that may be desirable from different users. To gain a better understanding of how this relates to relative loudness variations chosen by different users, consider that the volume adjustment from 2 0 to 7 75 represented an increase of 20dB or ten (10) times Thus, for even this small sampling of the population and single type of audio programming it was found that different listeners do prefer quite drastically different levels of "preferred signal" with respect to "remaining audio" This preference cuts across age groups showing that it is consistent with individual preference and basic hearing abilities, an unexpected result
As the test results show, the range that students without hearing infirmities caused by age selected varied considerably from a low setting of 2 00 to a high of 6 70. a spread of 4 70 or almost one half of the total range of from 1 to 10 The test is illustrative of how the "one size fits all" mentality of most recorded and broadcast audio signals falls far short of giving the individual listener the ability to adjust the mix to suit his or her own preferences and hearing needs Again, the students had a wide spread in their settings as did the older group demonstrating the individual differences in preferences and hearing needs
Having described the invention in detail, it will be obvious to those of ordinary skill in the art that many changes and modifications can be made without departing from the scope of the appended claims