WO2018069900A1 - Audio-system and method for hearing-impaired - Google Patents

Audio-system and method for hearing-impaired Download PDF

Info

Publication number
WO2018069900A1
WO2018069900A1 PCT/IB2017/056393 IB2017056393W WO2018069900A1 WO 2018069900 A1 WO2018069900 A1 WO 2018069900A1 IB 2017056393 W IB2017056393 W IB 2017056393W WO 2018069900 A1 WO2018069900 A1 WO 2018069900A1
Authority
WO
WIPO (PCT)
Prior art keywords
hearing
masking
audio
listener
psychoacoustic
Prior art date
Application number
PCT/IB2017/056393
Other languages
French (fr)
Inventor
Kei Kobayashi
Original Assignee
Auckland Uniservices Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Auckland Uniservices Limited filed Critical Auckland Uniservices Limited
Publication of WO2018069900A1 publication Critical patent/WO2018069900A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/70Adaptation of deaf aid to hearing loss, e.g. initial electronic fitting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/1752Masking
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/43Signal processing in hearing aids to enhance the speech intelligibility

Definitions

  • the invention relates to an audio system and method for hearing impaired.
  • the audio system may be utilized in hearing aids, portable audio devices, or in any generic audio file processing.
  • the spectral shape of a sound is represented in the excitation pattern evoked by the sound.
  • the reduced frequency selectivity in hearing loss is usually reduced and results in less detail about the spectrum than would be the case for a normal ear.
  • CB critical bandwidth
  • ERB Equivalent Rectangular Bandwidth
  • tone masking noise noise masking tone
  • auditory filters are a practical difficulty at audiology clinics as these measurements for individuals are very time consuming. This is because CB or ERB varies by center frequency. To compensate for an individual's CB or ERB, these must be measured at 125, 250, 500, 1000, 2000, 4000 Hz for both ears, at least. With existing simplified methods it takes on average 96 minutes in total testing time, which is generally unacceptable at clinics. Moreover, psychoacoustic testing (to perceive a tone in noise) is very difficult for patients thus usually more time is needed to complete such a test.
  • CB critical bandwidth
  • ERB Equivalent Rectangular Bandwidth
  • the present invention broadly consists in a method of improving the listenability of an audio signal for a hearing impaired listener, the method implemented by a processing device having associated memory, comprising receiving or retrieving input audio signal; receiving or retrieving listening data indicative of the hearing impaired listener's hearing characteristics; generating or modifying a customised psychoacoustic model based on the listening data; processing the input audio signal to identify and remove or at least partially attenuate inaudible or unhelpful spectral components in the input audio signal based on the customised psychoacoustic model; and generating a modified output audio signal based on the processing that is customised for the hearing impaired listener.
  • the listening data comprises a single hearing characteristic or single configuration parameter for generating or modifying the customised psychoacoustic model.
  • the listening data contains all of the parameters for generating or modifying the customised psychoacoustic model.
  • the single hearing characteristic or single configuration parameter is indicative of the listener's auditory filter bandwidth.
  • the listener's auditory filter bandwidth is a function of the single hearing characteristic or single configuration parameter.
  • the single hearing characteristic or single configuration parameter indexes which auditory filter bandwidth of a selection of different auditory filter bandwidth approximates the listener's auditory filter bandwidth.
  • the single hearing characteristic or single configuration parameter modifies a default auditory filter bandwidth.
  • the single hearing characteristic or single configuration parameter represents the listener's proportional difference between the default auditory filter bandwidth and the listener's auditory filter bandwidth.
  • the default critical band bandwidth is an average person's auditory filter bandwidth.
  • the single hearing characteristic or single configuration parameter is generated as output from an electronic psychoacoustic assessment system.
  • the electronic psychoacoustic assessment system comprises a GUI.
  • the GUI comprises an adjustable graphical user interface element which modifies a control variable.
  • control variable is the single hearing characteristic or single configuration parameter.
  • the GUI comprises a graph display.
  • the graph display comprises the user's listening assessment data.
  • adjusting the control variable adjusts a plot displayed in the graph display.
  • the plot displayed represents a user's average frequency selectivity.
  • the user's single hearing characteristic or single configuration parameter is derived from the user's average frequency selectivity.
  • the plot displayed represent the user's single hearing characteristic or single configuration parameter.
  • the adjustable graphical user interface element is one or more of: toggle switch, drop down menu, check box, radio button, numerical input, slider scale, or dial.
  • the auditory filter bandwidth is a user's critical band bandwidth.
  • the auditory filter bandwidth is a user's effective rectangular bandwidth (ERB).
  • receiving listening data further comprises generating or determining additional listening data indicative of additional hearing characteristics of the listener based on the single hearing characteristic or single configuration parameter.
  • the additional listening data is indicative of any one or more of the following hearing characteristics of the listener: the listener's tonal masking index, noise masking index and/or spreading function.
  • processing the input audio signal comprises fitting critical bands of audio of the input audio signal based on the listening data indicative of the listener's critical band bandwidth; determining an individual masking threshold for each critical band of audio; determining global masking thresholds based on the determined individual masking thresholds; and spectrally modifying the input audio signal based on the determined global masking thresholds.
  • determining an individual masking threshold for each critical band comprises determining a sound pressure level of a masking component in the critical band of the input audio signal; determining at least one masking index based on the listening data indicative of the critical band bandwidth of the listener; determining a spreading function based on the listening data indicative of the critical band bandwidth of the listener; determining an individual masking threshold based on the determined sound pressure level of the masking component, the determined at least one masking index, and the determined spreading function.
  • determining the at least one masking index comprises determining the tonal masking index and the non-tonal masking index.
  • spectrally modifying the input audio signal comprises calculating the signal-to-mask ratio in each critical band based on the global masking thresholds; and applying spectral subtraction to the input audio signal based on the global masking threshold.
  • generating or modifying the psychoacoustic model comprises:
  • generating or modifying the customised psychoacoustic model based on the received listening data comprises loading the customised psychoacoustic model into memory from an external source.
  • generating or modifying the customised psychoacoustic model comprises generating the psychoacoustic model in real-time based on the received listening data.
  • the present invention broadly consists in an audio processor that is configured to improve the listenability of an audio signal for a hearing impaired listener, the audio processor comprising a processor and associated memory, and which is configured to carry out the method according to the first aspect.
  • the audio processor is provided in a hearing aid or an audio prosthesis such as a cochlear implant or a middle ear implant or a bone conduction unit.
  • the audio processor is provided as an application program executable on a programmable electronic device.
  • the present invention broadly consists in a computer-readable medium having stored thereon computer executable instructions that, when executed on a processing device or devices, cause the processing device or devices to perform a method of the first aspect.
  • the present invention broadly consists in a hearing aid or an audio prosthesis for use by a hearing impaired user, the hearing aid configured to be mounted on or within a user's ear, the hearing aid comprising the audio processor according to the second aspect or a computer readable medium according to the third aspect.
  • the audio prosthesis may be a cochlear implant or a middle ear implant or a bone conduction device.
  • cochlear implant and middle ear implant incorporates the implanted elements as well as the external elements e.g. the sound processor unit.
  • the present invention broadly consists in a mobile device or a computing device comprising the computer readable medium according to the third aspect.
  • the present invention broadly consists in a method of fitting a hearing aid or an audio prosthesis, the method of fitting comprises assessing or generating a single control parameter representing a user' s hearing profile.
  • the method comprises customising the psychoacoustic model to the user based on the single control parameter.
  • the single control parameter representing a user's hearing profile is determined based on a testing process.
  • testing process comprises providing a plurality of auditory or audible stimuli and assessing a user's response, the single control parameter being determined based on the response of the user to the stimuli.
  • the single control parameter may be determined by trial and error.
  • the stimuli may be speech stimuli or specific sounds or musical notes or specific audible signals.
  • the stimuli may be delivered with varying amplitudes and/or with varying signal to noise ratios.
  • the noise may be white noise that is mixed with the stimuli signals.
  • the single control parameter of a user's hearing profile may be determined based on a fitting a user parameter to match the user's listening ability.
  • the parameter may be a critical band width ratio.
  • the single control parameter of a user's hearing profile is determined by fitting a critical band width ratio of a user such that the fit line substantially matches or corresponds to the user's listening ability.
  • the determined single control parameter may be used to determine or generate a customised psychoacoustic model for a user.
  • the present invention broadly consists in an audio processor for executing a method of fitting a hearing aid or an audio prosthesis according to the sixth aspect.
  • the phrase 'computer-readable medium' should be taken to include a single medium or multiple media. Examples of multiple media include a centralised or distributed database and/or associated caches. These multiple media store the one or more sets of computer executable instructions.
  • the phrase 'computer readable medium' should also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor of a computing device and that cause the processor to perform any one or more of the methods described herein.
  • the computer-readable medium is also capable of storing, encoding or carrying data structures used by or associated with these sets of instructions.
  • the phrase 'computer-readable medium' includes solid-state memories, optical media and magnetic media.
  • Figure 1 shows a schematic diagram of the main modules of an audio system having an audio processor with psychoacoustic model as a front end to a hearing aid in an embodiment
  • Figure 2a shows a schematic diagram of an alternative embodiment audio system for processing audio inputs generally using the audio processor of Figure 1
  • Figure 2b is a schematic diagram showing sub-modules of the audio processor of Figures 1 and 2a;
  • Figure 3a shows the primary components of a psychoacoustic processing configuration or algorithm for digital audio processing
  • Figure 3b shows the psychoacoustic processing configuration or algorithm of Figure 3 a customised based on an individual's psychoacoustic assessments in an embodiment
  • Figures 4a to 4f show psychoacoustic analysis of a pop music song, specifically:
  • Figure 4a shows normalized power spectral density and tonal/non-tonal masker ID;
  • Figure 4b shows prototype spreading functions;
  • Figure 4c shows individual tonal masker thresholds;
  • Figure 4d shows individual noise masker thresholds;
  • Figure 4e shows global masking thresholds;
  • FIG. 5 shows excitation patterns for the same sound calculated using different sizes of equivalent rectangular bandwidth (ERB);
  • Figures 6a to 6d show a process of eliminating inaudible components around two tones in accordance with an embodiment, specifically: Figures 6a-6c show processing for a normal hearing person on the left and processing for a person with hearing loss on the right in accordance with an embodiment; and Figure 6d shows processing for a normal hearing person in the upper graph and processing for a hearing impaired person in the lower graphs in accordance with an embodiment;
  • FIG. 7a shows a graphical representation of critical bandwidths (ERB) using some exemplary methods
  • Figure 7b shows a graphical representation of critical band rate (ERB rate) using the exemplary methods described with reference to figure 7a;
  • Figure 8a shows a plot of ERB plotted for a variety of frequencies for various characteristic or single configuration parameters for a user
  • Figure 8b shows a plot of ERB rate for a variety of frequencies for various characteristic or single configuration parameters for a user
  • Figure 8c shows a plot of ERB vs ERB rate for three different characteristic or single configuration parameters for a user
  • Figure 9 illustrates a plot of three different psychoacoustic models
  • Figure 10 shows a graphical representation of various spreading functions for different characteristic or single configuration parameters for a user
  • Figure 11a show a plot of a model of Rb across frequencies for various characteristic or single configuration parameters for a user;
  • Figure lib shows a plot of the test results of Rb across frequencies for various characteristic or single configuration parameters for a user
  • Figure 12a and 12b show the modelling of ERB curves plotted over a range of frequencies
  • Figure 13 shows relative responses in decibels for spreading functions for different characteristic or single configuration parameters
  • Figures 14a and 14b are graphical representations of an original signal and a processed signal using a suitable processing method as described herein, for different characteristic parameters of a user;
  • Figures 15a and 15b show the 1/3 octave analysis of an audio frame for different characteristic parameters (i.e. different users);
  • Figure 16 shows various spectrograms illustrating the improvement of a signal e.g. a speech signal
  • Figures 17a and 17b illustrate spectrograms of music used in testing, and the result of improving the listenability of speech with music
  • Figures 18a and 18b show graphical representations of critical bandwidths for different individual's hearing
  • Figure 19 shows masking index for different individual's hearing
  • Figure 20 shows ratio of spreading functions for different individual's hearing
  • Figures 21a and 21b shows examples of a psychoacoustic GUI assessment tool used in a first quick fitting method
  • Figure 22 shows an example of a psychoacoustic GUI assessment tool used in a second quick fitting method
  • Figure 23 shows an example of a psychoacoustic GUI assessment tool used in a third quick fitting method
  • Figure 24 shows values absolute thresholds of hearing (ATH) increasing for various frequencies
  • Figures 25a and 25b show plots illustrating a comparison of the frequency selectivity model and test results of the average of Rb from measurement data;
  • Figures 26 shows a plot of speech intelligibility for original signals e.g. original speech, and processed signals e.g. processed speech;
  • Figure 27 shows a total scope of perceptual scales of signals i.e. original signals and processed signals
  • Figure 28a to 28d show various method steps of spectral subtraction to improve the listenability of a signal e.g. a speech signal.
  • Figures 28e and 28f show plots of third octave spectrum of normal hearing and a hearing impaired user.
  • the embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged.
  • a process is terminated when its operations are completed.
  • a process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc., in a computer program. When a process corresponds to a function, its termination corresponds to a return of the function to the calling function or a main function.
  • mobile device includes, but is not limited to, a wireless device, a mobile phone, a smart phone, a mobile communication device, a user communication device, personal digital assistant, mobile hand-held computer, a laptop computer, wearable electronic devices such as smart watches and head-mounted devices, an electronic book reader and reading devices capable of reading electronic contents and/or other types of mobile devices typically carried by individuals and/or having some form of communication capabilities (e.g., wireless, infrared, short-range radio, cellular etc.).
  • some form of communication capabilities e.g., wireless, infrared, short-range radio, cellular etc.
  • Presence of hearing loss acts as a filter, making harmonics inaudible in the region of hearing loss.
  • Reduced audibility is usually addressed by the fitting of hearing aids which amplify sounds and which are developed for speech intelligibility. Due to the differences in the spectrum and intensity of music to speech, amplification and reproduction of music presents several challenges (e.g. crest factor, dynamic range, peak level).
  • the electronic and electro-acoustic parameters set up in a hearing aid may be optimal for speech, but not for music.
  • Hearing aids and cochlear implants do not compensate for dynamic masking such as simultaneous (or frequency) masking. Such dynamic masking is caused by peaks of the complex sounds in which the intensity and frequency are dynamically changing in time sequence.
  • Hearing aids and cochlear implants are generally designed with fixed frequency channels. People suffering from hearing loss typically have worse masking effect because of their wider auditory filter. In respect of listening to music, masking affects audibility of timbre and lower amplitude of background instruments because they have much louder peaks that mask softer sounds simultaneously.
  • Lossy digital audio compression systems include, but are not limited to: ISO/IEC MPEG Standard, ATRAC, DTS-CA, and Dolby (TM) AC-3.
  • the model by which audio masking is calculated and adjusted in such compression technologies is known as a "psychoacoustic model".
  • a method and system for modifying and adapting a psychoacoustic model into audio processing technology to improve listening quality of audio, particularly complex sounds such as music and speech in background noise, for those suffering hearing loss is described.
  • This audio processing method and system can work as an independent processing block 102 at the front end of hearing aids 104 or cochlear implant as shown in Figure 1, or
  • audio processor 200 that optionally includes a hearing aid
  • audio codec for the hearing impaired as shown in Figures 2a and 2b. It will be appreciated that the audio processor could be used in an audio or audio and video signal processing chain.
  • the audio processor can be incorporated or implemented in any electronic hardware device that is capable or configured to deliver audio to an end user including, but not limited to, hearing-loss devices such as hearing aids and cochlear implant devices, and general devices configured to deliver audio to a user via speakers or headphones or earphones for example such as portable and non-portable digital audio players and audio systems, consumer electronic devices such as general purpose computing devices, tablets, smart phones, smart televisions, wearable devices.
  • the functionality of the audio processor can be implemented in software or computer- readable instructions executable on such devices or hardware logic or circuits, or a combination of these.
  • the audio processor may be a plug-in or addon component to an audio player software application program.
  • Audio Codecs Audio coding relies heavily on exploiting properties of psychoacoustics.
  • MPEG Motion Picture Experts Group
  • MPEG-1 and the MPEG-2 has become the most widely used lossy audio/video format in the world.
  • MPEG-2 has become the most widely used lossy audio/video format in the world.
  • MPEG-4 Adaptive TRansform Acoustic Coding
  • DTS-CA DTS Coherent Acoustics
  • AC-3 Dolby's Audio Coder-3
  • the MPEG standard includes the MP3 (MPEG-1 or MPEG-2 Audio Layer III) audio format.
  • the MP3 encoder processes the digital audio signal and produces a compressed bitstream for storage.
  • the MP3 encoder algorithm is not standardized, and may use various means for encoding such as estimation of the auditory masking threshold, quantization, and scaling. However, the encoder output must be such that a decoder conforming to the specifications will produce audio suitable for the intended application.
  • different layers of the coding system with increasing encoder complexity and performance can be used.
  • An ISO MPEG Audio Layer N decoder is able to decode bitstream data which has been encoded in Layer N and all layers below N.
  • This layer contains the basic mapping of the digital audio input into 32 subbands, fixed segmentation to format the data into blocks, a psychoacoustic model to determine the adaptive bit allocation, and quantization using block companding and formatting.
  • This layer provides additional coding of bit allocation, scalefactors and samples. Different framing is used.
  • the MPEG-Audio algorithm 300 is a psychoacoustic algorithm that receives a digital audio input signal 310, processes it, and outputs an encoded digital audio signal or bitstream 312.
  • the primary parts of the psychoacoustic processing algorithm are shown and described in the following (also see ISOl 1172-3): 1) Filter Bank 302:
  • the filterbank does a time to frequency mapping.
  • the psychoacoustic model calculates a just noticable noise-level for each band in the filterbank. This noise level is used in the bit or noise allocation to determine the actual quantizers and quantizer levels. There are two psychoacoustic models.
  • Model 1 has been used for Layers I and II, and Model 2 for Layer III.
  • SMR signal-to-mask ratio
  • the final output of the model is a signal-to-mask ratio (SMR) for each band (Layers I and II) or group of bands (Layer III). 3) Bit or Noise Allocation 306:
  • the allocator looks at both the output samples from the filter bank and the SMR's from the psychoacoustic model, and adjusts the bit allocation (Layers I and II) or noise allocation (Layer III) in order to simultaneously meet both the bitrate requirements and the masking requirements. At low bitrates, these methods attempt to spend bits in a fashion that is psychoacousticly inoffensive when they cannot meet the psychoacoustic demand at the required bitrate.
  • the bitstream formatter 308 takes the quantized filterbank outputs, the bit allocation (Layers I and II) or noise allocation (Layer III) and other required side
  • the decoder accepts the compressed audio bitstream and uses the information to produce digital audio output. Bitstream data is fed into the decoder.
  • the bitstream unpacking and decoding block does error detection if errorcheck is applied in the encoder.
  • the bitstream data are unpacked to recover the various pieces of information.
  • the reconstruction block reconstructs the quantized version of the set of mapped samples. The inverse mapping transforms these mapped samples back into uniform Pulse-code modulation (PCM).
  • PCM Pulse-code modulation
  • Psychoacoustic model is a mathematical model of the masking behaviour of the human auditory system of averaged normal hearing.
  • MP2 MPEG-Audio Layer II
  • psychoacoustic model I an embodiment based on the use of MPEG-Audio Layer II (MP2) and psychoacoustic model I, although it will be appreciated that the application of the system and method is not limited to this particular model.
  • the calculation of the global masking threshold may be based on the following steps:
  • Step 1 Calculation of the FFT for time to frequency conversion.
  • Step 2 Determination of the sound pressure level in each subband.
  • Step 3 Determination of the threshold in quiet (absolute threshold).
  • Step 4 Finding of the tonal (more sinusoid-like) and non-tonal (more noise-like) components of the audio signal.
  • Step 5 Decimation of the maskers, to obtain only the relevant maskers.
  • Step 6 Calculation of the individual masking thresholds.
  • Step 7 Determination of the global masking threshold.
  • LTtm and LTnm are the individual masking thresholds at critical band rate z(i) in Bark of the masking component at the critical band rate z(j) in Bark.
  • Xtm[z(j)] is the sound pressure level of the masking component with the index number j at the corresponding critical band rate z(j).
  • av is called the masking index and vf the masking function of the masking component Xtm[z(j)].
  • the masking function vf is also so called as spreading function.
  • the masking index av is different for tonal and non- tonal masker (avtm and avnm).
  • the global masking threshold LTg(i) at the i'th frequency sample is derived from the upper and lower slopes of the individual masking threshold of each of the j tonal and non- tonal maskers, and in addition from the threshold in quiet LTq(i).
  • the global masking threshold is found by summing the powers corresponding to the individual masking thresholds and the threshold in quiet.
  • the range of j can be reduced to just encompass those masking components that are within -8 to +3 Bark from i. Outside of this range LTtm and LTnm are -8 dB.
  • the critical band rate of individuals can be calculated by approximating a curve, which forms with some of the critical bandwidths.
  • the ISO/IEC 11172-3 psychoacoustic model 1 determines the maximum allowable quantization noise energy in each critical band such that quantization noise remains inaudible. In one of its modes, the model uses a 512-point DFT for high resolution spectral analysis (86.13 Hz), then estimates for each input frame individual simultaneous masking thresholds due to the presence of tone-like and noise- like maskers in the signal spectrum. A global masking threshold is then estimated for a subset of the original 256 frequency bins by (power) additive combination of the tonal and non-tonal individual masking thresholds.
  • the absolute threshold 402 is indicated with a dashed line (Top graph: linear frequency scale 400, Bottom graph: Bark scale 401).
  • Figure 4 shows a power spectral density (PSD) and sound pressure level (SPL) 404 estimate using a 512- point FFT of incoming audio samples. Tonal and non-tonal masking components are also identified in Figures 4c-d - Local maxima in the sample PSD which exceed neighbouring components within a certain bark distance by at least 7dB are classified as tonal, while a single noise masker for each critical band is then computed from (remaining) spectral lines not within the certain bark distance of a tonal masker.
  • PSD power spectral density
  • SPL sound pressure level
  • Tonal maskers 408 are denoted by 'x' symbols and noise maskers 406 are denoted by ⁇ ' symbols in the Figure 4.
  • the number of maskers is reduced (decimation) by the following steps: any tonal or noise maskers below the absolute threshold and any pairs which are too close to each other (within 0.5 Bark) are replaced by the stronger one.
  • two tonal maskers 410, 412 appear between 19.5 and 20.5 Barks, as shown the bottom graph 401 in Figure 4a. It can be seen that the pair is replaced by the stronger of the two 410 during threshold calculations as shown in Figures 4c-e). Having obtained a decimated set of tonal and noise maskers, individual tone and noise masking thresholds are computed next.
  • Tonal masker thresholds, T TM(i, j), and noise masker thresholds, T NM(i, j), are given by
  • TTM (i, j) PTM (j) (PNM (j)) denotes the SPL of the tonal (noise) masker in frequency bin j
  • z(j) denotes the Bark frequency of bin j
  • Figures 4c and 4d show individual masking thresholds associated with the tone maskers 420 and noise maskers 422.
  • Tq (i) (unit: dBSPL) is the absolute hearing threshold for frequency bin i
  • TTM (i, 1) and TNM (i, m) are the individual masking thresholds
  • L and M are the number of tonal and noise maskers.
  • Figure 4e shows global masking threshold 430 obtained by adding the power of the individual tonal maskers 420 as shown in Figure 4c and noise maskers 422 as shown in Figure 4d to the absolute threshold 402 in quiet.
  • Degradation of absolute threshold can occur due to cochlear damage. Firstly, damage to the Outer Hair Cells (OHCs) impairs the active mechanism, resulting in reduced basilar membrane (BM) vibration for a given low sound level. Hence, the sound level must be larger than normal to give a just-detectable amount of vibration. Secondly, OHC damage can result in reduced efficiency of transduction, so the amount of BM vibration needed to reach threshold is larger than normal.
  • the most prominent change in loudness perception associated with the damage is loudness recruitment.
  • the sound appears to fluctuate more in loudness than it would for a normally hearing person.
  • the loudness differences between consonants and vowels may be greater than normal.
  • the forte passages may be perceived at almost normal loudness, but the piano passages may be inaudible. To compensate the audibility, it needs to account for both the increasing absolute hearing threshold and loudness recruitment.
  • Hearing aids are the primary method for alleviating these problems. However, people find that their hearing aids are sometimes useful in helping them to hear soft loudness sounds, but that the aids do not help very much, if at all, when background noise, including music instruments, is present because of their frequency selectivity and increasing global masking thresholds, described below.
  • the degree of broadening of the PTCs increases with increasing hearing loss.
  • the PTC shapes for the impaired ears vary greatly in shape across participants, but they are all broader than those for the normal ears, especially their low-frequency side.
  • Detecting fine structure of spectrum in audio may be influenced by recruitment.
  • FIG. 5 shows excitation patterns 502, 504 for the same vowel, but calculated for impaired auditory systems.
  • the top graph 500 in Figure 5 shows an excitation pattern 502 for an impaired ear with auditory filters two times broader than normal.
  • the bottom graph 501 in Figure 5 shows an excitation pattern 504 for an impaired ear with auditory filters four times broader than normal. It can be seen that spectral details are less well represented in these excitation patterns.
  • Masking index here is denoted with a "noise-masking-tone” and a “tone-masking-noise”.
  • Noise-masking-tone is masked threshold of a tone by uniform exciting noise while tone- masking-noise is masked threshold of burst noise (critical band wide) by a tone.
  • the masking index is used in the spreading function in the psychoacoustic model.
  • the masking index is the difference between the critical band level and the masked threshold in the region of the main excitation.
  • the loudness of a 1kHz tone below a SPL of about 30dB is reduced equally narrow and wide-band noise (normal hearing).
  • the steepness of the masked loudness function depends on the spread of excitation of the noise.
  • the effect of the tone on a critical band of noise is greater than its effect on either an octave-band noise or wide-and noise and whether the masker is a tone or noise, masking ceases when the effective energy of the masked and masking stimuli is the same.
  • the masking index for hearing loss is measured and used as a basis to define the individual spreading function of a customised psychoacoustic model for the end listener. .
  • a customised psychoacoustic model is configured or generated for the intended end-listener based on their individual hearing loss profile or characteristics.
  • This customised psychoacoustic model operates in the audio processor to modify the incoming audio signal into a modified output audio signal that is intended to in at least some aspects improve the listening quality or experience for the listener.
  • the audio processing based on the customised psychoacoustic model in some embodiments, is configured such that masked sounds by tone's and noise's peaks (below the global masking threshold LTg(i)) are eliminated for the individual's hearing.
  • the audio processing based on the customised psychoacoustic models improves the listenability of a signal e.g.
  • psychoacoustic assessment tools are provided to assess individual psychoacoustic characteristics of an end-listener and the system is configured to import those psychoacoustic profiles into an MPEG Audio encoder that uses psychoacoustic model I in MPEG Audio Layer II. Rather than average data for normal hearing, the individual participant data or profile is applied to customise the
  • the customised psychoacoustic model because of the great variances between individuals with hearing loss as described in the previous section. It will be appreciated that the customised psychoacoustic model may be used in other audio processors, encoders, and codecs in alternative configurations.
  • the customised psychoacoustic model comprises a modified spreading function (also called an exciting pattern), which was derived by individual's auditory characteristics.
  • An encoder containing the customised psychoacoustic model will eliminate unnecessary sounds masked by tone's and noise's peaks. This is broadly shown in Figures 6a-6d. Total band power in a critical band is reduced and therefore simultaneous masking is improved. Reducing total band power evokes less unnecessary excitation and therefore improves sound quality for hearing loss.
  • critical bands can be approximately represented as equivalent rectangular bandwidths (ERBs).
  • ERBs are way of mathematically modelling the critical bands as rectangular band-pass filters. In some embodiments ERBs are used, but it will be appreciated that critical bands could be used alternatively.
  • the purpose of the proposed signal processing is to suppress or subtract unnecessary powers without degrade of sound quality.
  • the frequency bins of the unnecessary powers are masked one (inaudible).
  • the total noise power in a critical band is increased and evoke frequency masking that masks the close frequency bins of interest (E.g. Formant in speech) beyond the critical bandwidth.
  • This speech enhancement processing would work so that such masked sounds by dominant tone's and noise's peaks below the global masking threshold LTg of individual such as hearing loss, are eliminated.
  • the critical bandwidth parameter is obtained initially. Initially may be the initial fitting stage or every recalibration of a device e.g. a hearing aid or an audio prosthesis. Exemplary methods of achieving the critical bandwidth parameter (h) are explained with reference to figures 21a, 21b and figures 22, 23. The method disclosed with reference to figures 21a, 21b do not require ERB or CB data, but is determined during the fitting process by using the slider or other tool or interface to select or indicate a specific h value. In figures 22, 23 the (critical band ratio)
  • CBR in the information window indicates the specific h value for a user to use in the fitting critical bandwidth and sub band stages.
  • Figures 21a-23 describe methods of determining and using the h value to produce a customized model for each user i.e. hearing impaired user. 2. Finding of tonal and non-tonal components
  • Global masking thresholds are found by summing the powers corresponding to individual masking thresholds and threshold in quiet (absolute hearing threshold).
  • Spectral subtraction Frequency bins lower than the global masking threshold at corresponding sub band is subtracted.
  • the spectral subtraction can be implemented using any of the spectral subtraction methods described below. For example one exemplary method is described in section 4.3.2.5. Alternatively the spectral subtraction can be implemented using the steps as disclosed in section 5, as described herein.
  • an audio processing system 200 is shown using a customised critical bandwidth parameter (h).
  • the audio system broadly consists of an audio input block 202, audio output block 204, spectral analysis block 206, audio synthesis block 208, a psychoacoustic processing block 210, and a simplified fitting of critical bandwidth block 212.
  • the audio input block 202 may comprise an input buffer 214 for digital audio data.
  • the audio input block 202 may also comprise a microphone, or other sound recording device (not shown), and an analogue to digital element for converting analogue sound waves to a digital representation for further processing (not shown).
  • the audio output block 204 may similarly comprise an output buffer 216 for digital audio data being outputted by the system 200.
  • the audio output block 204 may also comprise a speaker, or other sound emitting device (not shown), and a digital to analogue converter for converting the digital representation of the audio into analogue sound waves (not shown.
  • the spectral analysis block 206 may comprise of an Overlap Add (OLA) Fast Fourier Transform (FFT) step 218 or any other audio spectral analysis techniques known in the art.
  • the psychoacoustic processing block 210 comprises multiple steps. Some of these steps may be performed concurrently, in parallel or sequentially depending on the
  • the psychoacoustic processing block may receive or retrieve a critical bandwidth parameter (h) at the first Fitting Critical Bandwidth (CB) stage 220.
  • the critical bandwidth parameter is received or retrieved from the Simplified fitting of critical bandwidth block 212.
  • the critical band rate (measured in Bark) is calculated using the critical bandwidth parameter.
  • the critical band rate may be calculated as a function of the critical bandwidth parameter.
  • the function may be in the form of a linear model equation. Alternatively the critical bandwidth parameter may represent the critical band rate exactly.
  • the Fitting CB stage 220 also calculates the masking function (or spreading function) 221.
  • the sub-band synthesis step 222 re-aligns the new sub-bands to improve the following computing load as a trade-off of frequency resolution.
  • the Fitting CB 220 and Sub-band synthesis blocks 222 may be replaced with re-alignment of the ERBi by frequency selectivity (h) block 220' .
  • This in effect replaces step 1 detailed immediately above where using the frequency selectivity (h), which is described in detail below in section 4.3.1.1, computes a new form of ⁇ with the ERB-rate (Equation l).
  • the finding tonal and non-tonal components step 224 comprises finding tonal and non-tonal components as well as the total power in each critical bandwidth.
  • the compute individual masking thresholds step 226 comprises calculating individual masking thresholds by summing the sound pressure level of the masking component, masking index, and masking function (or spreading function) 221 at each corresponding critical band rate. This is calculated for both tonal and non-tonal components.
  • the compute global masking thresholds step 228 comprises calculating the global masking threshold by summing the powers corresponding to the individual masking thresholds and absolute hearing threshold.
  • the final step for the psychoacoustic processing block 210 is the spectral subtraction step 210.
  • the spectral subtraction step 210 takes frequency bins lower than the global masking threshold (calculated in the previous step) at corresponding sub bands and subtracts them.
  • Spectral subtraction is described in further detail in section 4.3.2.5 of this specification with examples.
  • Spectral subtraction is also described in section 5 of this specification in more detail.
  • the spectral subtraction in section 5 is similar to that described in section 4.3.2.5.
  • the audio synthesis stage receives both the output of the spectral analysis block 206 and the output of the psychoacoustic processing block 210 and conducts an Overlap add (OLA) Inverse Fast Fourier Transform (IFFT) 232.
  • OLA Overlap add
  • IFFT Inverse Fast Fourier Transform
  • an active noise control (Filtered-X LMS) step 234 may be used to further process the audio.
  • Figures 2a and 2b describe a system using a critical bandwidth parameter (h) to calculate the critical band rate of a user.
  • h critical bandwidth parameter
  • different and/or more psychoacoustic model parameters could be determined such as those found using the "Full Psychoacoustic Model” method described below. These psychoacoustic model parameters may replace or supplement the different stages within the psychoacoustic processing stage 210.
  • the purpose of the signal processing of the audio signal based on the customised psychoacoustic model is to remove "irrelevant" signal information that interferes with dominant sound components of interest, without degrading of sound quality.
  • customisation of the psychoacoustic model is to individualize auditory filters that automatically or individually customize the masking index and spreading function to calculate or generate a customised global masking threshold in the model.
  • the audio processing of the system and method disclosed is configured such that masked sounds by tone and noise peaks are eliminated for the individual's particular hearing profile, such as taking into account any hearing loss. Eliminating these inaudible peaks reduces total energy in critical bandwidth, which benefits loudness of soft sound in interest for hearing loss. Speech includes formants. Music includes various harmonic/non harmonic peaks as well as vocal. The signal processing in the audio processor of this system and method therefore evokes less unnecessary excitation pattern and improves audibility of simultaneous sound, which is distinguished from intensity and pitch.
  • Figures 6a-6d show the processing effects generated by the audio processor based on the customised psychoacoustic model.
  • Total band power in a critical band of hearing loss was increased and because the spreading function of hearing loss would be wider. If unnecessary sounds were not eliminated, these sounds vibrate the basilar membrane when it arrives at the peripheral stage and spread the unnecessary excitation patterns that would lead to greater difficulty in picking out contrast between the peaks and dips (see left graph in Figure 6d).
  • the processing of the customised audio processor of the system and method filters peaks and non-peaks below global masking threshold (background components) from input sounds so that unnecessary excitation patterns do not happen. Also the excitation patterns by filtered sound have a good signal and noise ratio.
  • a full and complete psychoacoustic model of a user is taken.
  • an audio processor can be configured or modified based on the specifics of that user's model.
  • the required psychoacoustics assessments and calculations of absolute threshold, equivalent rectangular bandwidth, and masking index need to be conducted. These assessments can be used to define the user's individual psychoacoustic model or alternatively be considered as control parameters or input variables for modifying or adjusting a default psychoacoustic model to be a customised psychoacoustic model.
  • the "PSYCHOACOUSTIC" program may be used for the assessments.
  • PSYCHOACOUSTICS is a MATLAB toolbox implementing three classic adaptive procedures for auditory threshold estimation. The first includes those of the staircase family (method of limits, simple up-down and transformed up- down); the second is the PEST; and the third is the Maximum Likelihood Procedure. It will be appreciated that one or more other software or assessment systems could alternatively be used to assess the parameters of a full psychoacoustics model of the end listener.
  • the Staircase procedure is used in the assessments of the end user. Three procedures can be distinguished within this Staircase procedure category: the method of limits, the simple-up down and the transformed up- down.
  • the detection threshold is the minimum detectable stimulus level in the absence of any other stimuli of the same sort. In other words, the detection threshold marks the beginning of the sensation of a given stimulus.
  • the discrimination threshold is the minimum detectable difference between two stimuli levels. Therefore, for a given sensory continuum, the discrimination threshold cuts the sensory continuum into the steps into which it is divided.
  • the detection threshold can be estimated either via yes/no tasks or via multiple- alternative forced choice tasks (in brief, nAFC, with n being the number of alternatives).
  • the discrimination threshold on the contrary, must be estimated exclusively via multiple nAFC tasks.
  • yes/no tasks the subject is presented with a succession of different stimulus levels (spanning from below to above the subject's detection threshold) and is asked to report whether he or she has detected the stimulus (yes) or not (no).
  • nAFC task the subject is presented with a series of n stimuli differing in level.
  • the tasks are often multiple-interval tasks (i.e., mIO-nAFC).
  • one stimulus changes its level across the trials, whereas the level of the others (the standards) is fixed.
  • the difference between standard and variable ranges from below to above the subject's detection (or discrimination) threshold. After each trial, the subject is asked to report which was the variable stimulus.
  • absolute threshold is measured with a pure tone and follows steps as below.
  • auditory filter assessment (ERB, symmetric, level independent) using notched noise method is based on the simplified method, in which the filter shape is assumed to be symmetric.
  • the method includes for the determination of the masker level and for the estimation of the auditory filter shape from one masked threshold.
  • the ascending method with yes/no task is used to detect the signal threshold level.
  • Several center frequencies 250, 500, 1000, 2000, 4000 and 6000 Hz) and two loudness levels (“soft" and 40dB) are tested.
  • Probe tone is preferably 240ms in length (40ms rise/fall).
  • Masker is a white noise, ranging from OHz to the Nyquest frequency. A spectral notch is applied to the white noise.
  • the ' Simplified procedure' consists of both a masker level determination technique and an auditory filter shape estimation technique.
  • the masker level determination technique can determine the dynamic range of the auditory filter, and the auditory filter shape estimation technique uses only one measurement point corresponding to one masked threshold.
  • Noise masking tone masked threshold of a tone by uniform exciting noise.
  • Tone masking noise masked threshold of burst noise (critical band wide) by a fixed level of a tone at the centre frequency.
  • the MPEG psychoacoustic model was modified to use ERBs instead of critical bands.
  • a conversion engine is configured to convert the measured psychoacoustic model data into ERB related values. It will be appreciated by those skilled in the art that other psychoacoustic model software can be modified similarly or that the MPEG model could remain unmodified and not use ERBs instead of critical bands, but still otherwise be customised based on the individuals measured psychoacoustic data.
  • the conversion engine is a Matlab calculation program of ERB including p and r, ERB-rate and spreading function.
  • the conversion engine may be implemented as a software module implemented in a processor e.g. a processor of a hearing aid, or the conversion engine may be as a hardware module or as a firmware module in a hearing aid or other similar device or an audio prosthesis.
  • a processor e.g. a processor of a hearing aid
  • the conversion engine may be as a hardware module or as a firmware module in a hearing aid or other similar device or an audio prosthesis.
  • the conversion engine is configured to do the following:
  • ERP bandwidths
  • the "quick fitting method” is configured to assess or generate a single control parameter representing the user's hearing profile and which can be used to customise the psychoacoustic model to the user.
  • the system is configured to convert or extrapolate the single hearing characteristic or control parameter (referred to as an h value) of a user to generate the user's individual psychoacoustic model.
  • This method is advantageous because it provides an easier fitting method as it does not require determination of ERB and provides a simpler fitting method that is customised for each user. It will be clear from the following equations and steps that being able to assess a user's particular h value will enable various aspects of a psychoacoustic model to be customised for that particular user.
  • Sections 4.3.1 and 4.3.2 provide detailed examples and explanations for modelling of psychoacoustics fitting. More specifically sections 4.3.1.1 - 4.3.1.4 in particular are presented below as additional examples and explanations to supplement the quick fitting methods described later within this specification. Further Sections 4.3.2.1 - 4.3.2.5 provide additional details regarding speech enhancement and how speech enhancement can be achieved following the quick fitting method. Sections 4.3.2.1 - 4.3.2.5 provides additional examples and explanations that supplement the quick fitting methods described herein. 4.3.1 Modelling of Psychoacoustics fitting 4.3.1.1 Frequency Selectivity
  • CB refers to critical bandwidth and x represents distance on the basilar membrane measured in critical bands as the unit of distance.
  • the x is called critical band rate (Bark), a is often defined as lambda.
  • Figure 7a shows Critical bandwidth (ERB) and Figure 7b Critical band rate (ERB rate) of Glasberg and Moore (1990), Greenwood (1961) and Zwicker (1961) which was currently provided in ISO 11172-3 by
  • the ERB determinied by each of the known approaches is shown as its own distinct line and labelled accordingly in the figures.
  • the Glasberg and Moore ERB is shown as line 702
  • the Greenwood method ERB is shown by line 704
  • the Zwicker method ERB is shown by line 706.
  • ERB are not necessarily required in the fitting method according to the present invention. However it should be understood that the processing techniques described in this section may be used as part of the processing method according to the present invention.
  • the ERB rate line calculated by the Glasbery and Moore approach is represented by line 712.
  • the ERB rate determined by the Greenwood approach is shown by line 714 and the ERB rate as per the Zwicker approach is shown by line 716.
  • Figure 8a shows ERB plotted vs Frequency.
  • Figure 8b shows ERB plotted against ERB-rate (i.e. bark).
  • the darkest line i.e. .the black line indicates normal hearing (young) model i.e. line 802 on both figures 8a and 8b represents normal hearing. .
  • Frequency selectivity Rb for individuals can be described as division of the individual ERB (ERB') by average ERB of normal hearing (ERB.) at each frequency:
  • Figure 8c shows a plot of ERB in Hz vs ERB rate for an h parameter between 1.10 and 0.9 with intervals of 0.1.
  • Masking index is denoted with “tone-masking-noise” (Schroeder, 1979) and “noise- masking-tone” (Zwicker and Fasti, 2006). In this test, the inventors fixed the parameter the same as normal hearing defined in ISOl 1172-3 due to no data on hearing loss.
  • p is a parameter which determines both the bandwidth and the slope of the skirts of the symmetrical auditory filter auditory filter.
  • the equivalent rectangular bandwidth (ERB) is equal to 4f0/p.
  • Figure 10 shows the ratio of spreading functions.
  • Rb (Nakaichi and Sakamoto, 2007) was suggested as division of ERB of individual participant by average ERB of normal hearing at each frequency. Rb is described as below
  • Rb is a reciprocal of (vf )Vf
  • Figure 11a shows a model of Rb (i.e. the degrade of frequency selectivity).
  • Figure 1 la is a plot of a model of Rb vs frequency for different h values. As can be seen from the model the degrade value changes as the frequency increases for different h values.
  • Figure 1 lb shows test results of Rb for various frequencies and for various h values of users, h values of 1.15 and 1.20 have been omitted due to only a single sample being obtained.
  • the modification of the psychoacoustic model is to individualize auditory filters that automatically customizes spreading function to calculate appropriate global masking threshold for each user.
  • the speech enhancement works to eliminate frequency bins that were not only inaudible but also it would reduce total energy in critical bandwidth that would benefit for hearing loss' audibility to detect sound in interest without masking, such as formants and consonants in speech in noise and the same in vocal with various harmonic/non harmonic peaks of background musical instruments in music. The results indicate this speech enhancement provide benefits on loudness.
  • This speech enhancement is considered not only for hearing aid DSP but also that it can be independent from hearing aid DSP and working as the front-end (pre-processing) as it would increase SNR around peaks before they are compressed by recruitment of hearing loss. Furthermore, appropriate volume control (e.g. equalize RMS) would increase speech intelligibility or audibility of the peaks thanks for the benefit of this SNR improvement.
  • Figures 12a and 12b show the modelling of ERB curves plotted over a range of frequencies.
  • Figure 12a shows the ERB across frequencies.
  • Figure 12b shows the E(bark) scale across frequencies.
  • the dotted line in figures 12a and 12b shows the ERB curve calculated by the Moore and Glasberg (1990) method.
  • the solid line represents the model of a user according to the present invention.
  • the model is derived for a h value of 1.2.
  • Figure 12b shows the same data as figure 12a plotted on E (bark) scale.
  • the model according to the present invention provides a model of ERB at least as accurate as the Moore and Glasberg model.
  • the present invention is advantageous since ERB does not need to calculated as part of the fitting method, hence providing a faster fitting method.
  • Psychoacoustic model I and II in IS011172-3 are also shown for comparison.
  • the power of the spectral lines is summed to form the sound pressure level of the new non-tonal component corresponding to that critical band.
  • Index number k of the spectral line nearest to the geometric mean of the critical band and sound pressure level Xnm (k) in dB were listed.
  • LTtm and LTnm are the individual masking thresholds at critical band rate z(i) in Bark of the masking component at the critical band rate z(j) in Bark.
  • Xtm[z(j)] is the sound pressure level of the masking component with the index number j at the corresponding critical band rate z(j).
  • av is called the masking index and vf the masking function of the masking component Xtm[z(j)].
  • the masking function vf is also so called as spreading function.
  • the masking index av is different for tonal and non- tonal masker (avtm and avnm).
  • Global masking threshold is found by summing the powers corresponding to individual masking thresholds and threshold in quiet (absolute hearing threshold)
  • the global masking threshold LTg(i) at the i'th frequency sample is derived from the upper and lower slopes of the individual masking threshold of each of the j tonal and non- tonal maskers, and in addition from the threshold in quiet LTq(i).
  • the global masking threshold is found by summing the powers corresponding to the individual masking thresholds and the threshold in quiet.
  • FIG. 14a and 14b show a sample processing result with a 16-bit input of continuous complex sound which is composed of five tones and white noise.
  • the tone frequencies are 440, 880, 1320, 1760 and 2200 Hz and the sampling frequency is 44,100Hz.
  • Glasberg and Moore 1990 is used for the psychoacoustics model.
  • FFT shift size 512
  • Figure 14a and 14b show the dashed line being the original signal and the solid line represents the processed signal.
  • Figure 14a shows a result with a h value of 1.0.
  • Figure 14b shows a result with a h value of 1.2.
  • the original signal is shown by line 1402 (i.e. the dash dot line).
  • Line 1406 denotes the sum of the global masking threshold (i.e. the sum of the individual masking thresholds).
  • Line 1406 shows a line of the masking thresholds.
  • Figures 15a and 15b show the 1/3 octave analysis of an audio frame.
  • the band between 1st and 2nd tones and other 'valley' bands are suppressed while bands including tones are not changed.
  • the dark bars show the original sound signal and the white bars (i.e. light coloured bars) show the processed audio frame with a h value of 1.2
  • Figure 16 shows spectrograms of 'ba' (S-67 Table No. l, crop from CD2 Track 209, 1.15.800-1.16.200) used in this testing.
  • the upper left quadrant 1602 shows original speech.
  • the upper right quadrant 1604 shows processed speech with an h value of 1.2.
  • the lower left quadrant 1606 shows original speech with noise.
  • the lower right quadrant 1608 shows the processed speed with noise, having an h value of 1.2.
  • the spectral subtraction helps to enhance the speech signal over the noise signal.
  • Figures 17a and 17b show spectrograms of music used in this testing (No.47, RWC- MDB-P-2001, RWC Music Databese, Goto et al., 2002).
  • Figure 17a shows original speech with noise in it as a signal.
  • Figure 17b shows processing using the improved processing method described herein.
  • figure 17b shows processed speech with noise with a h value of 1.2.
  • Noise herein means signals that are not the signal of interest e.g. speech signals.
  • the figures 16, 17a, 17b show an improvement of the listenability of the speech signal.
  • Praat version 6.0.21 (www.praat.org) were used to create the Figures 16, 17a and 17b.
  • CB refers to critical bandwidth
  • x represents distance on the basilar membrane measured in critical bands as the unit of distance.
  • the x is also called critical band rate (Bark).
  • an h value is defined as a proportional parameter which represents a variety of individual hearing, the CB' can be denoted as,
  • the critical band rate (x) at a frequency would be smaller, indicating one critical band longer than 1mm and total number of critical band would be reduced.
  • x can be calculated at a frequency
  • an individual's critical bandwidth, CB' can be calculated by finding an appropriate h value.
  • Figure 18a plots the bandwidths against frequency and Figure 18b plots the bandwidths against bark.
  • Masking index is denoted with “tone-masking-noise” and “noise-masking-tone”.
  • i is the index of the spectral line at which the masking function is calculated andj that of the masker.
  • the masking function which is the same for tonal and non-tonal maskers, is given by:
  • X[z(j)] is the sound pressure level of the j'th masking component in dB.
  • roex(p) filter shape The simplest expression of auditory filter that was called as roex(p) filter shape, where p is a parameter which determines both the bandwidth and the slope of the skirts of the auditory filter. The higher the value of p, the more sharply tuned is the filter.
  • the equivalent rectangular bandwidth (ERB) is equal to 4f0/p. (fO : center frequency of the filter)
  • FIGS 21a-23 Various embodiments of the GUI assessment tools can be seen in Figures 21a-23.
  • These example quick fitting embodiments use sets of steps and applications for determining the single hearing characteristic or single configuration parameter (h value) of an individual.
  • the h value may represent a number of different things depending on the example provided.
  • the h value may directly or indirectly represent an individual listener's hearing.
  • the h value may, alternatively or additionally, be used as an input to a function to represent an individual's hearing.
  • the h value may, alternatively or additionally, be used as, or be, an index to a group of pre-calculated/known hearing characteristics appropriate for different hearing impairments.
  • the computer programs and interfaces described below are implemented on a general purpose desktop computer.
  • the steps above may be carried out directed by a user while under direction of a clinician or by following a manual. Alternatively, the adjustment steps may be automated and require that the user simply input their responses to each adjusted sound.
  • these programs could be implemented on any computing device comprising an interface, not just the general purpose desktop computer shown in the examples.
  • the GUI assessment tool could be implemented in any form of application program and may be accessible via any suitable platform.
  • the application program could execute on a general purpose computer, or a website application program, or a smart phone application program for example.
  • adjustable user interface elements of the GUI shown in the following examples are illustrative only and may be any other adjustable graphical user interface elements suitable for adjusting parameters including but not limited to: toggle switches, drop down menus, check boxes, radio buttons, numerical inputs, slider scales, dials or the like.
  • 10 consonant-noun-consonant (CNC) stimuli were mixed with white-noise at optional SNR values (E.g. -5, 0 and 5 dB SNR) with or without processing from seven h values.
  • a computer program assessment tool comprising a GUI of multiple panes and views 1900 and 1950 is configured to assist in this testing.
  • An example embodiment of the psychoacoustic GUI assessment tool can be seen in Figures 21a and 21b.
  • speech scores across 10 CNC words in white-noise for each h value at an SNR value where the participant achieves a preference score (from 1 :poor to 5:excellent) were compared ( ⁇ 2 minutes testing time).
  • the noise component could be enabled/disabled using a toggle switch 1902 on the GUI 1900.
  • the processing can be enabled/disabled using a toggle switch 1904 and the volume could be adjusted using a slider 1906, both present on the GUI 1900.
  • the GUI 1900 the user or clinician conducting the test with a user is able to manually adjust SNR and h value for processing of selected speech-in-noise files using a wheel 1908 and slider 1910 respectively.
  • the fitting GUI pane 1950 of the computer program was used to approximate individual CB or h.
  • the fitting GUI pane 1950 allows the user to play back randomly selected pre- processed CNC words in white-noise at a selected SNR.
  • a folder 1952 containing 10 CNC words in white-noise corresponding to the participants -50% score SNR value was selected on the GUI 1950.
  • the 10 stimuli were processed either with the four different h values, or had no processing (50 stimuli in total).
  • the unprocessed stimuli were used to see if h value processing produced significantly different scores to the original sound file. Phonemes and words correct were scored and the h value yielding the highest overall score was noted. 4.3.8 Second Quick Fitting Example Embodiment
  • Figure 22 shows another GUI interface 2000 for a second method of quick fitting.
  • Different listening data may be loaded into this interface by using the "Load Data” button 2002.
  • the location of the current listening data being presented in GUI 2000 is in text box 2004.
  • Metadata and other information about the data file is presented under the title "Information” 2006. This metadata includes the date the test data was recorded, basic information about the listened and details on the processing and frequencies of the audio and being tested.
  • Interface 2000 comprises a graph display 2052 for showing an individual's listening characteristics at particular frequency points 2054.
  • the user's listening capabilities were tested on the following frequencies: 250, 500, 1000, 2000, and 4000Hz.
  • the listening assessments were performed using standard listening tests at particular frequencies. It will be appreciated than any number of frequencies and other ranges or values of frequencies may be chosen. It will be appreciated also that assessing more frequencies will give more information to assist the clinician in their assessment, however will take longer to assess overall.
  • dashed curve fitting line 2056 is generated using standard curve fitting techniques.
  • the model line 2058 represents a standard user's listening ability using a standard, un-customised psychoacoustic model.
  • the "fit" line 2060 is also shown on the graph. This fit line 2060 is discussed further below.
  • This interface 2000 has two primary purposes, firstly to show a user or physician the hearing characteristics of an individual and also to attempt to fit an appropriate critical band width ratio such that the fit line 2060 matches the user's listening ability as close as possible. Using a critical bandwidth ratio of 1.0 will offer no modification to an undamaged user's hearing characteristics and as such the fit line 2060 will match the model line 2058. For users with damaged hearing, such as the user presented in the graph display 2052 with listening points 2054, a critical bandwidth ratio (or h value) greater than 1.0 must be used to match the fit line 2060 as close as possible to the curve fitting line 2056. Slider 2008 can be moved to modify the critical bandwidth ratio. Interface 2000 has a critical bandwidth ratio of 1.5 selected as shown in information box 2006 under the value "CBR".
  • a clinician or a user may export the data using the "Export Results" button 2010.
  • the exported data contains the critical bandwidth ration, or h value, which is then used to generate or modify a custom psychoacoustic model.
  • FIG. 23 shows a GUI interface 2100.
  • This interface 2100 has similar features to the second quick fitting example embodiment. These similarities include the "Load Data” button 2102, the file location text box 2104, the information section 2106, the slider 2108 and the "Export Results” button 2110. These GUI components all perform similar functions as in the second quick fitting example embodiment.
  • This embodiment uses frequency selectivity (Rb) to determine h.
  • An individual's frequency selectivity is the ratio between the ERB of that individual (called ERBi) to the ERB of a person of normal hearing (called ERB n ). It can be calculated using the following equation: where f c denotes centre frequency and x denotes sensation level.
  • ATH absolute thresholds of hearing
  • ERB were measure. In the current example, 5 frequencies were assessed: 250, 500, 1000, 2000, and 4000Hz.
  • the listening assessments were performed using standard listening tests that are used to assess a user's ATH and ERB at particular frequencies. It will be appreciated than any number of frequencies and other ranges or values of frequencies may be chosen. It will be appreciated also that assessing more frequencies will give more information to assist the clinician in their assessment, however will take longer to assess overall.
  • the clinician loads the data obtained from the standard tests using the "Load Data" button 2102 and the file location text box 2104.
  • the graph display 2152 shows the user under test's Rb data points 2154 represented as circles. These data points 2154 are calculated using the equation for Rb above and inputting a normal user's ERB for the given frequencies and the user under test' s ERB results.
  • the clinician then adjusts the Critical Bandwidth Ratio slider 2108 so that the critical bandwidth ratio line 2160 is close to the average Rb. It will be appreciated that this step may be automated and/or replaced by a mathematical algorithm to auto-fit a value to a data set.
  • the quick fitting methods described herein can be used by clinicians to fit a hearing aid to a hearing impaired person. Similar fitting methods may also be used for fitting cochlear implants.
  • the fitting methods described herein are advantageous because they provide a faster fitting method and provide a customised profile i.e. customised model for each user.
  • the additional sections 4.3.1.1 - 4.3.1.4 and 4.3.2.1 - 4.3.2.5 provide additional examples and explanation to supplement the herein described fitting methods.
  • a custom audio processor based on the individual's hearing can be generated or modified.
  • the approximation of the individual's hearing may come in the form of an h value or psychoacoustic model.
  • an audio processor capable of receiving an individual's h value, psychoacoustic model, or other data indicative of a user's hearing at run time may also be used.
  • any encoder used could be modified to receive data indicative of an individual's psychoacoustic model at run time as a configuration option or using any other method of receiving or retrieving data when in use.
  • the following examples refer to the psychoacoustic model ISOl 1172 which is used in the MPEG-Audio standard.
  • the TwoLAME implementation of the MPEG- Audio standard is used.
  • Other psychoacoustic models, encoding standards, and implementations thereof are known in the art.
  • a person skilled in the art will appreciate that other psychoacoustic models and encoding standards or implementations may also be modified in a similar way to incorporate a user's individual hearing characteristics.
  • Figure 3b shows a modified MPEG- Audio encoder 350 using a custom psychoacoustic model 354.
  • the customisation has been based off psychoacoustic assessments 360.
  • the customised psychoacoustic model 354 is modified to use filtering techniques for the individual parameters.
  • This MPEG- Audio encoder 350 takes digital audio on the input 362 and outputs an encoded bitstream 364.
  • This encoded bitstream could be stored as a lossy digital audio file, such as an MP3 or MP2 file, on a computer readable medium.
  • the encoded bitstream 364 is not limited to just storing the audio and could be used in any audio processing system, including a hearing aid audio processing chain.
  • the audio encoder "TwoLAME" has been modified at compile time to be customised to the individual's hearing.
  • TooLAME is a free software MPEG-1 Layer II (MP2) audio encoder written primarily by Mike Cheng, and is an exemplary audio encoder. While there are innumerable MP2 encoders, TooLAME is well-known and widely used for its particularly high audio quality. It has been unmaintained since 2003, but is directly succeeded by the TwoLAME code fork (the latest version, TwoLAME 0.3.13, was released January 21, 2011). 5.1 Full Psychoacoustic Method
  • the TwoLAME audio encoder is modified to support individual psychoacoustic characteristics.
  • Psychoacoustic model 1 described "step 3 : decimation and reorganization of maskers". The number of maskers is reduced using two criteria: 1. Any tonal or noise maskers below the absolute threshold are discarded.
  • Sliding 0.5 bark-wide window is used to replace any pair of maskers occurring within a distance of 0.5 bark by the stronger of the two.
  • PSYCHOACOUSTIC DATA - Common.h was modified to import individual psychoacoustics data. This header file should be updated before compile as individual application.
  • ATH - ath_dB() in ath.c is modified to support individual hearing loss.
  • the ATH is calculated by using coefficients of a polynomial expression which is estimated from 125-16000Hz.
  • BARK - ath_freq2bark() in ath.c is modified to support individual hearing loss.
  • the BARK is calculated by ERBrate fitting parameters of Greenwood, 1961 (A and lambda).
  • SPREADING FUNCTION psycho_3_threshold() in psycho_3.c is modified to support individual hearing loss.
  • the spreading function is derived by the ERBrate fitting parameters and ERB fitting parameters.
  • MASKING INDEX Individual masking index for tone (TMN) and noise (NMT) is newly implemented in the above SPREADING FUNCTION algorithm.
  • a spectral subtraction process can also be implemented to improve the processing of received audio signals to improve the listenability of the received audio signals.
  • the spectral subtraction process also helps to remove signals i.e. frequency bins that are below a global masking threshold, thereby improving the listenability of speech or harmonic tones when there is additional non tonal signals present.
  • the spectral subtraction method is customised for each user based on their psychoacoustic model.
  • An exemplary spectral subtraction method is described with reference to figures 28a-28g.
  • the spectral subtraction process described herein is similar to the earlier described spectral subtraction process.
  • the description with respect to figures 28a-28g provides a step by step explanation of the spectral subtraction process.
  • the line represents an original spectrum. Harmonic tones 440 Hz, 880 Hz, 1320Hz, 1760Hz and 2200Hz were used with white noise added.
  • the line 2802 represents the original spectrum of signal i.e. harmonic tones and noise components.
  • the dots 2810, 2812, 2814, 2816 and 2818 are the harmonic tones disclosed.
  • the dots 2820 (i.e. the light coloured dots) represent peaks of white noise.
  • Dot 2830 represents an eliminated tonal sound. Dots 2810 - 2818 are tonal sounds and dots 2820 are non tonal sounds. For clarity the white noise peaks 2820 are not identified on the rest of the figures.
  • Figure 28b shows individual masking thresholds being determined. Individual masking thresholds are given by summing sound pressure levels of the masking component, masking index and masking function (i.e. spreading function), at a corresponding critical band rate. The various masking thresholds for each tonal and non tonal component are calculated. These can be calculated as described in section 4.3.2.3 as described above using the mathematical formulae described in section 4.3.2.3. The masking thresholds are represented by lines 2830, and are shown in faint lines for clarity. Only 3 masking thresholds have been labelled for clarity.
  • Figure 28c shows a global masking threshold line 2840 that is overlaid onto the original spectrum line. The global masking threshold is preferably determined by summing the powers corresponding to individual masking thresholds i.e. the global masking threshold is determined from the sum of individual masking thresholds. The global masking threshold is represented as a dashed line.
  • Figure 28d shows the processed spectrum.
  • the processed spectrum is represented by line 2850.
  • the processed spectrum is 2850 is determined by spectral subtraction.
  • the processed spectrum 2850 In the illustrated example of figure 28d, there is a lOdB attenuation under the global masking threshold. For example Leq: -4.4dB in figure 28d.
  • This exemplary processing using spectral subtraction has a large impact for hearing loss due to improved recruitment of frequency bins i.e. improved recruitment of the cochlea of the user.
  • Figure 28e shows a one third octave spectrum for normal hearing.
  • Figure 28b shows an averaged ERB.
  • Figure 28f shows a one third octave spectrum for a hearing impaired user i.e. a user with hearing loss.
  • the raw signal i.e. prior to filtering
  • the filtered parts i.e. where the output is reduced is shown by bars 2862.
  • the Quick Fitting Method produces data indicative of a user's critical band bandwidth.
  • This data indicative of a user's critical band bandwidth is called an h value.
  • Also described above is using the h value to modify the functions which comprise a psychoacoustic model. It will be appreciated to a person skilled in the art that in some embodiments a psychoacoustic model can be modified similarly to the full
  • the pre-processed audio may be saved in many different formats, such as, but not limited to: digital audio files (MP3, FLAC, or WAV for example), CDs, DVDs, Blu-rays, vinyl, and cassette tapes. These pre-processed audio formats may then be delivered to an end user as-is. The end user may play the audio without using any specialised hardware or software. Alternatively, the pre- processed audio may also be streamed to a user directly without being saved into an intermediate format. This alternative has the same advantage of not requiring specialised hardware or software at the end's end.
  • pre-processing audio By pre-processing the audio into a standardised format, the end user is not required to operate the complex task of encoding the audio themselves. This both simplifies the process. Pre-processing audio also saves time as the audio processing doesn't need to occur whenever a user wants to listen to the audio.
  • An example of pre-processing audio is in the pre-prepared audio samples used above as described in the section titled "Measuring h". In this example, multiple different processes were applied to the same audio samples.
  • the standard hearing aid fitting process may also include a fitting process in accordance with the embodiments present above to generate or modify a custom psychoacoustic model.
  • the quick fitting system can control the sloping of the spreading function (vf) by a control slider controlling the critical bandwidth via the GUI assessment tool described above Alternatively, the control slider may control the frequency selectivity from which the critical bandwidth may be calculated. In some embodiments, this processing may improve SNR around peaks followed by Wide Dynamic Range Compression (WDRC) in the hearing aid. Appropriate volume control of hearing aid increases speech intelligibility or audibility of the peaks due to the benefits of the SNR improvement.
  • WDRC Wide Dynamic Range Compression
  • the quick fitting system is also configurable or operable to adjust independently between the critical bandwidth, masking index and spreading function.
  • the fitting step will be performed once and saved into a memory location in the Hearing Aid DSP chain system.
  • the user may adjust the control slider during use to suit the situation or if the user's preferences change.
  • FIG 1 shows a suggested use of an encoder 102 with custom psychoacoustic model 106 in audio processing chain preceding a hearing aid 104. Also shown is the wide dynamic range compression (WDRC) processing block 108.
  • the WDRC block 108 is an audio processing block commonly used in hearing aids (HA).
  • HA hearing aids
  • the encoder with custom psychoacoustic model can also work as a standalone audio processing block.
  • Data indicative of an individual's psychoacoustic model may also be used to create a standalone audio processor for use in any other real time audio processing applications. This will be useful for a listener with damaged hearing that may not have access to the original recording to modify it or may not be able to modify the original source.
  • These other real time audio applications will be known to a person skilled in the art. Some examples may be: music and other audio CDs or vinyl, digital music or other audio stored on a user's device, movie sound tracks, radio broadcasts, internet radio, and music streaming services.
  • the psychoacoustic models generated by both the first and second embodiments have demonstrated that they are capable of identifying and removing the inaudible components of the audio provided. Removing these inaudible sounds results in an improvement in the listening experience for a hearing impaired user.
  • the psychoacoustic models generated have both of the embodiments have generated modifications of the MPEG-Audio psychoacoustic software however other encoders or psychoacoustic models may be modified, or an entirely new psychoacoustic model software or hardware could be made using the principles of the first and second embodiments.
  • An advantage of these embodiments of the invention is the flexibility and usage in different signal processing chains.
  • Other encoder's psychoacoustic models could also be modified based on the same techniques described in this disclosure.
  • the flexibility also allows the psychoacoustic model to be used in combination with any other audio processing devices, or as a standalone processing block.
  • either embodiment may be used as a front end to a user's hearing aid or as a standalone processing block taking arbitrary audio and outputting audio which has been improved based on an individual's particular hearing characteristics.
  • the second embodiment allows for the process of assessing a user's hearing characteristics to be performed quickly and simply. Any time saved is good for both the user being assessed and the clinician performing the assessment. Using a simple, structured GUI also both speeds up the process by reducing mistakes and provides a more pleasant experience for the user and clinician. Other protocols for measuring critical bandwidth (or ERB), tone masking noise and noise masking tone take an unreasonable amount of time for practical use.
  • the processing effect was investigated by testing speech intelligibility and musical preference (on loudness, fullness, clearness, naturalness and dynamics) with 34 elderly participants.
  • the speech intelligibility Japanese monosyllables of the speech-to-noise ratio by 5 dB
  • Rating of the pop music consisting of male vocal and musical instruments, revealed significantly higher ratings in the processed music conditions. In particular, loudness preference was significantly improved.
  • ATH was correlated moderate strongly with ERB (.661, p ⁇ 01), Rb (.666, p ⁇ 01) and h (.631, p ⁇ 01).
  • the scatter data of ATH and Rb shown in Figure 24 indicates the values increasing proportionally with the increase of ATH up to 4.0. As shown in figure 24 a number of different values of the Rb of the auditory filter for elderly participants (i.e. test subjects).
  • the index h was strongly correlated with the ERB (.894, p ⁇ 01) and Rb (.903, p ⁇ 01). h is also moderate strongly correlated with ATH (.631, p ⁇ 01) and SRT (.675, p ⁇ 01). h is not correlated with SI.
  • Figures 25a and 25b show comparison of the frequency selectivity model and result of the average of Rb from our measurement data. Values in legend indicate h. Rb at 1000 Hz and lower frequencies were reasonably matched to the model, however Rb at 2000 and 4000 Hz were spread out.
  • Figure 25a shows the frequency selectivity model and the measurement data is shown in figure 25b. The legend indicates the various h values used.
  • embodiments may be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof.
  • the program code or code segments to perform the necessary tasks may be stored in a machine-readable medium such as a storage medium or other storage(s).
  • a processor may perform the necessary tasks.
  • a code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements.
  • a code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc.
  • a storage medium may represent one or more devices for storing data, including read-only memory (ROM), random access memory (RAM), magnetic disk storage mediums, optical storage mediums, flash memory devices and/or other machine readable mediums for storing information.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk storage mediums including magnetic disks, optical storage mediums, flash memory devices and/or other machine readable mediums for storing information.
  • machine readable medium and “computer readable medium” include, but are not limited to portable or fixed storage devices, optical storage devices, and/or various other mediums capable of storing, containing or carrying instruction(s) and/or data.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, circuit, and/or state machine.
  • a processor may also be implemented as a combination of computing components, e.g., a combination of a DSP and a microprocessor, a number of
  • microprocessors one or more microprocessors in conjunction with a DSP core, or any other such configuration.
  • the methods or algorithms described in connection with the examples disclosed herein may be embodied directly in hardware, in a software module executable by a processor, or in a combination of both, in the form of processing unit, programming instructions, or other directions, and may be contained in a single device or distributed across multiple devices.
  • a software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD- ROM, or any other form of storage medium known in the art.
  • a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium.
  • the storage medium may be integral to the processor.
  • One or more of the components and functions illustrated the figures may be rearranged and/or combined into a single component or embodied in several components without departing from the invention. Additional elements or components may also be added without departing from the invention. Additionally, the features described herein may be implemented in software, hardware, as a business method, and/or combination thereof.
  • the invention can be embodied in a computer-implemented process, a machine (such as an electronic device, or a general purpose computer or other device that provides a platform on which computer programs can be executed), processes performed by these machines, or an article of manufacture.
  • a machine such as an electronic device, or a general purpose computer or other device that provides a platform on which computer programs can be executed
  • Such articles can include a computer program product or digital information product in which a computer readable storage medium containing computer program instructions or computer readable data stored thereon, and processes and machines that create and use these articles of manufacture.

Abstract

Disclosed herein is an audio system and method for hearing impaired. The audio system and method may be utilized in hearing aids, portable audio devices, or in any generic audio file processing. The method for hearing impaired helps to improve the listenability of an audio signal for a hearing impaired listener. The method may be implemented by a processing device having associated memory. The method includes the steps of receiving or retrieving input audio signal; receiving or retrieving listening data indicative of the hearing impaired listener's hearing characteristics; generating or modifying a customised psychoacoustic model based on the listening data; processing the input audio signal to identify and remove or at least partially attenuate inaudible or unhelpful spectral components in the input audio signal based on the customised psychoacoustic model; and generating a modified output audio signal based on the processing that is customised for the hearing impaired listener.

Description

AUDIO SYSTEM AND METHOD FOR HEARING IMPAIRED
FIELD OF THE INVENTION
The invention relates to an audio system and method for hearing impaired. In particular, although not exclusively, the audio system may be utilized in hearing aids, portable audio devices, or in any generic audio file processing.
BACKGROUND TO THE INVENTION
In 2005 the World Health Organisation (WHO) estimated that about 278 million people had moderate to profound hearing impairment. These people all struggle to hear the many complex sounds that unimpaired people take for granted. Hearing loss can take on many forms, including a reduced audibility (loudness), recruitment (rapid growth of perceived loudness) and frequency selectivity that exaggerateperceptions of intelligibility and preference of the sounds. In particular, speech in background noise and music is difficult for those with hearing impairment. Speech includes formants that were often masked in background noise. Music includes voice and/or various harmonic musical instruments. These sounds are often loud, dynamic and overlapped.
The spectral shape of a sound is represented in the excitation pattern evoked by the sound. The reduced frequency selectivity in hearing loss is usually reduced and results in less detail about the spectrum than would be the case for a normal ear.
To compensate for hearing loss, the hearing impaired use cochlear implants and hearing aids. The exaggerated perceptions in hearing loss are reasonably compensated in many methods but there still be some users who often complain about especially intelligibility of speech in noise situations and quality of music; especially when complex sounds are loud, dynamic, and overlapped.
Measuring a psychoacoustic factor, such as critical bandwidth CB (or sometimes called Equivalent Rectangular Bandwidth ERB), tone masking noise, noise masking tone and auditory filters are a practical difficulty at audiology clinics as these measurements for individuals are very time consuming. This is because CB or ERB varies by center frequency. To compensate for an individual's CB or ERB, these must be measured at 125, 250, 500, 1000, 2000, 4000 Hz for both ears, at least. With existing simplified methods it takes on average 96 minutes in total testing time, which is generally unacceptable at clinics. Moreover, psychoacoustic testing (to perceive a tone in noise) is very difficult for patients thus usually more time is needed to complete such a test.
Therefore, due to these issues psychoacoustic measurement (CB/ERB) and any compensation of CB or ERB have not been widely conducted in fitting hearing aids or the like. In this specification where reference has been made to patent specifications, other external documents, or other sources of information, this is generally for the purpose of providing a context for discussing the features of the invention. Unless specifically stated otherwise, reference to such external documents is not to be construed as an admission that such documents, or such sources of information, in any jurisdiction, are prior art, or form part of the common general knowledge in the art.
SUMMARY OF THE INVENTION
It is an object of the invention to provide a method and/or system for improving the listenability of audio for the hearing impaired, or to at least provide the public with a useful alternative.
In a first aspect, the present invention broadly consists in a method of improving the listenability of an audio signal for a hearing impaired listener, the method implemented by a processing device having associated memory, comprising receiving or retrieving input audio signal; receiving or retrieving listening data indicative of the hearing impaired listener's hearing characteristics; generating or modifying a customised psychoacoustic model based on the listening data; processing the input audio signal to identify and remove or at least partially attenuate inaudible or unhelpful spectral components in the input audio signal based on the customised psychoacoustic model; and generating a modified output audio signal based on the processing that is customised for the hearing impaired listener. In an embodiment, the listening data comprises a single hearing characteristic or single configuration parameter for generating or modifying the customised psychoacoustic model. In an embodiment, the listening data contains all of the parameters for generating or modifying the customised psychoacoustic model.
In an embodiment, the single hearing characteristic or single configuration parameter is indicative of the listener's auditory filter bandwidth.
In an embodiment, the listener's auditory filter bandwidth is a function of the single hearing characteristic or single configuration parameter.
In an embodiment, the single hearing characteristic or single configuration parameter indexes which auditory filter bandwidth of a selection of different auditory filter bandwidth approximates the listener's auditory filter bandwidth.
In an embodiment, the single hearing characteristic or single configuration parameter modifies a default auditory filter bandwidth.
In an embodiment, the single hearing characteristic or single configuration parameter represents the listener's proportional difference between the default auditory filter bandwidth and the listener's auditory filter bandwidth. In an embodiment, the default critical band bandwidth is an average person's auditory filter bandwidth.
In an embodiment, the single hearing characteristic or single configuration parameter is generated as output from an electronic psychoacoustic assessment system.
In an embodiment, the electronic psychoacoustic assessment system comprises a GUI. In an embodiment, the GUI comprises an adjustable graphical user interface element which modifies a control variable.
In an embodiment, the control variable is the single hearing characteristic or single configuration parameter.
In an embodiment, the GUI comprises a graph display.
In an embodiment, the graph display comprises the user's listening assessment data.
In an embodiment, adjusting the control variable adjusts a plot displayed in the graph display.
In an embodiment, the plot displayed represents a user's average frequency selectivity.
In an embodiment, the user's single hearing characteristic or single configuration parameter is derived from the user's average frequency selectivity.
In an embodiment, the plot displayed represent the user's single hearing characteristic or single configuration parameter.
In an embodiment, the adjustable graphical user interface element is one or more of: toggle switch, drop down menu, check box, radio button, numerical input, slider scale, or dial.
In an embodiment, the auditory filter bandwidth is a user's critical band bandwidth.
In an embodiment, the auditory filter bandwidth is a user's effective rectangular bandwidth (ERB). In an embodiment, receiving listening data further comprises generating or determining additional listening data indicative of additional hearing characteristics of the listener based on the single hearing characteristic or single configuration parameter. In an embodiment, the additional listening data is indicative of any one or more of the following hearing characteristics of the listener: the listener's tonal masking index, noise masking index and/or spreading function.
In an embodiment, processing the input audio signal comprises fitting critical bands of audio of the input audio signal based on the listening data indicative of the listener's critical band bandwidth; determining an individual masking threshold for each critical band of audio; determining global masking thresholds based on the determined individual masking thresholds; and spectrally modifying the input audio signal based on the determined global masking thresholds.
In an embodiment, determining an individual masking threshold for each critical band comprises determining a sound pressure level of a masking component in the critical band of the input audio signal; determining at least one masking index based on the listening data indicative of the critical band bandwidth of the listener; determining a spreading function based on the listening data indicative of the critical band bandwidth of the listener; determining an individual masking threshold based on the determined sound pressure level of the masking component, the determined at least one masking index, and the determined spreading function. In an embodiment, determining the at least one masking index comprises determining the tonal masking index and the non-tonal masking index.
In an embodiment, spectrally modifying the input audio signal comprises calculating the signal-to-mask ratio in each critical band based on the global masking thresholds; and applying spectral subtraction to the input audio signal based on the global masking threshold. In an embodiment, generating or modifying the psychoacoustic model comprises:
inserting the received listening data into source code representing the psychoacoustic model to be processed by the processing device. In an embodiment, generating or modifying the customised psychoacoustic model based on the received listening data comprises loading the customised psychoacoustic model into memory from an external source.
In an embodiment, generating or modifying the customised psychoacoustic model comprises generating the psychoacoustic model in real-time based on the received listening data.
In a second aspect, the present invention broadly consists in an audio processor that is configured to improve the listenability of an audio signal for a hearing impaired listener, the audio processor comprising a processor and associated memory, and which is configured to carry out the method according to the first aspect.
In an embodiment, the audio processor is provided in a hearing aid or an audio prosthesis such as a cochlear implant or a middle ear implant or a bone conduction unit.
In an embodiment, the audio processor is provided as an application program executable on a programmable electronic device.
In a third aspect, the present invention broadly consists in a computer-readable medium having stored thereon computer executable instructions that, when executed on a processing device or devices, cause the processing device or devices to perform a method of the first aspect.
In a fourth aspect, the present invention broadly consists in a hearing aid or an audio prosthesis for use by a hearing impaired user, the hearing aid configured to be mounted on or within a user's ear, the hearing aid comprising the audio processor according to the second aspect or a computer readable medium according to the third aspect. The audio prosthesis may be a cochlear implant or a middle ear implant or a bone conduction device. The terms cochlear implant and middle ear implant incorporates the implanted elements as well as the external elements e.g. the sound processor unit. In a fifth aspect, the present invention broadly consists in a mobile device or a computing device comprising the computer readable medium according to the third aspect.
In a sixth aspect, the present invention broadly consists in a method of fitting a hearing aid or an audio prosthesis, the method of fitting comprises assessing or generating a single control parameter representing a user' s hearing profile.
In an embodiment the method comprises customising the psychoacoustic model to the user based on the single control parameter. In an embodiment the single control parameter representing a user's hearing profile is determined based on a testing process.
In an embodiment testing process comprises providing a plurality of auditory or audible stimuli and assessing a user's response, the single control parameter being determined based on the response of the user to the stimuli.
In an embodiment the single control parameter may be determined by trial and error.
In an embodiment the stimuli may be speech stimuli or specific sounds or musical notes or specific audible signals.
In an embodiment the stimuli may be delivered with varying amplitudes and/or with varying signal to noise ratios. The noise may be white noise that is mixed with the stimuli signals.
In an embodiment the single control parameter of a user's hearing profile may be determined based on a fitting a user parameter to match the user's listening ability. The parameter may be a critical band width ratio. In an embodiment the single control parameter of a user's hearing profile is determined by fitting a critical band width ratio of a user such that the fit line substantially matches or corresponds to the user's listening ability.
In an embodiment the single control parameter may be determined using a GUI
(graphical user interface) to visually determine the single control parameter.
In an embodiment the determined single control parameter may be used to determine or generate a customised psychoacoustic model for a user.
In a seventh aspect the present invention broadly consists in an audio processor for executing a method of fitting a hearing aid or an audio prosthesis according to the sixth aspect.
Each of the aspects above may comprise any one or more features mentioned in respect of the other aspects above.
Definitions or terms or phrases
The phrase 'computer-readable medium' should be taken to include a single medium or multiple media. Examples of multiple media include a centralised or distributed database and/or associated caches. These multiple media store the one or more sets of computer executable instructions. The phrase 'computer readable medium' should also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor of a computing device and that cause the processor to perform any one or more of the methods described herein. The computer-readable medium is also capable of storing, encoding or carrying data structures used by or associated with these sets of instructions. The phrase 'computer-readable medium' includes solid-state memories, optical media and magnetic media.
The term "comprising" as used in this specification and claims means "consisting at least in part of. When interpreting each statement in this specification and claims that includes the term "comprising", features other than that or those prefaced by the term may also be present. Related terms such as "comprise" and "comprises" are to be interpreted in the same manner. Number Ranges
It is intended that reference to a range of numbers disclosed herein (for example, 1 to 10) also incorporates reference to all rational numbers within that range (for example, 1, 1.1, 2, 3, 3.9, 4, 5, 6, 6.5, 7, 8, 9 and 10) and also any range of rational numbers within that range (for example, 2 to 8, 1.5 to 5.5 and 3.1 to 4.7) and, therefore, all sub-ranges of all ranges expressly disclosed herein are hereby expressly disclosed. These are only examples of what is specifically intended and all possible combinations of numerical values between the lowest value and the highest value enumerated are to be considered to be expressly stated in this application in a similar manner. As used herein the term "and/or" means "and" or "or", or both.
As used herein "(s)" following a noun means the plural and/or singular forms of the noun.
The invention consists in the foregoing and also envisages constructions of which the following gives examples only.
BRIEF DESCRIPTION OF THE DRAWINGS
Preferred embodiments of the invention will be described by way of example only and with reference to the drawings, in which:
Figure 1 shows a schematic diagram of the main modules of an audio system having an audio processor with psychoacoustic model as a front end to a hearing aid in an embodiment;
Figure 2a shows a schematic diagram of an alternative embodiment audio system for processing audio inputs generally using the audio processor of Figure 1; Figure 2b is a schematic diagram showing sub-modules of the audio processor of Figures 1 and 2a;
Figure 3a shows the primary components of a psychoacoustic processing configuration or algorithm for digital audio processing;
Figure 3b shows the psychoacoustic processing configuration or algorithm of Figure 3 a customised based on an individual's psychoacoustic assessments in an embodiment; Figures 4a to 4f show psychoacoustic analysis of a pop music song, specifically: Figure 4a shows normalized power spectral density and tonal/non-tonal masker ID; Figure 4b shows prototype spreading functions; Figure 4c shows individual tonal masker thresholds; Figure 4d shows individual noise masker thresholds; and Figure 4e shows global masking thresholds;
Figure 5 shows excitation patterns for the same sound calculated using different sizes of equivalent rectangular bandwidth (ERB);
Figures 6a to 6d show a process of eliminating inaudible components around two tones in accordance with an embodiment, specifically: Figures 6a-6c show processing for a normal hearing person on the left and processing for a person with hearing loss on the right in accordance with an embodiment; and Figure 6d shows processing for a normal hearing person in the upper graph and processing for a hearing impaired person in the lower graphs in accordance with an embodiment;
Figures 7a shows a graphical representation of critical bandwidths (ERB) using some exemplary methods;
Figure 7b shows a graphical representation of critical band rate (ERB rate) using the exemplary methods described with reference to figure 7a;
Figure 8a shows a plot of ERB plotted for a variety of frequencies for various characteristic or single configuration parameters for a user;
Figure 8b shows a plot of ERB rate for a variety of frequencies for various characteristic or single configuration parameters for a user;
Figure 8c shows a plot of ERB vs ERB rate for three different characteristic or single configuration parameters for a user;
Figure 9 illustrates a plot of three different psychoacoustic models;
Figure 10 shows a graphical representation of various spreading functions for different characteristic or single configuration parameters for a user; Figure 11a show a plot of a model of Rb across frequencies for various characteristic or single configuration parameters for a user;
Figure lib shows a plot of the test results of Rb across frequencies for various characteristic or single configuration parameters for a user;
Figure 12a and 12b show the modelling of ERB curves plotted over a range of frequencies;
Figure 13 shows relative responses in decibels for spreading functions for different characteristic or single configuration parameters;
Figures 14a and 14b are graphical representations of an original signal and a processed signal using a suitable processing method as described herein, for different characteristic parameters of a user;
Figures 15a and 15b show the 1/3 octave analysis of an audio frame for different characteristic parameters (i.e. different users);
Figure 16 shows various spectrograms illustrating the improvement of a signal e.g. a speech signal;
Figures 17a and 17b illustrate spectrograms of music used in testing, and the result of improving the listenability of speech with music;
Figures 18a and 18b show graphical representations of critical bandwidths for different individual's hearing;
Figure 19 shows masking index for different individual's hearing;
Figure 20 shows ratio of spreading functions for different individual's hearing;
Figures 21a and 21b shows examples of a psychoacoustic GUI assessment tool used in a first quick fitting method;
Figure 22 shows an example of a psychoacoustic GUI assessment tool used in a second quick fitting method;
Figure 23 shows an example of a psychoacoustic GUI assessment tool used in a third quick fitting method;
Figure 24 shows values absolute thresholds of hearing (ATH) increasing for various frequencies;
Figures 25a and 25b show plots illustrating a comparison of the frequency selectivity model and test results of the average of Rb from measurement data; Figures 26 shows a plot of speech intelligibility for original signals e.g. original speech, and processed signals e.g. processed speech;
Figure 27 shows a total scope of perceptual scales of signals i.e. original signals and processed signals;
Figure 28a to 28d show various method steps of spectral subtraction to improve the listenability of a signal e.g. a speech signal; and
Figures 28e and 28f show plots of third octave spectrum of normal hearing and a hearing impaired user. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
In the following description, specific details are given to provide a thorough
understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, software modules, functions, circuits, etc., may be shown in block diagrams in order not to obscure the embodiments in unnecessary detail. In other instances, well- known modules, structures and techniques may not be shown in detail in order not to obscure the embodiments.
Also, it is noted that the embodiments may be described as a process that is depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process is terminated when its operations are completed. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc., in a computer program. When a process corresponds to a function, its termination corresponds to a return of the function to the calling function or a main function.
Aspects of the systems and methods described below may be operable on any type of electronic hardware device or system, general purpose computer system or computing device, including, but not limited to, a desktop, laptop, notebook, tablet, smart television, or mobile device. The term "mobile device" includes, but is not limited to, a wireless device, a mobile phone, a smart phone, a mobile communication device, a user communication device, personal digital assistant, mobile hand-held computer, a laptop computer, wearable electronic devices such as smart watches and head-mounted devices, an electronic book reader and reading devices capable of reading electronic contents and/or other types of mobile devices typically carried by individuals and/or having some form of communication capabilities (e.g., wireless, infrared, short-range radio, cellular etc.).
1 Overview
There are many sounds in the world and one of the important sounds for the life of those with hearing loss is complex sounds, especially speech in background noise and music which consists vocal and some musical instruments at once.
Presence of hearing loss (usually high frequency in mild hearing loss) acts as a filter, making harmonics inaudible in the region of hearing loss. Reduced audibility is usually addressed by the fitting of hearing aids which amplify sounds and which are developed for speech intelligibility. Due to the differences in the spectrum and intensity of music to speech, amplification and reproduction of music presents several challenges (e.g. crest factor, dynamic range, peak level). The electronic and electro-acoustic parameters set up in a hearing aid may be optimal for speech, but not for music.
Hearing aids and cochlear implants do not compensate for dynamic masking such as simultaneous (or frequency) masking. Such dynamic masking is caused by peaks of the complex sounds in which the intensity and frequency are dynamically changing in time sequence. Hearing aids and cochlear implants are generally designed with fixed frequency channels. People suffering from hearing loss typically have worse masking effect because of their wider auditory filter. In respect of listening to music, masking affects audibility of timbre and lower amplitude of background instruments because they have much louder peaks that mask softer sounds simultaneously.
Although adjusting audio for masking by tone and noise peaks has not been realised in hearing aids and cochlear implant technology, it has been successfully developed in lossy digital audio compression. Lossy digital audio compression systems include, but are not limited to: ISO/IEC MPEG Standard, ATRAC, DTS-CA, and Dolby (TM) AC-3. The model by which audio masking is calculated and adjusted in such compression technologies is known as a "psychoacoustic model". In this disclosure, a method and system for modifying and adapting a psychoacoustic model into audio processing technology to improve listening quality of audio, particularly complex sounds such as music and speech in background noise, for those suffering hearing loss is described. This audio processing method and system can work as an independent processing block 102 at the front end of hearing aids 104 or cochlear implant as shown in Figure 1, or
alternatively as a standalone audio processor 200 (that optionally includes a hearing aid) or audio codec for the hearing impaired as shown in Figures 2a and 2b. It will be appreciated that the audio processor could be used in an audio or audio and video signal processing chain.
The audio processor can be incorporated or implemented in any electronic hardware device that is capable or configured to deliver audio to an end user including, but not limited to, hearing-loss devices such as hearing aids and cochlear implant devices, and general devices configured to deliver audio to a user via speakers or headphones or earphones for example such as portable and non-portable digital audio players and audio systems, consumer electronic devices such as general purpose computing devices, tablets, smart phones, smart televisions, wearable devices. It will be appreciated that the functionality of the audio processor can be implemented in software or computer- readable instructions executable on such devices or hardware logic or circuits, or a combination of these. In some applications, the audio processor may be a plug-in or addon component to an audio player software application program.
2 Audio Codecs and Psychoacoustic Models
2.1 Audio Codecs Audio coding relies heavily on exploiting properties of psychoacoustics. MPEG (Moving Pictures Experts Group) audio standards, i.e. MPEG-1 and the MPEG-2 has become the most widely used lossy audio/video format in the world. Furthermore, several successful commercial audio standards have been published including Sony's Adaptive TRansform Acoustic Coding (ATRAC), DTS Coherent Acoustics (DTS-CA) and Dolby's Audio Coder-3 (AC-3). The advent of ISO/IEC MPEG-4 standardization established new research goals for high-quality coding of general audio signals even at low bit rates. A person skilled in the art will appreciate that more audio encoding systems are known in the art.
The MPEG standard includes the MP3 (MPEG-1 or MPEG-2 Audio Layer III) audio format. The MP3 encoder processes the digital audio signal and produces a compressed bitstream for storage. The MP3 encoder algorithm is not standardized, and may use various means for encoding such as estimation of the auditory masking threshold, quantization, and scaling. However, the encoder output must be such that a decoder conforming to the specifications will produce audio suitable for the intended application. Depending on the application, different layers of the coding system with increasing encoder complexity and performance can be used. An ISO MPEG Audio Layer N decoder is able to decode bitstream data which has been encoded in Layer N and all layers below N.
Layer I:
This layer contains the basic mapping of the digital audio input into 32 subbands, fixed segmentation to format the data into blocks, a psychoacoustic model to determine the adaptive bit allocation, and quantization using block companding and formatting.
Layer II:
This layer provides additional coding of bit allocation, scalefactors and samples. Different framing is used.
Layer III:
This layer introduces increased frequency resolution based on a hybrid filterbank. It adds a different (nonuniform) quantizer, adaptive segmentation and entropy coding of the quantized values. Referring to Figure 3a, the MPEG-Audio algorithm 300 is a psychoacoustic algorithm that receives a digital audio input signal 310, processes it, and outputs an encoded digital audio signal or bitstream 312. The primary parts of the psychoacoustic processing algorithm are shown and described in the following (also see ISOl 1172-3): 1) Filter Bank 302:
The filterbank does a time to frequency mapping. There are two filterbanks used in the MPEG-Audio algorithm, each providing a specific mapping in time and frequency. These filterbanks are critically sampled (i.e. there are as many samples in the analyzed domain as there are in the time domain). These filterbanks provide the primary frequency separation for the encoder, and the reconstruction filters for the decoder. The output samples of the filterbank are quantized.
2) The Psychoacoustic Model 304:
The psychoacoustic model calculates a just noticable noise-level for each band in the filterbank. This noise level is used in the bit or noise allocation to determine the actual quantizers and quantizer levels. There are two psychoacoustic models.
While they can both be applied to any layer of the MPEG-Audio algorithm, in practice Model 1 has been used for Layers I and II, and Model 2 for Layer III. In both psychoacoustic models, the final output of the model is a signal-to-mask ratio (SMR) for each band (Layers I and II) or group of bands (Layer III). 3) Bit or Noise Allocation 306:
The allocator looks at both the output samples from the filter bank and the SMR's from the psychoacoustic model, and adjusts the bit allocation (Layers I and II) or noise allocation (Layer III) in order to simultaneously meet both the bitrate requirements and the masking requirements. At low bitrates, these methods attempt to spend bits in a fashion that is psychoacousticly inoffensive when they cannot meet the psychoacoustic demand at the required bitrate.
4) The bitstream formatter 308: The bitstream formatter takes the quantized filterbank outputs, the bit allocation (Layers I and II) or noise allocation (Layer III) and other required side
information, and encodes and formats that information in an efficient fashion. In the case of Layer III, the Huffman codes are also inserted at this point. The decoder accepts the compressed audio bitstream and uses the information to produce digital audio output. Bitstream data is fed into the decoder. The bitstream unpacking and decoding block does error detection if errorcheck is applied in the encoder. The bitstream data are unpacked to recover the various pieces of information. The reconstruction block reconstructs the quantized version of the set of mapped samples. The inverse mapping transforms these mapped samples back into uniform Pulse-code modulation (PCM).
2.2 Psychoacoustic Models
The field of psychoacoustics has made significant progress toward characterizing human auditory perception and particularly the time-frequency analysis capabilities of the inner ear. Current audio coders can incorporate several psychoacoustic principles, absolute hearing thresholds, critical band frequency analysis, simultaneous masking, and the spread of masking along the basilar membrane. The purpose of these psychoacoustic models is to remove unhelpful power components (masked components). The total noise power in a critical band is increased and evokes frequency masking that masks the close frequency bins of interest (e.g. formant in speech) beyond the critical bandwidth. 2.2.1 Psychoacoustic Model ISO 11172-3
Psychoacoustic model is a mathematical model of the masking behaviour of the human auditory system of averaged normal hearing. There are two psychoacoustic models presented in ISO 11172-3. By way of example only, an embodiment based on the use of MPEG-Audio Layer II (MP2) and psychoacoustic model I, although it will be appreciated that the application of the system and method is not limited to this particular model.
In some configurations, the calculation of the global masking threshold may be based on the following steps:
Step 1 : Calculation of the FFT for time to frequency conversion. Step 2: Determination of the sound pressure level in each subband.
Step 3 : Determination of the threshold in quiet (absolute threshold).
Step 4: Finding of the tonal (more sinusoid-like) and non-tonal (more noise-like) components of the audio signal. Step 5: Decimation of the maskers, to obtain only the relevant maskers.
Step 6 Calculation of the individual masking thresholds.
Step 7 Determination of the global masking threshold.
In ISO 11172-3, the psychoacoustic model I denotes that the individual masking thresholds of both tonal and non-tonal components are given by the following expression:
Figure imgf000020_0001
In this formula LTtm and LTnm are the individual masking thresholds at critical band rate z(i) in Bark of the masking component at the critical band rate z(j) in Bark. The term Xtm[z(j)] is the sound pressure level of the masking component with the index number j at the corresponding critical band rate z(j). The term av is called the masking index and vf the masking function of the masking component Xtm[z(j)]. The masking function vf is also so called as spreading function. The masking index av is different for tonal and non- tonal masker (avtm and avnm).
The global masking threshold LTg(i) at the i'th frequency sample is derived from the upper and lower slopes of the individual masking threshold of each of the j tonal and non- tonal maskers, and in addition from the threshold in quiet LTq(i). The global masking threshold is found by summing the powers corresponding to the individual masking thresholds and the threshold in quiet.
Figure imgf000020_0002
For a given i, the range of j can be reduced to just encompass those masking components that are within -8 to +3 Bark from i. Outside of this range LTtm and LTnm are -8 dB. The critical band rate of individuals can be calculated by approximating a curve, which forms with some of the critical bandwidths.
2.2.2 Psychoacoustic Model Example
It is useful to consider an example of how the psychoacoustic principles described thus far are applied in actual coding algorithms. The ISO/IEC 11172-3 psychoacoustic model 1 determines the maximum allowable quantization noise energy in each critical band such that quantization noise remains inaudible. In one of its modes, the model uses a 512-point DFT for high resolution spectral analysis (86.13 Hz), then estimates for each input frame individual simultaneous masking thresholds due to the presence of tone-like and noise- like maskers in the signal spectrum. A global masking threshold is then estimated for a subset of the original 256 frequency bins by (power) additive combination of the tonal and non-tonal individual masking thresholds.
Referring to Figure 4, the absolute threshold 402 is indicated with a dashed line (Top graph: linear frequency scale 400, Bottom graph: Bark scale 401). Figure 4 shows a power spectral density (PSD) and sound pressure level (SPL) 404 estimate using a 512- point FFT of incoming audio samples. Tonal and non-tonal masking components are also identified in Figures 4c-d - Local maxima in the sample PSD which exceed neighbouring components within a certain bark distance by at least 7dB are classified as tonal, while a single noise masker for each critical band is then computed from (remaining) spectral lines not within the certain bark distance of a tonal masker. Tonal maskers 408 are denoted by 'x' symbols and noise maskers 406 are denoted by Ό' symbols in the Figure 4. The number of maskers is reduced (decimation) by the following steps: any tonal or noise maskers below the absolute threshold and any pairs which are too close to each other (within 0.5 Bark) are replaced by the stronger one. In the pop music example, two tonal maskers 410, 412 appear between 19.5 and 20.5 Barks, as shown the bottom graph 401 in Figure 4a. It can be seen that the pair is replaced by the stronger of the two 410 during threshold calculations as shown in Figures 4c-e). Having obtained a decimated set of tonal and noise maskers, individual tone and noise masking thresholds are computed next. Each individual threshold represents a masking contribution at frequency bin i due to the tone or noise masker located at bin j . Tonal masker thresholds, T TM(i, j), and noise masker thresholds, T NM(i, j), are given by
Figure imgf000022_0002
where PTM (j) (PNM (j)) denotes the SPL of the tonal (noise) masker in frequency bin j, z(j) denotes the Bark frequency of bin j, and the spread of masking from masker bin j to maskee bin i, SF(i, j) approximates the basilar membrane spreading (excitation pattern). Prototype individual masking thresholds, TTM (i, j) are shown as a function of masker level in, Figure 4b for an example tonal masker 414 occurring at z=10 Barks. Figures 4c and 4d show individual masking thresholds associated with the tone maskers 420 and noise maskers 422.
Finally, individual masking thresholds are combined to estimate a global masking threshold. The model assumes the masking effects are additive. Therefore, the global masking threshold is obtained by:
Figure imgf000022_0001
where Tq (i) (unit: dBSPL) is the absolute hearing threshold for frequency bin i, TTM (i, 1) and TNM (i, m) are the individual masking thresholds, and L and M are the number of tonal and noise maskers. Figure 4e shows global masking threshold 430 obtained by adding the power of the individual tonal maskers 420 as shown in Figure 4c and noise maskers 422 as shown in Figure 4d to the absolute threshold 402 in quiet.
A person skilled in the art will appreciate that other psychoacoustic models exist for different MPEG encoders. Other encoders also employ psychoacoustic models. The end goal of all of these encoders, which reduce audio file size while trying to not reduce audio quality too greatly, is to remove (and therefore not encode at all) or efficiently compress inaudible/unimportant sounds as determined by the psychoacoustic model. 3 Hearing Loss
As described in previous section, the psychoacoustic model is however designed with normal hearing psychoacoustics. These psychoacoustic factors in hearing loss would differ. In this section, several complicated factors in hearing loss are described. 3.1 Absolute Threshold
Degradation of absolute threshold can occur due to cochlear damage. Firstly, damage to the Outer Hair Cells (OHCs) impairs the active mechanism, resulting in reduced basilar membrane (BM) vibration for a given low sound level. Hence, the sound level must be larger than normal to give a just-detectable amount of vibration. Secondly, OHC damage can result in reduced efficiency of transduction, so the amount of BM vibration needed to reach threshold is larger than normal.
The most prominent change in loudness perception associated with the damage is loudness recruitment. The sound appears to fluctuate more in loudness than it would for a normally hearing person. When listening to speech, the loudness differences between consonants and vowels may be greater than normal. When listening to music, the forte passages may be perceived at almost normal loudness, but the piano passages may be inaudible. To compensate the audibility, it needs to account for both the increasing absolute hearing threshold and loudness recruitment.
Hearing aids are the primary method for alleviating these problems. However, people find that their hearing aids are sometimes useful in helping them to hear soft loudness sounds, but that the aids do not help very much, if at all, when background noise, including music instruments, is present because of their frequency selectivity and increasing global masking thresholds, described below.
3.2 Frequency selectivity (Auditory Filter) Psychophysical tuning curves (PTCs) are broader than normal amongst hearing impaired participants. However, it is difficult to quantify the differences between impaired and normal PTCs. Most studies have found that the sharpness of tuning of PTCs decreases with increasing absolute threshold, although the correlation between threshold and sharpness of tuning varies markedly across studies.
The degree of broadening of the PTCs increases with increasing hearing loss. The PTC shapes for the impaired ears vary greatly in shape across participants, but they are all broader than those for the normal ears, especially their low-frequency side.
Detecting fine structure of spectrum in audio may be influenced by recruitment.
Recruitment in normal ear may enhance the contrast between the peaks and dips in the excitation pattern evoked by a complex sound. This may make it easier to pick out spectral features, such as formants in vowel sounds. The degradation of recruitment associated with cochlear damage would thus lead to greater difficulty in picking out such features. Figure 5 shows excitation patterns 502, 504 for the same vowel, but calculated for impaired auditory systems. The top graph 500 in Figure 5 shows an excitation pattern 502 for an impaired ear with auditory filters two times broader than normal. The bottom graph 501 in Figure 5 shows an excitation pattern 504 for an impaired ear with auditory filters four times broader than normal. It can be seen that spectral details are less well represented in these excitation patterns.
3.3 Masking index
Masking index here is denoted with a "noise-masking-tone" and a "tone-masking-noise". Noise-masking-tone is masked threshold of a tone by uniform exciting noise while tone- masking-noise is masked threshold of burst noise (critical band wide) by a tone. The masking index is used in the spreading function in the psychoacoustic model. The masking index is the difference between the critical band level and the masked threshold in the region of the main excitation.
When the spectrum level of the noise is held constant, the loudness of a 1kHz tone below a SPL of about 30dB is reduced equally narrow and wide-band noise (normal hearing). However, at higher SPLs, the steepness of the masked loudness function depends on the spread of excitation of the noise. The effect of the tone on a critical band of noise is greater than its effect on either an octave-band noise or wide-and noise and whether the masker is a tone or noise, masking ceases when the effective energy of the masked and masking stimuli is the same.
In the method and system of this disclosure the masking index for hearing loss is measured and used as a basis to define the individual spreading function of a customised psychoacoustic model for the end listener. .
4 Psychoacoustic Model for People with Hearing Loss
4.1 Overview
In the system and method of this disclosure, a customised psychoacoustic model is configured or generated for the intended end-listener based on their individual hearing loss profile or characteristics. This customised psychoacoustic model operates in the audio processor to modify the incoming audio signal into a modified output audio signal that is intended to in at least some aspects improve the listening quality or experience for the listener. By way of example, the audio processing based on the customised psychoacoustic model, in some embodiments, is configured such that masked sounds by tone's and noise's peaks (below the global masking threshold LTg(i)) are eliminated for the individual's hearing. The audio processing based on the customised psychoacoustic models improves the listenability of a signal e.g. speech signal using a characteristic parameter for each user. The end user can further adjust the final loudness of the audio signal without modifying the masking effects. In this embodiment of the system, psychoacoustic assessment tools are provided to assess individual psychoacoustic characteristics of an end-listener and the system is configured to import those psychoacoustic profiles into an MPEG Audio encoder that uses psychoacoustic model I in MPEG Audio Layer II. Rather than average data for normal hearing, the individual participant data or profile is applied to customise the
psychoacoustic model because of the great variances between individuals with hearing loss as described in the previous section. It will be appreciated that the customised psychoacoustic model may be used in other audio processors, encoders, and codecs in alternative configurations. The customised psychoacoustic model comprises a modified spreading function (also called an exciting pattern), which was derived by individual's auditory characteristics. An encoder containing the customised psychoacoustic model will eliminate unnecessary sounds masked by tone's and noise's peaks. This is broadly shown in Figures 6a-6d. Total band power in a critical band is reduced and therefore simultaneous masking is improved. Reducing total band power evokes less unnecessary excitation and therefore improves sound quality for hearing loss. It is expected that the process will improve the listenability of the audio, especially the audibility of soft loudness sounds in music (Timbre, background instruments etc). Moreover this customised audio processing is independent and different from the conventional processing in hearing aids because this is a masking compensation (Frequency selectivity on tone's and noise's peaks).
It will be appreciated that critical bands can be approximately represented as equivalent rectangular bandwidths (ERBs). ERBs are way of mathematically modelling the critical bands as rectangular band-pass filters. In some embodiments ERBs are used, but it will be appreciated that critical bands could be used alternatively.
The purpose of the proposed signal processing is to suppress or subtract unnecessary powers without degrade of sound quality. The frequency bins of the unnecessary powers are masked one (inaudible). The total noise power in a critical band is increased and evoke frequency masking that masks the close frequency bins of interest (E.g. Formant in speech) beyond the critical bandwidth. This speech enhancement processing would work so that such masked sounds by dominant tone's and noise's peaks below the global masking threshold LTg of individual such as hearing loss, are eliminated. This is realized by spectral subtraction to suppress all frequency bins under LTg at each frame by applying individual critical band rate z to recalculate sound pressure levels of the masking components, Xtm and Xnm (so called 'tone masker' and 'noise masker' respectively), the masking index avtm and avnm for tone and noise respectively, the spreading function vf and finally the LTg.
The steps that may be followed are detailed below:
1. Fitting critical bandwidth and sub-band synthesis Get critical bandwidth parameter (h) at the initial stage and at every change, it computes new critical band rate (unit: Bark) by a form of linear model equation. That also reset masking (spreading) function.
The critical bandwidth parameter is obtained initially. Initially may be the initial fitting stage or every recalibration of a device e.g. a hearing aid or an audio prosthesis. Exemplary methods of achieving the critical bandwidth parameter (h) are explained with reference to figures 21a, 21b and figures 22, 23. The method disclosed with reference to figures 21a, 21b do not require ERB or CB data, but is determined during the fitting process by using the slider or other tool or interface to select or indicate a specific h value. In figures 22, 23 the (critical band ratio)
CBR in the information window indicates the specific h value for a user to use in the fitting critical bandwidth and sub band stages. Figures 21a-23 describe methods of determining and using the h value to produce a customized model for each user i.e. hearing impaired user. 2. Finding of tonal and non-tonal components
Find tonal and non-tonal (total power in critical bandwidth) components.
3. Compute individual masking thresholds
Individual masking thresholds are found by summing of sound pressure level of the masking component, masking index and masking function (spreading function), at corresponding critical band rates.
4. Compute Global masking thresholds
Global masking thresholds are found by summing the powers corresponding to individual masking thresholds and threshold in quiet (absolute hearing threshold).
5. Spectral subtraction Frequency bins lower than the global masking threshold at corresponding sub band is subtracted. The spectral subtraction can be implemented using any of the spectral subtraction methods described below. For example one exemplary method is described in section 4.3.2.5. Alternatively the spectral subtraction can be implemented using the steps as disclosed in section 5, as described herein.
Reverting to Figures 2a and 2b, an audio processing system 200 is shown using a customised critical bandwidth parameter (h). The audio system broadly consists of an audio input block 202, audio output block 204, spectral analysis block 206, audio synthesis block 208, a psychoacoustic processing block 210, and a simplified fitting of critical bandwidth block 212. The audio input block 202 may comprise an input buffer 214 for digital audio data. The audio input block 202 may also comprise a microphone, or other sound recording device (not shown), and an analogue to digital element for converting analogue sound waves to a digital representation for further processing (not shown). The audio output block 204 may similarly comprise an output buffer 216 for digital audio data being outputted by the system 200. The audio output block 204 may also comprise a speaker, or other sound emitting device (not shown), and a digital to analogue converter for converting the digital representation of the audio into analogue sound waves (not shown.
The spectral analysis block 206 may comprise of an Overlap Add (OLA) Fast Fourier Transform (FFT) step 218 or any other audio spectral analysis techniques known in the art. The psychoacoustic processing block 210 comprises multiple steps. Some of these steps may be performed concurrently, in parallel or sequentially depending on the
implementation of the psychoacoustic process block 210. The psychoacoustic processing block may receive or retrieve a critical bandwidth parameter (h) at the first Fitting Critical Bandwidth (CB) stage 220. The critical bandwidth parameter is received or retrieved from the Simplified fitting of critical bandwidth block 212. The critical band rate (measured in Bark) is calculated using the critical bandwidth parameter. The critical band rate may be calculated as a function of the critical bandwidth parameter. The function may be in the form of a linear model equation. Alternatively the critical bandwidth parameter may represent the critical band rate exactly. The Fitting CB stage 220 also calculates the masking function (or spreading function) 221. The sub-band synthesis step 222 re-aligns the new sub-bands to improve the following computing load as a trade-off of frequency resolution.
Alternatively, as shown in Figure 2c, the Fitting CB 220 and Sub-band synthesis blocks 222 may be replaced with re-alignment of the ERBi by frequency selectivity (h) block 220' . This in effect replaces step 1 detailed immediately above where using the frequency selectivity (h), which is described in detail below in section 4.3.1.1, computes a new form of ΕΚΒ· with the ERB-rate (Equation l).The finding tonal and non-tonal components step 224 comprises finding tonal and non-tonal components as well as the total power in each critical bandwidth. The compute individual masking thresholds step 226 comprises calculating individual masking thresholds by summing the sound pressure level of the masking component, masking index, and masking function (or spreading function) 221 at each corresponding critical band rate. This is calculated for both tonal and non-tonal components.
The compute global masking thresholds step 228 comprises calculating the global masking threshold by summing the powers corresponding to the individual masking thresholds and absolute hearing threshold.
The final step for the psychoacoustic processing block 210 is the spectral subtraction step 210. The spectral subtraction step 210 takes frequency bins lower than the global masking threshold (calculated in the previous step) at corresponding sub bands and subtracts them. Spectral subtraction is described in further detail in section 4.3.2.5 of this specification with examples. Spectral subtraction is also described in section 5 of this specification in more detail. The spectral subtraction in section 5 is similar to that described in section 4.3.2.5.
The audio synthesis stage receives both the output of the spectral analysis block 206 and the output of the psychoacoustic processing block 210 and conducts an Overlap add (OLA) Inverse Fast Fourier Transform (IFFT) 232.
Optionally, an active noise control (Filtered-X LMS) step 234 may be used to further process the audio. Figures 2a and 2b describe a system using a critical bandwidth parameter (h) to calculate the critical band rate of a user. Alternatively, different and/or more psychoacoustic model parameters could be determined such as those found using the "Full Psychoacoustic Model" method described below. These psychoacoustic model parameters may replace or supplement the different stages within the psychoacoustic processing stage 210.
Unnecessary sounds which are inaudible for hearing loss are not eliminated by the original MPEG- Audio psychoacoustic model and vibrate the basilar membrane when sounds arrive at the peripheral stage. Nor is it a goal for the original MPEG-Audio psychoacoustic model to eliminate inaudible sounds for hearing impaired. These unnecessary sounds create (unnecessary) peaks and spread the excitation patterns that lead to greater difficulty in picking out contrast between the peaks and dips. Moreover the total band power in a critical band of hearing loss is increased rather than eliminated for sound inputs and therefore the spreading function is wider for the high intensity as discussed above under "Frequency Selectivity (Auditory Filter)". In the method and system disclosed the purpose of the signal processing of the audio signal based on the customised psychoacoustic model is to remove "irrelevant" signal information that interferes with dominant sound components of interest, without degrading of sound quality. In some embodiments, the main modification or
customisation of the psychoacoustic model is to individualize auditory filters that automatically or individually customize the masking index and spreading function to calculate or generate a customised global masking threshold in the model. The audio processing of the system and method disclosed is configured such that masked sounds by tone and noise peaks are eliminated for the individual's particular hearing profile, such as taking into account any hearing loss. Eliminating these inaudible peaks reduces total energy in critical bandwidth, which benefits loudness of soft sound in interest for hearing loss. Speech includes formants. Music includes various harmonic/non harmonic peaks as well as vocal. The signal processing in the audio processor of this system and method therefore evokes less unnecessary excitation pattern and improves audibility of simultaneous sound, which is distinguished from intensity and pitch. By way of example only, Figures 6a-6d show the processing effects generated by the audio processor based on the customised psychoacoustic model. Total band power in a critical band of hearing loss was increased and because the spreading function of hearing loss would be wider. If unnecessary sounds were not eliminated, these sounds vibrate the basilar membrane when it arrives at the peripheral stage and spread the unnecessary excitation patterns that would lead to greater difficulty in picking out contrast between the peaks and dips (see left graph in Figure 6d). The processing of the customised audio processor of the system and method filters peaks and non-peaks below global masking threshold (background components) from input sounds so that unnecessary excitation patterns do not happen. Also the excitation patterns by filtered sound have a good signal and noise ratio.
4.2 Full Psychoacoustic Method
In a first embodiment, a full and complete psychoacoustic model of a user is taken. With a full psychoacoustic model taken, an audio processor can be configured or modified based on the specifics of that user's model.
The required psychoacoustics assessments and calculations of absolute threshold, equivalent rectangular bandwidth, and masking index need to be conducted. These assessments can be used to define the user's individual psychoacoustic model or alternatively be considered as control parameters or input variables for modifying or adjusting a default psychoacoustic model to be a customised psychoacoustic model.
By way of example, in this embodiment, the "PSYCHOACOUSTIC" program may be used for the assessments. PSYCHOACOUSTICS is a MATLAB toolbox implementing three classic adaptive procedures for auditory threshold estimation. The first includes those of the staircase family (method of limits, simple up-down and transformed up- down); the second is the PEST; and the third is the Maximum Likelihood Procedure. It will be appreciated that one or more other software or assessment systems could alternatively be used to assess the parameters of a full psychoacoustics model of the end listener. In this embodiment, the Staircase procedure is used in the assessments of the end user. Three procedures can be distinguished within this Staircase procedure category: the method of limits, the simple-up down and the transformed up- down.
For sensory threshold estimation, sensation moves within and across two types of thresholds: detection and discrimination. The detection threshold is the minimum detectable stimulus level in the absence of any other stimuli of the same sort. In other words, the detection threshold marks the beginning of the sensation of a given stimulus. The discrimination threshold is the minimum detectable difference between two stimuli levels. Therefore, for a given sensory continuum, the discrimination threshold cuts the sensory continuum into the steps into which it is divided.
The detection threshold can be estimated either via yes/no tasks or via multiple- alternative forced choice tasks (in brief, nAFC, with n being the number of alternatives). The discrimination threshold, on the contrary, must be estimated exclusively via multiple nAFC tasks. In yes/no tasks, the subject is presented with a succession of different stimulus levels (spanning from below to above the subject's detection threshold) and is asked to report whether he or she has detected the stimulus (yes) or not (no). In an nAFC task, the subject is presented with a series of n stimuli differing in level. In addition, because the various stimuli have to be presented in temporal succession, the tasks are often multiple-interval tasks (i.e., mIO-nAFC). In an nAFC task, one stimulus (the variable) changes its level across the trials, whereas the level of the others (the standards) is fixed. The difference between standard and variable ranges from below to above the subject's detection (or discrimination) threshold. After each trial, the subject is asked to report which was the variable stimulus.
All test frequencies are 250, 500, 1000, 2000, 4000, 6000 Hz for each ear. Optional 8000 and 12000Hz are available. All default presentation level is 40 dBSL. For severe hearing loss, arbitrary "soft" level will be presented. All tests are performed by a yes/no task, simple up down method. 4.2.1 Absolute threshold assessment
In this embodiment, absolute threshold is measured with a pure tone and follows steps as below.
1 Set the frequency of the pure tone (250, 500, 1000, 2000, 4000, 6000, 8000 and 12000 Hz),
2 Measure the hearing threshold level (T) by ascending the level of the pure tone (yes/no, simple up down, 3 reversals, 5dB step. The arithmetic mean by the last 2 reversals provides the result).
4.2.2 Auditory filter assessment In this embodiment, auditory filter assessment (ERB, symmetric, level independent) using notched noise method is based on the simplified method, in which the filter shape is assumed to be symmetric. The method includes for the determination of the masker level and for the estimation of the auditory filter shape from one masked threshold. In order to reduce measuring time, the ascending method with yes/no task is used to detect the signal threshold level. Several center frequencies (250, 500, 1000, 2000, 4000 and 6000 Hz) and two loudness levels ("soft" and 40dB) are tested. Probe tone is preferably 240ms in length (40ms rise/fall). Masker is a white noise, ranging from OHz to the Nyquest frequency. A spectral notch is applied to the white noise.
The ' Simplified procedure' consists of both a masker level determination technique and an auditory filter shape estimation technique. The masker level determination technique can determine the dynamic range of the auditory filter, and the auditory filter shape estimation technique uses only one measurement point corresponding to one masked threshold.
Masking index assessment
In this embodiment, two assessments are then performed,
• Noise masking tone: masked threshold of a tone by uniform exciting noise. • Tone masking noise: masked threshold of burst noise (critical band wide) by a fixed level of a tone at the centre frequency.
Note that both two assessments need ERB-rate (ErbFitParam) that can be derived by gx-a and gt at all frequencies. 4.2.3 Calculation of psychoacoustic characteristics
Once all of the different aspects or parameters of an individual user's psychoacoustic model have been measured they can be inserted into an audio encoder or standalone psychoacoustic model software. In this embodiment, the MPEG psychoacoustic model was modified to use ERBs instead of critical bands. In this embodiment, a conversion engine is configured to convert the measured psychoacoustic model data into ERB related values. It will be appreciated by those skilled in the art that other psychoacoustic model software can be modified similarly or that the MPEG model could remain unmodified and not use ERBs instead of critical bands, but still otherwise be customised based on the individuals measured psychoacoustic data. In this embodiment, the conversion engine is a Matlab calculation program of ERB including p and r, ERB-rate and spreading function. The conversion engine may be implemented as a software module implemented in a processor e.g. a processor of a hearing aid, or the conversion engine may be as a hardware module or as a firmware module in a hearing aid or other similar device or an audio prosthesis. In this
embodiment, the conversion engine is configured to do the following:
1. CALCULATE p, which determines both the bandwidth and the slope of the skirts of the auditory filter (The higher the value of p, the more sharply tuned is the filter), r (10 loglO r = -x) and ERB. Assume B = 1.2. The integral coding is based on Oxenham (roex filter fitting) plot roex.m. 2. CALCULATE ERB-rate (integrating the reciprocal of the ERB which is derived as above). In this embodiment, the fitting technique uses a quadratic polynomial that is fitted to bandwidths (ERB) of all frequency by expressing deviations from the fit as a proportion of centre frequency, and minimizing the resulting squared deviations.
3. CALCULATE spreading function of the individual hearing loss. Level dependent will be not considerable because of huge testing time.
Quick fitting method
In a second embodiment, a quicker method is used to obtain an individual's
psychoacoustic model. The "quick fitting method" is configured to assess or generate a single control parameter representing the user's hearing profile and which can be used to customise the psychoacoustic model to the user. In this embodiment, the system is configured to convert or extrapolate the single hearing characteristic or control parameter (referred to as an h value) of a user to generate the user's individual psychoacoustic model. This method is advantageous because it provides an easier fitting method as it does not require determination of ERB and provides a simpler fitting method that is customised for each user. It will be clear from the following equations and steps that being able to assess a user's particular h value will enable various aspects of a psychoacoustic model to be customised for that particular user.
Sections 4.3.1 and 4.3.2 provide detailed examples and explanations for modelling of psychoacoustics fitting. More specifically sections 4.3.1.1 - 4.3.1.4 in particular are presented below as additional examples and explanations to supplement the quick fitting methods described later within this specification. Further Sections 4.3.2.1 - 4.3.2.5 provide additional details regarding speech enhancement and how speech enhancement can be achieved following the quick fitting method. Sections 4.3.2.1 - 4.3.2.5 provides additional examples and explanations that supplement the quick fitting methods described herein. 4.3.1 Modelling of Psychoacoustics fitting 4.3.1.1 Frequency Selectivity
D.D. Greenwood (1961) gives functions of critical bandwidth, assumed that critical bands represent equal distances on the basilar membrane and that critical bandwidth increases exponentially with distance from the helicotrema. The function was consistent with Bekesy's optimal observations (1960) and Mayer's psychophysical data (1894). The equation for this straight line of the form (CB refers to critical bandwidth) was,
Or in exponential form,
Figure imgf000036_0002
Where CB refers to critical bandwidth and x represents distance on the basilar membrane measured in critical bands as the unit of distance. The x is called critical band rate (Bark), a is often defined as lambda.
To get a function of frequency f, by integrating (1.2), Greenwood got
Where,
Figure imgf000036_0001
or
Figure imgf000036_0003
Therefore,
Figure imgf000037_0001
From Bekesy's data (1960), Greenwood (1961) got [A lambda] = [165.4 0.06]. Moore and Glasberg (1987) got [A lambda] = [165.4 0.0546]. a = lambda = 0.0546 from
Patterson (1976, 1982) that revised [A lambda] = [228.8330 0.0467] (Glasberg and Moore, 1990).
Figure 7a shows Critical bandwidth (ERB) and Figure 7b Critical band rate (ERB rate) of Glasberg and Moore (1990), Greenwood (1961) and Zwicker (1961) which was currently provided in ISO 11172-3 by
As shown in figures 7a and 7b, the ERB determinied by each of the known approaches is shown as its own distinct line and labelled accordingly in the figures. The Glasberg and Moore ERB is shown as line 702 , the Greenwood method ERB is shown by line 704 and the Zwicker method ERB is shown by line 706. These known methods of calculating
ERB are not necessarily required in the fitting method according to the present invention. However it should be understood that the processing techniques described in this section may be used as part of the processing method according to the present invention.
Similarly in figure 7b, the ERB rate line calculated by the Glasbery and Moore approach is represented by line 712. The ERB rate determined by the Greenwood approach is shown by line 714 and the ERB rate as per the Zwicker approach is shown by line 716.
Plomp (1964) found that partials of a complex sound can be "heard out" only if their frequency separation exceeds the critical bandwidth (inharmonic). Plomp and mimpen (1968) and Moore and Ohgushi (1993) are also consistent with the idea (75% accuracy when the partcials separated by about 1.25 ERBs). It suggests that variation of the critical bandwidth would still follow the width in harmony on basilar membrane. Therefore, defined a proportional parameter (h) which represents variation of individual hearing, the CB' can be denoted as,
Figure imgf000038_0002
Or in exponential form,
Figure imgf000038_0003
We could get the individual critical bandwidth, CB', and critical band rate x' similarly as
Figure imgf000038_0001
Where,
Figure imgf000038_0004
As we already know A, lambda and b (Eq.1.5) of normal hearing, we can get the individual critical bandwidth, CB', by finding appropriate h. Figure 8a and figure 8b shows a result on Glasberg and Moore (1990) with the h parameters between 0.7 and 1.3 in 0.1 steps (h = 1.0 represents normal hearing). Figure 8a shows ERB plotted vs Frequency. Figure 8b shows ERB plotted against ERB-rate (i.e. bark). The darkest line i.e. .the black line indicates normal hearing (young) model i.e. line 802 on both figures 8a and 8b represents normal hearing. . Assuming that the exaggeration of the frequency selectivity was expressed with the proportional extent of width in harmony on basilar membrane, the inventors propose a proportional parameter (h) to represent the ratio of the exaggeration. The equivalent rectangular bandwidth (ERB) can be denoted as,
Figure imgf000039_0002
where ERB' and ERB. are the ERB of individual and normal hearing respectively, x represents distance on the basilar membrane and ERB-rate is used in the experiment, and a and b are parameters that characterize the straight line of the form of normal hearing (See Figure 1 below), h = 1.0 indicates normal hearing. As a and b are known, ERB' can be calculated by finding an appropriate h.
Frequency selectivity Rb for individuals can be described as division of the individual ERB (ERB') by average ERB of normal hearing (ERB.) at each frequency:
Figure imgf000039_0001
where fc and a denote a center frequency and sensation level of auditory filter respectively. Figure 8c shows a plot of ERB in Hz vs ERB rate for an h parameter between 1.10 and 0.9 with intervals of 0.1.
4.3.1.2 Masking index and individual masking index
Masking index is denoted with "tone-masking-noise" (Schroeder, 1979) and "noise- masking-tone" (Zwicker and Fasti, 2006). In this test, the inventors fixed the parameter the same as normal hearing defined in ISOl 1172-3 due to no data on hearing loss.
4.3.1.3 Spreading function
The presence of a strong noise or tone masker creates an excitation of sufficient strength on the basilar membrane at the critical band location to effectively block transmission of a weaker signal. Inter-band masking has also been observed, i.e., a masker centred within one critical band has some predictable effect on detection thresholds in other critical bands. This effect, also known as the spread of masking, or spreading function, is modelled in ISO 11172-3 Psychoacoustics I and II. Figure 9 shows them and one more mathematical model suggested from Moore and Glasberg (1987) using roex filter (Patterson et al., 1982). Each of the models is plotted with a different line as shown in figure 9. Figure 9 in particular shows the spreading functions that are modelled based on the models described above. Patterson et al. (1982) suggested auditory filters as a family of the form of an exponential with a rounded top, called 'roex' for brevity. The simplest of these expressions was called the roex(p) filter shape. It is convenient to measure frequency in terms of the absolute value of the deviation from the center frequency of the filter, f_0, and to normalize this frequency variable by dividing by the centre frequency of the filter. The new frequency variable, g, is:
Figure imgf000040_0002
The roex(p) filter shape is then given by:
Figure imgf000040_0003
where p is a parameter which determines both the bandwidth and the slope of the skirts of the symmetrical auditory filter auditory filter. The higher the value of p, the more sharply tuned is the filter. The equivalent rectangular bandwidth (ERB) is equal to 4f0/p.
Figure imgf000040_0001
By our definition shown in Eq. 2.2, individual ERB can be denoted by
Figure imgf000040_0004
It is assumed that ratio of critical bandwidth amongst hearing would be linearly similar as well as the slope of roex filter and spreading function. Thus, individual spreading function vf can be suggested as
Figure imgf000041_0002
where, β' = α * β. When h = 1, vf = vf. Therefore, β' = 1. Finally, the spreading function is a function of ERB and it can be expressed as,
Figure imgf000041_0003
In this test, h was used to realize automatic control from the critical bandwidth fitting (h) as well as masking indexes.
Figure 10 shows the ratio of vf/vf among h = 0.7 to 1.3 in a 0.1 step, where calculation of vf is provided by Moore and Glasberg, 1987. Data of normal hearing is determined by Glasberg and Moore, 1990. Figure 10 shows the ratio of spreading functions.
4.3.1.4 Frequency selectivity Rb
To represent degrade of frequency selectivity, Rb (Nakaichi and Sakamoto, 2007) was suggested as division of ERB of individual participant by average ERB of normal hearing at each frequency. Rb is described as below
Figure imgf000041_0001
(fc and x denote center frequency and sensation level of auditory filter respectively.)
Rb is a reciprocal of (vf )Vf The higher Rb, the broader ERB while the higher (vf )Vf (higher the value of (ρ' )φ), the narrower ERB (more sharply tuned is the filter). Figure 11a shows a model of Rb (i.e. the degrade of frequency selectivity). Figure 1 la is a plot of a model of Rb vs frequency for different h values. As can be seen from the model the degrade value changes as the frequency increases for different h values. Figure 1 lb shows test results of Rb for various frequencies and for various h values of users, h values of 1.15 and 1.20 have been omitted due to only a single sample being obtained.
4.3.2 Speech enhancement
A mathematical model of fitting of individual critical bandwidth and a speech
enhancement by spectral subtraction using the customized global masking threshold were established.
The modification of the psychoacoustic model (as described above) is to individualize auditory filters that automatically customizes spreading function to calculate appropriate global masking threshold for each user. The speech enhancement works to eliminate frequency bins that were not only inaudible but also it would reduce total energy in critical bandwidth that would benefit for hearing loss' audibility to detect sound in interest without masking, such as formants and consonants in speech in noise and the same in vocal with various harmonic/non harmonic peaks of background musical instruments in music. The results indicate this speech enhancement provide benefits on loudness.
It was hypothesized that this speech enhancement evokes less unnecessary excitation pattern and improves audibility of simultaneous sounds, which distinguish from intensity and pitch.
This speech enhancement is considered not only for hearing aid DSP but also that it can be independent from hearing aid DSP and working as the front-end (pre-processing) as it would increase SNR around peaks before they are compressed by recruitment of hearing loss. Furthermore, appropriate volume control (e.g. equalize RMS) would increase speech intelligibility or audibility of the peaks thanks for the benefit of this SNR improvement.
4.3.2.1 Fitting critical bandwidth and sub-band synthesis At the first step, it is required to get critical bandwidth over frequencies. Model suggested in previous section is used but it is not limited and approximated values by some measurements were also acceptable. Got critical bandwidth parameter (h) with the model expressed with A and lambda, the new Critical Bandwidths CB' and Critical Band Rates x' can be solved by equation 2.3 - 2.6. New frequency boundaries in the critical bands are then updated to re-align sub- bands. With the new critical band width and critical band rate, the spreading function is also updated. The calculative spreading function is provided by B.C.J. Moore and
B.R. Glasberg (1987). Figures 12a and 12b show the modelling of ERB curves plotted over a range of frequencies. Figure 12a shows the ERB across frequencies. Figure 12b shows the E(bark) scale across frequencies. The dotted line in figures 12a and 12b shows the ERB curve calculated by the Moore and Glasberg (1990) method. The solid line represents the model of a user according to the present invention. The model is derived for a h value of 1.2. Figure 12b shows the same data as figure 12a plotted on E (bark) scale. As can be seen from at least figure 12a the model according to the present invention provides a model of ERB at least as accurate as the Moore and Glasberg model. The present invention is advantageous since ERB does not need to calculated as part of the fitting method, hence providing a faster fitting method.
Figure 13 shows relative responses in decibels for spreading function among h = 1.0, 1.1 and 1.2. h = 1.0 indicates normal hearing. Psychoacoustic model I and II in IS011172-3 are also shown for comparison. Figure 13 shows a plot of the spreading function with a deviation of h, wherein h = 1.0, h = 1.1 and h = 1.2
4.3.2.2 Finding of tonal and non-tonal components
Find tonal and non-tonal (total power in critical bandwidth) components. For calculating the global masking threshold, it is necessary to derive the tonal and the non-tonal components from the FFT spectrum. In ISO 11172-3, the tonality is judged if a local maximum was X(k) - X(k+j) >= 7 dB. The index number k of the spectral line and sound pressure level Xtm (k)=X(k- 1)+X(k)+X(k+1), in dB were listed. The non-tonal (noise) components are calculated from the remaining spectral lines. Within each critical band, the power of the spectral lines is summed to form the sound pressure level of the new non-tonal component corresponding to that critical band. Index number k of the spectral line nearest to the geometric mean of the critical band and sound pressure level Xnm (k) in dB were listed.
4.3.2.3 Compute individual masking thresholds
Individual masking thresholds are given by summing of sound pressure level of the masking component, masking index and masking function (spreading function), at corresponding critical band rate and computes on surrounding critical bands and there are calculated for both tonal and non tonal components.
In ISO 11172-3, the psychoacoustic model I denotes that the individual masking thresholds of both tonal and non-tonal components are given by the following expression:
Figure imgf000044_0001
In this formula LTtm and LTnm are the individual masking thresholds at critical band rate z(i) in Bark of the masking component at the critical band rate z(j) in Bark. The term Xtm[z(j)] is the sound pressure level of the masking component with the index number j at the corresponding critical band rate z(j). The term av is called the masking index and vf the masking function of the masking component Xtm[z(j)]. The masking function vf is also so called as spreading function. The masking index av is different for tonal and non- tonal masker (avtm and avnm).
As described in the previous chapter, we fixed the avtm and avnm same as normal hearing defined in ISOl 1172-3 due to no data on hearing loss. Spreading function vf is calculated by equation 3.1. Figure 13 shows the spreading function with the derivation of h = 1.0, 1.1 and 1.2.
4.3.2.4 Compute Global masking thresholds
Global masking threshold, is found by summing the powers corresponding to individual masking thresholds and threshold in quiet (absolute hearing threshold) The global masking threshold LTg(i) at the i'th frequency sample is derived from the upper and lower slopes of the individual masking threshold of each of the j tonal and non- tonal maskers, and in addition from the threshold in quiet LTq(i). The global masking threshold is found by summing the powers corresponding to the individual masking thresholds and the threshold in quiet.
Figure imgf000045_0001
4.3.2.5 Spectral subtraction
Frequency bins lower than the global masking threshold at corresponding sub band is subtracted by spectral subtraction method. The attenuation was lOdB. Figures 14a and 14b show a sample processing result with a 16-bit input of continuous complex sound which is composed of five tones and white noise. The tone frequencies are 440, 880, 1320, 1760 and 2200 Hz and the sampling frequency is 44,100Hz. Glasberg and Moore 1990 is used for the psychoacoustics model. Frequency response in FFT analysis (FFT size = 1024. FFT shift size = 512) with h = 1.0 (upper figure) and 1.2 (lower figure) were indicated. This shows typical processing effect, for example, noise peaks (indication by x) between the 1st and 2nd tones are suppressed by the processing with h = 1.2. Figure 14a and 14b show the dashed line being the original signal and the solid line represents the processed signal. Figure 14a shows a result with a h value of 1.0. Figure 14b shows a result with a h value of 1.2. The original signal is shown by line 1402 (i.e. the dash dot line). Line 1406 denotes the sum of the global masking threshold (i.e. the sum of the individual masking thresholds). Line 1406 shows a line of the masking thresholds.
Figures 15a and 15b show the 1/3 octave analysis of an audio frame. The band between 1st and 2nd tones and other 'valley' bands are suppressed while bands including tones are not changed. The dark bars show the original sound signal and the white bars (i.e. light coloured bars) show the processed audio frame with a h value of 1.2
Figure 16 shows spectrograms of 'ba' (S-67 Table No. l, crop from CD2 Track 209, 1.15.800-1.16.200) used in this testing. Referring to figure 16, the upper left quadrant 1602 shows original speech. The upper right quadrant 1604 shows processed speech with an h value of 1.2. The lower left quadrant 1606 shows original speech with noise. The lower right quadrant 1608 shows the processed speed with noise, having an h value of 1.2. As can be seen from at least the lower right quadrant 1608, the spectral subtraction helps to enhance the speech signal over the noise signal.
Figures 17a and 17b show spectrograms of music used in this testing (No.47, RWC- MDB-P-2001, RWC Music Databese, Goto et al., 2002). Figure 17a shows original speech with noise in it as a signal. Figure 17b shows processing using the improved processing method described herein. In particular figure 17b shows processed speech with noise with a h value of 1.2. Noise herein means signals that are not the signal of interest e.g. speech signals. The figures 16, 17a, 17b show an improvement of the listenability of the speech signal.
Praat version 6.0.21 (www.praat.org) were used to create the Figures 16, 17a and 17b.
Total band power of processed audio was decreased than that of original (unmodified) audio. If unnecessary (masked) frequency bins were not eliminated, all sound vibrates the basilar membrane when it arrives at the peripheral stage and spread greater excitation patterns that would lead a greater difficulty in picking out contrast between the peaks and dips of the sound. This speech enhancement would filter frequency components below global masking threshold at the input (or the front-end) stage so that greater excitation patterns in worse signal and noise ratio would never happen.
4.3.3 Critical Bandwidth and critical band rate
Aspects of the quick fitting methods will now be described in more detail in at least sections 4.3.3 - 4.3.6. These descriptions are more generalised descriptions that provide explanation of the various aspects of the quick fitting methods described herein. The above sections provide additional explanations and/or examples to supplement the foregoing disclosure.
Functions of critical bandwidth can be derived based on the assumption that critical bands represent equal distances on the basilar membrane and that critical bandwidth increases exponentially with distance from the helicotrema. The equation for this straight line of the form (CB refers to critical bandwidth) is
or in exponential form,
Figure imgf000047_0002
where CB refers to critical bandwidth and x represents distance on the basilar membrane measured in critical bands as the unit of distance. The x is also called critical band rate (Bark).
In one embodiment, an h value is defined as a proportional parameter which represents a variety of individual hearing, the CB' can be denoted as,
Figure imgf000047_0003
Therefore,
Figure imgf000047_0004
Where a' = h * a, and b' = h * b. The critical band frequency scale was constructed by laying critical bands end to end, thus integrating the equation (1) got a revision,
Figure imgf000047_0005
Where,
Figure imgf000047_0001
Thus, the critical bandwidth, CB, in the function of frequency f, is expressed as,
Figure imgf000048_0001
Critical bandwidth of individual hearing, CB', is expressed as,
Figure imgf000048_0002
Since the length of the basilar membrane is 35mm, and the upper end point of the 35th critical band in normal hearing falls at 20 655 Hz, indicating that one critical band equal to approximately 1mm.
In hearing loss, according to equation (3), the critical band rate (x) at a frequency would be smaller, indicating one critical band longer than 1mm and total number of critical band would be reduced. By deforming equation (3), x can be calculated at a frequency,
Figure imgf000048_0003
a and b have been calculated to be a = 0.06 and b = 1.3592. Thus, an individual's critical bandwidth, CB', can be calculated by finding an appropriate h value. Figures 18a and 18b show critical bandwidths of individual hearing with h=0.7 1602, h=0.8 1604, h=0.9 1606, h=l 1608, h=l . l 1610, h=1.2 1612, h=1.3 1614. Figure 18a plots the bandwidths against frequency and Figure 18b plots the bandwidths against bark.
4.3.4 Masking Index
Masking index is denoted with "tone-masking-noise" and "noise-masking-tone".
In ISOl 1172-3, for tonal maskers it is given by avtm = - 1.525 - 0.275 * z(j) - 4.5 dB, And for non-tonal maskers
Figure imgf000049_0003
For hearing loss the masking levels increase. Thus, an individual masking index which critical bandwidth is expressed in equation (2) is:
Figure imgf000049_0002
Defined an independent parameter hav = h/γ , the both masking indexes can be denoted as,
Figure imgf000049_0001
In this embodiment, hav = h is used to realize an automatic control from the critical bandwidth fitting (h).
Figure 19 show masking indexes among h = 0.7 to 1.3 in a 0.1 step. The solid lines dotted avtm' with h=0.7 1702, h=0.8 1704, h=0.9 1706, h=1.0 1708, h=l . l 1710, h=1.2 1712, h=1.3, 1714. The dotted lines plot avnm' with h=0.7 1722, h=0.8 1724, h=0.9 1726, h=1.0 1728, h=l . l 1730, h=1.2 1732, h=1.3, 1734.
It is noted that this fitting of masking indexes is directly expected to help audibility of peaks of tones and noise in high frequencies that would not be masked, such as consonants in monosyllable or harmonic or inharmonic components in music
instruments. 4.3.5 Spreading Function
In ISO 11172-3, the psychoacoustic model I denotes that the masking function (or so called spreading function or excitation pattern) vf of a masker is characterized by different lower and upper slopes, which depend on the distance in Bark dz = z(i)- z(j) to the masker. In this expression i is the index of the spectral line at which the masking function is calculated andj that of the masker. The masking function, which is the same for tonal and non-tonal maskers, is given by:
Figure imgf000050_0004
In these expressions X[z(j)] is the sound pressure level of the j'th masking component in dB.
The simplest expression of auditory filter that was called as roex(p) filter shape, where p is a parameter which determines both the bandwidth and the slope of the skirts of the auditory filter. The higher the value of p, the more sharply tuned is the filter. The equivalent rectangular bandwidth (ERB) is equal to 4f0/p. (fO : center frequency of the filter)
Figure imgf000050_0001
There was a strong negative correlation (-0.75) between the pass-band parameter p and age at all three signal frequencies of 0.5, 2.0, and 4.0 kHz (Patterson et al, 1982).
It is assumed that ratio of critical bandwidth amongst hearing would be linearly similar as well as the slope of roex filter and spreading function. Thus, individual spreading function vf with CB' can be suggested as
Figure imgf000050_0002
where, β' = α * β. When h = 1, vf = vf. Therefore β' = 1. This equation can be expressed as,
Figure imgf000050_0003
By replacing h to a parameter hvf, vf can be controlled independently. In this test, h was used to realize an automatic control from the critical bandwidth fitting (h) as well as masking indexes. Figure 20 shows the ratio of with line showing h=0.7 1802, h=0.8 1804, h=0.9
Figure imgf000051_0001
1806, h=1.0 1808, h=l . l 1810, h=1.2 1812, h=1.3 1814. 4.3.6 Measuring h
Various embodiments of the GUI assessment tools can be seen in Figures 21a-23. These example quick fitting embodiments use sets of steps and applications for determining the single hearing characteristic or single configuration parameter (h value) of an individual. It will be appreciated that the quick fitting example embodiments' steps and GUIs may be used independently or in combination with any number of the other quick fitting embodiments. The h value may represent a number of different things depending on the example provided. The h value may directly or indirectly represent an individual listener's hearing. The h value may, alternatively or additionally, be used as an input to a function to represent an individual's hearing. The h value may, alternatively or additionally, be used as, or be, an index to a group of pre-calculated/known hearing characteristics appropriate for different hearing impairments. The computer programs and interfaces described below are implemented on a general purpose desktop computer. The steps above may be carried out directed by a user while under direction of a clinician or by following a manual. Alternatively, the adjustment steps may be automated and require that the user simply input their responses to each adjusted sound. It will be appreciated by those skilled in the art that these programs could be implemented on any computing device comprising an interface, not just the general purpose desktop computer shown in the examples. It will be appreciated that the GUI assessment tool could be implemented in any form of application program and may be accessible via any suitable platform. The application program could execute on a general purpose computer, or a website application program, or a smart phone application program for example. It will also be appreciated that the adjustable user interface elements of the GUI shown in the following examples are illustrative only and may be any other adjustable graphical user interface elements suitable for adjusting parameters including but not limited to: toggle switches, drop down menus, check boxes, radio buttons, numerical inputs, slider scales, dials or the like.
4.3.7 First Quick Fitting Example Embodiment
In this example embodiment, 10 consonant-noun-consonant (CNC) stimuli were mixed with white-noise at optional SNR values (E.g. -5, 0 and 5 dB SNR) with or without processing from seven h values. A computer program assessment tool comprising a GUI of multiple panes and views 1900 and 1950 is configured to assist in this testing. An example embodiment of the psychoacoustic GUI assessment tool can be seen in Figures 21a and 21b. In this embodiment, to estimate individual CB, and thus what h value to use, speech scores across 10 CNC words in white-noise for each h value at an SNR value where the participant achieves a preference score (from 1 :poor to 5:excellent) were compared (~2 minutes testing time). The noise component could be enabled/disabled using a toggle switch 1902 on the GUI 1900. The processing can be enabled/disabled using a toggle switch 1904 and the volume could be adjusted using a slider 1906, both present on the GUI 1900. Using the GUI 1900, the user or clinician conducting the test with a user is able to manually adjust SNR and h value for processing of selected speech-in-noise files using a wheel 1908 and slider 1910 respectively.
The fitting GUI pane 1950 of the computer program was used to approximate individual CB or h. The fitting GUI pane 1950 allows the user to play back randomly selected pre- processed CNC words in white-noise at a selected SNR. A folder 1952 containing 10 CNC words in white-noise corresponding to the participants -50% score SNR value was selected on the GUI 1950. The 10 stimuli were processed either with the four different h values, or had no processing (50 stimuli in total). The unprocessed stimuli were used to see if h value processing produced significantly different scores to the original sound file. Phonemes and words correct were scored and the h value yielding the highest overall score was noted. 4.3.8 Second Quick Fitting Example Embodiment
Figure 22 shows another GUI interface 2000 for a second method of quick fitting.
Different listening data may be loaded into this interface by using the "Load Data" button 2002. The location of the current listening data being presented in GUI 2000 is in text box 2004. Metadata and other information about the data file is presented under the title "Information" 2006. This metadata includes the date the test data was recorded, basic information about the listened and details on the processing and frequencies of the audio and being tested.
Interface 2000 comprises a graph display 2052 for showing an individual's listening characteristics at particular frequency points 2054. Previous to a clinician using this tool 2000, the user's listening capabilities were tested on the following frequencies: 250, 500, 1000, 2000, and 4000Hz. The listening assessments were performed using standard listening tests at particular frequencies. It will be appreciated than any number of frequencies and other ranges or values of frequencies may be chosen. It will be appreciated also that assessing more frequencies will give more information to assist the clinician in their assessment, however will take longer to assess overall. Also shown on the graph is dashed curve fitting line 2056. This line 2056 is generated using standard curve fitting techniques. The model line 2058 represents a standard user's listening ability using a standard, un-customised psychoacoustic model. The "fit" line 2060 is also shown on the graph. This fit line 2060 is discussed further below.
This interface 2000 has two primary purposes, firstly to show a user or physician the hearing characteristics of an individual and also to attempt to fit an appropriate critical band width ratio such that the fit line 2060 matches the user's listening ability as close as possible. Using a critical bandwidth ratio of 1.0 will offer no modification to an undamaged user's hearing characteristics and as such the fit line 2060 will match the model line 2058. For users with damaged hearing, such as the user presented in the graph display 2052 with listening points 2054, a critical bandwidth ratio (or h value) greater than 1.0 must be used to match the fit line 2060 as close as possible to the curve fitting line 2056. Slider 2008 can be moved to modify the critical bandwidth ratio. Interface 2000 has a critical bandwidth ratio of 1.5 selected as shown in information box 2006 under the value "CBR".
Once the fit line 2060 has been matched as best as possible to the user's listening data and therefore an appropriate critical bandwidth ratio has been found, a clinician or a user may export the data using the "Export Results" button 2010. The exported data contains the critical bandwidth ration, or h value, which is then used to generate or modify a custom psychoacoustic model.
4.3.9 Third Quick Fitting Example Embodiment
Figure 23 shows a GUI interface 2100. This interface 2100 has similar features to the second quick fitting example embodiment. These similarities include the "Load Data" button 2102, the file location text box 2104, the information section 2106, the slider 2108 and the "Export Results" button 2110. These GUI components all perform similar functions as in the second quick fitting example embodiment.
This embodiment uses frequency selectivity (Rb) to determine h. An individual's frequency selectivity is the ratio between the ERB of that individual (called ERBi) to the ERB of a person of normal hearing (called ERBn). It can be calculated using the following equation:
Figure imgf000054_0001
where fc denotes centre frequency and x denotes sensation level. Prior to using the tool shown in Figure 23, absolute thresholds of hearing (ATH) and
ERB were measure. In the current example, 5 frequencies were assessed: 250, 500, 1000, 2000, and 4000Hz. The listening assessments were performed using standard listening tests that are used to assess a user's ATH and ERB at particular frequencies. It will be appreciated than any number of frequencies and other ranges or values of frequencies may be chosen. It will be appreciated also that assessing more frequencies will give more information to assist the clinician in their assessment, however will take longer to assess overall. The clinician, loads the data obtained from the standard tests using the "Load Data" button 2102 and the file location text box 2104. The graph display 2152 shows the user under test's Rb data points 2154 represented as circles. These data points 2154 are calculated using the equation for Rb above and inputting a normal user's ERB for the given frequencies and the user under test' s ERB results.
The clinician then adjusts the Critical Bandwidth Ratio slider 2108 so that the critical bandwidth ratio line 2160 is close to the average Rb. It will be appreciated that this step may be automated and/or replaced by a mathematical algorithm to auto-fit a value to a data set.
It should be appreciated that the quick fitting methods described herein can be used by clinicians to fit a hearing aid to a hearing impaired person. Similar fitting methods may also be used for fitting cochlear implants. The fitting methods described herein are advantageous because they provide a faster fitting method and provide a customised profile i.e. customised model for each user. The additional sections 4.3.1.1 - 4.3.1.4 and 4.3.2.1 - 4.3.2.5 provide additional examples and explanation to supplement the herein described fitting methods.
5 Realising a Custom Psychoacoustic Model
With an approximation of an individual's hearing impairment a custom audio processor based on the individual's hearing can be generated or modified. The approximation of the individual's hearing may come in the form of an h value or psychoacoustic model.
Alternatively, an audio processor capable of receiving an individual's h value, psychoacoustic model, or other data indicative of a user's hearing at run time may also be used. It will be appreciated by those skilled in the art that any encoder used could be modified to receive data indicative of an individual's psychoacoustic model at run time as a configuration option or using any other method of receiving or retrieving data when in use. The following examples refer to the psychoacoustic model ISOl 1172 which is used in the MPEG-Audio standard. In particular, the TwoLAME implementation of the MPEG- Audio standard is used. Other psychoacoustic models, encoding standards, and implementations thereof are known in the art. A person skilled in the art will appreciate that other psychoacoustic models and encoding standards or implementations may also be modified in a similar way to incorporate a user's individual hearing characteristics.
Figure 3b shows a modified MPEG- Audio encoder 350 using a custom psychoacoustic model 354. The customisation has been based off psychoacoustic assessments 360. The customised psychoacoustic model 354 is modified to use filtering techniques for the individual parameters. This MPEG- Audio encoder 350 takes digital audio on the input 362 and outputs an encoded bitstream 364. This encoded bitstream could be stored as a lossy digital audio file, such as an MP3 or MP2 file, on a computer readable medium. A person skilled in the art will appreciate that the encoded bitstream 364 is not limited to just storing the audio and could be used in any audio processing system, including a hearing aid audio processing chain.
In the following Full Psychoacoustic Method and Quick Fitting Method example embodiments, the audio encoder "TwoLAME" has been modified at compile time to be customised to the individual's hearing.
Both of these example embodiments are for illustrative purposes only. It will be appreciated to a person skilled in the art that any other encoders containing a
psychoacoustic model could be modified or an entirely new one written. It will also be appreciated that person skilled in the art could implement a psychoacoustic model audio processing block separate from an audio encoder to form a standalone psychoacoustic audio processor. TooLAME is a free software MPEG-1 Layer II (MP2) audio encoder written primarily by Mike Cheng, and is an exemplary audio encoder. While there are innumerable MP2 encoders, TooLAME is well-known and widely used for its particularly high audio quality. It has been unmaintained since 2003, but is directly succeeded by the TwoLAME code fork (the latest version, TwoLAME 0.3.13, was released January 21, 2011). 5.1 Full Psychoacoustic Method
In this "Full Psychoacoustic Method" embodiment, the TwoLAME audio encoder is modified to support individual psychoacoustic characteristics.
Also the original TwoLAME code was modified to support binaural hearing setting. Each independent left and right ear's psychoacoustic characteristics is implemented in the ATH, BARK and spreading function.
Decimation was not implemented in the original TwoLAME. ISO 11172-3
Psychoacoustic model 1 described "step 3 : decimation and reorganization of maskers". The number of maskers is reduced using two criteria: 1. Any tonal or noise maskers below the absolute threshold are discarded.
2. Sliding 0.5 bark-wide window is used to replace any pair of maskers occurring within a distance of 0.5 bark by the stronger of the two.
Below is a summary of the all modifications made.
1. PSYCHOACOUSTIC DATA - Common.h was modified to import individual psychoacoustics data. This header file should be updated before compile as individual application.
2. ATH - ath_dB() in ath.c is modified to support individual hearing loss. The ATH is calculated by using coefficients of a polynomial expression which is estimated from 125-16000Hz. 3. BARK - ath_freq2bark() in ath.c is modified to support individual hearing loss. The BARK is calculated by ERBrate fitting parameters of Greenwood, 1961 (A and lambda).
4. SPREADING FUNCTION - psycho_3_threshold() in psycho_3.c is modified to support individual hearing loss. The spreading function is derived by the ERBrate fitting parameters and ERB fitting parameters. 5. MASKING INDEX - Individual masking index for tone (TMN) and noise (NMT) is newly implemented in the above SPREADING FUNCTION algorithm.
6. DECEVIATION - Decimation code in psycho_3_decimation() in psycho_3.c.
A spectral subtraction process can also be implemented to improve the processing of received audio signals to improve the listenability of the received audio signals. The spectral subtraction process also helps to remove signals i.e. frequency bins that are below a global masking threshold, thereby improving the listenability of speech or harmonic tones when there is additional non tonal signals present. The spectral subtraction method is customised for each user based on their psychoacoustic model. An exemplary spectral subtraction method is described with reference to figures 28a-28g. The spectral subtraction process described herein is similar to the earlier described spectral subtraction process. The description with respect to figures 28a-28g provides a step by step explanation of the spectral subtraction process.
Referring to figure 28a the line represents an original spectrum. Harmonic tones 440 Hz, 880 Hz, 1320Hz, 1760Hz and 2200Hz were used with white noise added. The line 2802 represents the original spectrum of signal i.e. harmonic tones and noise components. The dots 2810, 2812, 2814, 2816 and 2818 are the harmonic tones disclosed. The dots 2820 (i.e. the light coloured dots) represent peaks of white noise. Dot 2830 represents an eliminated tonal sound. Dots 2810 - 2818 are tonal sounds and dots 2820 are non tonal sounds. For clarity the white noise peaks 2820 are not identified on the rest of the figures.
Figure 28b shows individual masking thresholds being determined. Individual masking thresholds are given by summing sound pressure levels of the masking component, masking index and masking function (i.e. spreading function), at a corresponding critical band rate. The various masking thresholds for each tonal and non tonal component are calculated. These can be calculated as described in section 4.3.2.3 as described above using the mathematical formulae described in section 4.3.2.3. The masking thresholds are represented by lines 2830, and are shown in faint lines for clarity. Only 3 masking thresholds have been labelled for clarity. Figure 28c shows a global masking threshold line 2840 that is overlaid onto the original spectrum line. The global masking threshold is preferably determined by summing the powers corresponding to individual masking thresholds i.e. the global masking threshold is determined from the sum of individual masking thresholds. The global masking threshold is represented as a dashed line.
Figure 28d shows the processed spectrum. The processed spectrum is represented by line 2850. The processed spectrum is 2850 is determined by spectral subtraction. The processed spectrum 2850. In the illustrated example of figure 28d, there is a lOdB attenuation under the global masking threshold. For example Leq: -4.4dB in figure 28d. This exemplary processing using spectral subtraction has a large impact for hearing loss due to improved recruitment of frequency bins i.e. improved recruitment of the cochlea of the user.
Figure 28e shows a one third octave spectrum for normal hearing. Figure 28b shows an averaged ERB. Figure 28f shows a one third octave spectrum for a hearing impaired user i.e. a user with hearing loss. As can be seen in figure 28f a user with hearing loss has a wider ERB. The raw signal (i.e. prior to filtering) is shown by bars 2860. The filtered parts i.e. where the output is reduced is shown by bars 2862. As shown in figure 28e the total power (Leq): input = 65.8 dB and the output is 63.0 dB. As shown in figure 28f the total power (Leq): input = 65.8dB and the output is 61.7dB.
5.2 Quick Fitting Method
As described above, the Quick Fitting Method produces data indicative of a user's critical band bandwidth. This data indicative of a user's critical band bandwidth is called an h value. Also described above is using the h value to modify the functions which comprise a psychoacoustic model. It will be appreciated to a person skilled in the art that in some embodiments a psychoacoustic model can be modified similarly to the full
psychoacoustic model above and follow the equations outlined in section 'Full
Psychoacoustic Method' discussed immediately above. Using steps from the modified psychoacoustic model as discussed earlier, spreading functions for tone and noise peaks are calculated for an individual's level of hearing. Sound energy determined 'inaudible' is removed from the signal, leaving the new (i.e. processed) frequency response as seen in Figures 14a, 14b. The frequency response is denoted by line 1404 (i.e. the solid line). The original signal is shown by line 1402 (i.e. the dash dot line). Line 1406 denotes the sum of the global masking threshold (i.e. the sum of the individual masking thresholds). Line 1406 shows a line of the masking thresholds.
6 Using a Custom Psychoacoustic Model Both of the customised encoders from the "Full Psychoacoustic Model" or "Quite Fitting Method" example embodiments can be used in a number of different digital signal processing chains. The audio encoders above are capable of identifying and removing or at least partially attenuating inaudible and/or unhelpful spectral components of the audio files provided. 6.1 Pre-Processing Audio or "Offline Mode"
It's possible to use any of the encoders presented in the previous sections to pre-process audio some time before a listener might want to listen to it. The pre-processed audio may be saved in many different formats, such as, but not limited to: digital audio files (MP3, FLAC, or WAV for example), CDs, DVDs, Blu-rays, vinyl, and cassette tapes. These pre-processed audio formats may then be delivered to an end user as-is. The end user may play the audio without using any specialised hardware or software. Alternatively, the pre- processed audio may also be streamed to a user directly without being saved into an intermediate format. This alternative has the same advantage of not requiring specialised hardware or software at the end's end. By pre-processing the audio into a standardised format, the end user is not required to operate the complex task of encoding the audio themselves. This both simplifies the process. Pre-processing audio also saves time as the audio processing doesn't need to occur whenever a user wants to listen to the audio. An example of pre-processing audio is in the pre-prepared audio samples used above as described in the section titled "Measuring h". In this example, multiple different processes were applied to the same audio samples.
6.2 Hearing Aid DSP chain or "Real Time Mode" A further usage of these encoders is to improve the listenability of an audio signal in a hearing aid audio processing chain or any other "real time" applications.
Hearing aids require fitting for their signal processing. The standard hearing aid fitting process may also include a fitting process in accordance with the embodiments present above to generate or modify a custom psychoacoustic model. In one embodiment, the quick fitting system can control the sloping of the spreading function (vf) by a control slider controlling the critical bandwidth via the GUI assessment tool described above Alternatively, the control slider may control the frequency selectivity from which the critical bandwidth may be calculated. In some embodiments, this processing may improve SNR around peaks followed by Wide Dynamic Range Compression (WDRC) in the hearing aid. Appropriate volume control of hearing aid increases speech intelligibility or audibility of the peaks due to the benefits of the SNR improvement. In some embodiments, the quick fitting system is also configurable or operable to adjust independently between the critical bandwidth, masking index and spreading function. In some embodiments, the fitting step will be performed once and saved into a memory location in the Hearing Aid DSP chain system. In other embodiments, the user may adjust the control slider during use to suit the situation or if the user's preferences change.
Figure 1 shows a suggested use of an encoder 102 with custom psychoacoustic model 106 in audio processing chain preceding a hearing aid 104. Also shown is the wide dynamic range compression (WDRC) processing block 108. The WDRC block 108 is an audio processing block commonly used in hearing aids (HA). A person skilled in the art will appreciate that other hearing aid processing blocks are known in the art. It will be appreciated that the order of the different processing blocks may also be changed depending on an individual user's preferences or the type of hearing aid used. As shown in Figure 2a, the encoder with custom psychoacoustic model can also work as a standalone audio processing block.
Data indicative of an individual's psychoacoustic model may also be used to create a standalone audio processor for use in any other real time audio processing applications. This will be useful for a listener with damaged hearing that may not have access to the original recording to modify it or may not be able to modify the original source. These other real time audio applications will be known to a person skilled in the art. Some examples may be: music and other audio CDs or vinyl, digital music or other audio stored on a user's device, movie sound tracks, radio broadcasts, internet radio, and music streaming services.
7 Advantages
The psychoacoustic models generated by both the first and second embodiments have demonstrated that they are capable of identifying and removing the inaudible components of the audio provided. Removing these inaudible sounds results in an improvement in the listening experience for a hearing impaired user.
The psychoacoustic models generated have both of the embodiments have generated modifications of the MPEG-Audio psychoacoustic software however other encoders or psychoacoustic models may be modified, or an entirely new psychoacoustic model software or hardware could be made using the principles of the first and second embodiments. An advantage of these embodiments of the invention is the flexibility and usage in different signal processing chains. Other encoder's psychoacoustic models could also be modified based on the same techniques described in this disclosure.
The flexibility also allows the psychoacoustic model to be used in combination with any other audio processing devices, or as a standalone processing block. As described above, either embodiment may be used as a front end to a user's hearing aid or as a standalone processing block taking arbitrary audio and outputting audio which has been improved based on an individual's particular hearing characteristics. The second embodiment allows for the process of assessing a user's hearing characteristics to be performed quickly and simply. Any time saved is good for both the user being assessed and the clinician performing the assessment. Using a simple, structured GUI also both speeds up the process by reducing mistakes and provides a more pleasant experience for the user and clinician. Other protocols for measuring critical bandwidth (or ERB), tone masking noise and noise masking tone take an unreasonable amount of time for practical use. These protocols needs numbers of measurements at a large range of frequencies. They also have difficulties or confusion in nature of psychoacoustics, to judge a just noticeable level between tone and tone or tone and noise with various conditions such as notch noise bandwidth. It can be considered that these facts are why psychoacoustic assessments are not popular in audiology clinics and thus hearing aid developments.
8 Experimental
8.1 Measurement and experiment
The inventor investigated equivalent rectangular bandwidth over five centre frequencies (0.25, 0.5, 1, 2, 4kHz), manipulated the psychoacoustic model and developed spectral subtraction that decreased large amounts of unnecessary powers resulting in both effective reductions of excessive loudness and associated frequency masking, and improving spectral contracts around the formants or timbre. The processing effect was investigated by testing speech intelligibility and musical preference (on loudness, fullness, clearness, naturalness and dynamics) with 34 elderly participants. The speech intelligibility (Japanese monosyllables of the speech-to-noise ratio by 5 dB) was degraded very slightly by the processing however there was no significant difference. Rating of the pop music, consisting of male vocal and musical instruments, revealed significantly higher ratings in the processed music conditions. In particular, loudness preference was significantly improved.
Testing was conducted in Japan with Japanese speech and music. The patient's right ear was used in the following tests. Absolute thresholds of hearing (ATH) and ERB at 250, 500, 1000, 2000 and 4000 Hz at the right ear were measured by a simplified measurement method of Nakaichi, Watanuki and Sakamoto, using HD-AF equipment of Rion Ltd, Japan with accompanying audiology headphones, AD-06B (Rion Ltd, Japan). ERB was measured with sound pressure levels of 30 dB above the ATH of the test frequency. Participants then proceeded to standard tests of speech reception threshold (SRT) and speech intelligibility (SI). Digits (2-9) in Japanese was presented in the SRT test. In the SI test, twenty Japanese monosyllables (67-S, Japanese Audiological Society, 1987) were presented with the audiometer dial of 30 dB above the SRT. Rb at each test frequency were calculated by Equation 2. Then the average was calculated by the mean of Rb at 500, 1000 and 2000 Hz. The index h was determined so that the average was fitted close to Rb at 1000 Hz by a graphical user interface software developed by the inventor.
The participants then proceeded to the speech intelligibility (SI) tests. These included 50 Japanese monosyllables (57-S, Japanese Audiological Society, 1983) mixed with white noise at the signal-to-noise ratio of 5 dB and were presented through the participants headphones. The order of the original and the processed sound sets was counterbalanced. The dial of audiometer was set to 30dB above the participant' averaged ATH (the mean of ATHs at 500, 1000 and 2000 Hz).
Finally, participants proceeded to a music preference test. A Japanese pop song of 2 minutes was played (No.47, RWC- MDB-P-2001) through the participant's headphones. This song consisted of musical equipment; two guitars, a base, a drum and piano, with a male vocal. The tempo was 94 beats per second. The original and the processed sound sets were played with a counterbalanced order. The dial of audiometer was set to 50 dB above each of the participants averaged ATH. The participants were asked to rate their preference from 1 (poorest) to 5 (best) for the following perceptual rating scales:
Loudness, Fullness, Clearness, Naturalness and Dynamics. A perfect perceptual reproduction score was 25 points (5 x 5 scales).
The data from the 34 participants' data was then analysed. Average age was 72.7 (SD=6.2). Average hearing level was 21.5 dBHL (SD=11.0). The correlation between the age and average hearing level was .482 (p< 01). Prior to this, 20 young adults with normal hearing (8 males and 2 females, average age was 23.4, SD=3.4) was investigated to obtain ERB. (a and b in Equation 1) to use as denominator in Equation 2.
8.2 Results and discussion 8.2.1 Verification of the frequency selectivity model
ATH was correlated moderate strongly with ERB (.661, p< 01), Rb (.666, p< 01) and h (.631, p< 01). The scatter data of ATH and Rb shown in Figure 24 indicates the values increasing proportionally with the increase of ATH up to 4.0. As shown in figure 24 a number of different values of the Rb of the auditory filter for elderly participants (i.e. test subjects).
The index h was strongly correlated with the ERB (.894, p< 01) and Rb (.903, p< 01). h is also moderate strongly correlated with ATH (.631, p< 01) and SRT (.675, p< 01). h is not correlated with SI.
Figures 25a and 25b show comparison of the frequency selectivity model and result of the average of Rb from our measurement data. Values in legend indicate h. Rb at 1000 Hz and lower frequencies were reasonably matched to the model, however Rb at 2000 and 4000 Hz were spread out. Figure 25a shows the frequency selectivity model and the measurement data is shown in figure 25b. The legend indicates the various h values used.
8.2.2 Results of the spectral enhancement Speech intelligibility was very slightly degraded by the processing. Paired sampled t-test was conducted and there was no significant difference (p=.33). Mean percentage of the speech intelligibility for the original speech was 58.0% (SD=.16) and that of processed speech was 56.0% (SD=.14). A Shapiro-Wilk test revealed normality, thus multiple regression analysis was conducted and revealed that age, ATH and h (shown in figure 26) significantly predict the speech intelligibility of the processed speech, F(3, 18)=10.833, p=.000). Moore indicated that speech intelligibility was worse when ERB was greater than normal by their hearing loss simulator and our results supported this. Further analysis using confusion matrix (classification the consonant and vowel by the place and manner of articulation) was conducted. Shapiro- Wilk revealed no normality in the speech intelligibility by classification of articulation, thus Wilcoxon's signed rank test was conducted to investigate processing effect on both consonants and vowels and there were no any significant differences. One potential reason of no processing effect on the intelligibility is the dominant non-tonal peaks of the consonants were not detected by the relatively stronger white noise (S/N = 5dB) mixed in simultaneously.
Total score of perceptual scales of music preference was significantly improved by the processing by 1.8 points of the total score 25 (p=.002), as shown in Figure 27. The total score of the original music was 16.3 (SD=2.9) while that of the processed music was 18.1 (SD=3.7). All perceptual scales were also improved by the processing. Wilcoxon analysis was conducted to reveal processing effects to each perceptual scale. This revealed loudness preference was significantly improved (z=3.076, p=.002). h did not correlate significantly with any perceptual scale and the total scores.
To conclude, this spectral enhancement by manipulating psychoacoustic model significantly improved music preferences, especially loudness, amongst elderly listeners.
9 General
Furthermore, embodiments may be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine-readable medium such as a storage medium or other storage(s). A processor may perform the necessary tasks. A code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc. In the foregoing, a storage medium may represent one or more devices for storing data, including read-only memory (ROM), random access memory (RAM), magnetic disk storage mediums, optical storage mediums, flash memory devices and/or other machine readable mediums for storing information. The terms "machine readable medium" and "computer readable medium" include, but are not limited to portable or fixed storage devices, optical storage devices, and/or various other mediums capable of storing, containing or carrying instruction(s) and/or data.
The various illustrative logical blocks, modules, circuits, elements, and/or components described in connection with the examples disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic component, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, circuit, and/or state machine. A processor may also be implemented as a combination of computing components, e.g., a combination of a DSP and a microprocessor, a number of
microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. The methods or algorithms described in connection with the examples disclosed herein may be embodied directly in hardware, in a software module executable by a processor, or in a combination of both, in the form of processing unit, programming instructions, or other directions, and may be contained in a single device or distributed across multiple devices. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD- ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. One or more of the components and functions illustrated the figures may be rearranged and/or combined into a single component or embodied in several components without departing from the invention. Additional elements or components may also be added without departing from the invention. Additionally, the features described herein may be implemented in software, hardware, as a business method, and/or combination thereof.
In its various aspects, the invention can be embodied in a computer-implemented process, a machine (such as an electronic device, or a general purpose computer or other device that provides a platform on which computer programs can be executed), processes performed by these machines, or an article of manufacture. Such articles can include a computer program product or digital information product in which a computer readable storage medium containing computer program instructions or computer readable data stored thereon, and processes and machines that create and use these articles of manufacture. The foregoing description of the invention includes preferred forms thereof.
Modifications may be made thereto without departing from the scope of the invention as defined in the accompanying claims.

Claims

1. A method of improving the listenability of an audio signal for a hearing impaired listener, the method implemented by a processing device having associated memory, comprising:
receiving or retrieving input audio signal;
receiving or retrieving listening data indicative of the hearing impaired listener's hearing characteristics;
generating or modifying a customised psychoacoustic model based on the listening data;
processing the input audio signal to identify and remove or at least partially attenuate inaudible or unhelpful spectral components in the input audio signal based on the customised psychoacoustic model; and
generating a modified output audio signal based on the processing that is customised for the hearing impaired listener.
2. A method according to claim 1 wherein the listening data contains all of the
parameters for generating or modifying the customised psychoacoustic model.
3. A method according to claim 1 wherein the listening data comprises a single hearing characteristic or single configuration parameter for generating or modifying the customised psychoacoustic model.
4. A method according to claim 3, wherein the single hearing characteristic or single configuration parameter is indicative of the listener's auditory filter bandwidth.
5. A method according to claims 3 or claim 4, wherein the listener's auditory filter bandwidth is a function of the single hearing characteristic or single configuration parameter.
6. A method according to claim 3 or claim 4, wherein the single hearing
characteristic or single configuration parameter indexes which auditory filter bandwidth of a selection of different auditory filter bandwidths approximates the listener's auditory filter bandwidths.
7. A method according to claim 3 or claim 4, wherein the single hearing
characteristic or single configuration parameter modifies a default auditory filter bandwidth.
8. A method according to claim 7, wherein the single hearing characteristic or single configuration parameter represents the listener's proportional difference between the default auditory filter bandwidth and the listener's auditory filter bandwidth.
9. A method according to claim 7 or claim 8, wherein the default auditory filter bandwidth is an average person's auditory filter bandwidth.
10. A method according to any one of claims 3-9, wherein the single hearing
characteristic or single configuration parameter is generated as output from an electronic psychoacoustic assessment system.
11. A method according to claim 10, wherein the electronic psychoacoustic
assessment system comprises a GUI.
12. A method according to claim 11, wherein the GUI comprises an adjustable
graphical user interface element which modifies a control variable.
13. A method according to claim 12, wherein the control variable is the single hearing characteristic or single configuration parameter.
14. A method according to any one of claims 12-13 wherein the GUI comprises a graph display.
15. A method according to claim 14 wherein the graph display comprises the user's listening assessment data.
16. A method according to claim 14 or claim 15 wherein adjusting the control variable adjusts a plot displayed in the graph display.
A method according to claim 15 wherein the plot displayed represents average frequency selectivity.
18. A method according to claim 17 wherein the user's single hearing characteristic or single configuration parameter is derived from the user's average frequency selectivity.
19. A method according to claim 15 wherein the plot displayed represent the user's single hearing characteristic or single configuration parameter.
A method according to any of claims 12-19 wherein adjustable graphical user interface element is one or more of: toggle switch, drop down menu, check box, radio button, numerical input, slider scale, or dial.
A method according to any one of claims 4-20, wherein the auditory filter bandwidth is a user's critical band bandwidth.
A method according to any one of claims 4-20, wherein the auditory filter bandwidth is a user's effective rectangular bandwidth (ERB).
23. A method according to any one of claims 3-22, wherein receiving listening data further comprises generating or determining additional listening data indicative of additional hearing characteristics of the listener based on the single hearing characteristic or single configuration parameter.
24. A method according to claim 23 wherein the additional listening data is indicative of any one or more of the following hearing characteristics of the listener: the listener's tonal masking index, noise masking index and/or spreading function.
25. A method according to any one of claims 3-24 wherein processing the input audio signal comprises:
fitting critical bands of audio of the input audio signal based on the listening data indicative of the listener's critical band bandwidth;
determining an individual masking threshold for each critical band of audio;
determining global masking thresholds based on the determined individual masking thresholds; and
spectrally modifying the input audio signal based on the determined global masking thresholds.
26. A method according to claim 25 wherein determining an individual masking
threshold for each critical band comprises:
determining a sound pressure level of a masking component in the critical band of the input audio signal;
determining at least one masking index based on the listening data indicative of the critical band bandwidth of the listener;
determining a spreading function based on the listening data indicative of the critical band bandwidth of the listener;
determining an individual masking threshold based on the determined sound pressure level of the masking component, the determined at least one masking index, and the determined spreading function.
27. A method according to claim 26 wherein determining the at least one masking index comprises determining the tonal masking index and the non-tonal masking index.
28. A method according to any one of claims 25-27 wherein spectrally modifying the input audio signal comprises:
calculating the signal-to-mask ratio in each critical band based on the global masking thresholds; and applying spectral subtraction to the input audio signal based on the global masking threshold.
29. A method according to any one of the preceding claims wherein generating or modifying the psychoacoustic model comprises: inserting the received listening data into source code representing the psychoacoustic model to be processed by the processing device.
30. A method according to any one of claims 1-28 wherein generating or modifying the customised psychoacoustic model based on the received listening data comprises loading the customised psychoacoustic model into memory from an external source.
31. A method according to any one of claims 1-28 wherein generating or modifying the customised psychoacoustic model comprises generating the psychoacoustic model in real-time based on the received listening data.
32. An audio processor configured to improve the listenability of an audio signal for a hearing impaired listener, the audio processor comprising a processor and associated memory, and which is configured to carry out the method according to any one of claims 1-31.
33. An audio processor according to claim 32 wherein the audio processor is provided in a hearing aid.
34. An audio processor according to claim 32 wherein the audio processor is provided as an application program executable on a programmable electronic device.
35. A computer-readable medium having stored thereon computer executable
instructions that, when executed on a processing device or devices, cause the processing device or devices to perform a method of any one of claims 1-31.
PCT/IB2017/056393 2016-10-14 2017-10-16 Audio-system and method for hearing-impaired WO2018069900A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
NZ725277 2016-10-14
NZ72527716 2016-10-14

Publications (1)

Publication Number Publication Date
WO2018069900A1 true WO2018069900A1 (en) 2018-04-19

Family

ID=61905194

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2017/056393 WO2018069900A1 (en) 2016-10-14 2017-10-16 Audio-system and method for hearing-impaired

Country Status (1)

Country Link
WO (1) WO2018069900A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190230438A1 (en) * 2018-01-25 2019-07-25 Cirrus Logic International Semiconductor Ltd. Psychoacoustics for improved audio reproduction, power reduction, and speaker protection
EP3598440A1 (en) * 2018-07-20 2020-01-22 Mimi Hearing Technologies GmbH Systems and methods for encoding an audio signal using custom psychoacoustic models
EP3718476A1 (en) * 2019-04-02 2020-10-07 Mimi Hearing Technologies GmbH Systems and methods for evaluating hearing health
US10871940B2 (en) 2018-08-22 2020-12-22 Mimi Hearing Technologies GmbH Systems and methods for sound enhancement in audio systems
US10966033B2 (en) 2018-07-20 2021-03-30 Mimi Hearing Technologies GmbH Systems and methods for modifying an audio signal using custom psychoacoustic models
US10993049B2 (en) 2018-07-20 2021-04-27 Mimi Hearing Technologies GmbH Systems and methods for modifying an audio signal using custom psychoacoustic models
US11153682B1 (en) 2020-09-18 2021-10-19 Cirrus Logic, Inc. Micro-speaker audio power reproduction system and method with reduced energy use and thermal protection using micro-speaker electro-acoustic response and human hearing thresholds
US11159888B1 (en) 2020-09-18 2021-10-26 Cirrus Logic, Inc. Transducer cooling by introduction of a cooling component in the transducer input signal
WO2021249611A1 (en) * 2020-06-08 2021-12-16 Huawei Technologies Co., Ltd. A control device for performing an acoustic calibration of an audio device
EP3614379B1 (en) 2018-08-20 2022-04-20 Mimi Hearing Technologies GmbH Systems and methods for adaption of a telephonic audio signal
WO2023169755A1 (en) * 2022-03-07 2023-09-14 Widex A/S Method for operating a hearing aid
CN117093182A (en) * 2023-10-10 2023-11-21 荣耀终端有限公司 Audio playing method, electronic equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090028362A1 (en) * 2007-07-27 2009-01-29 Matthias Frohlich Hearing device with a visualized psychoacoustic variable and corresponding method
US20090103742A1 (en) * 2007-10-23 2009-04-23 Swat/Acr Portfolio Llc Hearing Aid Apparatus
US20130024201A1 (en) * 2007-10-31 2013-01-24 Cambridge Silicon Radio Limited Adaptive tuning of the perceptual model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090028362A1 (en) * 2007-07-27 2009-01-29 Matthias Frohlich Hearing device with a visualized psychoacoustic variable and corresponding method
US20090103742A1 (en) * 2007-10-23 2009-04-23 Swat/Acr Portfolio Llc Hearing Aid Apparatus
US20130024201A1 (en) * 2007-10-31 2013-01-24 Cambridge Silicon Radio Limited Adaptive tuning of the perceptual model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TIWARI, N. ET AL.: "Speech enhancement and multi-band frequency compression for suppression of noise and intraspeech spectral masking in hearing aids", 2013 ANNUAL IEEE INDIA CONFERENCE, 2013, XP055500045, Retrieved from the Internet <URL:http://ieeexplore.ieee.org/document/6726008> [retrieved on 20180105] *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10827265B2 (en) * 2018-01-25 2020-11-03 Cirrus Logic, Inc. Psychoacoustics for improved audio reproduction, power reduction, and speaker protection
US20190230438A1 (en) * 2018-01-25 2019-07-25 Cirrus Logic International Semiconductor Ltd. Psychoacoustics for improved audio reproduction, power reduction, and speaker protection
US10993049B2 (en) 2018-07-20 2021-04-27 Mimi Hearing Technologies GmbH Systems and methods for modifying an audio signal using custom psychoacoustic models
US10966033B2 (en) 2018-07-20 2021-03-30 Mimi Hearing Technologies GmbH Systems and methods for modifying an audio signal using custom psychoacoustic models
US20200027467A1 (en) * 2018-07-20 2020-01-23 Mimi Hearing Technologies GmbH Systems and methods for encoding an audio signal using custom psychoacoustic models
EP3598440A1 (en) * 2018-07-20 2020-01-22 Mimi Hearing Technologies GmbH Systems and methods for encoding an audio signal using custom psychoacoustic models
US10909995B2 (en) * 2018-07-20 2021-02-02 Mimi Hearing Technologies GmbH Systems and methods for encoding an audio signal using custom psychoacoustic models
EP3614379B1 (en) 2018-08-20 2022-04-20 Mimi Hearing Technologies GmbH Systems and methods for adaption of a telephonic audio signal
US10871940B2 (en) 2018-08-22 2020-12-22 Mimi Hearing Technologies GmbH Systems and methods for sound enhancement in audio systems
EP3718476A1 (en) * 2019-04-02 2020-10-07 Mimi Hearing Technologies GmbH Systems and methods for evaluating hearing health
WO2021249611A1 (en) * 2020-06-08 2021-12-16 Huawei Technologies Co., Ltd. A control device for performing an acoustic calibration of an audio device
US11153682B1 (en) 2020-09-18 2021-10-19 Cirrus Logic, Inc. Micro-speaker audio power reproduction system and method with reduced energy use and thermal protection using micro-speaker electro-acoustic response and human hearing thresholds
US11159888B1 (en) 2020-09-18 2021-10-26 Cirrus Logic, Inc. Transducer cooling by introduction of a cooling component in the transducer input signal
WO2023169755A1 (en) * 2022-03-07 2023-09-14 Widex A/S Method for operating a hearing aid
CN117093182A (en) * 2023-10-10 2023-11-21 荣耀终端有限公司 Audio playing method, electronic equipment and computer readable storage medium
CN117093182B (en) * 2023-10-10 2024-04-02 荣耀终端有限公司 Audio playing method, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
WO2018069900A1 (en) Audio-system and method for hearing-impaired
Kates et al. The hearing-aid speech perception index (HASPI)
US8812308B2 (en) Apparatus and method for modifying an input audio signal
US8391212B2 (en) System and method for frequency domain audio post-processing based on perceptual masking
US6934677B2 (en) Quantization matrices based on critical band pattern information for digital audio wherein quantization bands differ from critical bands
US20070239294A1 (en) Hearing instrument having audio feedback capability
JP2009532739A (en) Calculation and adjustment of perceived volume and / or perceived spectral balance of audio signals
EP3457402B1 (en) Noise-adaptive voice signal processing method and terminal device employing said method
CN111161699B (en) Method, device and equipment for masking environmental noise
Edraki et al. Speech intelligibility prediction using spectro-temporal modulation analysis
EP1841284A1 (en) Hearing instrument for storing encoded audio data, method of operating and manufacturing thereof
Kates et al. Comparing the information conveyed by envelope modulation for speech intelligibility, speech quality, and music quality
Huber Objective assessment of audio quality using an auditory processing model
DK2535894T3 (en) Practices and devices in a telecommunications network
Liu et al. STRAIGHT: A new speech synthesizer for vowel formant discrimination
WO2007034375A2 (en) Determination of a distortion measure for audio encoding
US11224360B2 (en) Systems and methods for evaluating hearing health
JP2011141540A (en) Voice signal processing device, television receiver, voice signal processing method, program and recording medium
Arslan Determination of Optimum Parameters for Cochlear Implants Speech Processors by Using Objective Measures
WO2024008928A1 (en) Masking threshold determinator, audio encoder, method and computer program for determining a masking threshold information
Christiansen Digital speech processing in the context of a human auditory model
Houtsma Perceptually Based Audio Coding
WO2017025107A2 (en) Talker language, gender and age specific hearing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17860120

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17860120

Country of ref document: EP

Kind code of ref document: A1