WO2007106384A1 - Music compatible headset amplifier with anti-startle feature - Google Patents

Music compatible headset amplifier with anti-startle feature Download PDF

Info

Publication number
WO2007106384A1
WO2007106384A1 PCT/US2007/006035 US2007006035W WO2007106384A1 WO 2007106384 A1 WO2007106384 A1 WO 2007106384A1 US 2007006035 W US2007006035 W US 2007006035W WO 2007106384 A1 WO2007106384 A1 WO 2007106384A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
audio signal
spl
speech
delta
Prior art date
Application number
PCT/US2007/006035
Other languages
French (fr)
Inventor
David Huddart
Original Assignee
Plantronics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Plantronics, Inc. filed Critical Plantronics, Inc.
Publication of WO2007106384A1 publication Critical patent/WO2007106384A1/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G7/00Volume compression or expansion in amplifiers
    • H03G7/002Volume compression or expansion in amplifiers in untuned or low-frequency amplifiers, e.g. audio amplifiers

Definitions

  • a telephone headset provides a speaker contained within an earpiece positioned over the user's ear.
  • the sound level of the acoustic signal emitted by the speaker should fall within a specified sound intensity range. Above the specified intensity range, the excessive sound level may cause discomfort for the user and/or the user's hearing. Thus, excessively high
  • Excessively high sound levels may be caused by various events. For example, accidental disturbances within a communication connection, such as an amplifier malfunction, intense feedback, incorrect signal source, and/or a phone line shorted to a power line, may cause dramatic increases in the electrical signal level input to a transducer that converts electrical signals to acoustic signals.
  • the transient time for the acoustic signal to reach excessively high levels may be very short, such that a user often does not have sufficient time to move the listening device away from the ear in time to prevent exposure to the high sound levels.
  • a handset user may be able to quickly move the handset speaker away from the ear as the user is typically already holding the handset in the hand, it may take a hands-free headset user longer to bring the hand to the headset in order to move the headset earpiece away from the ear.
  • headsets are particularly suitable for users who are on the telephone for long periods of time, e.g., telemarketers, receptionists, and operators. Thus, because of the extra time required to remove a headset away from the ear and the potentially longer
  • acoustic startle i.e., the involuntary contraction of bodily muscles resulting from unexpected moderate or intense acoustic stimuli with rapid onset.
  • Headsets and other audio output devices often employ audio limiting devices on the receiver input terminals in order to limit the voltage and thus the maximum sound level from the headset receiver.
  • Most conventional audio limiting devices either clip or compress the electrical signal that drives the headset, which prevents the electrical signal from exceeding a specified peak-to-peak or root mean square (rms) voltage.
  • rms root mean square
  • Headset amplifiers may include automatic gain controls. These gain controls
  • Acoustic startle measures in communications systems. Acoustic startle is a well documented and understood phenomena and as its name suggests, it relates to surprise or shock at sudden or unexpected noise. Acoustic startle is a psycho-acoustic effect. It is caused when a sudden increase in sound level is unexpected by the recipient. The levels at which startle can occur are quite low and linked to individual users. Unlike excessively high sound levels, there is currently no established link between acoustic startle and hearing damage. As such it may be classified more as a comfort feature rather
  • an audio output device that limits sounds that exceed a specified sound pressure level threshold and thus prevent discomfort caused by loud sounds. It is also desirable to provide an audio output device that reduces the likelihood and intensity of acoustic startle.
  • headset amplifiers are capable of being connected to either a telecommunications device or an external sound source such as a MP3/CD player or PC, allowing the user to engage in
  • FIG. 1 is a flowchart illustrating the operation of the invention in one example.
  • FIG. 2 illustrates an example of the hardware architecture in one example of the invention.
  • FIG. 3 illustrates a headset amplifier application in one example of the
  • FIG. 4 is a block diagram illustrating an exemplary audio processing system
  • SPL sound pressure level
  • FIG. 5 is a block diagram illustrating an exemplary true-SPL converter employing single band processing.
  • FIG. 6 is a block diagram illustrating an alternative exemplary true-SPL converter employing multi-band processing.
  • FIG. 7 is a block diagram illustrating an exemplary SPL incident detector.
  • FIG. 8 is a flowchart illustrating an exemplary process for limiting the sound
  • FIG. 9 is a graph illustrating an exemplary anti-startle boundary in a SPL increase vs. rise time variable space.
  • FIG. 10 is a graph illustrating the anti-startle boundary in the SPL increase
  • FIG. 11 is a block diagram illustrating an exemplary delta incident detector.
  • FIG. 12 is a graph illustrating a delta detector response measured using the
  • FIG. 13 is a flowchart illustrating an exemplary process for limiting the delta value.
  • FIG. 14 are graphs illustrating an exemplary measured delta limiter response.
  • FIG. 15 are graphs illustrating an exemplary combined SPL and delta limiter response.
  • the present invention provides a solution to the needs described above through an inventive method and apparatus for processing audio signals for music listening and speech communications.
  • the present invention provides a method and apparatus for processing an audio signal.
  • the method and apparatus may be used in systems such as those that play sound via an audio device located close to the listener's ear or via a loudspeaker or other
  • an audio signal is received.
  • the audio signal is classified as a speech signal or a music signal.
  • the audio signal is further processed responsive to whether the audio signal is a speech signal or a music signal. If the audio signal is a speech signal, the processing includes anti-startle processing.
  • the classification and signal processing occurs within a headset amplifier.
  • the headset amplifier and associated headset may be used with any electronic device where speech or music may be output and there has not been a previous classification.
  • the signal processing is performed within a host personal computer, such as in voice over Internet Protocol (VoIP) applications where the headset is directly connected to the personal computer.
  • VoIP voice over Internet Protocol
  • FIG. 1 is a flow chart illustrating the operation of the invention in one
  • an audio signal is received for processing.
  • the audio signal is classified as a speech signal or a music signal.
  • the audio signal is examined to determine whether it is a music signal or a speech signal. If yes, at block 16 anti-startle processing is performed on the speech signal. If no, the anti-startle processing at block 16 is bypassed and various signal processing of the audio signal is performed at block 18.
  • signal processing at block 18 may include SPL processing as described below in reference to FIG. 4.
  • the audio signal is a music signal, no further signal processing is performed and the audio signal is output to the user.
  • the received audio signal may be continuously monitored, with the
  • audio signal is a speech signal.
  • Additional signal processing may utilize audio signal enhancing plug-ins such as those available from SRS Labs or FX Sound.
  • the classification of the audio signal as a speech signal or a music signal at block 12 may be performed using a variety of signal processing techniques. Ih one example, spectral analysis is used. A fast Fourier transform DSP algorithm analyzes the audio signal received by the amplifier in different frequency bands. For example, the signal may be analyzed in half octave frequency bands. From this analysis, the spectral power density of differing bands is compared. The spectral characteristics of a speech signal tend to demonstrate high peaks in single sub-octave bands relative to adjacent bands and most energy is in the frequency range between 300 and 3000 Hz. Conversely, a music signal will tend to have similar energy in adjacent bands (averaged over a short period) and significant energy above 3000Hz and below 300Hz. An algorithm based on this technique provides a continuous probability (0 to 100%) of the current signal being
  • Additional classification techniques include Gaussian mixture model, Gaussian model classification and nearest-neighbor classification. These techniques use statistical analyses of underlying features of the audio signal, either in a long or short period of measurement time, resulting in separate long-term and short-term features. [0034] Once the classification is made, the switch from a speech classification to a music classification and vice- versa occurs at a predetermined threshold. The assessment of speech verses music is a continuous process. For any particular example
  • a threshold algorithm can be derived.
  • the threshold has a time and hysteresis factor built in that prevents undesirable hunting between the two states.
  • the switching characteristic may have a soft transition so as not to be noticeable to the user except in that the benefits of this invention results in good music fidelity plus protection from startle during speech communications.
  • the system 20 typically includes at least one processing unit 22 and memory 32.
  • Processing unit 22 interfaces with memory 32 and
  • Processing unit 22 processes information and instructions used by system 20.
  • Memory 32 is any type of memory that can be used to store code and data for processing unit 22.
  • memory 32 may include volatile memory 28 (such as RAM), non-volatile memory 30 (such as ROM, flash memory, etc.) or some combination of the two.
  • volatile memory 28 such as RAM
  • non-volatile memory 30 such as ROM, flash memory, etc.
  • the communication connection 24 may include wired media such as a direct-wired connection, and wireless media such as RF.
  • the device on which system 20 is implemented may have a variety of features and functionality.
  • the implementation device may utilize several forms of computer storage media.
  • the computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Memory 32 may be incorporated or integrated with the computer storage media of the implementation device.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology.
  • the computer storage media includes CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the implementation device on which system 20 is implemented.
  • system 20 may be implemented on a headset amplifier 34.
  • system 20 is independent of the electronic device to which it is attached and can therefore be used with a variety of electronic devices.
  • the headset amplifier 34 may have multiple inputs
  • a headset 36 is connected to a headset amplifier 34 which, in turn, is connected to an electronic device 38.
  • the electronic device 38 may be a telephone, digital music player, PDA, or an integrated device combining
  • the headset 36 includes at least one speaker and a microphone.
  • the headset amplifier 34 is generally used to amplify signals to or from electronic device 38.
  • the headset amplifier 34 receives the audio signal from electronic device 38, limits the maximum amplitude of the audio signal to improve user safety, and provides a power output to drive the speaker of the headset 36.
  • the headset amplifier 34 may provide power for the headset microphone, receives the audio signal from the microphone, and modifies the gain of the audio signal from the microphone.
  • an electret microphone is used, which requires that headset amplifier 34 supply DC power of a few volts at between 15 and several hundred
  • headset amplifier 34 includes system 20 for performing digital signal processing on the audio signal in addition to amplification.
  • the headset amplifier 34 may provide automatic gain control to protect the user by limiting the maximum volume level output to the user.
  • Headset amplifier 34 may receive variety from a variety of sources. For example, it may draw current from electronic device 38. Headset amplifier 34 may also be powered with a battery or from power derived from the USB port of a PC or from an AC wall outlet using a DC power supply. [0042] Referring again to FIG. 1, in an example of the invention the anti-startle processing at block 16 comprises systems and methods for a sound pressure level limiter with an anti-startle feature as described and illustrated in reference to FIGS. 4-14. In a
  • block 16 processing includes only anti-startle processing and does not include sound pressure level limiting processing.
  • the anti-startle feature generally involves detecting fast rise time signals that are likely to cause acoustic startle and slowing the rise time of such signals.
  • the anti- startle feature may be implemented with a delta incident detector for detecting delta
  • a delta limiter acoustic incidents that exceed a predetermined acoustic startle boundary
  • the predetermined acoustic startle boundary may be a function of signal rise time and sound pressure level (SPL) increase.
  • the delta incident detector may detect
  • the estimated true SPL may be measured with a microphone located at, or close to, the chosen datum point (e.g. a microphone located in the headset receiver assembly).
  • the true SPL may be estimated based on the electrical signal that drives the headset receiver and the measured receiving frequency response of the transducer.
  • An SPL limiter may also be implemented with or without the anti-startle feature to determine an SPL gain in response to detecting an SPL acoustic incident that exceeds a predetermined SPL threshold, the detection of the SPL acoustic incident may be based on the estimated true SPL.
  • the anti-startle gain can be associated with an anti-startle gain limit and release time.
  • the delta limiter may set the anti-startle gain to the anti-startle gain limit and
  • delta limiter release phase in which the delta limiter increases the anti-startle gain over a period of time associated with the anti-startle release time until the anti-startle gain reaches unity (1).
  • FIG. 4 is a block diagram illustrating an exemplary audio processing system 50 implementing sound pressure level (SPL) limiting and anti-startle features.
  • SPL sound pressure level
  • the systems and methods described herein may be utilized for audio devices located close to the listener's ear such as a headset, handset, mobile phone, headphone, or earphone, as well as audio devices located at a distant to the listener's ear such as loudspeakers or other transducers located distant from the listener.
  • SPL sound pressure level
  • the audio processing system 50 generally includes a true SPL estimator or processor 52, an SPL incident detector 54, an SPL limiter 56, a delta incident
  • ADC analog to digital converter
  • DAC digital to analog converter
  • the audio signal passes only through the delay element 62 and the amplifier 64 and the remaining components, i.e., the true SPL processor 52 and the SPL and delta incident detectors and limiters 54-60, implement signal analysis and gain control functions.
  • the true SPL processor 52 estimates the sound pressure level at the user's ear,
  • the audio processing system 50 uses the estimated true SPL rather than the electrical signal level delivered to the headset receiver as the basis for SPL and delta limiting.
  • Such use of the true SPL (or estimated true SPL) helps to ensure that the delta
  • the SPL incident detector 54 receives the (estimated) true SPL waveform and measures the mean square sound pressure level to detect for an SPL incident.
  • the SPL limiter 56 calculates the SPL gain reduction depending on the results of the SPL incident detection so as to limit the sound pressure level below a predetermined SPL threshold.
  • the SPL gain reduction Gainsp L is then applied by the VGA 64 or a digital gain-control block (not shown) that performs the same function as the VGA in the digital domain.
  • the delta detector 58 detects acoustic incidents that have a high likelihood of causing acoustic startle in the user, based on the rise time and amount of increase in the sound pressure level. For example, delta incident detector 58 may base its determination on whether the combination of the increase in the sound level, suddenness of the increase in the sound level, and the absolute sound level is likely to cause acoustic startle in the user.
  • the delta limiter 60 then generates a time- varying control signal for the VGA 64 to slow the rise time of the increase in the sound pressure level.
  • the time-varying gain control of the delta limiter 60 may use a
  • the combination of the delta detector 58 and the delta limiter 60 thus facilitates in preventing acoustic startle in the user, i.e., the anti-startle feature.
  • element 62 applies a short look-ahead delay, typically a few milliseconds, to ensure that
  • GainspL and Gairideita are applied slightly before they are actually needed so as to prevent any loud glitches occurring as the VGA 64 responds to increases in signal level.
  • the audio processing system 50 uses the true SPL instead of electrical signal level as the basis for SPL and delta limiting.
  • True SPL processing enables consistent limiting at the same sound level regardless of changes in the signal spectrum or audio transducer. Such consistent limiting at the same sound level is particularly applicable to headsets, handsets, etc. that are used in a fixed position close to
  • true SPL processing may also be used by audio processing systems in applications with
  • True SPL is measured at a chosen datum point such as at an eardrum reference point (DRP), ear reference point (ERP) or equivalent open-field sound pressure level.
  • the true SPL may be directly measured using a microphone located at, or close to, the chosen datum point.
  • a microphone mounted in the headset receiver assembly may directly measure the SPL at ERP.
  • the true SPL may be estimated based on a measurement made at a different point.
  • the SPL at DRP may be estimated from the SPL at ERP by passing the output signal from a probe microphone located at ERP through a filter whose frequency response is equal to the ERP-to-DRP
  • the true SPL may be measured with a probe microphone located at or close to the datum point in some applications, in many cases, such direct measurement of the true SPL may be impractical or difficult.
  • the audio processing system 50 typically employs the true SPL processor 52 to estimate or calculate the true SPL from
  • the headset's receiving frequency response can be measured and combined with an A-weighting response to form a composite true SPL estimation filter.
  • the true SPL estimation filter may also include DAC gain and power amplifier gain as a function of frequency.
  • the true SPL estimation filter can be a combination of the headset's receiving frequency response with the DAC and power amplifier frequency response and the A-weighting response.
  • the true SPL estimation filter models the electroacoustic transmission path between the SPL limiting device and the user's eardrum or other chosen datum point.
  • the true SPL estimation filter processes the digital signal driving the DAC to estimate the A-weighted sound pressure waveform that is present at the user's eardrum, from which the A- weighted SPL may be calculated.
  • FIGS. 5 and 6 are block diagrams illustrating exemplary true SPL estimators 52a, 52b employing single band processing and multi-band processing, respectively.
  • single band true SPL estimator 52a as shown in FIG. 5 implements an electrical or digital
  • the multi-band true SPL estimator 52b as shown in FIG. 6 uses a separate gain
  • the true SPL estimator 52a, 52b may include the true SPL estimator 52a, 52b.
  • true SPL estimation filter 52 processes the digital signal driving the DAC, or the electrical signal driving the headset's receiver, to estimate the A-weighted SPL that is present at the wearer's eardrum or other chosen datum point.
  • the estimator 52 can implement frequency-weighted SPL measurement and limiting but cannot distinguish between narrowband and wideband signals of the same power.
  • the electrical signal is split into multiple frequency bands fi, f 2 , f 3 , ..., f n , using an analysis filter bank 82 or block transform.
  • frequency-weighted limiting may be implemented by replacing the A-weighting frequency response 78, 88 with the alternative frequency response that is required.
  • Multi-band processing 52b allows independent narrowband and whole band SPL measurements. For example,
  • multi-band processing 52b can be configured to limit high frequency narrowband signals to a lower level than single band processing, if both systems are configured to limit the whole-band SPL to the same level, for example.
  • accuracy of SPL limiting depends on the accuracy of the
  • SPL measurement When SPL is estimated from the electrical signal driving the headset or loudspeaker, one factor affecting the accuracy of the SPL measurement is the accuracy with which the receiving frequency response of the transducer is known. Very accurate SPL calculation may be achieved if the receiving frequency response for the specific headset in use has been measured. Less accurate SPL calculation may be achieved if an average frequency response for the headset type or model is used. An even less accurate SPL calculation results if a generic average frequency response for several headset models is used.
  • the true SPL processor 52 outputs an estimated true SPL waveform P to both the SPL incident detector 54 and the delta incident detector 58.
  • waveform P is a waveform whose instantaneous level represents the sound pressure (e.g. Pascals, A-weighted) at the selected acoustic reference point, e.g., at the DRP.
  • the SPL incident detector 54 detects when the SPL exceeds a predetermined SPL threshold
  • FIG. 7 is a block diagram illustrating an exemplary SPL incident detector 54 for an audio processing system that uses single-band true SPL processing.
  • the SPL incident detector 54 includes a squarer X 2 102, a lowpass filter 104 with an associated time
  • the time constant TSPL is approximately 20 ms and
  • the lowpass filter 104 may be a first-order infinite impulse response (HR) filter implementing: y n — A y n -i + (1-A) x n- i, where:
  • f s is the sampling frequency, which is typically 8kHz or 16kHz for a telecommunications device but may be any other suitable frequency.
  • X n is the filter input (n th sample in the time series), and y n is the filter output (n th sample in the time series).
  • An SPL incident detector 54 for an audio processing system using multi-band true SPL processing would employ a narrowband SPL incident detector 54 similar to that shown in FIG. 7 for each frequency band.
  • a whole band SPL incident detector may also be implemented using the mean square sum of the sub-band signal levels.
  • narrowband SPL limiting thresholds are typically lower than the whole band SPL limiting threshold and may vary with frequency.
  • the output of the SPL incident detector 54 drives the SPL limiter 56 which in turn reduces the SPL generated in the headset when SPL incidents are detected by controlling the gain of the VGA.
  • the SPL limiter 56 reduces the SPL generated in the headset when SPL incidents are detected by controlling the gain of the VGA.
  • a fixed attenuation AS PL may apply a fixed attenuation AS PL , e.g., 4OdB, with attack time tspL_attack and release time tspL_reiease-
  • a fixed attenuation A SPL of approximately 4OdB is generally sufficient to reduce the loudest sounds that can occur on a telephone network to a comfortable level at or below normal speech level while still allowing the user to detect that an acoustic incident has occurred.
  • SPL limiting threshold SPLmax of 100 dB (A) reduces such signals to a minimum level of 60 dB (A), which is clearly audible in most situations.
  • the SPL limiter 56 may be implemented in various other suitable ways. Merely as an example, rather than applying a fixed 40 dB attenuation, the SPL limiter 56 may apply an attenuation equal to the amount by which the input signal exceeds the SPL incident threshold. As is evident, various other implementations of the SPL limiter 56 may be employed to reduce the SPL below the SPL incident threshold. [0060] Attack and release may have logarithmic rather than linear or exponential characteristics as a human listener tends to perceives logarithmic attacks and releases as
  • attack time tspL_attack is approximately 50 ms and the release time tsp Lje ie ase is approximately 250ms.
  • a non- instantaneous attack time tspL_attac k ensures that the natural peaks of speech are generally unaffected even when listening to loud speech with an rms signal level close to the
  • FIG. 8 is a flowchart illustrating an exemplary SPL limiting process 108 for
  • the SPL limiting the sound pressure level as performed by the SPL limiter 56 limits the sound pressure level as performed by the SPL limiter 56.
  • process 108 shown is performed by the SPL limiter 56 for each new audio sample.
  • the SPL limiter enters a limiting phase.
  • the SPL limiter determines whether the SPL gain Gainsp L exceeds a predetermined SPL gain limit GainsPLjimit at decision block 112. If so, then the SPL limiter enters an attack phase at block 114 and sets the
  • GainspL GainspL ksPL_ a ttack
  • kspL_attack is the SPL attack constant
  • each iteration of the SPL limiting process 108 decreases the SPL gain Gainsp L until it reaches the
  • predetermined SPL gain limit GainspLjimit Once the SPL gain GainspL has reached the predetermined SPL gain limit GainspLjimi t , i.e., the SPL gain Gain S p L is equal to or less than the predetermined SPL gain limit Gains PLj i m i t as determined at decision block 112,
  • the SPL gain Gainsp L is set to equal to the predetermined SPL gain limit GainspLjimit at block 116, i.e., steady state attenuation by the SPL limiter.
  • SPL limiter determines whether the SPL gain Gainsp L is less than unity (1) at decision block 118. If so, the SPL limiter is in a release phase and, at block 120, the SPL limiter
  • GainspL GainspL k S pL_ re ieas-> where k S pL release is the SPL release constant:
  • each iteration of the SPL limiting process 108 increases the SPL gain GainspL until it reaches unity (1), i.e., the release phase is complete and no attenuation is applied by the SPL limiter.
  • the SPL gain Gainsp L has reached or exceeded unity, as determined at decision block 118, the SPL gain Gainsp L is set to equal to unity (1) at block 122, i.e., no attenuation is applied by the SPL limiter.
  • the attack time tsp L _att ack is approximately 50 ms
  • the release time tspL j eiease is approximately 250 ms
  • the SPL gain limit GainspLjimit is approximately 0.01, i.e. 4OdB attenuation.
  • the audio processing system 50 also provides an anti-startle feature by implementing the delta detector 58 for detecting changes in the sound level that are deemed to be likely to cause acoustic startle and the delta limiter 60 for limiting such changes in the sound level.
  • acoustic startle is a complex and widely variable phenomenon that depends on a range of environmental and psychological conditions, acoustic startle is generally not amenable to simple characterization.
  • acoustic startle is typically not characterized by defining specific limits for absolute increases in SPL and/or rate of increases in the sound level that cause a startle
  • acoustic startle typically include faster rising acoustic stimuli increase the intensity of an acoustic startle, larger increases in sound level increase both the likelihood and intensity of acoustic startle, and under some conditions, sound levels as low as 6OdB SPL are capable of causing acoustic startle.
  • sound level increase and the rise time of that
  • acoustic startle detection algorithm implemented by the delta incident detector 58 whose parameters can be tuned empirically to suit particular operating environments.
  • FlG. 9 is a graph illustrating an exemplary anti-startle boundary defined in an SPL increase versus rise time variable space. The upper left portion above the anti-startle boundary in the variable space, representing large increases in SPL with relatively fast
  • rise times generally corresponds to high probability and likely intensity of acoustic startle.
  • the anti-startle boundary is such that above the boundary, the probability and likely intensity of acoustic startle is deemed to be unacceptable.
  • longer rise time signals require a greater total sound level increase to cause acoustic startle than fast rise time signals.
  • the actual gradient of the delta detector boundary maybe determined empirically, for example. (0065] However, small increases, i.e., delta, in sound level generally do not cause acoustic startle regardless of the rise time.
  • the approximate minimum delta that may cause acoustic startle is approximately 15 dB.
  • FIG. 10 is a graph illustrating the anti-startle boundary in the SPL increase versus rise time variable space of FIG.9 with the minimum delta requirement introduced. It is noted that various alternative values for the minimum delta may be used and may be fine tuned by subjective testing.
  • the delta incident detector 58 may detect delta incidents based on the anti- startle boundary as shown in FIG.' 10.
  • the delta incident detector 58 may also take into account that the resumption of speech at the previous sound level after a short period of silence is unlikely to cause acoustic startle even if such resumption results in a very large increase in sound level relative to the preceding silence.
  • the additional condition for the delta incident detector 58 to be triggered is that the instantaneous sound level exceeds the previous active speech level by a certain resumption of speech threshold.
  • FIG. 11 is a block diagram illustrating an exemplary delta incident detector 58. As shown, the delta incident detector 58 receives the estimated true SPL waveform P output from the true SPL processor 52. The delta incident detector 58 detects delta
  • the delta incident detector 58 includes a squarer X 2 132, fast, medium and slow lowpass filters 134, 136, 138 with associated
  • Each of the lowpass filters 134, 136, 148 maybe a first order HR filter similar to that used in the SPL incident detector 54 as described above with
  • the lowpass filters 134, 136, 138 are approximately 5 ms, 50 ms, and 5 s, respectively.
  • the slow lowpass filter 138 measures the recent average speech level and may be selectively enabled and disabled. Specifically, when either an SPL incident or a delta incident is detected, the slow lowpass filter 138 is disabled such that the slow lowpass filter 138 does not perform filter update calculations and the current filter output state is frozen and used until the slow lowpass filter 138 is re-enabled. Such a configuration helps to prevent abnormal signal levels during acoustic incidents from affecting the average speech level estimation. However, the slow exponential decay with time
  • the first delta detection threshold comparator 140 compares the ratio of the
  • the second delta detection threshold comparator 142 compares the ratio of the mean square sound levels P f 2 / P s 2 output from the fast and slow lowpass filter 134, 138 with a second
  • the predetermined first (fast/medium) and second (fast/slow) delta detection thresholds Throi and Thr D2 are 5.6 (7.5 dB) and 31.6 (15 dB), respectively.
  • the first delta detection threshold comparator 140 implements the anti-startle boundary such as that shown in BIG. 10. Thus if the first comparatorl40 determines that the first delta threshold ThrDi is not exceeded, then a delta incident is not detected. On the other hand, if the first delta threshold is exceeded, i.e., the anti-startle boundary is crossed, the second comparator 142 ensures that resumption of speech (or other audio) at or close to the previous sound level after a short pause does not trigger delta (startle) incidents.
  • FIG. 12 is a graph illustrating an exemplary measured response of the delta incident detector 58 for the exemplary time constant and threshold values presented
  • the minimum delta plateau level, the knee-point and the slope are all configurable by changing the filter time constants and/or the first delta detection
  • the delta incident detector detects a delta incident
  • the delta limiter 60 when triggered, applies a fixed attenuation with an instantaneous (or near instantaneous) attack and a slow release.
  • the slow release may be logarithmic to ensure that the release sounds gradual to a human listener.
  • Such delta limit processing slows the rise time of signals with fast rise times, thus reducing the likelihood of acoustic startle.
  • the delta limiter 60 may have an attack time of approximately 1000/f s ms or less (where f s is the sampling frequency), a release time t de i taje i ease of approximately 250 ms, and an initial attenuation of approximately 4OdB, i.e., delta gain limit
  • FIG. 13 is a flowchart illustrating an exemplary process 150 for slowing the rise time as performed by the delta limiter 60.
  • the delta limiting process 150 shown is performed by the delta limiter 60 for each new audio sample.
  • the delta limiter determines if the delta incident detector has detected a delta incident. If so, the delta gain Gainaeita is immediately set to the delta gain limit Gaindeitajimit at block 154 so that the attack time of the attenuation applied by the delta limiter is instantaneous or near instantaneous.
  • any delay in applying the attenuation by the delta limiter is introduced by the short processing delays attributable to the true SPL processor, the delta detector and the fact that the output of a digital audio system is sampled and thus
  • the delta limiter determines if the delta gain Gain ⁇ i e i ta is less than unity (1) at decision block 156. If
  • the delta limiter is in a delta release phase and, at block 158, the delta gain Gaindeita is increased to:
  • Gaindeita Gaindeita kdelta_release> where kd e i t a_reiease is the delta release constant:
  • f s is the sampling frequency (Hz). While the acoustic processing system remains in the delta release phase, each iteration of
  • the delta limiting process 150 increases the delta gain Gairi d ei ta until it reaches unity (1), i.e., no attenuation. Once the delta gain Gaining has reached or exceeded unity (1) as
  • the delta gain Gainaeita is set to unity (1) at block 160, i.e., no attenuation applied by the delta limiter.
  • FIG. 14 are graphs illustrating an exemplary measured delta limiter response and FIG. 15 are graphs illustrating an exemplary combined SPL and delta limiter response. These graphs represent the response from a multi-band test system employing
  • the input level graphs use different vertical scales and the input signal level in FIG. 15 is approximately ten times greater than the input signal level in FIG. 14.
  • the output level graph uses a different vertical scale from that used for the input level graph. If the output and input level graphs used the same vertical scale, the details on the output level graph would not be visible due to the 8OdB (10,000 times) attenuation provided by the combination of the SPL limiter and the delta limiter.
  • the delta incident detector triggers the delta limiter to apply 4OdB of attenuation nearly instantaneously.
  • the delta limiter provides instantaneous or near instantaneous attenuation and then enters its release phase with a slow release (rise) time such that the
  • delta-limited output signal has a slow rise time
  • the SPL incident detector detects the same acoustic incident shortly after the delta incident detector and causes the SPL limiter to apply an additional 4OdB of
  • the SPL incident detector is delayed relative to the delta incident detector due in part to the longer time constant used by the SPL incident detector (20ms for the SPL incident detector versus 5ms for the delta incident detector), and also due in part to the SPL detector's internal
  • a delta incident can be triggered by a relatively small increase in SPL, e.g., on the order of 15dB.
  • the cumulative attenuation peaks at approximately 8OdB with the delta and SPL limiters each contributing approximately 40 dB attenuation.
  • the SPL limiter applies its 4OdB of steady state attenuation
  • the SPL limiter provides (near) instantaneous limiting
  • the SPL limiter can use a relatively slow attack time so as to prevent the SPL limiter from clipping normal peaks of the speech waveform, even at rms speech levels close to the limiting threshold, which may result in short-term peaks in the speech waveform causing the threshold to be exceeded for a few milliseconds.
  • the delta incident detector may be tuned so as to not trigger during continuous speech with short periods of silence. The net effect of the SPL and delta incident detectors and limiters is that loud and/or potentially startling acoustic incidents are avoided but undesirable distortion of speech (or other audio) is reduced or minimized.
  • the combination of the delta and SPL limiters complement each other so as to provide better acoustic comfort and less degradation of speech signals.
  • the combination of the true SPL processor 52, detectors 54, 58, and limiters 56, 60 introduces a short delay.
  • the look-ahead delay element 62 is provided in the signal path so that the gain control applied by the variable gain amplifier (VGA) 64 is applied slightly before the acoustic incident that requires attenuation, thus preventing short duration glitches on the system output when acoustic incidents occur on the input.
  • VGA variable gain amplifier
  • the processing performed by the components of the audio processing system 50 is carried out in the digital domain so that the VGA 64 is a digital gain block whose gain Gainv GA is the product of the delta limiter gain Gainaeita and the SPL limiter gain Gainsp L -
  • the audio processing system 50 provides several features including improved accuracy of SPL at which limiting occurs with the use of the true SPL processor 52, an anti-startle feature with the use of the delta incident detector and limiter 58, 60 by instantaneously limiting acoustic incidents with fast risetime and high intensity, and reduced distortion of speech (or other audio) whose rms level is close to the limiting
  • the audio processing system 50 thus provides better acoustic comfort and less degradation of speech signals.

Abstract

A method and apparatus for processing an audio signal. In one example of the invention, an audio signal is received. The audio signal is classified as a speech signal or a music signal. The audio signal is further processed responsive to whether the audio signal is a speech signal or a music signal. If the audio signal is a speech signal, the processing includes anti-startle processing.

Description

MUSIC COMPATIBLE HEADSET AMPLIFIER WITH ANTI- STARTLE FEATURE
BACKGROUND
[0001] Proper control of acoustic signal levels in communications and other audio
output devices is desirable to ensure high quality audio output and hearing comfort to the users. For example, a telephone headset provides a speaker contained within an earpiece positioned over the user's ear. To ensure acoustic safety and high acoustic quality, the sound level of the acoustic signal emitted by the speaker should fall within a specified sound intensity range. Above the specified intensity range, the excessive sound level may cause discomfort for the user and/or the user's hearing. Thus, excessively high
sound levels are of particular concern in communication and other audio devices such as
telephone handsets and headsets and other listening devices that position a speaker near the user's ear.
[0002] Excessively high sound levels may be caused by various events. For example, accidental disturbances within a communication connection, such as an amplifier malfunction, intense feedback, incorrect signal source, and/or a phone line shorted to a power line, may cause dramatic increases in the electrical signal level input to a transducer that converts electrical signals to acoustic signals. The transient time for the acoustic signal to reach excessively high levels may be very short, such that a user often does not have sufficient time to move the listening device away from the ear in time to prevent exposure to the high sound levels. Although a handset user may be able to quickly move the handset speaker away from the ear as the user is typically already holding the handset in the hand, it may take a hands-free headset user longer to bring the hand to the headset in order to move the headset earpiece away from the ear. Furthermore, headsets are particularly suitable for users who are on the telephone for long periods of time, e.g., telemarketers, receptionists, and operators. Thus, because of the extra time required to remove a headset away from the ear and the potentially longer
periods of headset usage, headset users maybe particularly vulnerable to exposure to excessively high sound levels caused by sudden or constant loud audible signals. [0003] Many countries have legislation limiting the maximum sound pressure level (SPL) that telephone equipment, including headsets, may produce. Noise exposure legislation is intended to prevent noise-induced hearing loss. The legal maximum SPL is generally relatively high, e.g., approximately 118dB SPL or 118dB(A) SPL, and is extremely loud when compared with normal telephone speech. Thus, telephone handsets and headsets that comply with the law can nonetheless cause user discomfort due to loud sound levels and may also startle the telephone or headset user due to sudden increases in
the sound level from relatively quiet to relatively loud.
[0004] Reducing or removing sounds that are significantly louder than normal speech, even those sounds below the legal limits, may help enhance the comfort of telephone or headset users. User comfort may also be improved by preventing acoustic startle, i.e., the involuntary contraction of bodily muscles resulting from unexpected moderate or intense acoustic stimuli with rapid onset. Li a quiet environment, even sound levels as low as 5OdB SPL, similar to or below normal telephone speech levels, can cause acoustic startle.
[0005] Headsets and other audio output devices often employ audio limiting devices on the receiver input terminals in order to limit the voltage and thus the maximum sound level from the headset receiver. Most conventional audio limiting devices either clip or compress the electrical signal that drives the headset, which prevents the electrical signal from exceeding a specified peak-to-peak or root mean square (rms) voltage. However the sound pressure level produced by the headset is determined at least in part by the
receiving sensitivity of the headset, which in turn depends on the headset model and can generally vary significantly with frequency. Thus prior art methods for clipping or compressing the electrical signal require that these worst case tolerances are accounted for, which may sometimes result in lower overall levels than are necessary or desirable. 10006] Headset amplifiers may include automatic gain controls. These gain controls
actively control the signal levels to maintain comfort to the listener as well as assist in meeting TWA (time weighted average) noise exposure limits imposed by regulatory bodies such as OSHA and EU Directives. More recently, there has been a demand for
"anti-startle" measures in communications systems. Acoustic startle is a well documented and understood phenomena and as its name suggests, it relates to surprise or shock at sudden or unexpected noise. Acoustic startle is a psycho-acoustic effect. It is caused when a sudden increase in sound level is unexpected by the recipient. The levels at which startle can occur are quite low and linked to individual users. Unlike excessively high sound levels, there is currently no established link between acoustic startle and hearing damage. As such it may be classified more as a comfort feature rather
than a health and safety issue. Nevertheless, some users view acoustic startle as safely
related.
[0007] Thus, it is desirable to provide an audio output device that limits sounds that exceed a specified sound pressure level threshold and thus prevent discomfort caused by loud sounds. It is also desirable to provide an audio output device that reduces the likelihood and intensity of acoustic startle.
[0008] It is common practice for communications workers in offices, for example, to wish to listen to music whilst not engaged on a telephone call. In the prior art, this has been achieved by using a headset and separate headphones. More recently, headset amplifiers are capable of being connected to either a telecommunications device or an external sound source such as a MP3/CD player or PC, allowing the user to engage in
speech communications or listen to music with a single headset and headset amplifier.
However, processing difficulties arise from the fact that speech communications signals and music signals have different characteristics. Acoustic startle is unlikely when listening to music, as music is rhythmic, predictable and pleasurable. For these reasons, it is not necessary to apply anti-startle protection to a music signal. Unfortunately, level limiting anti-startle processing that is effective on a speech communication signal has detrimental effects on the quality of a music signal to the point that it would not be pleasurable and certainly unacceptable to the user. For example, when applied to a music signal anti-startle processing will suppress fast attack notes with resulting unpleasant compression of the sound. This is because the very measures used in anti-startle processing to reduce "shock and surprise" are the very characteristics of some music that make it pleasurable. Clarity and "presence" of music will be dulled and the whole listening experience would be unacceptable.
[0009] Thus, improved methods and systems capable of processing both speech communication signals and music signals are needed.
DESCRIPTION OFTHE DRAWINGS [OOIOJ The features and advantages of the apparatus and method of the present invention will be apparent from the following description in which:
[0011] FIG. 1 is a flowchart illustrating the operation of the invention in one example.
[0012] FIG. 2 illustrates an example of the hardware architecture in one example of the invention.
[0013] FIG. 3 illustrates a headset amplifier application in one example of the
invention.
[0014] FIG. 4 is a block diagram illustrating an exemplary audio processing system
implementing sound pressure level (SPL) limiting and anti-startle features.
[0015] FIG. 5 is a block diagram illustrating an exemplary true-SPL converter employing single band processing.
[0016] FIG. 6 is a block diagram illustrating an alternative exemplary true-SPL converter employing multi-band processing.
[0017] FIG. 7 is a block diagram illustrating an exemplary SPL incident detector.
[0018] FIG. 8 is a flowchart illustrating an exemplary process for limiting the sound
pressure level.
[0019] FIG. 9 is a graph illustrating an exemplary anti-startle boundary in a SPL increase vs. rise time variable space.
[0020] FIG. 10 is a graph illustrating the anti-startle boundary in the SPL increase
versus rise time variable space of FIG. 9 with a minimum delta requirement introduced.
[0021] FIG. 11 is a block diagram illustrating an exemplary delta incident detector. [0022] FIG. 12 is a graph illustrating a delta detector response measured using the
exemplary delta incident detector of FIG. 11.
[0023] FIG. 13 is a flowchart illustrating an exemplary process for limiting the delta value.
[0024] FIG. 14 are graphs illustrating an exemplary measured delta limiter response.
[0025] FIG. 15 are graphs illustrating an exemplary combined SPL and delta limiter response.
DESCRIPTION OF SPECIFIC EMBODIMENTS
[0026] The present invention provides a solution to the needs described above through an inventive method and apparatus for processing audio signals for music listening and speech communications.
[0027] Other embodiments of the present invention will become apparent to those
skilled in the art from the following detailed description, wherein is shown and described only the embodiments of the invention by way of illustration of the best modes contemplated for carrying out the invention. As will be realized, the invention is capable of modification in various obvious aspects, all without departing from the spirit and scope
of the present invention. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not restrictive. The data structures and code described in this detailed description are typically stored on a computer readable storage medium, which may be any device or medium that can store code and/or data for use by a
computer system. Furthermore, although software code or components are described in certain instances, those skilled in the art will recognize that such maybe equivalently replaced by firmware and hardware components. For purpose of clarity, details relating to technical material that is known in the technical fields related to the invention have not been described in detail so as not to unnecessarily obscure the present invention. [0028] The present invention provides a method and apparatus for processing an audio signal. The method and apparatus may be used in systems such as those that play sound via an audio device located close to the listener's ear or via a loudspeaker or other
transducer located distant from the listener are disclosed. In one example of the invention, an audio signal is received. The audio signal is classified as a speech signal or a music signal. The audio signal is further processed responsive to whether the audio signal is a speech signal or a music signal. If the audio signal is a speech signal, the processing includes anti-startle processing. In one application of the invention, the classification and signal processing occurs within a headset amplifier. In this application, the headset amplifier and associated headset may be used with any electronic device where speech or music may be output and there has not been a previous classification. In a further application of the invention, the signal processing is performed within a host personal computer, such as in voice over Internet Protocol (VoIP) applications where the headset is directly connected to the personal computer. [0029] The present invention permits listening to both music and speech communications while complying with regulatory requirements for acoustic safety and reducing or eliminating startle when appropriate. The signal processing performed on the audio is automatically selected invisibly to the user based on whether the audio signal is
classified as music or speech communications. The present invention removes user temptation to bypass headset acoustic safety devices in order to listen to music. [0030] FIG. 1 is a flow chart illustrating the operation of the invention in one
example of the invention. At block 10, an audio signal is received for processing. At block 12, the audio signal is classified as a speech signal or a music signal. At block 14, the audio signal is examined to determine whether it is a music signal or a speech signal. If yes, at block 16 anti-startle processing is performed on the speech signal. If no, the anti-startle processing at block 16 is bypassed and various signal processing of the audio signal is performed at block 18. For example, signal processing at block 18 may include SPL processing as described below in reference to FIG. 4. Alternatively, if the audio signal is a music signal, no further signal processing is performed and the audio signal is output to the user. The received audio signal may be continuously monitored, with the
default setting that the audio signal is a speech signal. Additional signal processing may utilize audio signal enhancing plug-ins such as those available from SRS Labs or FX Sound.
[0031] The classification of the audio signal as a speech signal or a music signal at block 12 may be performed using a variety of signal processing techniques. Ih one example, spectral analysis is used. A fast Fourier transform DSP algorithm analyzes the audio signal received by the amplifier in different frequency bands. For example, the signal may be analyzed in half octave frequency bands. From this analysis, the spectral power density of differing bands is compared. The spectral characteristics of a speech signal tend to demonstrate high peaks in single sub-octave bands relative to adjacent bands and most energy is in the frequency range between 300 and 3000 Hz. Conversely, a music signal will tend to have similar energy in adjacent bands (averaged over a short period) and significant energy above 3000Hz and below 300Hz. An algorithm based on this technique provides a continuous probability (0 to 100%) of the current signal being
music.
[0032] Another classification method is described by Saunders in "Real-Time Discrimination of Broadcast Speech/Music", EBEE 0-7803-3192-3/96, which is hereby incorporated by reference. This classification method is based on the analysis of the zero
crossings rate of the audio signal. The rate and changes in rate of zero crossings are used to differentiate between speech and music signals. This method uses less processor power and memory than more traditional fast-Fourier transform techniques. Improvements in recognition speed to Saunders are proposed by El-Maleh et al in "Music Speech Discrimination for Multimedia Applications" in Proceedings of IEEE Conference Acoustics, Speech, Signal Processing (June 2000), which is hereby incorporated by reference.
[0033] Additional classification techniques include Gaussian mixture model, Gaussian model classification and nearest-neighbor classification. These techniques use statistical analyses of underlying features of the audio signal, either in a long or short period of measurement time, resulting in separate long-term and short-term features. [0034] Once the classification is made, the switch from a speech classification to a music classification and vice- versa occurs at a predetermined threshold. The assessment of speech verses music is a continuous process. For any particular example
implementation, numerous empirical tests using music and speech measuring the "music probability" in the range 0 to 100% may be performed. The distribution of speech and
music can then be overlayed and one would expect to see no, or a very small overlap in the distribution curves. From this data, a threshold algorithm can be derived. The threshold has a time and hysteresis factor built in that prevents undesirable hunting between the two states. The switching characteristic may have a soft transition so as not to be noticeable to the user except in that the benefits of this invention results in good music fidelity plus protection from startle during speech communications. This threshold
can be linked to the probability that the signal being processed is speech (the higher the probability it is speech, the lower the delta threshold). This provides a compromise between protection from startle, music fidelity and completely transparent operation (no sudden on/off switching of anti-startle processing).
[0035] Referring to FIG. 2, one example system 20 for implementing the processes set forth in FIG. 1 is shown. The system 20 typically includes at least one processing unit 22 and memory 32. Processing unit 22 interfaces with memory 32 and
communication connection 24 to receive and send audio to and from other devices. Processing unit 22 processes information and instructions used by system 20. Memory 32 is any type of memory that can be used to store code and data for processing unit 22.
Depending on the exact configuration and type of device system 20 is implemented, memory 32 may include volatile memory 28 (such as RAM), non-volatile memory 30 (such as ROM, flash memory, etc.) or some combination of the two. By way of example, and not limitation, the communication connection 24 may include wired media such as a direct-wired connection, and wireless media such as RF.
[0036] The device on which system 20 is implemented may have a variety of features and functionality. The implementation device may utilize several forms of computer storage media. Depending on the particular device, the computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Memory 32 may be incorporated or integrated with the computer storage media of the implementation device. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology. Where the implementation device is a personal computer, the computer storage media includes CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the implementation device on which system 20 is implemented.
[0037] For example, referring to FIG. 3, system 20 may be implemented on a headset amplifier 34. By implementing system 20 at a headset amplifier 34, system 20 is independent of the electronic device to which it is attached and can therefore be used with a variety of electronic devices. The headset amplifier 34 may have multiple inputs
to accommodate multiple devices simultaneously. Processing power at headset amplifier 34 may advantageously be higher than other components. System 20 may be more difficult to bypass by users when placed at headset amplifier 34, thus providing for improved user health and safety. In a further example, system 20 may be implemented on a desktop or laptop personal computer, mobile handset, personal digital assistant, headset, or sound card. Although described independently here, processing unit 22 and memory 32 typically already reside on the device to perform other functions associated with the device. Thus, implementation of processing set forth in FIG. 1 does not require additional hardware resources. [0038] In one application, a headset 36 is connected to a headset amplifier 34 which, in turn, is connected to an electronic device 38. For example, the electronic device 38 may be a telephone, digital music player, PDA, or an integrated device combining
functionality of two or more of such devices. The headset 36 includes at least one speaker and a microphone.
[0039] The headset amplifier 34 is generally used to amplify signals to or from electronic device 38. In one application, the headset amplifier 34 receives the audio signal from electronic device 38, limits the maximum amplitude of the audio signal to improve user safety, and provides a power output to drive the speaker of the headset 36.
The headset amplifier 34 may provide power for the headset microphone, receives the audio signal from the microphone, and modifies the gain of the audio signal from the microphone. Typically, an electret microphone is used, which requires that headset amplifier 34 supply DC power of a few volts at between 15 and several hundred
microamps to the headset 36.
[0040] In the present example, headset amplifier 34 includes system 20 for performing digital signal processing on the audio signal in addition to amplification. The headset amplifier 34 may provide automatic gain control to protect the user by limiting the maximum volume level output to the user.
[0041] Headset amplifier 34 may receive variety from a variety of sources. For example, it may draw current from electronic device 38. Headset amplifier 34 may also be powered with a battery or from power derived from the USB port of a PC or from an AC wall outlet using a DC power supply. [0042] Referring again to FIG. 1, in an example of the invention the anti-startle processing at block 16 comprises systems and methods for a sound pressure level limiter with an anti-startle feature as described and illustrated in reference to FIGS. 4-14. In a
further example of the invention, block 16 processing includes only anti-startle processing and does not include sound pressure level limiting processing. [0043] The anti-startle feature generally involves detecting fast rise time signals that are likely to cause acoustic startle and slowing the rise time of such signals. The anti- startle feature may be implemented with a delta incident detector for detecting delta
acoustic incidents that exceed a predetermined acoustic startle boundary, a delta limiter
for determining an anti-startle gain, and an amplifier to apply the anti-startle gain to the input signal. The predetermined acoustic startle boundary may be a function of signal rise time and sound pressure level (SPL) increase. The delta incident detector may detect
delta incidents based on an estimated true SPL delivered by a transducer to a predetermined datum point. The estimated true SPL may be measured with a microphone located at, or close to, the chosen datum point (e.g. a microphone located in the headset receiver assembly). Alternatively the true SPL may be estimated based on the electrical signal that drives the headset receiver and the measured receiving frequency response of the transducer. An SPL limiter may also be implemented with or without the anti-startle feature to determine an SPL gain in response to detecting an SPL acoustic incident that exceeds a predetermined SPL threshold, the detection of the SPL acoustic incident may be based on the estimated true SPL.
[0044] The anti-startle gain can be associated with an anti-startle gain limit and release time. Thus, upon detection of a delta acoustic incident by the delta incident detector, the delta limiter may set the anti-startle gain to the anti-startle gain limit and
then enter a delta limiter release phase in which the delta limiter increases the anti-startle gain over a period of time associated with the anti-startle release time until the anti-startle gain reaches unity (1).
[0045] FIG. 4 is a block diagram illustrating an exemplary audio processing system 50 implementing sound pressure level (SPL) limiting and anti-startle features. The systems and methods described herein may be utilized for audio devices located close to the listener's ear such as a headset, handset, mobile phone, headphone, or earphone, as well as audio devices located at a distant to the listener's ear such as loudspeakers or other transducers located distant from the listener.
[0046] As shown, the audio processing system 50 generally includes a true SPL estimator or processor 52, an SPL incident detector 54, an SPL limiter 56, a delta incident
detector 58, a delta limiter 60, a look-ahead delay element 62, and a variable gain amplifier (VGA) 64. The processing by the components of the system 50 is typically carried out in the digital domain. Thus analog to digital converter (ADC) and digital to analog converter (DAC) are typically provided at the input and the output but are not shown for purposes of clarity. Within the audio processing system 50, the audio signal passes only through the delay element 62 and the amplifier 64 and the remaining components, i.e., the true SPL processor 52 and the SPL and delta incident detectors and limiters 54-60, implement signal analysis and gain control functions. [0047] The true SPL processor 52 estimates the sound pressure level at the user's ear,
thus allowing the audio processing system 50 to use the estimated true SPL rather than the electrical signal level delivered to the headset receiver as the basis for SPL and delta limiting. Such use of the true SPL (or estimated true SPL) helps to ensure that the delta
limiting and SPL limiting both occur at precisely defined sound pressure levels rather than at arbitrary electrical signal levels.
[0048] The SPL incident detector 54 receives the (estimated) true SPL waveform and measures the mean square sound pressure level to detect for an SPL incident. The SPL limiter 56 calculates the SPL gain reduction depending on the results of the SPL incident detection so as to limit the sound pressure level below a predetermined SPL threshold.
The SPL gain reduction GainspL is then applied by the VGA 64 or a digital gain-control block (not shown) that performs the same function as the VGA in the digital domain. {0049] The delta detector 58 detects acoustic incidents that have a high likelihood of causing acoustic startle in the user, based on the rise time and amount of increase in the sound pressure level. For example, delta incident detector 58 may base its determination on whether the combination of the increase in the sound level, suddenness of the increase in the sound level, and the absolute sound level is likely to cause acoustic startle in the user. If acoustic startle is determined to be likely, the delta limiter 60 then generates a time- varying control signal for the VGA 64 to slow the rise time of the increase in the sound pressure level. The time-varying gain control of the delta limiter 60 may use a
feedforward configuration as will be described in more detail below. The combination of the delta detector 58 and the delta limiter 60 thus facilitates in preventing acoustic startle in the user, i.e., the anti-startle feature.
[0050J The combination and close integration of the true SPL processing 52, SPL limiting, and anti-startle processing in the audio processing system 50 allows the SPL limiter 56 to use a relatively slow attack time constant so that normal speech peaks remains relatively unaffected while the combination of delta limiter 60 and SPL limiter 56 still provides instantaneous limiting of loud, fast-onset noises. The look-ahead delay
element 62 applies a short look-ahead delay, typically a few milliseconds, to ensure that
gain reductions GainspL and Gairideita are applied slightly before they are actually needed so as to prevent any loud glitches occurring as the VGA 64 responds to increases in signal level. Each component of the audio processing system 50 will now be described in more detail below.
[0051] As noted, the audio processing system 50 uses the true SPL instead of electrical signal level as the basis for SPL and delta limiting. True SPL processing enables consistent limiting at the same sound level regardless of changes in the signal spectrum or audio transducer. Such consistent limiting at the same sound level is particularly applicable to headsets, handsets, etc. that are used in a fixed position close to
the ear and thus have relatively consistent receiving characteristics. However, true SPL processing may also be used by audio processing systems in applications with
loudspeaker systems in a controlled acoustic environment, for example. True SPL is measured at a chosen datum point such as at an eardrum reference point (DRP), ear reference point (ERP) or equivalent open-field sound pressure level. In some applications the true SPL may be directly measured using a microphone located at, or close to, the chosen datum point. For example, a microphone mounted in the headset receiver assembly may directly measure the SPL at ERP. If it is difficult or impossible to directly measure the SPL at the chosen datum point, the true SPL may be estimated based on a measurement made at a different point. For example the SPL at DRP may be estimated from the SPL at ERP by passing the output signal from a probe microphone located at ERP through a filter whose frequency response is equal to the ERP-to-DRP
transfer function of a typical human ear.
[0052] Although the true SPL may be measured with a probe microphone located at or close to the datum point in some applications, in many cases, such direct measurement of the true SPL may be impractical or difficult. Thus, the audio processing system 50 typically employs the true SPL processor 52 to estimate or calculate the true SPL from
the electrical signal level. For example, if the chosen datum is A-weighted SPL at the DRP, the headset's receiving frequency response (reference DRP) can be measured and combined with an A-weighting response to form a composite true SPL estimation filter.
In a digital system, the true SPL estimation filter may also include DAC gain and power amplifier gain as a function of frequency. The transfer function for the true SPL
estimation filter can be a combination of the headset's receiving frequency response with the DAC and power amplifier frequency response and the A-weighting response. The true SPL estimation filter models the electroacoustic transmission path between the SPL limiting device and the user's eardrum or other chosen datum point. The true SPL estimation filter processes the digital signal driving the DAC to estimate the A-weighted sound pressure waveform that is present at the user's eardrum, from which the A- weighted SPL may be calculated.
[0053] FIGS. 5 and 6 are block diagrams illustrating exemplary true SPL estimators 52a, 52b employing single band processing and multi-band processing, respectively. The
single band true SPL estimator 52a as shown in FIG. 5 implements an electrical or digital
filter whose transfer function is equivalent to a combination of the DAC and power amplifier gain 74, headset frequency response 76 and A-weighting frequency response 78. The multi-band true SPL estimator 52b as shown in FIG. 6 uses a separate gain
value for each frequency band, which is equivalent to a combination of the average DAC and power amplifier gain 84, headset receiving sensitivity 86 and A-weighting value 88 in that frequency band. As the true SPL estimator or estimation filter 52a, 52b is generally employed in a digital system, the true SPL estimator 52a, 52b may include the
DAC and power amplifier gain 74, 84. Alternatively, if the true SPL estimator 52a, 52b is employed in an analog system, there would be no DAC frequency response term although a power amplifier frequency response may still be included. In other words, the
true SPL estimation filter 52 processes the digital signal driving the DAC, or the electrical signal driving the headset's receiver, to estimate the A-weighted SPL that is present at the wearer's eardrum or other chosen datum point..
[0054] With single band (time domain) processing as shown in FIG. 5, the estimator 52 can implement frequency-weighted SPL measurement and limiting but cannot distinguish between narrowband and wideband signals of the same power. In contrast, with multi-band (frequency domain) processing as shown in FIG. 6, the electrical signal is split into multiple frequency bands fi, f2, f3, ..., fn, using an analysis filter bank 82 or block transform. With both single band and multi-band processing, frequency-weighted limiting may be implemented by replacing the A-weighting frequency response 78, 88 with the alternative frequency response that is required. Multi-band processing 52b allows independent narrowband and whole band SPL measurements. For example,
multi-band processing 52b can be configured to limit high frequency narrowband signals to a lower level than single band processing, if both systems are configured to limit the whole-band SPL to the same level, for example. [0055] It is noted that the accuracy of SPL limiting depends on the accuracy of the
SPL measurement. When SPL is estimated from the electrical signal driving the headset or loudspeaker, one factor affecting the accuracy of the SPL measurement is the accuracy with which the receiving frequency response of the transducer is known. Very accurate SPL calculation may be achieved if the receiving frequency response for the specific headset in use has been measured. Less accurate SPL calculation may be achieved if an average frequency response for the headset type or model is used. An even less accurate SPL calculation results if a generic average frequency response for several headset models is used.
10056} The true SPL processor 52 outputs an estimated true SPL waveform P to both the SPL incident detector 54 and the delta incident detector 58. The estimated true SPL
waveform P is a waveform whose instantaneous level represents the sound pressure (e.g. Pascals, A-weighted) at the selected acoustic reference point, e.g., at the DRP. The SPL incident detector 54 detects when the SPL exceeds a predetermined SPL threshold
SPLmax. FIG. 7 is a block diagram illustrating an exemplary SPL incident detector 54 for an audio processing system that uses single-band true SPL processing. The SPL incident detector 54 includes a squarer X2 102, a lowpass filter 104 with an associated time
constant TSPL and a comparator 106. In particular, the SPL incident detector 54
determines and compares the mean square sound level P2 with the predetermined SPL
threshold SPLm3x. In one embodiment, the time constant TSPL is approximately 20 ms and
the predetermined SPL threshold SPLmax is approximately 100 dB (A). [0057] The lowpass filter 104 may be a first-order infinite impulse response (HR) filter implementing: yn — A yn-i + (1-A) xn-i, where:
Figure imgf000022_0001
fs is the sampling frequency, which is typically 8kHz or 16kHz for a telecommunications device but may be any other suitable frequency.
Xn is the filter input (nth sample in the time series), and yn is the filter output (nth sample in the time series).
[0058] An SPL incident detector 54 for an audio processing system using multi-band true SPL processing would employ a narrowband SPL incident detector 54 similar to that shown in FIG. 7 for each frequency band. A whole band SPL incident detector may also be implemented using the mean square sum of the sub-band signal levels. The
narrowband SPL limiting thresholds are typically lower than the whole band SPL limiting threshold and may vary with frequency.
[0059] The output of the SPL incident detector 54 drives the SPL limiter 56 which in turn reduces the SPL generated in the headset when SPL incidents are detected by controlling the gain of the VGA. In one exemplary implementation, the SPL limiter 56
may apply a fixed attenuation ASPL, e.g., 4OdB, with attack time tspL_attack and release time tspL_reiease- A fixed attenuation ASPL of approximately 4OdB is generally sufficient to reduce the loudest sounds that can occur on a telephone network to a comfortable level at or below normal speech level while still allowing the user to detect that an acoustic incident has occurred. In one example, a fixed attenuation of approximately 4OdB with a
SPL limiting threshold SPLmax of 100 dB (A) reduces such signals to a minimum level of 60 dB (A), which is clearly audible in most situations. Although a particular implementation is described, the SPL limiter 56 may be implemented in various other suitable ways. Merely as an example, rather than applying a fixed 40 dB attenuation, the SPL limiter 56 may apply an attenuation equal to the amount by which the input signal exceeds the SPL incident threshold. As is evident, various other implementations of the SPL limiter 56 may be employed to reduce the SPL below the SPL incident threshold. [0060] Attack and release may have logarithmic rather than linear or exponential characteristics as a human listener tends to perceives logarithmic attacks and releases as
smooth linear changes of loudness. In one embodiment, the attack time tspL_attack is approximately 50 ms and the release time tspLjeiease is approximately 250ms. A non- instantaneous attack time tspL_attack ensures that the natural peaks of speech are generally unaffected even when listening to loud speech with an rms signal level close to the
limiting threshold SPLm3x such that the SPL incident detector is triggered for a few milliseconds by peaks of the speech waveform. A slow release time tspLjeiease facilitates in preventing the resulting 4OdB rise in signal level from causing acoustic startle. [0061] FIG. 8 is a flowchart illustrating an exemplary SPL limiting process 108 for
limiting the sound pressure level as performed by the SPL limiter 56. The SPL limiting
process 108 shown is performed by the SPL limiter 56 for each new audio sample. In particular, if an SPL incident is detected as determined at decision block 110, the SPL limiter enters a limiting phase. In the limiting phase, the SPL limiter determines whether the SPL gain GainspL exceeds a predetermined SPL gain limit GainsPLjimit at decision block 112. If so, then the SPL limiter enters an attack phase at block 114 and sets the
SPL gain GainspL to:
GainspL = GainspL ksPL_attack where kspL_attack is the SPL attack constant:
I" IQg(GHfBsn. JMl)I JL _ I ri fs 1SPl alack J
11SPl attack ~ l KJ and fs is the sampling frequency (Hz).
While the acoustic processing system remains in the active attack phase, each iteration of the SPL limiting process 108 decreases the SPL gain GainspL until it reaches the
predetermined SPL gain limit GainspLjimit. Once the SPL gain GainspL has reached the predetermined SPL gain limit GainspLjimit, i.e., the SPL gain GainSpL is equal to or less than the predetermined SPL gain limit GainsPLjimitas determined at decision block 112,
the SPL gain GainspL is set to equal to the predetermined SPL gain limit GainspLjimit at block 116, i.e., steady state attenuation by the SPL limiter.
[0062] If an SPL incident is not detected as determined at decision block 110, the
SPL limiter determines whether the SPL gain GainspL is less than unity (1) at decision block 118. If so, the SPL limiter is in a release phase and, at block 120, the SPL limiter
increases the SPL gain GainspL to:
GainspL = GainspL kSpL_reieas-> where kSpL release is the SPL release constant:
K u-SPl _release - ~ i χnyjL f*tspL-"h'* J
While the acoustic processing system remains in the release phase, each iteration of the SPL limiting process 108 increases the SPL gain GainspL until it reaches unity (1), i.e., the release phase is complete and no attenuation is applied by the SPL limiter. Once the
SPL gain GainspL has reached or exceeded unity, as determined at decision block 118, the SPL gain GainspL is set to equal to unity (1) at block 122, i.e., no attenuation is applied by the SPL limiter. In one embodiment, the attack time tspL_attack is approximately 50 ms, the release time tspLjeiease is approximately 250 ms, and the SPL gain limit GainspLjimit is approximately 0.01, i.e. 4OdB attenuation. [0063] In addition to SPL limiting, the audio processing system 50 also provides an anti-startle feature by implementing the delta detector 58 for detecting changes in the sound level that are deemed to be likely to cause acoustic startle and the delta limiter 60 for limiting such changes in the sound level. As acoustic startle is a complex and widely variable phenomenon that depends on a range of environmental and psychological conditions, acoustic startle is generally not amenable to simple characterization. For example, acoustic startle is typically not characterized by defining specific limits for absolute increases in SPL and/or rate of increases in the sound level that cause a startle
response. However, some observations affecting the likelihood and/or intensity of
acoustic startle typically include faster rising acoustic stimuli increase the intensity of an acoustic startle, larger increases in sound level increase both the likelihood and intensity of acoustic startle, and under some conditions, sound levels as low as 6OdB SPL are capable of causing acoustic startle. Thus sound level increase and the rise time of that
increase may be used to form the basis of an acoustic startle detection algorithm implemented by the delta incident detector 58 whose parameters can be tuned empirically to suit particular operating environments.
[0064] FlG. 9 is a graph illustrating an exemplary anti-startle boundary defined in an SPL increase versus rise time variable space. The upper left portion above the anti-startle boundary in the variable space, representing large increases in SPL with relatively fast
rise times, generally corresponds to high probability and likely intensity of acoustic startle. The lower right portion below the anti-startle boundary in the variable space, representing small increases in SPL with relatively slow rise times, generally corresponds to low probability and likely intensity of acoustic startle. The anti-startle boundary is such that above the boundary, the probability and likely intensity of acoustic startle is deemed to be unacceptable. As shown, longer rise time signals require a greater total sound level increase to cause acoustic startle than fast rise time signals. The actual gradient of the delta detector boundary maybe determined empirically, for example. (0065] However, small increases, i.e., delta, in sound level generally do not cause acoustic startle regardless of the rise time. In one embodiment, the approximate minimum delta that may cause acoustic startle is approximately 15 dB. FIG. 10 is a graph illustrating the anti-startle boundary in the SPL increase versus rise time variable space of FIG.9 with the minimum delta requirement introduced. It is noted that various alternative values for the minimum delta may be used and may be fine tuned by subjective testing.
[0066] The delta incident detector 58 may detect delta incidents based on the anti- startle boundary as shown in FIG.' 10. In addition, the delta incident detector 58 may also take into account that the resumption of speech at the previous sound level after a short period of silence is unlikely to cause acoustic startle even if such resumption results in a very large increase in sound level relative to the preceding silence. Thus the additional condition for the delta incident detector 58 to be triggered is that the instantaneous sound level exceeds the previous active speech level by a certain resumption of speech threshold. During active speech, the resumption of speech threshold may be slightly greater than the speech crest factor while during periods of silence, the resumption of speech threshold may decay exponentially with a time constant. The resumption of speech time constant may be on the order of seconds or tens of seconds, for example. [0067] FIG. 11 is a block diagram illustrating an exemplary delta incident detector 58. As shown, the delta incident detector 58 receives the estimated true SPL waveform P output from the true SPL processor 52. The delta incident detector 58 detects delta
incidents that are likely to cause acoustic startle. The delta incident detector 58 includes a squarer X2 132, fast, medium and slow lowpass filters 134, 136, 138 with associated
time constants τDeitajast, τDeita_medium, τDeita_siow, respectively, and delta detection threshold comparators 140, 142. Each of the lowpass filters 134, 136, 148 maybe a first order HR filter similar to that used in the SPL incident detector 54 as described above with
reference to FIG. 7. In one embodiment, the time constants τfast, tmedium> τsiow *°r *he
lowpass filters 134, 136, 138 are approximately 5 ms, 50 ms, and 5 s, respectively. [0068] The slow lowpass filter 138 measures the recent average speech level and may be selectively enabled and disabled. Specifically, when either an SPL incident or a delta incident is detected, the slow lowpass filter 138 is disabled such that the slow lowpass filter 138 does not perform filter update calculations and the current filter output state is frozen and used until the slow lowpass filter 138 is re-enabled. Such a configuration helps to prevent abnormal signal levels during acoustic incidents from affecting the average speech level estimation. However, the slow exponential decay with time
constant τsiow ensures that normal speech (or other audio) starting after a long period of
silence is correctly flagged as a potentially startling incident. [0069] The first delta detection threshold comparator 140 compares the ratio of the
mean square sound levels Pf 2 / Pm 2 output from the fast and medium lowpass filters 134,
136 to a first (fast/medium) predetermined delta detection threshold ThrDi. The second delta detection threshold comparator 142 compares the ratio of the mean square sound levels Pf 2 / Ps 2 output from the fast and slow lowpass filter 134, 138 with a second
(fast/slow) predetermined delta detection threshold ThTo2. In one embodiment, the predetermined first (fast/medium) and second (fast/slow) delta detection thresholds Throi and ThrD2 are 5.6 (7.5 dB) and 31.6 (15 dB), respectively.
[0070] The first delta detection threshold comparator 140 implements the anti-startle boundary such as that shown in BIG. 10. Thus if the first comparatorl40 determines that the first delta threshold ThrDi is not exceeded, then a delta incident is not detected. On the other hand, if the first delta threshold is exceeded, i.e., the anti-startle boundary is crossed, the second comparator 142 ensures that resumption of speech (or other audio) at or close to the previous sound level after a short pause does not trigger delta (startle) incidents. FIG. 12 is a graph illustrating an exemplary measured response of the delta incident detector 58 for the exemplary time constant and threshold values presented
herein. The minimum delta plateau level, the knee-point and the slope are all configurable by changing the filter time constants and/or the first delta detection
thresholds Thro l.
[0071] When the delta incident detector detects a delta incident, the delta incident
detector 58 triggers the delta limiter 60. In one exemplary implementation, when triggered, the delta limiter 60 applies a fixed attenuation with an instantaneous (or near instantaneous) attack and a slow release. The slow release may be logarithmic to ensure that the release sounds gradual to a human listener. Such delta limit processing slows the rise time of signals with fast rise times, thus reducing the likelihood of acoustic startle. Ih one embodiment, the delta limiter 60 may have an attack time of approximately 1000/fs ms or less (where fs is the sampling frequency), a release time tdeitajeiease of approximately 250 ms, and an initial attenuation of approximately 4OdB, i.e., delta gain limit
Gaindeitajimit of 0.01. Various other suitable implementations of the delta limiter 60 may be similarly employed to slow the rise time of signals with fast rise times. [0072] FIG. 13 is a flowchart illustrating an exemplary process 150 for slowing the rise time as performed by the delta limiter 60. The delta limiting process 150 shown is performed by the delta limiter 60 for each new audio sample. At decision block 152, the delta limiter determines if the delta incident detector has detected a delta incident. If so, the delta gain Gainaeita is immediately set to the delta gain limit Gaindeitajimit at block 154 so that the attack time of the attenuation applied by the delta limiter is instantaneous or near instantaneous. In general, any delay in applying the attenuation by the delta limiter is introduced by the short processing delays attributable to the true SPL processor, the delta detector and the fact that the output of a digital audio system is sampled and thus
only changes once every (l/fs) seconds. This sampling may delay a change in the system
output, in response to a change in the input signal, by up to (1000/fs) milliseconds. Alternatively, if the delta incident detector has not detected a delta incident, the delta limiter determines if the delta gain Gain<ieita is less than unity (1) at decision block 156. If
so, the delta limiter is in a delta release phase and, at block 158, the delta gain Gaindeita is increased to:
Gaindeita — Gaindeita kdelta_release> where kdeita_reiease is the delta release constant:
ndelta_rekase ~
Figure imgf000029_0001
and fs is the sampling frequency (Hz). While the acoustic processing system remains in the delta release phase, each iteration of
the delta limiting process 150 increases the delta gain Gairideita until it reaches unity (1), i.e., no attenuation. Once the delta gain Gaining has reached or exceeded unity (1) as
determined at decision block 156, the delta gain Gainaeita is set to unity (1) at block 160, i.e., no attenuation applied by the delta limiter.
[0073] Some of the parameters and their respective values or equations used by the exemplary SPL and delta limiters 56, 60 presented herein are summarized in Table 1 below. However, various other suitable parameter values may be similarly employed to achieve different characteristics of SPL limiting and/or delta limiting. It is noted that although the exemplary SPL limiter 56 and the delta limiter 60 apply the same gain limit, the same release time, and uses the same release constant determination, the SPL and delta limiters 56, 60 may be configured with parameter values different from each other. In addition, although not shown in TABLE 1, other suitable parameter values different from the exemplary values presented herein for the true SPL processor 52 and the SPL and delta incident detectors 54, 58 may be employed.
Figure imgf000030_0001
[0074] FIG. 14 are graphs illustrating an exemplary measured delta limiter response and FIG. 15 are graphs illustrating an exemplary combined SPL and delta limiter response. These graphs represent the response from a multi-band test system employing
a delta incident detector, a whole band SPL incident detector, and 16 narrowband SPL incident detectors, the activities of which are shown in the center graph. The top and bottom graphs show the electrical input and output signals of the acoustic processing system where the input has a fast risetime tone burst at time t = 10 sec. Note that the input level graphs use different vertical scales and the input signal level in FIG. 15 is approximately ten times greater than the input signal level in FIG. 14. Also note that in FIG. 15, the output level graph uses a different vertical scale from that used for the input level graph. If the output and input level graphs used the same vertical scale, the details on the output level graph would not be visible due to the 8OdB (10,000 times) attenuation provided by the combination of the SPL limiter and the delta limiter.
[0075] In FIG. 14, the sudden increase in input level causes a delta incident, but the absolute SPL is too low to cause either whole-band or narrowband SPL incidents. As shown, when the delta incident detector detects the delta incident at t = 10 sec and triggers the delta limiter, the delta limiter (nearly) instantaneously applies the delta gain
limit Gairideitajimit, e.g., 0.01 or 40 dB attenuation. The delta limiter then enters its release phase with a slow release (rise) time tddta release of 250 ms and increases the delta gain Gaindeita until it reaches unity (1), i.e., no attenuation, at time t = 10.25 sec. At time t = 10.25 sec, the delta limiter has completed its release phase and no longer applies any attenuation so that the input and output electrical signal levels are equal. [0076] In FIG. 15, the combined SPL and delta limiter response is shown for an input
signal, approximately ten times greater than that shown in FIG. 14, which causes the delta incident detector, the whole-band SPL incident detector and the narrowband incident detectors all to be triggered. Note the input and output level graphs have different vertical scales because the output is heavily attenuated by the SPL limiter and the delta limiter for the duration of the acoustic incident.
[0077] As shown, when the delta incident detector detects the acoustic incident at t = 10 sec, the delta incident detector triggers the delta limiter to apply 4OdB of attenuation nearly instantaneously. The delta limiter provides instantaneous or near instantaneous attenuation and then enters its release phase with a slow release (rise) time such that the
delta-limited output signal has a slow rise time.
[0078] The SPL incident detector detects the same acoustic incident shortly after the delta incident detector and causes the SPL limiter to apply an additional 4OdB of
attenuation with a relatively slow attack time tSpL_attack of 50 ms. The SPL incident detector is delayed relative to the delta incident detector due in part to the longer time constant used by the SPL incident detector (20ms for the SPL incident detector versus 5ms for the delta incident detector), and also due in part to the SPL detector's internal
signal level having to slew all the way from its initial low value to approximately 10OdB SPL before an SPL incident is flagged. In contrast, a delta incident can be triggered by a relatively small increase in SPL, e.g., on the order of 15dB. At approximately 50 ms after the acoustic incident (t = 10.05 sec), the cumulative attenuation peaks at approximately 8OdB with the delta and SPL limiters each contributing approximately 40 dB attenuation. At this point, the SPL limiter applies its 4OdB of steady state attenuation
(GainspLjimitof 0.01) for the entire duration of the SPL incident until both the whole-band and narrowband SPL incident detectors become inactive at t = 10.56 sec. The SPL detectors may remain active slightly longer than the input signal remains above their respective SPL incident thresholds due to the decay characteristics of the lowpass filters used in the SPL detectors. The delta limiter, on the other hand, continues its release phase with a release time tdeitaj-eiease of 250 ms until its release phase is complete at t = 10.25 sec. Thus, from the peak cumulative attenuation of approximately 80 dB at t = 10.05 sec, the cumulative attenuation reduces to approximately 4OdB by time t = 10.25 sec or about 250ms after the start of the acoustic incident when the delta limiter completes its release phase.
[0079] After the SPL incident detectors become inactive at t = 10.56 sec, the SPL limiter then enters its release phase with a with a slow release (rise) time tspL_reiease of 250 ms as it decreases its attenuation from 40 dB to 0 dB.
[0080] Because the delta limiter provides (near) instantaneous limiting, the SPL limiter can use a relatively slow attack time so as to prevent the SPL limiter from clipping normal peaks of the speech waveform, even at rms speech levels close to the limiting threshold, which may result in short-term peaks in the speech waveform causing the threshold to be exceeded for a few milliseconds. The delta incident detector may be tuned so as to not trigger during continuous speech with short periods of silence. The net effect of the SPL and delta incident detectors and limiters is that loud and/or potentially startling acoustic incidents are avoided but undesirable distortion of speech (or other audio) is reduced or minimized. The combination of the delta and SPL limiters complement each other so as to provide better acoustic comfort and less degradation of speech signals.
[0081] Referring again to FIG. 4, the combination of the true SPL processor 52, detectors 54, 58, and limiters 56, 60 introduces a short delay. Thus the look-ahead delay element 62 is provided in the signal path so that the gain control applied by the variable gain amplifier (VGA) 64 is applied slightly before the acoustic incident that requires attenuation, thus preventing short duration glitches on the system output when acoustic incidents occur on the input. The delay introduced by the delay element 62 may be
slightly longer than the delay in the VGA control path to ensure the prevention of such glitches on the system output. Typically, the processing performed by the components of the audio processing system 50 is carried out in the digital domain so that the VGA 64 is a digital gain block whose gain GainvGA is the product of the delta limiter gain Gainaeita and the SPL limiter gain GainspL-
[0082] The audio processing system 50 provides several features including improved accuracy of SPL at which limiting occurs with the use of the true SPL processor 52, an anti-startle feature with the use of the delta incident detector and limiter 58, 60 by instantaneously limiting acoustic incidents with fast risetime and high intensity, and reduced distortion of speech (or other audio) whose rms level is close to the limiting
threshold with the combination of the SPL and delta incident detectors and limiters. The audio processing system 50 thus provides better acoustic comfort and less degradation of speech signals.
[0083] While the exemplary embodiments of the present invention are described and illustrated herein, it will be appreciated that they are merely illustrative and that modifications can be made to these embodiments without departing from the spirit and
scope of the invention. Thus, the scope of the invention is intended to be defined only in terms of the following claims as may be amended, with each claim being expressly incorporated into this Description of Specific Embodiments as an embodiment of the invention.

Claims

1. A method for processing an audio signal comprising: receiving an audio signal at a headset amplifier;
classifying the audio signal as a speech signal or a music signal at the headset amplifier; and
processing the audio signal responsive to whether the audio signal is a speech signal or music signal, wherein the processing comprises anti-startle processing if the audio signal is a speech signal.
2. The method of claim 1, wherein classifying the audio signal as a speech signal or a music signal comprises analyzing the audio signal in different frequency bands and comparing a spectral power density of different bands.
3. The method of claim 1, wherein classifying the audio signal as a speech signal or a music signal comprises analyzing a zero crossings rate of the audio signal.
4. The method of claim 1, further comprising switching between a speech signal classification and a music signal classification at a predetermined threshold having a built in hysteresis factor.
5. A computer readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for processing an audio signal, comprising: receiving an audio signal;
classifying the audio signal as a speech signal or a music signal; and
processing the audio signal responsive to whether the audio signal is a speech signal or music signal, wherein the processing comprises anti-startle processing if the audio signal is a speech signal.
6. The computer readable storage medium of claim 5, wherein the audio signal is received at a headset amplifier.
7. The computer readable storage medium of claim 5, wherein the audio signal is received at a personal computer.
8. The computer readable storage medium of claim 5, wherein classifying the audio signal as a speech signal or a music signal comprises analyzing the audio signal in different frequency bands and comparing a spectral power density of different bands.
9. The computer readable storage medium of claim 5, wherein classifying the audio signal as a speech signal or a music signal comprises analyzing a zero crossings rate of the audio signal.
10. The computer readable storage medium of claim 5, wherein the method further comprises switching between a speech signal classification and a music signal classification at a predetermined threshold having a built in hysteresis factor.
11. An apparatus for processing an audio signal comprising: a receiving mechanism for receiving an audio signal;
a classifying mechanism for classifying the audio signal as a speech signal or a music signal; and
a processing mechanism for processing the audio signal responsive to whether the audio signal is a speech signal or music signal, wherein the processing comprises anti- startle processing if the audio signal is a speech signal.
12. The apparatus of claim 11, wherein the audio signal is received at a headset amplifier.
13. The apparatus of claim 11, wherein the audio signal is received at a personal computer.
14. The apparatus of claim 11, wherein classifying the audio signal as a speech signal or a music signal comprises analyzing the audio signal in different frequency bands and comparing a spectral power density of different bands.
15. The apparatus of claim 11, wherein classifying the audio signal as a speech signal or a music signal comprises analyzing a zero crossings rate of the audio signal.
16. The apparatus of claim 11 , further comprising switching between a speech signal classification and a music signal classification at a predetermined threshold having a built in hysteresis factor.
17. A method for processing an audio signal comprising: receiving an audio signal;
classifying the audio signal as a speech signal or a music signal; and
processing the audio signal responsive to whether the audio signal is a speech signal or music signal, wherein the processing comprises anti-startle processing if the audio signal is a speech signal.
18. The method of claim 17, wherein the audio signal is received at a personal computer.
19. The method of claim 17, wherein classifying the audio signal as a speech signal or a music signal comprises analyzing the audio signal in different frequency bands and comparing a spectral power density of different bands.
20. The method of claim 17, wherein classifying the audio signal as a speech signal or a music signal comprises analyzing a zero crossings rate of the audio signal.
21. The method of claim 17, further comprising switching between a speech signal classification and a music signal classification at a predetermined threshold having a built in hysteresis factor.
PCT/US2007/006035 2006-03-10 2007-03-08 Music compatible headset amplifier with anti-startle feature WO2007106384A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US37354506A 2006-03-10 2006-03-10
US11/373,545 2006-03-10

Publications (1)

Publication Number Publication Date
WO2007106384A1 true WO2007106384A1 (en) 2007-09-20

Family

ID=38233999

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/006035 WO2007106384A1 (en) 2006-03-10 2007-03-08 Music compatible headset amplifier with anti-startle feature

Country Status (1)

Country Link
WO (1) WO2007106384A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011044798A1 (en) * 2009-10-15 2011-04-21 华为技术有限公司 Signal classification method and device
WO2011141772A1 (en) * 2010-05-12 2011-11-17 Nokia Corporation Method and apparatus for processing an audio signal based on an estimated loudness
EP2602978A1 (en) * 2011-12-08 2013-06-12 Samsung Electronics Co., Ltd Method and Apparatus for Processing Audio in Mobile Terminal
WO2020040676A1 (en) * 2018-08-24 2020-02-27 Dirac Research Ab Controlling a limiter adapted for selectively suppressing an audio signal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4596902A (en) * 1985-07-16 1986-06-24 Samuel Gilman Processor controlled ear responsive hearing aid and method
US20010046304A1 (en) * 2000-04-24 2001-11-29 Rast Rodger H. System and method for selective control of acoustic isolation in headsets
EP1471767A2 (en) * 2003-03-31 2004-10-27 DSPFactory Ltd. Method and system for acoustic shock protection
US20050195994A1 (en) * 2004-03-03 2005-09-08 Nozomu Saito Apparatus and method for improving voice clarity

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4596902A (en) * 1985-07-16 1986-06-24 Samuel Gilman Processor controlled ear responsive hearing aid and method
US20010046304A1 (en) * 2000-04-24 2001-11-29 Rast Rodger H. System and method for selective control of acoustic isolation in headsets
EP1471767A2 (en) * 2003-03-31 2004-10-27 DSPFactory Ltd. Method and system for acoustic shock protection
US20050195994A1 (en) * 2004-03-03 2005-09-08 Nozomu Saito Apparatus and method for improving voice clarity

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8438021B2 (en) 2009-10-15 2013-05-07 Huawei Technologies Co., Ltd. Signal classifying method and apparatus
US8050916B2 (en) 2009-10-15 2011-11-01 Huawei Technologies Co., Ltd. Signal classifying method and apparatus
WO2011044798A1 (en) * 2009-10-15 2011-04-21 华为技术有限公司 Signal classification method and device
US9998081B2 (en) 2010-05-12 2018-06-12 Nokia Technologies Oy Method and apparatus for processing an audio signal based on an estimated loudness
WO2011141772A1 (en) * 2010-05-12 2011-11-17 Nokia Corporation Method and apparatus for processing an audio signal based on an estimated loudness
US10523168B2 (en) 2010-05-12 2019-12-31 Nokia Technologies Oy Method and apparatus for processing an audio signal based on an estimated loudness
EP2602978A1 (en) * 2011-12-08 2013-06-12 Samsung Electronics Co., Ltd Method and Apparatus for Processing Audio in Mobile Terminal
JP2013121181A (en) * 2011-12-08 2013-06-17 Samsung Electronics Co Ltd Method and apparatus for processing audio in mobile terminal
CN103219026A (en) * 2011-12-08 2013-07-24 三星电子株式会社 Method and apparatus for processing audio in mobile terminal
US9184715B2 (en) 2011-12-08 2015-11-10 Samsung Electronics Co., Ltd. Method and apparatus for processing audio in mobile terminal
CN103219026B (en) * 2011-12-08 2017-04-26 三星电子株式会社 Method and apparatus for processing audio in mobile terminal
KR101873325B1 (en) * 2011-12-08 2018-07-03 삼성전자 주식회사 Method and apparatus for processing audio in mobile terminal
WO2020040676A1 (en) * 2018-08-24 2020-02-27 Dirac Research Ab Controlling a limiter adapted for selectively suppressing an audio signal

Similar Documents

Publication Publication Date Title
US9008319B2 (en) Sound pressure level limiter with anti-startle feature
JP5448446B2 (en) Masking module
AU2009242464B2 (en) System and method for dynamic sound delivery
CA2009449C (en) Time dependant, variable amplitude threshold output circuit for frequency variant and frequency invariant signal discrimination
CN105185383B (en) Method for partially preserving music in the presence of intelligible speech
US9196258B2 (en) Spectral shaping for speech intelligibility enhancement
US6766176B1 (en) Method and apparatus for automatically adjusting speaker and microphone gains within a mobile telephone
US20090287496A1 (en) Loudness enhancement system and method
JP5149999B2 (en) Hearing aid and transient sound detection and attenuation method
EP1210767B1 (en) Method and apparatus for automatically adjusting speaker and microphone gains within a mobile telephone
US20060126865A1 (en) Method and apparatus for adaptive sound processing parameters
US20070136050A1 (en) System and method for audio signal processing
US8363854B2 (en) Device and method for automatically adjusting gain
US5070527A (en) Time dependant, variable amplitude threshold output circuit for frequency variant and frequency invarient signal discrimination
GB2432750A (en) Polyphonic ringtone annunciator with spectrum modification
JP5296108B2 (en) Method and apparatus for automatically adjusting speaker gain and microphone gain in a mobile telephone
WO2011117587A2 (en) Method and system
WO2007106384A1 (en) Music compatible headset amplifier with anti-startle feature
JP2008522511A (en) Method and apparatus for adaptive speech processing parameters
WO2010057267A1 (en) Adaptive hearing protection device
EP1696695A1 (en) Acoustic device and method for treating an audio signal
Nordqvist et al. Hearing-aid automatic gain control adapting to two sound sources in the environment, using three time constants
KR20080013268A (en) Apparatus and method of audio play which possess a hearing protection facility

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07752717

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07752717

Country of ref document: EP

Kind code of ref document: A1