WO2009026561A1 - System and method for noise activity detection - Google Patents

System and method for noise activity detection Download PDF

Info

Publication number
WO2009026561A1
WO2009026561A1 PCT/US2008/074102 US2008074102W WO2009026561A1 WO 2009026561 A1 WO2009026561 A1 WO 2009026561A1 US 2008074102 W US2008074102 W US 2008074102W WO 2009026561 A1 WO2009026561 A1 WO 2009026561A1
Authority
WO
WIPO (PCT)
Prior art keywords
average energy
energy
noise
threshold
signal
Prior art date
Application number
PCT/US2008/074102
Other languages
French (fr)
Inventor
Jon C. Taenzer
Original Assignee
Step Labs, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Step Labs, Inc. filed Critical Step Labs, Inc.
Priority to JP2010522086A priority Critical patent/JP2010537253A/en
Priority to EP08798555A priority patent/EP2191594A4/en
Priority to CN200880111290A priority patent/CN101821971A/en
Priority to BRPI0815721A priority patent/BRPI0815721A2/en
Publication of WO2009026561A1 publication Critical patent/WO2009026561A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • G10L2025/786Adaptive threshold
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Definitions

  • the present disclosure relates generally to noise activity detectors for use in for example noise reduction systems.
  • noise reduction systems In many signal processing applications, such as echo cancellation, speech recognition, speech encoding, voice-over-IP, and in particular noise reduction systems, it is important to gather real-time information and statistics about the noise in the signal. This is most often achieved by detecting when there is a useful amount of the desired signal and treating that portion of the signal as "non-noise.” At other times, the signal is assumed to be only noise and the information and statistics that are desired are gathered during those times.
  • the noise and desired signal are mixed, and the incoming mixed noisy signal is considered to be a linear sum of the desired signal and unwanted noise.
  • the noise information is not updated during this part of the signal. Instead, updating of the noise characteristics at other times allows noise reduction, for example, to be executed with appropriate processing.
  • voice communication systems the need for determining the presence of noise-only periods has given rise to the proliferation of numerous voice determination methods, often called voice detection or voice activity detection (VAD) methods, since the voice portion of the mixed signal is the desired portion.
  • VAD voice activity detection
  • Voice activity detection methods whether implemented in the time domain or in the frequency domain, utilize this fact. Many such systems are based upon means that detect when the total energy of the incoming noisy signal is above a threshold, and indicate that there is the presence of voice when this condition is met. Of course, the threshold must be adjusted to be always above the level of the background noise portion of the signal but below the level of the combined voice-plus noise level. Many complex methods have been devised to create such real-time dynamic threshold adjustment for this purpose.
  • Classical voice detection methods assume that the background noise is stationary or only slowly varying. In non-stationary noise conditions, classical voice detection schemes are unreliable, since rapid changes in noise level, especially upward jumps in noise, can not be distinguished from the onset of a voice burst and therefore give false indications of voice presence. Such voice detectors also react to the presence of nearby voices other than that of the user, even though background voices are actually "noise" in systems where the user's own voice is the only desired signal.
  • voice detection methods rely upon setting or updating one or more thresholds based upon the prior history of the signal, rather than on instantaneous current conditions. By relying upon prior information, such thresholds can not update quickly, and the voice detection output is slow to react to rapid changes in background noise, creating errors until the system can eventually adjust.
  • enhancements include means for tracking noise levels in order for the threshold to be updated in real time, the addition of separate wind detector schemes, improved sensitivity methods allowing the threshold to be set with greater precision to operate in lower SNR conditions, adding hangover methods to prevent the false indication that voicing has ended when at the end of an utterance it has simply decayed below the threshold, and creating lockout periods that wait for a time longer than any expected naturally occurring voicing period after which the threshold is allowed to adjust more rapidly in order to attempt to accommodate bursts or steps in background noise level.
  • using such enhancements still produces limited operation and still results in the false detection of noise-only signal conditions.
  • a method for generating an indication of noise activity in a signal includes: a) calculating average energy of the signal in a critical bandwidth; b) determining a frequency-dependent threshold function; c) generating a dynamic modification of the frequency-dependent threshold function using the average energy; d) identifying frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and determining a first average energy value representing an average energy of the identified frequency components with energy above the threshold values; e) identifying frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and determining a second average energy value representing an average energy of the identified frequency components with energy below the threshold values; f) applying an offset value to at least one of the first and second average energy values; g) comparing, after application of said offset value, the resultant first and second average energy values with one another; and h) indicating the presence of noise activity if, as a result of said comparison, it is determined that, the resultant first average energy
  • a noise activity detector for generating an indication of noise activity in a signal includes: a) a first circuit configured to calculate the average energy in a critical bandwidth; b) a second circuit configured to determine a frequency-dependent threshold function; c) a third circuit configured to generate a dynamic modification of the frequency- dependent threshold function using the average energy; d) a fourth circuit configured to identify frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and to determine a first average energy value representing an average energy of the identified frequency components with energy above the threshold; e) a fifth circuit configured to identify frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and to determine a second average energy value representing an average energy of the identified frequency components with energy below the threshold; f) a sixth circuit configured to apply an offset value to at least one of the first and second average energy values; g) a seventh circuit configured to compare, after application of said offset value, the resultant first and second average energy values with one
  • a noise activity detector for generating an indication of noise activity in a signal includes: a) means for calculating average energy of the signal in a critical bandwidth; b) means for determining a frequency-dependent threshold function; c) means for generating a dynamic modification of the frequency-dependent threshold function using the average energy; d) means for identifying frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and determining a first average energy value representing an average energy of the identified frequency components with energy above the threshold values; e) means for identifying frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and determining a second average energy value representing an average energy of the identified frequency components with energy below the threshold values; f) means for applying an offset value to at least one of the first and second average energy values; g) means for comparing, after application of said offset value, the resultant first and second average energy values with one another; and h) means for indicating the presence of noise activity if,
  • a program storage device readable by a machine, embodying a program of instructions executable by the machine to perform a method for generating an indication of noise activity in a signal, the method includes: a) calculating average energy of the signal in a critical bandwidth; b) determining a frequency-dependent threshold function; c) generating a dynamic modification of the frequency-dependent threshold function using the average energy; d) identifying frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and determining a first average energy value representing an average energy of the identified frequency components with energy above the threshold values; e) identifying frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and determining a second average energy value representing an average energy of the identified frequency components with energy below the threshold values; f) applying an offset value to at least one of the first and second average energy values; g) comparing, after application of said offset value, the resultant first and second average energy values with one another; and h
  • FIGS. 1-7 are plots of measured data corresponding to different sound conditions, and each include a long-dashed line modeling the curve that represents the noise power, and a short-dashed line representing average power.
  • FIG. 8 is a block diagram of a typical communication system front end showing the context within which a noise activity detector (NAD) 20 may be used.
  • NAD noise activity detector
  • FIG. 9 is a flow diagram of various steps or tasks that may be performed by NAD 20.
  • FIG. 10 is a block diagram of circuits that implement the tasks set forth in the flow diagram of FIG. 9.
  • FIG. 11 is a plot illustrating the performance of device using NAD 20.
  • Example embodiments are described herein in the context of a processor or individual circuits, or a flow diagram of a process that is performed. Those of ordinary skill in the art will realize that the following description is illustrative only and is not intended to be in any way limiting. Other embodiments will readily suggest themselves to such skilled persons having the benefit of this disclosure. Reference will now be made in detail to implementations of the example embodiments as illustrated in the accompanying drawings. The same reference indicators will be used to the extent possible throughout the drawings and the following description to refer to the same or like items.
  • the components, process steps, and/or data structures described herein may be implemented using various types of operating systems, computing platforms, computer programs, and/or general purpose machines.
  • devices of a less general purpose nature such as hardwired devices, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or the like, may also be used without departing from the scope and spirit of the inventive concepts disclosed herein.
  • a method comprising a series of process steps is implemented by a computer or a machine and those process steps can be stored as a series of instructions readable by the machine, they may be stored on a tangible medium such as a computer memory device (e.g., ROM (Read Only Memory), PROM (Programmable Read Only Memory), EEPROM (Electrically Eraseable Programmable Read Only Memory), FLASH Memory, Jump Drive, and the like), magnetic storage medium (e.g., tape, magnetic disk drive, and the like), optical storage medium (e.g., CD-ROM, DVD-ROM, paper card, paper tape and the like) and other types of program memory.
  • ROM Read Only Memory
  • PROM Programmable Read Only Memory
  • EEPROM Electrically Eraseable Programmable Read Only Memory
  • FLASH Memory Jump Drive
  • magnetic storage medium e.g., tape, magnetic disk drive, and the like
  • optical storage medium e.g., CD-ROM, DVD-ROM, paper card, paper tape and the like
  • the noise detector also referred to as a noise activity detector (NAD), as disclosed herein is based upon the unique characteristics of noise as differentiated from the characteristics of other signals, in particular the characteristics of desired signals. Generally, it is applicable to the detection of periods when a signal is only noise, and is especially useful therefore in systems, such as noise reduction systems, where knowledge of noise-only periods is needed for their function.
  • the arrangement disclosed herein is directed at reliable detection of periods with only acoustic noise in a mixed microphone input signal which may contain speech, wind and acoustic background noise.
  • An alternate use is as a voice activity detector. More particularly, it is directed to use in voice grade communication systems and devices such as cellular telephones, Bluetooth® wireless headsets, voice command and control and automatic speech recognition, among others.
  • FIG. 1 is a plot of measured data for ambient background noise generated by a multitude of human voices in a crowded restaurant, plotted as the measured signal power in decibels (dB) verses frequency in Hertz (Hz).
  • dB decibels
  • Hz Hertz
  • the measured noise power decreases with increasing frequency at a rate of approximately 6 dB per octave.
  • the average power level is determined over the frequency range from about 250 Hz to about 2,500 Hz. The average power level of the example measured data of FIG.
  • model line 1 is about -50 db, and is represented by the short-dashed line in the drawing.
  • a long-dashed line is constructed to model the curve that represents the actual noise power.
  • the model line is selected to be a straight line with a slope of -6dB per octave.
  • line is not limited to a straight line, and the illustrated slope of -6 db per octave is not by way of limitation as other slopes, positive and negative, are also contemplated.
  • the model curve (long-dashed line) crosses the average noise level line (short-dashed line) at what will be termed the effective frequency of slightly above 700 Hz.
  • this effective frequency is explained in more detail below, as will be the manner in which the model curve is selected and constructed. Assuming that the model curve (long-dashed line) has been properly determined so as to relatively accurately correspond to the typical noise power frequency characteristic shape, the average power for the model curve over the selected bandwidth of about 250 Hz to about 2,500 Hz is made to be equal to the actual average noise power in the measured data by raising or lowering the model curve until the two power averages are the same. This is accomplished mathematically by solving for the magnitude of the model which makes the average model power match the actual average measured power. The effective frequency at which the model and actual average power lines cross (i.e. are equal) then can be determined.
  • the model curve passes through the average power line such that it creates equal areas between it and the average power line, above and below the effective frequency crossing point when plotted on a magnitude squared vs. frequency plot (not shown). It can be seen that for this data the -6 dB sloped model provides a close approximation of the noise data characteristic when they cross at approximately 700 Hz. Thus, 700 Hz is determined to be the effective frequency for this data.
  • FIG. 2 is a plot of traffic noise measured adjacent to a street with heavy traffic. As in
  • the vertical axis is noise power in dB
  • the horizontal axis is frequency in Hz
  • the short-dashed line represents the average noise power over the 250 Hz to 2,500 Hz frequency band
  • the long-dashed line represents the model constructed to be a straight line with a slope of -6 dB per octave.
  • the model line (long dash) intersects the average power line (short dash) at very nearly the same effective frequency as for the restaurant noise of FIG. 1.
  • FIG. 3 shows a pair of plots of noise measured inside a car cabin, with the lower plot of measurements taken while the car was moving slowly with closed windows and no other noise source, and the higher plot of measurements taken with the car driven at 70 miles per hour, radio on and A/C fan on.
  • the short-dashed lines and the long-dashed lines again represent the average power of the noise data, and the -6 dB sloped model line giving the corresponding average model power "curve".
  • FIGS. 4 and 5 are plots of low and high wind velocity “noise,” respectively.
  • Wind “noise” is different from other sounds in that it is the result of air turbulence at the individual microphone ⁇ ort(s), and only exists due to the presence of the microphone. It is noise that is induced by the wind at the microphone port(s) rather than being acoustic noise inherent in the wind and sensed by the microphone. Such wind-induced noise nevertheless results in an electrical microphone output signal commonly referred to as "wind noise.”
  • FIG. 4 shows data collected when the wind speed was low and subsequently did not saturate the microphones.
  • This noise signal is characterized by relatively sustained noise bursts exhibiting both high stationarity and steeply-sloped power frequency response.
  • FIG. 5 shows data collected in high wind speed conditions, in which the wind saturated the microphones and was extremely "bursty.” In this case, the noise signal is characterized by short, intense non-stationary bursts of signal. In intermediate wind conditions, the signal alternates between these two characteristics. From FIGS. 4 and 5, it can be seen that wind-induced noise has characteristics that differ substantially from most common types of acoustic noise, including spectral differences and dynamic pattern differences. Further, this noise is statistically independent for each sensor signal in multi-sensor array systems.
  • Noise suppression processes must often ignore this wind-induced noise signal, handling it separately, or differently, from the way they respond to noise of an acoustic origin.
  • a short-dashed horizontal line is drawn at the average power level, and the long-dashed noise model line with a slope of -6 dB is shown, where the model average power is matched to the measured signal power at the shown intersection frequency.
  • acoustic noise signals exhibit little deviation from the model
  • voice discussed below
  • wind noise both exhibit significant deviation from the model.
  • three types of sound are identified: acoustic noise, wind noise, and voice.
  • Acoustic noise is generally a catch-all for all non-wind noise and non-voice sounds. It can be seen from the plots, that while the noise data in FIGS. 1-3 clusters closely around the model (long dash), the plots for the wind noise in FIGS. 4 and 5 do not. This difference can be used to distinguish wind noise from the other noises.
  • FIGS. 4 and 5 represent the sort of variation in wind-induced noise that a specific microphone is likely to produce over a range of wind speeds.
  • FIGS. 6 and 7 are plots of voiced speech in quiet room conditions, and voiced speech in intense noise, respectively.
  • the noise used for the plot in FIG. 7 includes commercially recorded music mixed with voice babble from multiple directions in a diffuse-source simulation, producing approximately 85 dB SPL of noise at the microphone.
  • the SNR of this signal was -3 dB in these conditions.
  • This simulation was intended to approximate various conditions of crowds, such as airports, theater intermissions, retail stores, etc.
  • the average signal power levels are represented by short-dashed horizontal lines, and the -6 dB straight line model is shown as the long-dashed lines.
  • the noise activity detector (NAD) disclosed herein uses the characteristics described above to identify a signal and indicate when noise-only periods of the signal are present. There are myriad applications for such an operation —for example, it can be used to provide a control signal that gates other functions such as updating a noise template in a spectral subtraction process, updating an automatic microphone matching table, blocking an automatic gain circuit from raising the gain when only noise is present, and so on.
  • the noise activity detector disclosed herein is described in the context of audio signals in a communication system. However, the process disclosed herein is not limited to single-channel, single-band applications, but is also applicable to multi-channel applications, as well as to multi-band applications.
  • noise activity detector can also be used with multi-channel applications to provide an indication for each channel when its signal was only noise.
  • each input signal may be similar to the signals that the other sensors receive, there are many situations where that is not the case, such as for wind-induced noise, and noise generated mechanically at a port such as by physical contact with the operator's skin or with other objects.
  • FIG. 8 is a block diagram of a typical communication system front end showing the context within which a noise activity detector (NAD) 20 may be used.
  • the noise activity detector operates as a multi-band process so that the time domain signal is broken into multiple frequency bands.
  • the multi-band conversion can be accomplished by use of a bank of bandpass filters (not shown) or by the application of Fourier transform processes or by any other process for such conversion.
  • Conversion to the frequency domain is a well-known process that may use for example Short Time Fourier Transform (STFT) techniques or other well known frequency domain conversion methods. Since the systems in which NAD 20 is used are likely to employ STFT methods for other processes, such as spectral subtraction, microphone sensitivity matching and/or automatic gain control processes, the conversion step is likely to already be available, and NAD 20 would require little additional processing.
  • the example embodiment employs the Fast Fourier Transform, and the process of NAD 20 is carried out in the frequency domain. Therefore, per the example system, the input signal can be converted to the frequency domain before the process disclosed herein is applied.
  • the analog input signal for example from a microphone (not shown) is framed at framing block 10.
  • a windowing block 12 is used to create a window, which is applied by windowing application block 13 to the framed data.
  • the framed, windowed data is converted to the frequency domain by Fourier transform block 14 (for example Fast Fourier Transform (FFT) or other appropriate transform process as explained above), and the frequency domain result can then be divided into one or, optionally, more than one sub-bands by sub-band selection block 15.
  • Fourier transform block 14 for example Fast Fourier Transform (FFT) or other appropriate transform process as explained above
  • a communications audio signal with an 8 ksps (kilo- samples per second) sample rate is separated into 512-sample frames, windowed with a Hanning window, converted to the frequency domain using an FFT (Fast Fourier Transform), and a single sub-band consisting of the frequency bins between 250 Hz and 2,500 Hz is selected.
  • FFT Fast Fourier Transform
  • Block 16 represents the determination of the noise model and frequency process performed by the practitioner during the design of the system in which the noise detector is to be used, and is a function of the particular application.
  • Typical noise as sensed by the sensor system of the intended application, is analyzed for a curve fit using well known curve-fitting methods.
  • the shape of the fitted mathematical curve is the noise model, and for example, in FIGS. 1-3, the model is a straight line shown by the declined long-dashed line.
  • An effective frequency, « is also determined during the design process by determining the frequency at which the modeled power equals the value of the average power.
  • Block 17 represents the determination of the critical bandwidth.
  • the critical bandwidth is generally a contiguous range of frequencies that includes the range in which the data fits the model.
  • data for the system that was measured fits the straight line model over a frequency range from about 200 Hz to somewhere between 2,500 Hz and 3,000 Hz.
  • a frequency range of 250Hz to 2,500 Hz can be selected.
  • a small adjustment to the selected frequency range in order to provide a convenient number of FFT bins will not significantly impact the performance of the noise detector.
  • the bandwidth utilized for the noise activity detector comprised 128 FFT bins, which, as an even power of two, is a convenient divisor for calculating the average power in the 128 bins.
  • the critical bandwidth, noise power model and effective frequency determination processes of blocks 16 and 17 may use the following steps:
  • FIG. 9 is generally a flow diagram depicting the operation of noise activity detector (NAD) 20.
  • the input signal is sub-band signal 22, which is the output signal provided by sub-band selection process 15 of FIG. 8, and is used to calculate average energy in the critical bandwidth step 30.
  • Noise model determination is performed at step 26, together with a determination of the effective frequency at step 28. Steps 26 and 28 correspond to block 16 of FIG. 8.
  • noise model determination can be made based on visual observation, or determined more rigorously with known curve fitting algorithms. As such, it can be determined how well any particular power curve model will represent the measured signal power data. In the case of the data of FIGS.
  • step 17 determination of critical bandwidth (step 17) may use this bandwidth for a single channel system or may use multiple critical sub-bands for a multiple channel system.
  • a different microphone design would produce different results, and could require a curved line model instead of a straight line model for the noise signal. Rigorous methods of curve fitting can be used to provide a precise model, but doing so is generally not required to achieve the desired result, and the more complex the model, the more processing power will be required in operating the noise detector.
  • step 28 The determination of effective frequency E (step 28) is also accomplished as mentioned above and described here more fully.
  • the power model is mathematically integrated over the critical bandwidth to determine the average model power level.
  • the frequency at which this level intersects the noise power model curve is the effective frequency E .
  • p NM ( KJ ⁇ ⁇ is the noise power model
  • s N ( u f) J is the noise power model shape function
  • ⁇ > f is frequency
  • a is a magnitude scale factor to be determined.
  • the shape model is integrated over the critical bandwidth and then divided by the critical bandwidth, BW c , to produce the average noise power model level.
  • This average noise model power level will equal the value of the noise power model at
  • noise activity detector (NAD) 20 shows a flow diagram of various steps or tasks that are performed.
  • processors may be used to perform the tasks, each processor having one or more modules that may be dedicated to one or more tasks.
  • step 30 in FIG. 9 average energy in is calculated, and the power across the entire critical bandwidth for the selected sub-band is summed and divided by the critical bandwidth, BW c , to generate a value for the signal's average power level of the current frame.
  • Circuit 102 of FIG. 10 is provided for this task.
  • This average power level value is used at step 32 of FIG. 9 to define a threshold function ⁇ ⁇ that is unique to the current frame of data.
  • Circuit 104 of FIG. 10 is provided for this purpose.
  • the define threshold function, ⁇ ' , step 32 (and circuit 104) determines a dynamic frequency-dependent threshold using the noise power model, P NM ( KJ f) ⁇ , determined in step 26
  • E* and the effective frequency, E determined in step 28, by calculating the average power in the current frame of data and setting the level, a , of the model so that the average power level p • for the current frame is equal to the value of the model at the effective frequency, E . That is,
  • the threshold function for the ⁇ 1 frame of data is determined by circuit 104 and in step 32 as
  • this threshold is not a single level and is not dependent upon prior frames of data, both of which are common in other such detectors. Because the threshold is immediate — that is, calculated for and used by only the current frame — the NAD 20 is able to follow rapid changes in background noise. Thus a dynamic modification of the frequency- dependent threshold function using the average energy is used.
  • the threshold function, Th ' ( KJ f) > is used to divide the spectral data of the current frame into two groups, those FFT frequency bins whose power data magnitudes are greater than the threshold, and those whose power data magnitudes are less than the threshold.
  • step 4-7 depicting wind noise and voice characteristics, all represent data that, when applied to the example embodiment of noise activity detector 20, generate threshold functions as shown by the long-dashed lines of each respective plot.
  • Every FFT bin holds a complex value having a magnitude which corresponds to the average magnitude of the signal content in the frequency bandwidth of the FFT bin over the time period of one frame.
  • the magnitude in each FFT bin is squared and the squared values are averaged, thus providing the average energy per bin over the time period of the frame.
  • step 32 uses this value to determine the value of ' and therefore the threshold function ' ⁇ ' for the current frame, where i is the frame index.
  • Calculate Average Energy Below ⁇ ⁇ step 34 is performed by circuit 106, which sums the squares of the magnitudes corresponding to bins with magnitudes less than the threshold, and divides that sum by the number of bins with magnitudes less than the threshold, resulting in an average energy per bin for the bins with magnitudes less than the threshold.
  • Step 34 provides the signal BEL0W while step 36 provides the signal AB0VE .
  • Log circuit 110 and filtering circuit 112 of FIG. 10 provide these functions.
  • the smoothing is not required for proper operation of the noise detector of this application, such filtering can be used to create longer hangover times, if desired.
  • additional hangover is often superfluous.
  • steps 38 and 40 in an exemplary embodiment is performed with an exponential filter of the following form:
  • E Y is either E BE LOW or E ABO V E
  • ⁇ A- is a time constant that determines the amount of smoothing where ⁇ X ⁇ is between 0 and 1, and where a typical value may be 0.1.
  • the subscript x denotes that n ⁇ may have different values for the ABOVE and BELOW cases.
  • E ⁇ is the smoothed output signal, where Y can be ABV or BLW, designating which signal is being smoothed.
  • the approach described above provides two signals that are similar in magnitude for a typical noise signal input to noise detector 20 so detection of the noise only portion of the input signal is simplified if one of these signals is offset from the other.
  • an offset is determined by the practitioner in Determine Offset step 42, where the offset is slightly larger than the random variation in the two logarithmic signals when a noise signal is input to noise detector 20.
  • This amount of offset then prevents false negative triggers of the noise detector, i.e. false indications that other-than-noise is present when indeed the input signal is only noise.
  • Such false triggers do not create error in operation of the associated noise reduction or other process with which the noise detector is used, but it does slow the operation of some.
  • the offset therefore is meant to minimize this effect.
  • the offset which may be a negative number, is added to the output of log & filter step 40 in add offset step 44.
  • the add offset step could be after step 38 and the offset applied to the signal AV ⁇ L0 . In this case, in order to achieve the same result, the offset value determined in step 42 would have the opposite sign.
  • Decision step 46 causes Set Noise Indicator step 48 (circuit 116) to set the NAD output to an "on" state indicating the presence of noise only if the output from step 38, E AV-W ; i s greater than the output from step 44, E AV-HI _ when E ⁇ V-LO ⁇ S i ess than AV ⁇ H! , decision test step 46 causes reset noise indicator step 50 to reset NAD output to an "off state indicating the presence of other-than-noise in the input signal.
  • An alternate embodiment uses an offset value dependent upon whether the NAD output is currently on or off, and in this way hysteresis can be incorporated into the NAD switching for applications where it is desirable to have a more stable NAD output.
  • FIG. 11 is a plot of the non- smoothed E AB0VE + O J ff J set signal, the E BELOW signal and the noise activity detector output signal.
  • the horizontal scale is shown as time in frames, and the vertical scale is in dB for the E A B O VE + Offset and E BELOW signals J n this p lotj the NAD output S ig na i ⁇ s ⁇ g n wnen no i se _ only conditions are detected and low when non-noise is found.
  • the scale for the NAD output is arbitrary, since it represents an on/off binary flag.
  • Sections (1) and (5) are periods of time with only silence and when noise detector 20 had no signal input. Li this case, whichever state the noise detector indicates is acceptable since the input signal is neither noise nor non-noise, and a noise reduction system would have no input noise to reduce.
  • Section (2) is a period during which the signal input to the noise detector 20 was clean voice in quiet ambient conditions. A short period at the end of this second section has only normal room ambient sound with no voice. The noise detector properly handled this relatively easy condition, indicating the presence of the voice as non-noise and yet detecting the absence of voice during noise only periods.
  • the system used for the plot of FIG. 11 included smoothing filters to provide additional hangover by design so there is a short time after the cessation of voice bursts when the detector's output does not change indication.
  • Section (3) consists of very loud (85 dB SPL) input noise only sound that was a mixture of music, single loud voice and voice babble from multiple directions.
  • the noise detector indicates mostly noise only, but also creates non-noise indications as a result of the single loud background voice even though the SNR for the background voice is less than -1O dB.
  • section (4) nearby voice speech was added to the noise from Section (3), with the added voice SNR being approximately -3 dB.
  • the NAD output shows that the noise-only periods are correctly indicated while during voicing, the NAD correctly indicates non-noise. Correct operation at such low input SNR levels shows the capability of this new noise/voice detector.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

A noise activity detector includes a circuit for calculating average energy in a critical bandwidth, a circuit for determining a threshold function, a circuit for generating a dynamic modification of the threshold function, a circuit for identifying frequency components of the signal having energy that is above threshold values determined by the threshold function, and to determine a first average energy value representing an average energy of the identified frequency components with energy above the threshold, a circuit for identifying frequency components of the signal having energy that is below threshold values determined by the threshold function, and to determine a second average energy value representing an average energy of the identified frequency components with energy below the threshold, a circuit for offsetting at least one of the first and second average energy values, a circuit for comparing the resultant average energy values with one another, and a circuit for indicating the presence of noise activity if the first average energy value is below the second average energy value.

Description

SYSTEM AND METHOD FOR NOISE ACTIVITY DETECTION
TECHNICAL FIELD
The present disclosure relates generally to noise activity detectors for use in for example noise reduction systems.
BACKGROUND
In many signal processing applications, such as echo cancellation, speech recognition, speech encoding, voice-over-IP, and in particular noise reduction systems, it is important to gather real-time information and statistics about the noise in the signal. This is most often achieved by detecting when there is a useful amount of the desired signal and treating that portion of the signal as "non-noise." At other times, the signal is assumed to be only noise and the information and statistics that are desired are gathered during those times.
In single channel systems, the noise and desired signal are mixed, and the incoming mixed noisy signal is considered to be a linear sum of the desired signal and unwanted noise. By detecting when there is the presence of desired signal in the mixed signal, the noise information is not updated during this part of the signal. Instead, updating of the noise characteristics at other times allows noise reduction, for example, to be executed with appropriate processing. In voice communication systems, the need for determining the presence of noise-only periods has given rise to the proliferation of numerous voice determination methods, often called voice detection or voice activity detection (VAD) methods, since the voice portion of the mixed signal is the desired portion.
Such methods usually rely upon the fact that talkers must hear at least a portion of their own voice in order to form their words properly. In order to reliably hear themselves speak, talkers need to keep their own voice about 10 dB above the ambient or background noise level. Thus, in the presence of loud background noise, talkers naturally elevate their voice level to keep it slightly above the competing background noise level.
Voice activity detection methods, whether implemented in the time domain or in the frequency domain, utilize this fact. Many such systems are based upon means that detect when the total energy of the incoming noisy signal is above a threshold, and indicate that there is the presence of voice when this condition is met. Of course, the threshold must be adjusted to be always above the level of the background noise portion of the signal but below the level of the combined voice-plus noise level. Many complex methods have been devised to create such real-time dynamic threshold adjustment for this purpose.
However, such "reverse" methods - that is the detection of the desired signal so that the noise periods can be implied, rather than the direct detection of the noise portions themselves, have drawbacks. For example, in noise above approximately 90 dB SPL (Sound
Pressure Level) it becomes nearly impossible for humans to further elevate the loudness of their voice and the SNR (signal-to-noise ratio) of the input signal drops, often to below 0 dB
(1:1).
Conventional voice detection systems operate poorly, or not at all, when the SNR becomes low — for example below 10 dB. As long as the voice signal power is significantly above the noise signal power, such systems are able to detect the presence of voice. But in increasingly noisy situations, the voice detection accuracy decreases until such systems fail to operate at all.
Another significant problem is the detection of wind noise, the noise created when air flows over microphones used in voice detection systems. With the proliferation of mobile communication devices, wind noise is becoming of critical importance. Such noise can exhibit highly variable properties, and therefore the noise of wind is often misclassified by such systems. When this happens, the noise reduction of VAD-based noise reduction systems can be compromised because the noise template is incorrectly updated. For wind noise to be correctly classified, additional methods or processes must be implemented to reliably detect it, at the cost of more complexity and expense.
Yet another difficulty with conventional voice detection schemes is that voice signals do not abruptly terminate but slowly decay after each utterance. Voice detection based upon the voice power being above a noise power threshold will falsely indicate the end of voicing when the voice signal's decaying tail drops below the threshold level, even though voice is still present. Therefore these systems often add a so called "hangover" timer to delay the onset of the noise indication.
Classical voice detection methods assume that the background noise is stationary or only slowly varying. In non-stationary noise conditions, classical voice detection schemes are unreliable, since rapid changes in noise level, especially upward jumps in noise, can not be distinguished from the onset of a voice burst and therefore give false indications of voice presence. Such voice detectors also react to the presence of nearby voices other than that of the user, even though background voices are actually "noise" in systems where the user's own voice is the only desired signal.
Further, virtually all voice detection methods rely upon setting or updating one or more thresholds based upon the prior history of the signal, rather than on instantaneous current conditions. By relying upon prior information, such thresholds can not update quickly, and the voice detection output is slow to react to rapid changes in background noise, creating errors until the system can eventually adjust.
The problems with voice detection methods historically have been addressed by adding enhancements to the basic principle of signal power threshold detection. Such enhancements include means for tracking noise levels in order for the threshold to be updated in real time, the addition of separate wind detector schemes, improved sensitivity methods allowing the threshold to be set with greater precision to operate in lower SNR conditions, adding hangover methods to prevent the false indication that voicing has ended when at the end of an utterance it has simply decayed below the threshold, and creating lockout periods that wait for a time longer than any expected naturally occurring voicing period after which the threshold is allowed to adjust more rapidly in order to attempt to accommodate bursts or steps in background noise level. However, using such enhancements still produces limited operation and still results in the false detection of noise-only signal conditions. Yet other voice detection methods have been created that rely upon the availability of more than one signal, such as from an array of sensors or microphones. However, these systems have the great disadvantage that they only work when multiple signals are available, or where multiple sensors can be accommodated. Also, they increase the complexity, cost, size and power consumption of such systems. Other solutions that are known rely upon complex signal processing computations such as autocorrelation, cross correlation, variance, Linear Predictive Coding (LPC) coefficients, various statistical noise predictors (e.g. Gaussian, Laplacian and Gamma distributions), stationarity measures, and so on. In general these solutions do not significantly improve performance, and are still aimed at the detection of voicing periods rather than detection of the noise-only periods themselves. OVERVIEW
As described herein, a method for generating an indication of noise activity in a signal includes: a) calculating average energy of the signal in a critical bandwidth; b) determining a frequency-dependent threshold function; c) generating a dynamic modification of the frequency-dependent threshold function using the average energy; d) identifying frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and determining a first average energy value representing an average energy of the identified frequency components with energy above the threshold values; e) identifying frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and determining a second average energy value representing an average energy of the identified frequency components with energy below the threshold values; f) applying an offset value to at least one of the first and second average energy values; g) comparing, after application of said offset value, the resultant first and second average energy values with one another; and h) indicating the presence of noise activity if, as a result of said comparison, it is determined that, the resultant first average energy value is below the resultant second average energy value.
Also as described herein, a noise activity detector for generating an indication of noise activity in a signal includes: a) a first circuit configured to calculate the average energy in a critical bandwidth; b) a second circuit configured to determine a frequency-dependent threshold function; c) a third circuit configured to generate a dynamic modification of the frequency- dependent threshold function using the average energy; d) a fourth circuit configured to identify frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and to determine a first average energy value representing an average energy of the identified frequency components with energy above the threshold; e) a fifth circuit configured to identify frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and to determine a second average energy value representing an average energy of the identified frequency components with energy below the threshold; f) a sixth circuit configured to apply an offset value to at least one of the first and second average energy values; g) a seventh circuit configured to compare, after application of said offset value, the resultant first and second average energy values with one another; and h) an eight circuit configured to indicate the presence of noise activity if, as a result of said comparison, it is determined that the resultant first average energy value is below the resultant second average energy value.
Also as described herein, a noise activity detector for generating an indication of noise activity in a signal includes: a) means for calculating average energy of the signal in a critical bandwidth; b) means for determining a frequency-dependent threshold function; c) means for generating a dynamic modification of the frequency-dependent threshold function using the average energy; d) means for identifying frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and determining a first average energy value representing an average energy of the identified frequency components with energy above the threshold values; e) means for identifying frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and determining a second average energy value representing an average energy of the identified frequency components with energy below the threshold values; f) means for applying an offset value to at least one of the first and second average energy values; g) means for comparing, after application of said offset value, the resultant first and second average energy values with one another; and h) means for indicating the presence of noise activity if, as a result of said comparison, it is determined that, the resultant first average energy value is below the resultant second average energy value. Also as described herein, a program storage device readable by a machine, embodying a program of instructions executable by the machine to perform a method for generating an indication of noise activity in a signal, the method includes: a) calculating average energy of the signal in a critical bandwidth; b) determining a frequency-dependent threshold function; c) generating a dynamic modification of the frequency-dependent threshold function using the average energy; d) identifying frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and determining a first average energy value representing an average energy of the identified frequency components with energy above the threshold values; e) identifying frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and determining a second average energy value representing an average energy of the identified frequency components with energy below the threshold values; f) applying an offset value to at least one of the first and second average energy values; g) comparing, after application of said offset value, the resultant first and second average energy values with one another; and h) indicating the presence of noise activity if, as a result of said comparison, it is determined that, the resultant first average energy value is below the resultant second average energy value.
BRIEF DESCRIPTION QF THE DRAWINGS The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate one or more examples of embodiments and, together with the description of example embodiments, serve to explain the principles and implementations of the embodiments. In the drawings: FIGS. 1-7 are plots of measured data corresponding to different sound conditions, and each include a long-dashed line modeling the curve that represents the noise power, and a short-dashed line representing average power. FIG. 8 is a block diagram of a typical communication system front end showing the context within which a noise activity detector (NAD) 20 may be used.
FIG. 9 is a flow diagram of various steps or tasks that may be performed by NAD 20. FIG. 10 is a block diagram of circuits that implement the tasks set forth in the flow diagram of FIG. 9.
FIG. 11 is a plot illustrating the performance of device using NAD 20.
DESCRIPTION QF EXAMPLE EMBODIMENTS
Example embodiments are described herein in the context of a processor or individual circuits, or a flow diagram of a process that is performed. Those of ordinary skill in the art will realize that the following description is illustrative only and is not intended to be in any way limiting. Other embodiments will readily suggest themselves to such skilled persons having the benefit of this disclosure. Reference will now be made in detail to implementations of the example embodiments as illustrated in the accompanying drawings. The same reference indicators will be used to the extent possible throughout the drawings and the following description to refer to the same or like items.
In the interest of clarity, not all of the routine features of the implementations described herein are shown and described. It will, of course, be appreciated that in the development of any such actual implementation, numerous implementation-specific decisions must be made in order to achieve the developer's specific goals, such as compliance with application- and business-related constraints, and that these specific goals will vary from one implementation to another and from one developer to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking of engineering for those of ordinary skill in the art having the benefit of this disclosure.
In accordance with this disclosure, the components, process steps, and/or data structures described herein may be implemented using various types of operating systems, computing platforms, computer programs, and/or general purpose machines. In addition, those of ordinary skill in the art will recognize that devices of a less general purpose nature, such as hardwired devices, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or the like, may also be used without departing from the scope and spirit of the inventive concepts disclosed herein. Where a method comprising a series of process steps is implemented by a computer or a machine and those process steps can be stored as a series of instructions readable by the machine, they may be stored on a tangible medium such as a computer memory device (e.g., ROM (Read Only Memory), PROM (Programmable Read Only Memory), EEPROM (Electrically Eraseable Programmable Read Only Memory), FLASH Memory, Jump Drive, and the like), magnetic storage medium (e.g., tape, magnetic disk drive, and the like), optical storage medium (e.g., CD-ROM, DVD-ROM, paper card, paper tape and the like) and other types of program memory.
The noise detector, also referred to as a noise activity detector (NAD), as disclosed herein is based upon the unique characteristics of noise as differentiated from the characteristics of other signals, in particular the characteristics of desired signals. Generally, it is applicable to the detection of periods when a signal is only noise, and is especially useful therefore in systems, such as noise reduction systems, where knowledge of noise-only periods is needed for their function. In particular, the arrangement disclosed herein is directed at reliable detection of periods with only acoustic noise in a mixed microphone input signal which may contain speech, wind and acoustic background noise. An alternate use is as a voice activity detector. More particularly, it is directed to use in voice grade communication systems and devices such as cellular telephones, Bluetooth® wireless headsets, voice command and control and automatic speech recognition, among others. For purposes of this discussion, three types of sound are identified: acoustic noise, wind noise, and voice. FIG. 1 is a plot of measured data for ambient background noise generated by a multitude of human voices in a crowded restaurant, plotted as the measured signal power in decibels (dB) verses frequency in Hertz (Hz). Considering the frequency band of interest corresponding to the human voice communications band of about 300 Hz to about 3,000 Hz, the measured noise power decreases with increasing frequency at a rate of approximately 6 dB per octave. For reasons of convenience detailed further below, the average power level is determined over the frequency range from about 250 Hz to about 2,500 Hz. The average power level of the example measured data of FIG. 1 is about -50 db, and is represented by the short-dashed line in the drawing. In addition, a long-dashed line is constructed to model the curve that represents the actual noise power. For this data and this particular example, the model line is selected to be a straight line with a slope of -6dB per octave. It will be appreciated that the term "line" is not limited to a straight line, and the illustrated slope of -6 db per octave is not by way of limitation as other slopes, positive and negative, are also contemplated. It is instructive to note that the model curve (long-dashed line) crosses the average noise level line (short-dashed line) at what will be termed the effective frequency of slightly above 700 Hz. The significance of this effective frequency is explained in more detail below, as will be the manner in which the model curve is selected and constructed. Assuming that the model curve (long-dashed line) has been properly determined so as to relatively accurately correspond to the typical noise power frequency characteristic shape, the average power for the model curve over the selected bandwidth of about 250 Hz to about 2,500 Hz is made to be equal to the actual average noise power in the measured data by raising or lowering the model curve until the two power averages are the same. This is accomplished mathematically by solving for the magnitude of the model which makes the average model power match the actual average measured power. The effective frequency at which the model and actual average power lines cross (i.e. are equal) then can be determined. In effect, the model curve passes through the average power line such that it creates equal areas between it and the average power line, above and below the effective frequency crossing point when plotted on a magnitude squared vs. frequency plot (not shown). It can be seen that for this data the -6 dB sloped model provides a close approximation of the noise data characteristic when they cross at approximately 700 Hz. Thus, 700 Hz is determined to be the effective frequency for this data.
It should be recognized that the shape of the measured data is dependent upon the characteristics of the specific signal pickup system. With other systems, a curved (non- straight) line may be a more appropriate model for the noise response of the system. For the data depicted in FIG. 1, the measurement system was calibrated for measurement of signals in a 200 Hz to 3,400 Hz range, and outside of that range the plot should not be considered to necessarily be an accurate representation of actual ambient noise. FIG. 2 is a plot of traffic noise measured adjacent to a street with heavy traffic. As in
FIG. 1 above, the vertical axis is noise power in dB, the horizontal axis is frequency in Hz, the short-dashed line represents the average noise power over the 250 Hz to 2,500 Hz frequency band, and the long-dashed line represents the model constructed to be a straight line with a slope of -6 dB per octave. The model line (long dash) intersects the average power line (short dash) at very nearly the same effective frequency as for the restaurant noise of FIG. 1.
Importantly, it will be noted traffic noise, while very different in origin, character and sound, has a spectral pattern quite similar to the restaurant noise, with the noise power decreasing with increasing frequency at approximately 6 dB per octave. FIG. 3 shows a pair of plots of noise measured inside a car cabin, with the lower plot of measurements taken while the car was moving slowly with closed windows and no other noise source, and the higher plot of measurements taken with the car driven at 70 miles per hour, radio on and A/C fan on. The short-dashed lines and the long-dashed lines again represent the average power of the noise data, and the -6 dB sloped model line giving the corresponding average model power "curve". Note that these model lines were made to intersect the average signal power level at the same effective frequency as determined in the FIG. 1 case. It can be seen from FIG. 3 that the spectral pattern of the noise in the car can still be described by the same -6 dB per octave model although not quite as closely as for the previous noise cases. The lines are nevertheless still quite reasonable models of the car-cabin noise.
FIGS. 4 and 5 are plots of low and high wind velocity "noise," respectively. Wind "noise" is different from other sounds in that it is the result of air turbulence at the individual microphone ρort(s), and only exists due to the presence of the microphone. It is noise that is induced by the wind at the microphone port(s) rather than being acoustic noise inherent in the wind and sensed by the microphone. Such wind-induced noise nevertheless results in an electrical microphone output signal commonly referred to as "wind noise."
FIG. 4 shows data collected when the wind speed was low and subsequently did not saturate the microphones. This noise signal is characterized by relatively sustained noise bursts exhibiting both high stationarity and steeply-sloped power frequency response. FIG. 5 shows data collected in high wind speed conditions, in which the wind saturated the microphones and was extremely "bursty." In this case, the noise signal is characterized by short, intense non-stationary bursts of signal. In intermediate wind conditions, the signal alternates between these two characteristics. From FIGS. 4 and 5, it can be seen that wind-induced noise has characteristics that differ substantially from most common types of acoustic noise, including spectral differences and dynamic pattern differences. Further, this noise is statistically independent for each sensor signal in multi-sensor array systems. Noise suppression processes must often ignore this wind-induced noise signal, handling it separately, or differently, from the way they respond to noise of an acoustic origin. Again in FIGS. 4 and 5, a short-dashed horizontal line is drawn at the average power level, and the long-dashed noise model line with a slope of -6 dB is shown, where the model average power is matched to the measured signal power at the shown intersection frequency. By analyzing numerous noise signals measured with the system in which the system disclosed herein may be used, it was determined that when the model curve (a straight -6 dB/oct. line in this case) was set to equal the average measured noise signal power at 750 Hz, the model did indeed create a good approximation to all acoustic noise signals. However, whereas acoustic noise signals exhibit little deviation from the model, voice (discussed below) and wind noise both exhibit significant deviation from the model. As explained above, for purposes of this discussion, three types of sound are identified: acoustic noise, wind noise, and voice. Acoustic noise is generally a catch-all for all non-wind noise and non-voice sounds. It can be seen from the plots, that while the noise data in FIGS. 1-3 clusters closely around the model (long dash), the plots for the wind noise in FIGS. 4 and 5 do not. This difference can be used to distinguish wind noise from the other noises.
The distinction between low and high wind-induced noise is a relative concept; it can be seen that the plots are significantly different. Since wind "noise" is generated at the port(s) of the microphone, the transition wind speed between the results of FIGS. 4 and 5 will be somewhat dependent upon the physical characteristics of the microphone. However, the general relationship is applicable; that is, low wind speeds produce a steep spectral curve, whereas high wind speeds (relative to the physical configuration) produce significantly more high frequency signal, and produce a generally flat spectral response. It may be observed that the plots of FIGS. 4 and 5 are reasonably close at 200 Hz, but FIG. 5 indicates progressively more power with increasing frequency for the high wind speeds, showing substantially more power at 2,000 Hz. These curves might correspond to wind speeds of 21A mph and 5 mph for one microphone's physical configuration, and correspond to wind speeds of 5 mph and 10 mph respectively for a microphone system with a different port design and/or built-in wind screening. However, FIGS. 4 and 5 represent the sort of variation in wind-induced noise that a specific microphone is likely to produce over a range of wind speeds.
FIGS. 6 and 7 are plots of voiced speech in quiet room conditions, and voiced speech in intense noise, respectively. The noise used for the plot in FIG. 7 includes commercially recorded music mixed with voice babble from multiple directions in a diffuse-source simulation, producing approximately 85 dB SPL of noise at the microphone. The SNR of this signal was -3 dB in these conditions. This simulation was intended to approximate various conditions of crowds, such as airports, theater intermissions, retail stores, etc. As is the case in the preceding drawings, the average signal power levels (including all voice and/or noise) are represented by short-dashed horizontal lines, and the -6 dB straight line model is shown as the long-dashed lines. The graphs of FIGS. 6 and 7 show that the characteristic spectral pattern of voice, even voice with large amounts of noise included, produce substantial voice formant spectral power peaks, and therefore much larger variation in power with frequency than any of the noise conditions. This difference in spectral pattern readily distinguishes voice from noise, even in sub-zero SNR mixed input signals.
The noise activity detector (NAD) disclosed herein uses the characteristics described above to identify a signal and indicate when noise-only periods of the signal are present. There are myriad applications for such an operation — for example, it can be used to provide a control signal that gates other functions such as updating a noise template in a spectral subtraction process, updating an automatic microphone matching table, blocking an automatic gain circuit from raising the gain when only noise is present, and so on. The noise activity detector disclosed herein is described in the context of audio signals in a communication system. However, the process disclosed herein is not limited to single-channel, single-band applications, but is also applicable to multi-channel applications, as well as to multi-band applications. Since the process is performed in the frequency domain, selection of the frequency range over which it operates is simple, and additional implementations of the noise detector can be used for other frequency ranges. An example of such an application would be a multi-band spectral subtraction process in which it may be necessary to independently update the noise template for each band when there is only noise in the respective band, even though there may be voice- and/or wind-induced signal in other bands. The noise activity detector can also be used with multi-channel applications to provide an indication for each channel when its signal was only noise. Although for many multi-channel systems each input signal may be similar to the signals that the other sensors receive, there are many situations where that is not the case, such as for wind-induced noise, and noise generated mechanically at a port such as by physical contact with the operator's skin or with other objects.
As examples of possible applications, a control signal from a noise, applied to each channel of a multi-channel system, could be used for channel-specific spectral subtraction processes, and/or the signals from the noise detectors on the different channels could be combined to enable an automatic microphone matching process to compensate for variations in the sensitivities of multiple microphones. In the latter application, the channel-specific noise detectors will assure that the microphone matching does not match to noise present on a single channel. FIG. 8 is a block diagram of a typical communication system front end showing the context within which a noise activity detector (NAD) 20 may be used. The noise activity detector operates as a multi-band process so that the time domain signal is broken into multiple frequency bands. The multi-band conversion can be accomplished by use of a bank of bandpass filters (not shown) or by the application of Fourier transform processes or by any other process for such conversion. Conversion to the frequency domain is a well-known process that may use for example Short Time Fourier Transform (STFT) techniques or other well known frequency domain conversion methods. Since the systems in which NAD 20 is used are likely to employ STFT methods for other processes, such as spectral subtraction, microphone sensitivity matching and/or automatic gain control processes, the conversion step is likely to already be available, and NAD 20 would require little additional processing. The example embodiment employs the Fast Fourier Transform, and the process of NAD 20 is carried out in the frequency domain. Therefore, per the example system, the input signal can be converted to the frequency domain before the process disclosed herein is applied. With reference to FIG. 8, the analog input signal, for example from a microphone (not shown) is framed at framing block 10. A windowing block 12 is used to create a window, which is applied by windowing application block 13 to the framed data. The framed, windowed data is converted to the frequency domain by Fourier transform block 14 (for example Fast Fourier Transform (FFT) or other appropriate transform process as explained above), and the frequency domain result can then be divided into one or, optionally, more than one sub-bands by sub-band selection block 15.
In an example embodiment, a communications audio signal with an 8 ksps (kilo- samples per second) sample rate is separated into 512-sample frames, windowed with a Hanning window, converted to the frequency domain using an FFT (Fast Fourier Transform), and a single sub-band consisting of the frequency bins between 250 Hz and 2,500 Hz is selected.
The resulting sub-band bin values are provided as input to NAD 20, the output of which is provided for subsequent control of a desired process associated with the particular communications application. Block 16 represents the determination of the noise model and frequency process performed by the practitioner during the design of the system in which the noise detector is to be used, and is a function of the particular application. Typical noise, as sensed by the sensor system of the intended application, is analyzed for a curve fit using well known curve-fitting methods. The shape of the fitted mathematical curve is the noise model, and for example, in FIGS. 1-3, the model is a straight line shown by the declined long-dashed line. An effective frequency, « , is also determined during the design process by determining the frequency at which the modeled power equals the value of the average power. Block 17 represents the determination of the critical bandwidth. The critical bandwidth is generally a contiguous range of frequencies that includes the range in which the data fits the model. In the signals of FIGS. 1-3, it can be seen that data for the system that was measured fits the straight line model over a frequency range from about 200 Hz to somewhere between 2,500 Hz and 3,000 Hz. As an example, a frequency range of 250Hz to 2,500 Hz can be selected. A small adjustment to the selected frequency range in order to provide a convenient number of FFT bins will not significantly impact the performance of the noise detector. In the exemplary embodiment the bandwidth utilized for the noise activity detector comprised 128 FFT bins, which, as an even power of two, is a convenient divisor for calculating the average power in the 128 bins. The critical bandwidth, noise power model and effective frequency determination processes of blocks 16 and 17 may use the following steps:
• Examine the power spectrum of the input signal under typical input noise conditions. Select the sub-band (block 15) to be used such that it includes only valid information for the task. For example, in a single-channel voice grade communication system, a sub-band extending from 250Hz to 3000Hz is applicable. Sub-band bandwidths and the number of sub-bands to be used for other systems can be readily determined.
• Select a model and model complexity (block 16) for each sub -band (they need not be the same for each sub-band). Polynomial curve fitting can be used for this step, or any other common curve fitting method is applicable. A monotonic function is preferred. For the example embodiment described above, the model uses a first-order curve (straight line) with two parameters: slope and intercept.
• Determine the parameter values from the typical noise-only data. In the example implementation, the slope is determined from the frequency response data, and the intercept is determined by the average energy.
• Calculate the effective frequency — that is, the frequency at which the value of the model power curve equals the average signal power contained in the sub-band portion of the actual measured noise signal. As shown in FIGS. 1-3, this is the frequency at which the short dashed line crosses the long dashed model line on the graphs — that is, 746 Hz. Of course this 746 Hz value is specific only to the example described herein, and other applications will have a different effective frequency.
The process of block 16 is described in more detail with reference to FIG. 9, which is generally a flow diagram depicting the operation of noise activity detector (NAD) 20. The input signal is sub-band signal 22, which is the output signal provided by sub-band selection process 15 of FIG. 8, and is used to calculate average energy in the critical bandwidth step 30. Noise model determination is performed at step 26, together with a determination of the effective frequency at step 28. Steps 26 and 28 correspond to block 16 of FIG. 8. As previously mentioned, noise model determination can be made based on visual observation, or determined more rigorously with known curve fitting algorithms. As such, it can be determined how well any particular power curve model will represent the measured signal power data. In the case of the data of FIGS. 1-3, it can be seen that a straight line with a slope of approximately -6 dB per octave will model, reasonably well, the sensor system response for all the noise source data depicted in these plots, and the noise power measured through the microphone system substantially fits a straight line model over a frequency range from about 200 Hz to 2,500 Hz. Thus, determination of critical bandwidth (step 17) may use this bandwidth for a single channel system or may use multiple critical sub-bands for a multiple channel system. A different microphone design would produce different results, and could require a curved line model instead of a straight line model for the noise signal. Rigorous methods of curve fitting can be used to provide a precise model, but doing so is generally not required to achieve the desired result, and the more complex the model, the more processing power will be required in operating the noise detector.
The determination of effective frequency E (step 28) is also accomplished as mentioned above and described here more fully. After the shape of the noise power model 26 and the critical bandwidth 17 have been determined, the power model is mathematically integrated over the critical bandwidth to determine the average model power level. The frequency at which this level intersects the noise power model curve is the effective frequency E . Let the noise power model be defined as
Pm (f) = <f SAf) (1)
where p NM ( KJ π } is the noise power model, s N ( u f) J is the noise power model shape function, > f is frequency, and a is a magnitude scale factor to be determined. The shape model is integrated over the critical bandwidth and then divided by the critical bandwidth, BW c , to produce the average noise power model level.
Let the critical bandwidth of the sub-band be defined by its lower frequency boundary, J flaw , and its upper frequency boundary, J f h< . In the exemplary case being discussed here,
^ = 200 and A = 2500 _ Therefore
Figure imgf000018_0001
and, the average noise power model level is
Figure imgf000018_0002
This average noise model power level will equal the value of the noise power model at
E* the effective frequency E . That is,
"NM avg "NM \^E ) (4)
E* therefore, E can be found by solving equation 4. As can be readily seen, model curves that are monotonic are preferred.
For the example case,
pm {f) = a - r2 (5) and
Figure imgf000019_0001
which is effectively about 700 Hz.
The above parameters of critical bandwidth, noise power model and effective frequency can all be predetermined during the design of the noise detector, and need not be calculated in real-time, thereby reducing the calculation power required for the operating system. The real time operation of the noise activity detector (NAD) 20 is now described with reference to FIG. 9, which shows a flow diagram of various steps or tasks that are performed.
It will be appreciated that these tasks can each be performed by a dedicated circuit, as shown in FIG. 10, or one or more circuits can be used to perform any one or more of the tasks.
Additionally, it may be possible to use a single processor, or several processors, to perform the tasks, each processor having one or more modules that may be dedicated to one or more tasks.
At step 30 in FIG. 9, average energy in
Figure imgf000019_0002
is calculated, and the power across the entire critical bandwidth for the selected sub-band is summed and divided by the critical bandwidth, BW c , to generate a value for the signal's average power level of the current frame.
Circuit 102 of FIG. 10 is provided for this task. This average power level value is used at step 32 of FIG. 9 to define a threshold function ^ ^ that is unique to the current frame of data. Circuit 104 of FIG. 10 is provided for this purpose.
The define threshold function, ^ ' , step 32 (and circuit 104) determines a dynamic frequency-dependent threshold using the noise power model, P NM ( KJ f) } , determined in step 26
E* and the effective frequency, E , determined in step 28, by calculating the average power in the current frame of data and setting the level, a , of the model so that the average power level p for the current frame is equal to the value of the model at the effective frequency, E . That is,
N avg
G =
S N (FE ) (7) where N avg is the current average power level. Thus, the threshold function for the ^1 frame of data is determined by circuit 104 and in step 32 as
Th1 (f) = U1 - SN (J) = P1V 1 (D (g)
Note that this threshold is not a single level and is not dependent upon prior frames of data, both of which are common in other such detectors. Because the threshold is immediate — that is, calculated for and used by only the current frame — the NAD 20 is able to follow rapid changes in background noise. Thus a dynamic modification of the frequency- dependent threshold function using the average energy is used. The threshold function, Th ' ( KJ f) > , is used to divide the spectral data of the current frame into two groups, those FFT frequency bins whose power data magnitudes are greater than the threshold, and those whose power data magnitudes are less than the threshold. FIGS. 4-7, depicting wind noise and voice characteristics, all represent data that, when applied to the example embodiment of noise activity detector 20, generate threshold functions as shown by the long-dashed lines of each respective plot. Every FFT bin holds a complex value having a magnitude which corresponds to the average magnitude of the signal content in the frequency bandwidth of the FFT bin over the time period of one frame. In the Calculate Average Energy In BW c step 30, the magnitude in each FFT bin is squared and the squared values are averaged, thus providing the average energy per bin over the time period of the frame. As described above, step 32 (circuit 104) uses this value to determine the value of ' and therefore the threshold function ' ^ ' for the current frame, where i is the frame index. A
Calculate Average Energy Below ^ ^ step 34 is performed by circuit 106, which sums the squares of the magnitudes corresponding to bins with magnitudes less than the threshold, and divides that sum by the number of bins with magnitudes less than the threshold, resulting in an average energy per bin for the bins with magnitudes less than the threshold. In addition, a
Calculate Average Energy Above ^KJ ) step 3^ [s performed by circuit 108, which sums the squares of the magnitudes corresponding to bins with magnitudes greater than the threshold, and divides that sum by the number of bins with magnitudes greater than the threshold, resulting in an average energy per bin for the bins with magnitudes greater than the threshold.
E E
Step 34 provides the signal BEL0W while step 36 provides the signal AB0VE .
The logarithms of the energy averages E BEL0W and E ABOVE are each calculated in steps
38 and 40, and the resulting values optionally provided to filters that create a smoothing function across time by acting on the values from sequential frames. Log circuit 110 and filtering circuit 112 of FIG. 10 provide these functions. Although the smoothing is not required for proper operation of the noise detector of this application, such filtering can be used to create longer hangover times, if desired. However, because the detector is able to correctly determine the presence of voice even when the voice power is well below the noise power in the incoming signal, additional hangover is often superfluous.
When desired, the filtering of steps 38 and 40 in an exemplary embodiment is performed with an exponential filter of the following form:
EX avg = ccx >g(£j^og(£X M)) + log(£X M ) (9)
where EY is either E BELOW or E ABOVE , ^A- is a time constant that determines the amount of smoothing where {Xχ is between 0 and 1, and where a typical value may be 0.1. The subscript x denotes that n χ may have different values for the ABOVE and BELOW cases. E γ is the smoothed output signal, where Y can be ABV or BLW, designating which signal is being smoothed.
There is no limitation on the type and complexity of the smoothing filter(s), and many are known in the art. More complex smoothing filters can be used which can provide asymmetrical rise (attack) and fall (decay) time constants. Hangover is created when the ABOVE smoothed signal is able to move up faster than down, and the BELOW smoothed signal is able to move down faster than up.
The approach described above provides two signals that are similar in magnitude for a typical noise signal input to noise detector 20 so detection of the noise only portion of the input signal is simplified if one of these signals is offset from the other. During system design, an offset is determined by the practitioner in Determine Offset step 42, where the offset is slightly larger than the random variation in the two logarithmic signals when a noise signal is input to noise detector 20. This amount of offset then prevents false negative triggers of the noise detector, i.e. false indications that other-than-noise is present when indeed the input signal is only noise. Such false triggers do not create error in operation of the associated noise reduction or other process with which the noise detector is used, but it does slow the operation of some. The offset, therefore is meant to minimize this effect. The offset, which may be a negative number, is added to the output of log & filter step 40 in add offset step 44. Just as well, the add offset step could be after step 38 and the offset applied to the signal AV~L0 . In this case, in order to achieve the same result, the offset value determined in step 42 would have the opposite sign.
After offsetting one of the two signals, the resulting values are compared in the decision step 46 (circuit 114). Decision step 46 causes Set Noise Indicator step 48 (circuit 116) to set the NAD output to an "on" state indicating the presence of noise only if the output from step 38, EAV-W ; is greater than the output from step 44, EAV-HI _ when EΛV-LO }S iess than AV~H! , decision test step 46 causes reset noise indicator step 50 to reset NAD output to an "off state indicating the presence of other-than-noise in the input signal. An alternate embodiment uses an offset value dependent upon whether the NAD output is currently on or off, and in this way hysteresis can be incorporated into the NAD switching for applications where it is desirable to have a more stable NAD output.
To illustrate the performance of this noise detector, FIG. 11 is a plot of the non- smoothed E AB0VE + O JffJset signal, the E BELOW signal and the noise activity detector output signal. The horizontal scale is shown as time in frames, and the vertical scale is in dB for the E ABOVE + Offset and EBELOW signals Jn this plotj the NAD output Signai }s ^gn wnen noise_ only conditions are detected and low when non-noise is found. The scale for the NAD output is arbitrary, since it represents an on/off binary flag.
Across the top are numbered sections indicating the input signal characteristics at different times. Sections (1) and (5) are periods of time with only silence and when noise detector 20 had no signal input. Li this case, whichever state the noise detector indicates is acceptable since the input signal is neither noise nor non-noise, and a noise reduction system would have no input noise to reduce.
Section (2) is a period during which the signal input to the noise detector 20 was clean voice in quiet ambient conditions. A short period at the end of this second section has only normal room ambient sound with no voice. The noise detector properly handled this relatively easy condition, indicating the presence of the voice as non-noise and yet detecting the absence of voice during noise only periods. The system used for the plot of FIG. 11 included smoothing filters to provide additional hangover by design so there is a short time after the cessation of voice bursts when the detector's output does not change indication.
Section (3) consists of very loud (85 dB SPL) input noise only sound that was a mixture of music, single loud voice and voice babble from multiple directions. Here it can be seen that the noise detector indicates mostly noise only, but also creates non-noise indications as a result of the single loud background voice even though the SNR for the background voice is less than -1O dB.
In section (4) nearby voice speech was added to the noise from Section (3), with the added voice SNR being approximately -3 dB. As designed, the NAD output shows that the noise-only periods are correctly indicated while during voicing, the NAD correctly indicates non-noise. Correct operation at such low input SNR levels shows the capability of this new noise/voice detector.
While embodiments and applications have been shown and described, it would be apparent to those skilled in the art having the benefit of this disclosure that many more modifications than mentioned above are possible without departing from the inventive concepts disclosed herein. The invention, therefore, is not to be restricted except in the spirit of the appended claims.

Claims

CLAIMSWhat is claimed is:
1. A method for generating an indication of noise activity in a signal, comprising: a) calculating average energy of the signal in a critical bandwidth; b) determining a frequency-dependent threshold function; c) generating a dynamic modification of the frequency-dependent threshold function using the average energy; d) identifying frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and determining a first average energy value representing an average energy of the identified frequency components with energy above the threshold values; e) identifying frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and determining a second average energy value representing an average energy of the identified frequency components with energy below the threshold values; f) applying an offset value to at least one of the first and second average energy values; g) comparing, after application of said offset value, the resultant first and second average energy values with one another, and h) indicating the presence of noise activity if, as a result of said comparison, it is determined that, the resultant first average energy value is below the resultant second average energy value.
2. The method of claim 1 , wherein procedures a) - h) are carried out over individual frames of a multi-frame process.
3. The method of claim 1, further comprising filtering prior to the comparison at g).
4. The method of claim 3, wherein the filtering is conducted using an exponential filter.
5. The method of claim 3, wherein filtering with asymmetrical rise and fall time constants is applied to the signals representing average energy of the identified frequency components with energy above the threshold and the average energy of the identified frequency components with energy below the threshold.
6. A noise activity detector for generating an indication of noise activity in a signal comprising: a) a first circuit configured to calculate the average energy in a critical bandwidth; b) a second circuit configured to determine a frequency-dependent threshold function; c) a third circuit configured to generate a dynamic modification of the frequency- dependent threshold function using the average energy; d) a fourth circuit configured to identify frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and to determine a first average energy value representing an average energy of the identified frequency components with energy above the threshold; e) a fifth circuit configured to identify frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and to determine a second average energy value representing an average energy of the identified frequency components with energy below the threshold; f) a sixth circuit configured to apply an offset value to at least one of the first and second average energy values; g) a seventh circuit configured to compare, after application of said offset value, the resultant first and second average energy values with one another; and h) an eight circuit configured to indicate the presence of noise activity if, as a result of said comparison, it is determined that the resultant first average energy value is below the resultant second average energy value.
7. The detector of claim 6, wherein the circuits carry out their function over individual frames of a multi-frame process.
8. The detector of claim 6, further comprising a filter for filtering prior to comparison.
9. The detector of claim 8, wherein the filter is an exponential filter.
10. The detector of claim 8, wherein the filter includes at least one filter incorporating asymmetric rise and fall time constants.
11. A noise activity detector for generating an indication of noise activity in a signal, comprising: a) means for calculating average energy of the signal in a critical bandwidth; b) means for determining a frequency-dependent threshold function; c) means for generating a dynamic modification of the frequency-dependent threshold function using the average energy; d) means for identifying frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and determining a first average energy value representing an average energy of the identified frequency components with energy above the threshold values; e) means for identifying frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and determining a second average energy value representing an average energy of the identified frequency components with energy below the threshold values; f) means for applying an offset value to at least one of the first and second average energy values; g) means for comparing, after application of said offset value, the resultant first and second average energy values with one another, and h) means for indicating the presence of noise activity if, as a result of said comparison, it is determined that, the resultant first average energy value is below the resultant second average energy value.
12. The noise activity detector of claim 11, wherein procedures a) — h) are carried out over individual frames of a multi-frame process.
13. The noise activity detector of claim 11 , further comprising filtering prior to the comparison at g).
14. The noise activity detector of claim 13, wherein the filtering is conducted using an exponential filter.
15. The noise activity detector of claim 13, wherein filtering with asymmetrical rise and fall time constants is applied to the signals representing average energy of the identified frequency components with energy above the threshold and the average energy of the identified frequency components with energy below the threshold.
16. A program storage device readable by a machine, embodying a program of instructions executable by the machine to perform a method for generating an indication of noise activity in a signal, the method comprising: a) calculating average energy of the signal in a critical bandwidth; b) determining a frequency-dependent threshold function; c) generating a dynamic modification of the frequency-dependent threshold function using the average energy; d) identifying frequency components of the signal having energy that is above threshold values determined by the threshold function at corresponding frequencies, and determining a first average energy value representing an average energy of the identified frequency components with energy above the threshold values; e) identifying frequency components of the signal having energy that is below threshold values determined by the threshold function at corresponding frequencies, and determining a second average energy value representing an average energy of the identified frequency components with energy below the threshold values; f) applying an offset value to at least one of the first and second average energy values; g) comparing, after application of said offset value, the resultant first and second average energy values with one another, and h) indicating the presence of noise activity if, as a result of said comparison, it is determined that, the resultant first average energy value is below the resultant second average energy value.
17. The device of claim 16, wherein procedures a) - h) are carried out over individual frames of a multi-frame process.
18. The device of claim 16, further comprising filtering prior to the comparison at g).
19. The device of claim 18, wherein the filtering is conducted using an exponential filter.
20. The device of claim 18, wherein filtering with asymmetrical rise and fall time constants is applied to the signals representing average energy of the identified frequency components with energy above the threshold and the average energy of the identified frequency components with energy below the threshold.
PCT/US2008/074102 2007-08-22 2008-08-22 System and method for noise activity detection WO2009026561A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2010522086A JP2010537253A (en) 2007-08-22 2008-08-22 System and method for noise activity detection
EP08798555A EP2191594A4 (en) 2007-08-22 2008-08-22 System and method for noise activity detection
CN200880111290A CN101821971A (en) 2007-08-22 2008-08-22 System and method for noise activity detection
BRPI0815721A BRPI0815721A2 (en) 2007-08-22 2008-08-22 system and method for detecting noise activity

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US96585407P 2007-08-22 2007-08-22
US60/965,854 2007-08-22

Publications (1)

Publication Number Publication Date
WO2009026561A1 true WO2009026561A1 (en) 2009-02-26

Family

ID=40378704

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/074102 WO2009026561A1 (en) 2007-08-22 2008-08-22 System and method for noise activity detection

Country Status (7)

Country Link
US (1) US20090154726A1 (en)
EP (1) EP2191594A4 (en)
JP (1) JP2010537253A (en)
KR (1) KR20100051727A (en)
CN (1) CN101821971A (en)
BR (1) BRPI0815721A2 (en)
WO (1) WO2009026561A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120329466A1 (en) * 2010-03-15 2012-12-27 Zte Corporation Method and System for Measuring Background Noise of Machine
CN110324917A (en) * 2019-07-02 2019-10-11 北京分音塔科技有限公司 Mobile hotspot device with pickup function
EP3432598A4 (en) * 2016-03-17 2019-10-16 Audio-Technica Corporation Noise detection device and audio signal output device

Families Citing this family (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8019091B2 (en) 2000-07-19 2011-09-13 Aliphcom, Inc. Voice activity detector (VAD) -based multiple-microphone acoustic noise suppression
US8280072B2 (en) 2003-03-27 2012-10-02 Aliphcom, Inc. Microphone array with rear venting
US8452023B2 (en) 2007-05-25 2013-05-28 Aliphcom Wind suppression/replacement component for use with electronic systems
US9066186B2 (en) 2003-01-30 2015-06-23 Aliphcom Light-based detection for acoustic applications
US9099094B2 (en) 2003-03-27 2015-08-04 Aliphcom Microphone array with rear venting
TWI475847B (en) * 2008-04-16 2015-03-01 Koninkl Philips Electronics Nv Passive radar for presence and motion detection
WO2011009946A1 (en) * 2009-07-24 2011-01-27 Johannes Kepler Universität Linz A method and an apparatus for deriving information from an audio track and determining similarity between audio tracks
AU2011248297A1 (en) * 2010-05-03 2012-11-29 Aliphcom, Inc. Wind suppression/replacement component for use with electronic systems
US9357307B2 (en) 2011-02-10 2016-05-31 Dolby Laboratories Licensing Corporation Multi-channel wind noise suppression system and method
JP5649488B2 (en) * 2011-03-11 2015-01-07 株式会社東芝 Voice discrimination device, voice discrimination method, and voice discrimination program
US8935132B2 (en) * 2012-02-08 2015-01-13 Sae Magnetics (H.K.) Ltd. Spectral simulation method during noise testing for a magnetic head, and noise-testing method for a magnetic head by using the same
CN103323619A (en) * 2012-03-20 2013-09-25 富泰华工业(深圳)有限公司 Wind direction detecting system, wind direction detecting method and electronic equipment using wind direction detecting system
CN103458201B (en) * 2012-06-05 2017-02-22 晨星软件研发(深圳)有限公司 Signal processing device and signal processing method
CN102750956B (en) * 2012-06-18 2014-07-16 歌尔声学股份有限公司 Method and device for removing reverberation of single channel voice
US9685921B2 (en) * 2012-07-12 2017-06-20 Dts, Inc. Loudness control with noise detection and loudness drop detection
WO2014043024A1 (en) 2012-09-17 2014-03-20 Dolby Laboratories Licensing Corporation Long term monitoring of transmission and voice activity patterns for regulating gain control
US9312826B2 (en) 2013-03-13 2016-04-12 Kopin Corporation Apparatuses and methods for acoustic channel auto-balancing during multi-channel signal extraction
US10306389B2 (en) 2013-03-13 2019-05-28 Kopin Corporation Head wearable acoustic system with noise canceling microphone geometry apparatuses and methods
CN103198835B (en) * 2013-04-03 2015-04-01 工业和信息化部电信传输研究所 Noise suppression algorithm reconvergence time measurement method based on mobile terminal
US9263061B2 (en) * 2013-05-21 2016-02-16 Google Inc. Detection of chopped speech
PL3011557T3 (en) 2013-06-21 2017-10-31 Fraunhofer Ges Forschung Apparatus and method for improved signal fade out for switched audio coding systems during error concealment
DE102013111784B4 (en) * 2013-10-25 2019-11-14 Intel IP Corporation AUDIOVERING DEVICES AND AUDIO PROCESSING METHODS
AU2014371411A1 (en) * 2013-12-27 2016-06-23 Sony Corporation Decoding device, method, and program
WO2015142486A1 (en) * 2014-03-17 2015-09-24 Robert Bosch Gmbh System and method for all electrical noise testing of mems microphones in production
CN107211214B (en) * 2015-01-28 2020-12-01 哈曼国际工业有限公司 Vehicle speaker arrangement
WO2016179238A1 (en) 2015-05-04 2016-11-10 Harman International Industries, Inc. Venting system for vehicle speaker assembly
US10027302B2 (en) * 2015-05-07 2018-07-17 Shari Eskenas Audio interrupter alertness device for headphones
CN106297819B (en) * 2015-05-25 2019-09-06 国家计算机网络与信息安全管理中心 A kind of noise cancellation method applied to Speaker Identification
JP6501259B2 (en) * 2015-08-04 2019-04-17 本田技研工業株式会社 Speech processing apparatus and speech processing method
US10242677B2 (en) * 2015-08-25 2019-03-26 Malaspina Labs (Barbados), Inc. Speaker dependent voiced sound pattern detection thresholds
JP6604113B2 (en) * 2015-09-24 2019-11-13 富士通株式会社 Eating and drinking behavior detection device, eating and drinking behavior detection method, and eating and drinking behavior detection computer program
US11631421B2 (en) * 2015-10-18 2023-04-18 Solos Technology Limited Apparatuses and methods for enhanced speech recognition in variable environments
US9925867B2 (en) * 2016-01-11 2018-03-27 Ford Global Technologies, Llc Fuel control regulator system with acoustic pliability
US10904656B2 (en) 2016-05-10 2021-01-26 Harman International Industries, Incorporated Vehicle speaker arragement
US9978392B2 (en) * 2016-09-09 2018-05-22 Tata Consultancy Services Limited Noisy signal identification from non-stationary audio signals
US10475471B2 (en) * 2016-10-11 2019-11-12 Cirrus Logic, Inc. Detection of acoustic impulse events in voice applications using a neural network
US10242696B2 (en) * 2016-10-11 2019-03-26 Cirrus Logic, Inc. Detection of acoustic impulse events in voice applications
CN108154880A (en) * 2016-12-05 2018-06-12 广东大仓机器人科技有限公司 The robot that environmental noise carries out speech recognition can be differentiated in real time
US10374564B2 (en) * 2017-04-20 2019-08-06 Dts, Inc. Loudness control with noise detection and loudness drop detection
US10697739B1 (en) * 2017-12-06 2020-06-30 David Skipper Electronic flashbang
US10928502B2 (en) * 2018-05-30 2021-02-23 Richwave Technology Corp. Methods and apparatus for detecting presence of an object in an environment
US10948581B2 (en) 2018-05-30 2021-03-16 Richwave Technology Corp. Methods and apparatus for detecting presence of an object in an environment
CN109275084B (en) * 2018-09-12 2021-01-01 北京小米智能科技有限公司 Method, device, system, equipment and storage medium for testing microphone array
JP7194832B2 (en) * 2018-12-12 2022-12-22 シグニファイ ホールディング ビー ヴィ Motion Detectors, Lighting Fixtures and How to Respond
CN110620600B (en) * 2019-09-11 2021-10-26 华为技术有限公司 Vehicle-mounted radio and control method
GB2595463B (en) 2020-05-26 2023-05-31 Dyson Technology Ltd Headgear having an air purifier
CN113990074B (en) * 2021-12-28 2022-03-04 江苏华设远州交通科技有限公司 Traffic control system and method based on noise data acquisition and processing

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030223597A1 (en) * 2002-05-29 2003-12-04 Sunil Puria Adapative noise compensation for dynamic signal enhancement
US20050108004A1 (en) * 2003-03-11 2005-05-19 Takeshi Otani Voice activity detector based on spectral flatness of input signal
US20060041426A1 (en) * 2004-08-23 2006-02-23 Nokia Corporation Noise detection for audio encoding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030223597A1 (en) * 2002-05-29 2003-12-04 Sunil Puria Adapative noise compensation for dynamic signal enhancement
US20050108004A1 (en) * 2003-03-11 2005-05-19 Takeshi Otani Voice activity detector based on spectral flatness of input signal
US20060041426A1 (en) * 2004-08-23 2006-02-23 Nokia Corporation Noise detection for audio encoding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2191594A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120329466A1 (en) * 2010-03-15 2012-12-27 Zte Corporation Method and System for Measuring Background Noise of Machine
US8958508B2 (en) * 2010-03-15 2015-02-17 Zte Corporation Method and system for measuring background noise of machine
EP2501171A4 (en) * 2010-03-15 2015-07-01 Zte Corp Method and system for measuring background noise of machine
EP3432598A4 (en) * 2016-03-17 2019-10-16 Audio-Technica Corporation Noise detection device and audio signal output device
CN110324917A (en) * 2019-07-02 2019-10-11 北京分音塔科技有限公司 Mobile hotspot device with pickup function

Also Published As

Publication number Publication date
BRPI0815721A2 (en) 2017-06-13
CN101821971A (en) 2010-09-01
KR20100051727A (en) 2010-05-17
EP2191594A4 (en) 2011-06-08
EP2191594A1 (en) 2010-06-02
JP2010537253A (en) 2010-12-02
US20090154726A1 (en) 2009-06-18

Similar Documents

Publication Publication Date Title
US20090154726A1 (en) System and Method for Noise Activity Detection
US11587579B2 (en) Vowel sensing voice activity detector
US8600073B2 (en) Wind noise suppression
CN108538310B (en) Voice endpoint detection method based on long-time signal power spectrum change
US9959886B2 (en) Spectral comb voice activity detection
US8428945B2 (en) Acoustic signal classification system
US6289309B1 (en) Noise spectrum tracking for speech enhancement
EP2047457B1 (en) Systems, methods, and apparatus for signal change detection
KR100944252B1 (en) Detection of voice activity in an audio signal
KR101034831B1 (en) System for suppressing wind noise
US6993481B2 (en) Detection of speech activity using feature model adaptation
US20160343385A1 (en) Method and apparatus for suppressing wind noise
EP2083417B1 (en) Sound processing device and program
US9454976B2 (en) Efficient discrimination of voiced and unvoiced sounds
US9384759B2 (en) Voice activity detection and pitch estimation
Khoa Noise robust voice activity detection
US9437213B2 (en) Voice signal enhancement
Wu et al. A pitch-based method for the estimation of short reverberation time
KR19990001828A (en) Apparatus and method for extracting speech features by dynamic region normalization of spectrum
JPH0449952B2 (en)
Dai et al. An improved model of masking effects for robust speech recognition system
US20230317100A1 (en) Method of Detecting Speech Using an in Ear Audio Sensor
JPH0398098A (en) Voice recognition device
JP5169297B2 (en) Sound processing apparatus and program
WO1989003519A1 (en) Speech processing apparatus and methods for processing burst-friction sounds

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200880111290.X

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08798555

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2010522086

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2008798555

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20107006039

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: PI0815721

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20100222