EP2710590A1 - Method for super-wideband noise supression - Google Patents

Method for super-wideband noise supression

Info

Publication number
EP2710590A1
EP2710590A1 EP11722677.9A EP11722677A EP2710590A1 EP 2710590 A1 EP2710590 A1 EP 2710590A1 EP 11722677 A EP11722677 A EP 11722677A EP 2710590 A1 EP2710590 A1 EP 2710590A1
Authority
EP
European Patent Office
Prior art keywords
signal stream
frequency
gain
signal
noise suppression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP11722677.9A
Other languages
German (de)
French (fr)
Other versions
EP2710590B1 (en
Inventor
Marco Paniconi
Jan Skoglund
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Publication of EP2710590A1 publication Critical patent/EP2710590A1/en
Application granted granted Critical
Publication of EP2710590B1 publication Critical patent/EP2710590B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Definitions

  • the present disclosure generally relates to systems and methods for transmission of audio signals such as voice communications. More specifically, aspects of the present disclosure relate to performing noise suppression on a signal in the time-domain using noise suppression data generated for a corresponding signal in the frequency-domain.
  • One embodiment of the present disclosure relates to a method for noise suppression comprising receiving an input signal at a first filter bank of a noise suppression module; splitting, by the first filter bank, the received signal into a first signal stream and a second signal stream, the first signal stream including a first range of frequencies and the second signal stream including a second range of frequencies higher than the first range of frequencies; deriving a gain for the second signal stream based on data collected from the first signal stream; applying the derived gain to the second signal stream in time-domain; and synthesizing at a second filter bank of the noise suppression module the first signal stream and the second signal stream into an output signal.
  • the method for noise suppression may further comprise computing a frequency average of a frequency-based speech probability of the first signal stream; deriving a frame-based speech probability for the second signal stream based on the frequency average of the frequency-based speech probability of the first signal stream; using the frame-based speech probability of the second signal stream to compute a first gain for the second signal stream; computing a frequency average of a frequency-based gain filter of the first signal stream; using the frequency average of the frequency-based gain filter of the first signal stream to compute a second gain for the second signal stream; and deriving a single upper band gain to be applied to the second signal stream based on a weighted average of the first gain of the second signal stream and the second gain of the second signal stream.
  • the method of noise suppression may further comprise computing a frequency average for the first and second gains of the second signal stream by averaging the first and second gains of the second signal stream over an upper portion of a frequency range of the first signal stream.
  • a noise suppression system comprising a first filter bank configured to split a received input signal into a first signal stream and a second signal stream, where the first signal stream includes a first range of frequencies and the second signal stream includes a second range of frequencies higher than the first range of frequencies; a noise suppression module configured to receive, from the first filter bank, the first and second signal streams, derive a gain for the second signal stream based on data collected from the first signal stream, and apply the derived gain to the second signal stream in time-domain; and a second filter bank configured to, in response to the noise suppression module applying the derived gain to the second signal stream, synthesize the first signal stream and the second signal stream into an output signal.
  • the noise suppression module is further configured to process the first signal stream in frequency-domain to generate a frequency- based speech probability and a frequency-based gain filter for the first signal stream.
  • the noise suppression module is further configured to compute a frequency average of a frequency-based speech probability of the first signal stream; derive a frame-based speech probability for the second signal stream based on the frequency average of the frequency-based speech probability of the first signal stream; use the frame-based speech probability of the second signal stream to compute a first gain for the second signal stream; compute a frequency average of a frequency-based gain filter of the first signal stream; use the frequency average of the frequency-based gain filter of the first signal stream to compute a second gain for the second signal stream; and derive a single upper band gain to be applied to the second signal stream based on a weighted average of the first gain of the second signal stream and the second gain of the second signal stream.
  • the noise suppression module is further configured to compute a frequency average for the first and second gains of the second signal stream by averaging the first and second gains of the second signal stream over an upper portion of a frequency range of the first signal stream.
  • the methods and systems described herein may optionally include one or more of the following additional features: the derived gain applied to the second signal stream is a single upper band gain, the data collected from the first signal stream is a speech probability and gain filter, the speech probability and gain filter collected from the first signal stream includes at least a frequency-based speech probability and a frequency-based gain filter for the first signal stream, processing the first signal stream in frequency-domain generates the frequency-based speech probability and the frequency- based gain filter for the first signal stream, the weighted average includes a weighting value for the first and second gains of the second signal stream, where the weighting value is based on the frequency-based speech probability of the first signal stream, the first range of frequencies includes frequencies between 0kHz and 8kHz, and the second range of frequencies includes frequencies between 8kHz and 16kHz, each of the first and second filter banks is an infinite impulse response filter bank or a finite impulse response filter bank, and/or each of the first and second filter banks include a plurality of all
  • Figure 1 is a block diagram of a representative embodiment in which one or more aspects described herein may be implemented.
  • Figure 2 is a communication diagram illustrating example data flows in connection with noise suppression processing according to one or more embodiments described herein.
  • Figure 3 is a flowchart illustrating an example method for super-wideband noise suppression according to one or more embodiments described herein.
  • Figure 4 is a block diagram illustrating an example computing device arranged for multipath routing and processing of audio input signals according to one or more embodiments described herein.
  • noise suppression occurs in the frequency domain, where both noise estimation and noise filtering processes are performed.
  • the noise suppression processing performed in the frequency domain involves a low-frequency band portion of a received signal, with a high-frequency band portion of the signal remaining in the time-domain.
  • a time-domain filter bank e.g., a chain of all-pass filters such as polyphase IIR filters, a finite impulse response (FIR) filter bank, etc.
  • a super-wideband input signal e.g., an input signal with a sampling rate of 32kHz, 48kHz, etc.
  • L-band low-frequency band
  • H-band high-frequency band
  • the L-band signal stream contains portions of the received signal that include lower-frequency components (e.g., components with frequencies in the range of 0-8kHz, or 0-12kHz, etc.), and the H-band signal stream contains portions of the signal that include higher-frequency components (e.g., components with frequencies in the range of 8- 16kHz, or 12-24kHz, etc.).
  • Noise suppression processing performed on the L-band stream in the frequency-domain generates noise suppression data (e.g., speech/noise probability and gain filter measurements for the L-band) that is used for noise suppression processing of the H-band stream, which remains in the time-domain.
  • the L-band and H-band signal streams may include components in frequency ranges other than the exemplary frequency ranges used herein.
  • the frequency ranges of 0-8kHz and 8- 16kHz are used for the L-band and H-band signal streams, respectively.
  • a frequency range of 0-12kHz may be used for the L-band signal stream and a frequency range of 12-24kHz used for the H-band signal stream.
  • frequency ranges of 0-7kHz and 7-20kHz may be used for the L-band and H-band signal streams, respectively.
  • narrowband wideband
  • super-wideband is sometimes used herein to refer to audio signals with sampling rates at or above certain threshold sampling rates, or with sampling rates within certain ranges. These terms may also be used relative to one another in describing audio signals with particular sampling rates.
  • “super-wideband” is sometimes used herein to refer to audio signals with a sampling rate above wideband sampling rate of, e.g., 16kHz.
  • super-wideband is used to refer to audio signals sampled at a higher rate of, e.g., 32kHz or 48kHz. It should be understood that such use of the terms “narrowband,” “wideband,” and/or “super-wideband” are not in any way intended to limit the scope of the disclosure.
  • the noise suppression module may be part of a larger audio communications system and may include one or more submodules, units, and/or components that perform some of the processes described herein.
  • the noise suppression module is located at the near-end (e.g., render stream) environment of a signal transmission path.
  • H-band noise suppression processing includes applying a frame-based gain to the time-domain H-band signal stream, where the frame- based gain is derived from L-band noise suppression data generated in the frequency-domain.
  • frame-based gain indicates that the H-band gain varies with the frame index of the input signal and does have a frequency dependency.
  • the frame- based gain is also referred to herein as a "single upper band gain" (e.g., where a single gain (e.g., factor) is applied to the whole upper frequency band (e.g., H-band) of the signal).
  • a received audio signal is determined to only contain narrowband and/or wideband frequency components (e.g., the signal is sampled at a rate of 8kHz or 16kHz), then the entire signal may be taken as the L-band signal stream, with noise suppression processing only acting on the L-band.
  • a filter bank is not used to split the received signal into H-band and L-band streams, and although a pointer for the H- band may be passed to the noise suppression module, no H-band processing is actually performed.
  • FIG. 1 and the following discussion provide a brief, general description of a representative embodiment in which various aspects of the present disclosure may be implemented.
  • a noise suppression module 140 may be located at the near-end environment of a signal transmission path, along with a capture device 105 also at the near-end and a render device 130 located at the far-end environment.
  • noise suppression module 140 may be one component in a larger system for audio (e.g., voice) communications.
  • the noise suppression module 140 may be an independent component in such a larger system or may be a subcomponent within an independent component (not shown) of the system.
  • FIG. 1 the example embodiment illustrated in FIG.
  • noise suppression module 140 is arranged to receive and process input from capture device 105 and generate output to, e.g., one or more other audio processing components (not shown).
  • these other audio processing components may be acoustic echo control (AEC), automatic gain control (AGC), and/or other audio quality improvement components.
  • AEC acoustic echo control
  • AGC automatic gain control
  • these other processing components may receive input from capture device 105 prior to noise suppression module 140 receiving such input.
  • Capture device 105 may be any of a variety of audio input devices, such as one or more microphones configured to capture sound and generate input signals.
  • Render device 130 may be any of a variety of audio output devices, including a loudspeaker or group of loudspeakers configured to output sound of one or more channels.
  • capture device 105 and render device 130 may be hardware devices internal to a computer system, or external peripheral devices connected to a computer system via wired and/or wireless connections.
  • capture device 105 and render device 130 may be components of a single device, such as a speakerphone, telephone handset, etc.
  • one or both of capture device 105 and render device 130 may include analog-to-digital and/or digital-to-analog transformation functionalities.
  • noise suppression module 140 includes a controller 150 for coordinating various processes performed therein, and monitoring and/or adapting timing considerations for such processes.
  • Noise suppression module 140 may also include a sampling unit 115, an analysis filter bank 120 and a synthesis filter bank 135, and a noise suppression unit 125. Each of these units may be in communication with controller 150 such that controller 150 facilitates some of the processes performed by and between the units. Details of the sampling unit 115, analysis filter bank 120, synthesis filter bank 135, and noise suppression unit 125 will be further described below.
  • Noise suppression unit 125 may include one or more smaller units or subunits, such as H-band noise suppression (NS) unit 160 and L-band NS unit 165.
  • the sampling unit 115 and the analysis filter bank 120 may be in communication with L-band NS unit 165 in a manner that is separate from any such communications between H-band NS unit 160 and the sampling unit 115 or the analysis filter bank 120.
  • signals, signal information, signal data, etc. may be passed to L-band NS unit 165 from either or both of sampling unit 115 and analysis filter bank 120 without such signals also being passed to H- band NS unit 160.
  • L-band NS unit 165 and H-band NS unit 160 may have separate lines of communication with synthesis filter bank 135.
  • L-band NS unit 165 and H-band NS unit 160 may each pass separate signal streams of different frequency ranges to synthesis filter bank 135 so the streams may be recombined into a full signal.
  • L-band NS unit 165 and H-band NS unit 160 may be in communication with each other within noise suppression unit 125.
  • noise suppression processing performed by L-band NS unit 165 in the frequency-domain may generate noise suppression data that is passed to H-band NS unit 160 for noise suppression processing in the time-domain.
  • one or more other components, modules, units, etc. may be included as part of noise suppression module 140, in addition to or instead of those illustrated in FIG. 1.
  • the names used to identify the units and components included as part of noise suppression module 140 e.g., "sampling unit,” “analysis filter bank,” “L-band NS unit,” etc.
  • noise suppression module 140 are exemplary in nature, and are not in any way intended to limit the scope of the disclosure.
  • FIG. 2 is a communication diagram illustrating example data flows for noise suppression processing according to at least some embodiments of the disclosure.
  • a capture device 205 may pass an input signal 210 to a sampling unit 215.
  • sampling unit 215 may be a part or component of a noise suppression module (e.g., noise suppression module 140 shown in FIG. 1).
  • Sampling unit 215 may sample input signal 210 at a rate of, for example, 8kHz, 16kHz, 32kHz, etc. for some time interval (e.g., ten milliseconds (ms)), depending on whether input signal 210 is narrowband, wideband, super-wideband, etc. in nature.
  • ms ten milliseconds
  • Example time intervals that may be used by sampling unit 215 in one or more arrangements include 10 ms, 20 ms, 30 ms, etc.
  • a low sample rate signal 219 e.g., input signal 210 sampled at a rate of 8kHz (e.g., narrowband), 16kHz (e.g., wideband), etc., by sampling unit 215)
  • the entire input signal 210 is passed to a noise suppression unit 225 as L-band signal stream 222, bypassing an analysis filter bank 220.
  • a high sample rate signal 217 (e.g., super-wideband input, input with a 32kHz sampling rate (320 samples) and including a frequency spectrum of 0-16kHz, and the like) is passed from sampling unit 215 to analysis filter bank 220.
  • FIG. 2 shows low sample rate signal 219 and high sample rate signal 217 being out from sampling unit 215, it is important to note that each of these signals is input signal 210 simply identified under a different name for purposes of explanation.
  • sampling unit 215 determines, based on a sampling rate of input signal 210 (e.g., 8kHz, 16kHz, etc., which in some arrangements may be identified as "low sample rate signal 219," or 32kHz, 48kHz, etc., which is some arrangements may be identified as "high sample rate signal 217"), whether input signal 210 is passed directly to noise suppression unit 225 as L-band signal stream 222, or whether input signal 210 is instead passed to analysis filter bank 220.
  • a sampling rate of input signal 210 e.g., 8kHz, 16kHz, etc., which in some arrangements may be identified as "low sample rate signal 219," or 32kHz, 48kHz, etc., which is some arrangements may be identified as “high sample rate signal 217”
  • sampling unit 215 samples input signal 210 at a high sample rate (e.g., 32kHz, 48kHz, etc.)
  • sampling unit 215 passes input signal 210 to analysis filter bank 220.
  • Analysis filter bank 220 splits input signal 210 into two signal streams, L-band signal stream 222 and H-band signal stream 224.
  • L-band signal stream 222 contains lower-frequency components (e.g., components with frequencies in the range of 0-8kHz, or 0-12kHz, etc.) of input signal 210
  • H-band signal stream 224 contains higher-frequency components (e.g., components with frequencies in the range of 8- 16kHz, or 12-24kHz, etc.) of input signal 210.
  • L-band signal stream 222 and H-band signal stream 224 may also be used in situations where input signal 210 bypasses analysis filter bank 220 and goes directly to noise suppression unit 225, such as when input signal 210 is output from sampling unit 215 as low sample rate signal 219.
  • L-band signal stream 222 and H-band signal stream 224 may each contain components of various other frequency ranges different from the example ranges described above.
  • noise suppression unit 225 may include a L-band NS unit 265 and a H-band NS unit 260.
  • L-band NS unit 265 may receive L-band signal stream 222 and perform noise-suppression processing as if L-band signal stream 222 is a wideband audio signal (e.g., 0-8kHz frequency spectrum).
  • noise suppression processing performed by L-band NS unit 265 on L-band signal stream 222 generates L-band NS data 280.
  • L-band NS data 280 includes frequency-based speech/noise probability and gain filter data for L-band signal stream 222 that is passed to H-band NS unit 260 for noise suppression processing of H-band signal stream 224 in the time-domain.
  • H-band NS unit 260 is active when input signal 210 includes super-wideband input (e.g., input signal 210 passes to analysis filter bank 220 as a high sample rate signal 217) and executes noise suppression processing on H-band signal stream 224 after receiving L-band NS data 280 from L-band NS unit 265.
  • Noise suppression unit 225 outputs noise-suppressed L-band stream 245 and noise-suppressed H-band stream 255 as a result of, for example, processing performed by L- band NS unit 265 and H-band NS unit 260.
  • noise-suppressed L-band stream 245 may similarly bypass a synthesis filter bank 235 and pass directly as a noise-suppressed output signal 270.
  • noise-suppressed L-band stream 245 is taken as the entire noise-suppressed output signal 270.
  • noise-suppressed L-band stream 245 and noise-suppressed H-band stream 255 pass to synthesis filter bank 235 where the streams are recombined or synthesized into a full band signal (e.g., 0-16kHz frequency spectrum) that forms the noise-suppressed output signal 270.
  • noise suppression is performed on the L-band signal stream (e.g., by L-band NS unit 265 on L-band signal stream 222 shown in FIG. 2) in the frequency domain
  • noise suppression is performed on the H-band signal stream (e.g., by H-band NS unit 260 on H-band signal stream 224) in the time-domain.
  • certain pre-processing is performed including buffering, windowing, and Fourier transformation to map the L-band signal stream to the frequency domain.
  • buffering similar to that applied to the L- band signal stream may also be applied to the H-band signal stream during noise suppression processing on the H-band signal stream, no such windowing or Fourier transformation occurs on the H-band signal stream since it remains in the time-domain.
  • Performing noise suppression on the H-band signal stream in the time-domain, rather than processing the H- band signal stream in the frequency domain lowers related complexity costs and allows for more stable noise suppression to be achieved. This is partly because there is not a lot of energy present in the H-band signal stream. Accordingly, a more robust approach is to derive a frame-based gain from the L-band signal stream, which is then frequency averaged over a portion of the L-band, as will be further described below.
  • analysis filter bank 220 and/or synthesis filter bank 235 are each comprised of a chain of three all-pass filters, e.g., a class of polyphase infinite impulse response (IIR) filters.
  • IIR infinite impulse response
  • any of a variety of two-channel filter banks may be used alone or in combination to form analysis filter bank 220 and/or synthesis filter bank 235.
  • FIG. 3 is a flowchart illustrating a method for noise suppression processing of super-wideband audio input according to one or more embodiments of the disclosure.
  • an input signal is sampled (e.g., input signal 210 sampled by sampling unit 215 shown in FIG. 2).
  • a determination is made as to whether the input signal is super- wideband in quality based on a sampling rate of the input signal from step 300.
  • the input signal may be sampled at a rate of 8kHz, 16kHz, or 32kHz, depending on whether the signal is narrowband, wideband, or super-wideband, respectively.
  • an input signal may be considered a super-wideband signal if it is sampled at a rate of 32kHz in step 300. In further examples, an input signal may be considered a super-wideband signal if it is sampled at a higher rate, such as 48kHz. [0042] If it is determined in step 305 that the input signal is a super-wideband signal, then in step 335 the input signal is split into low-frequency band (L-band) and high-frequency band (H-band) signal streams (e.g.. L-band signal stream 222 and H-band signal stream 224 shown in FIG. 2).
  • L-band low-frequency band
  • H-band high-frequency band
  • the input signal is split into L-band and H-band signal streams by a filter bank (e.g., analysis filter bank 220 shown in FIG. 2), with the L- band signal stream containing lower-frequency (e.g., 0-8kHz) components of the input signal and the H-band signal stream containing higher-frequency (e.g., 8- 16kHz) components of the input signal.
  • a filter bank e.g., analysis filter bank 220 shown in FIG. 220 shown in FIG. 2
  • the process continues from either of steps 310 or 335 to step 315, where the L-band signal stream is transformed to the frequency-domain as part of noise suppression processing of the input signal.
  • the L-band and H-band signal streams may pass to a noise suppression unit (e.g., L-band NS unit 265 and H-band NS unit 260 as parts of noise suppression unit 225 shown in FIG. 2) where the L-band signal stream undergoes noise suppression processing by first being transformed (e.g., using the discrete Fourier Transform (DFT)) to the frequency-domain, as indicated by step 315. While the L- band signal stream is transformed to the frequency-domain in step 315, the H-band signal stream remains in the time-domain.
  • DFT discrete Fourier Transform
  • step 320 noise estimation and filtering are performed on the L-band signal stream in the frequency-domain.
  • step 325 it is determined whether an H-band signal stream is present (e.g., a super-wideband signal was found to be present in step 305 and the input signal was split in step 335). If an H-band signal stream is found to be present in step 325, then the process moves to step 340 where a gain is generated for the H- band signal stream using noise suppression data from the L-band signal stream, as described in greater detail below. If no H-band signal stream is found to be present in step 325, then the process continues to step 330 where the noise-suppressed L-band signal stream is transformed back to time-domain from the frequency-domain (e.g., using inverse-DFT).
  • steps 315, 320, and 330 shown in FIG. 3 may collectively constitute noise suppression processing performed on the L-band signal stream.
  • steps 315, 320, and 330 may be performed by a L-band noise suppression unit (e.g., L-band NS unit 265 shown in FIG. 2).
  • steps 345, 350, 355 and 360 shown in FIG. 3 may collectively constitute noise suppression processing performed on the H-band signal stream, and may be performed by a H-band noise suppression unit (e.g., H-band NS unit 260 shown in FIG. 2).
  • the L-band signal stream may be processed in steps 315, 320, and 330 on a frame-by-frame basis as a wideband audio signal (e.g., frequency spectrum of 0-8kHz).
  • a wideband audio signal e.g., frequency spectrum of 0-8kHz.
  • the L-band signal stream may undergo certain pre-processing steps as part of the transformation step 31 , including buffering, windowing, and Fourier transformation (not shown in FIG. 3).
  • noise estimation and filtering of the L-band signal stream in step 320 may include one or more substeps or sub-processes performed on each frame of the signal.
  • an initial noise estimation may be obtained, followed by a speech/noise likelihood determination, an update to the initial noise estimate, and then application of a Wiener filter to reduce or suppress noise in the frame.
  • step 330 of FIG. 3 may include certain processes necessary to convert each frame of the L-band signal stream back to the time-domain, such as inverse FFT, scaling, and window synthesis steps.
  • step 325 If it is determined in step 325 that an H-band signal stream is present (e.g., a super-wideband signal was found to be present in step 305 and the input signal was split in step 335), then in step 340, frequency-based speech/noise probability and frequency-based gain filter measurements are obtained from data generated by the noise estimation and filtering of the L-band signal stream in step 320.
  • the frequency- based speech/noise probability i Hi
  • Y ⁇ m represents the observed noise input spectral coefficient
  • ⁇ F ⁇ denotes feature data of the L-band signal, which may include, for example, a spectral flatness measure, harmonic peak pitch, template matching, etc.
  • H represents the state of speech being present in the frequency bin k and frame index m
  • q represents a prior speech/noise probability of the L- band signal based on the features ⁇ Fj of the L-band signal.
  • the frequency-based speech/noise probability obtained in step 340 and expressed above, is also referred to as the "frequency-based speech probability.”
  • the frequency-based gain filter H ⁇ d (k, m) obtained in step 340 may be expressed as the following:
  • the frequency-based speech/noise probability measurement (which, as described above, is also referred to herein as the frequency-based speech probability) obtained in step 340 is used to determine a frame-based speech/noise probability for the riband signal stream.
  • an average measure of the frame-based speech/noise probability for the H-band signal stream is determined by computing a frequency average of the frequency-based speech/noise probability of the L-band signal stream: O. k-N-S where N refers to the highest frequency of the L-band signal stream (e.g., ⁇ 8kHz), and the quantity ⁇ determines the lower bound of the frequency average for the L-band.
  • the upper portion of the L-band signal frequency spectrum is used to determine the speech/noise probability for the H-band signal stream, the basis for doing so being that the upper portion of the L-band signal spectrum (rather than the full L-band spectrum) is more representative of the signal spectrum in the H-band.
  • the frame-based speech/noise probability derived for the H-band signal stream using the frequency-based speech probability in step 345 is also referred to as the "frame-based speech probability" for the H-band signal stream.
  • the H-band signal speech/noise probability (which, as described above, is also referred to herein as the H-band signal speech probability) determined in step 345 is used to extract a gain for the H-band signal frame.
  • the average of the L-band gain is computed as: where the quantity ⁇ determines the lower bound of the frequency average for the L-band.
  • step 355 the final frame-based (or single upper band) gain Gf ina i for the H-band signal stream is computed as a weighted average of the two gains obtained in step 225:
  • G fmal wG l + (l - w)G 2
  • w is the weight parameter for the two gain terms.
  • the weight parameter is selected to be 0.5.
  • step 355 the process moves to step 360 where the final upper band gain is applied to the H-band signal stream in the time- domain.
  • step 365 the noise-suppressed H-band and L-band signal streams are synthesized (e.g., noise-suppressed H-band signal stream 255 and L-band signal stream 245 synthesized by synthesis filter bank 235 shown in FIG. 2) into the full-band signal and then output as a noise-suppressed (e.g., speech-enhanced) signal in step 370.
  • a noise-suppressed e.g., speech-enhanced
  • the noise-suppressed H-band and L-band signal streams may be synthesized in step 365 by passing both streams through a filter bank (e.g., synthesis filter bank 235 shown in FIG. 2) similar to the analysis filter bank (e.g., analysis filter bank 220 shown in FIG. 2) used to split up the input signal in step 335.
  • a filter bank e.g., synthesis filter bank 235 shown in FIG. 2
  • analysis filter bank e.g., analysis filter bank 220 shown in FIG. 220 shown in FIG.
  • FIG. 4 is a block diagram illustrating an example computing device 400 that is arranged for multipath routing in accordance with one or more embodiments of the present disclosure.
  • computing device 400 typically includes one or more processors 410 and system memory 420.
  • a memory bus 430 may be used for communicating between the processor 410 and the system memory 420.
  • processor 410 can be of any type including but not limited to a microprocessor ( ⁇ ), a microcontroller ( ⁇ ), a digital signal processor (DSP), or any combination thereof.
  • Processor 410 may include one or more levels of caching, such as a level one cache 41 1 and a level two cache 412, a processor core 413, and registers 414.
  • the processor core 413 may include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof.
  • a memory controller 415 can also be used with the processor 410, or in some embodiments the memory controller 415 can be an internal part of the processor 410.
  • system memory 420 can be of any type including but not limited to volatile memory (e.g., RAM), non-volatile memory (e.g., ROM, flash memory, etc.) or any combination thereof.
  • System memory 420 typically includes an operating system 421, one or more applications 422, and program data 424.
  • application 422 includes a multipath processing algorithm 423 that is configured to pass a noisy input signal to a noise suppression component.
  • the multipath processing algorithm is further arranged to pass a noise-suppressed output from the noise suppression component to other components in the signal processing pathway.
  • Program Data 424 may include multipath routing data 425 that is useful for passing a noisy input signal along multiple signal pathways to, for example, a noise suppression component such that the component receives the noisy signal before the signal has been manipulated or altered by other audio processing.
  • Computing device 400 can have additional features and/or functionality, and additional interfaces to facilitate communications between the basic configuration 401 and any required devices and interfaces.
  • a bus/interface controller 440 can be used to facilitate communications between the basic configuration 401 and one or more data storage devices 450 via a storage interface bus 441.
  • the data storage devices 450 can be removable storage devices 451, non-removable storage devices 452, or any combination thereof.
  • removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), tape drives and the like.
  • Example computer storage media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, and/or other data.
  • System memory 420, removable storage 451 and non-removable storage 452 are all examples of computer storage media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 400. Any such computer storage media can be part of computing device 400.
  • Computing device 400 can also include an interface bus 442 for facilitating communication from various interface devices (e.g., output interfaces, peripheral interfaces, communication interfaces, etc.) to the basic configuration 401 via the bus/interface controller 440.
  • Example output devices 460 include a graphics processing unit 461 and an audio processing unit 462, either or both of which can be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 463.
  • Example peripheral interfaces 470 include a serial interface controller 471 or a parallel interface controller 472, which can be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 473.
  • An example communication device 480 includes a network controller 481, which can be arranged to facilitate communications with one or more other computing devices 490 over a network communication (not shown) via one or more communication ports 482.
  • the communication connection is one example of a communication media.
  • Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media.
  • a "modulated data signal" can be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media can include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared (IR) and other wireless media.
  • RF radio frequency
  • IR infrared
  • the term computer readable media as used herein can include both storage media and communication media.
  • Computing device 400 can be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
  • a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
  • PDA personal data assistant
  • Computing device 400 can also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
  • ASICs Application Specific Integrated Circuits
  • FPGAs Field Programmable Gate Arrays
  • DSPs digital signal processors
  • ASICs Application Specific Integrated Circuits
  • FPGAs Field Programmable Gate Arrays
  • DSPs digital signal processors
  • some aspects of the embodiments described herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computer systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof.
  • processors e.g., as one or more programs running on one or more microprocessors
  • firmware e.g., as one or more programs running on one or more microprocessors
  • designing the circuitry and/or writing the code for the software and/or firmware would be well within the skill of one of skilled in the art in light of the present disclosure.
  • Examples of a signal-bearing medium include, but are not limited to, the following: a recordable-type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, a computer memory, etc.; and a transmission-type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
  • a recordable-type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, a computer memory, etc.
  • a transmission-type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
  • a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and non-volatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities).
  • a typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network computing/communication systems.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Noise Elimination (AREA)

Abstract

A time-domain filter bank splits a super-wideband input signal into two signal streams, a low-frequency band (L-band) stream and a high-frequency band (H-band) stream. The L-band stream contains the lower frequency components of the received signal, and the H-band stream contains the higher frequency components of the signal. Noise suppression processing performed on the L-band stream in the frequency-domain generates speech/noise probability and gain filter data that is used for noise suppression processing of the H-band stream, which remains in the time-domain.

Description

METHOD FOR SUPER- WIDEBAND NOISE SUPPRESSION FIELD OF THE INVENTION
[0001] The present disclosure generally relates to systems and methods for transmission of audio signals such as voice communications. More specifically, aspects of the present disclosure relate to performing noise suppression on a signal in the time-domain using noise suppression data generated for a corresponding signal in the frequency-domain.
BACKGROUND
[0002] There is currently no robust, low-complexity method for providing effective noise suppression for super-wideband input speech signals, where super-wideband refers to signals with a sampling rate above wideband sampling rate, for example, 32kHz (as compared to 8kHz and 16kHz for narrowband and wideband, respectively). The difficultly lies with determining how to suppress only the relevant noise in the high-frequency band (H-band) of an input signal without also removing or suppressing aspects of the underlying speech that give the richer and fuller speech sound typical of super-wideband clean input. Because the low-frequency band (L-band) of the input signal contains most of the speech energy, it is generally very difficult to distinguish noise from actual speech in the H-band. This difficulty leads to over-suppressing or under-suppressing noise in the H-band, resulting in low-quality noise suppression for the overall signal.
SUMMARY
[0003] This Summary introduces a selection of concepts in a simplified form in order to provide a basic understanding of some aspects of the present disclosure. This Summary is not an extensive overview of the disclosure, and is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. This Summary merely presents some of the concepts of the disclosure as a prelude to the Detailed Description provided below. [0004] One embodiment of the present disclosure relates to a method for noise suppression comprising receiving an input signal at a first filter bank of a noise suppression module; splitting, by the first filter bank, the received signal into a first signal stream and a second signal stream, the first signal stream including a first range of frequencies and the second signal stream including a second range of frequencies higher than the first range of frequencies; deriving a gain for the second signal stream based on data collected from the first signal stream; applying the derived gain to the second signal stream in time-domain; and synthesizing at a second filter bank of the noise suppression module the first signal stream and the second signal stream into an output signal.
[0005] In another embodiment of the disclosure, the method for noise suppression may further comprise computing a frequency average of a frequency-based speech probability of the first signal stream; deriving a frame-based speech probability for the second signal stream based on the frequency average of the frequency-based speech probability of the first signal stream; using the frame-based speech probability of the second signal stream to compute a first gain for the second signal stream; computing a frequency average of a frequency-based gain filter of the first signal stream; using the frequency average of the frequency-based gain filter of the first signal stream to compute a second gain for the second signal stream; and deriving a single upper band gain to be applied to the second signal stream based on a weighted average of the first gain of the second signal stream and the second gain of the second signal stream.
[0006] In another embodiment of the disclosure, the method of noise suppression may further comprise computing a frequency average for the first and second gains of the second signal stream by averaging the first and second gains of the second signal stream over an upper portion of a frequency range of the first signal stream.
[0007] Another embodiment of the disclosure relates to a noise suppression system comprising a first filter bank configured to split a received input signal into a first signal stream and a second signal stream, where the first signal stream includes a first range of frequencies and the second signal stream includes a second range of frequencies higher than the first range of frequencies; a noise suppression module configured to receive, from the first filter bank, the first and second signal streams, derive a gain for the second signal stream based on data collected from the first signal stream, and apply the derived gain to the second signal stream in time-domain; and a second filter bank configured to, in response to the noise suppression module applying the derived gain to the second signal stream, synthesize the first signal stream and the second signal stream into an output signal.
[0008] In another embodiment of the disclosure, the noise suppression module is further configured to process the first signal stream in frequency-domain to generate a frequency- based speech probability and a frequency-based gain filter for the first signal stream.
[0009] In another embodiment of the disclosure, the noise suppression module is further configured to compute a frequency average of a frequency-based speech probability of the first signal stream; derive a frame-based speech probability for the second signal stream based on the frequency average of the frequency-based speech probability of the first signal stream; use the frame-based speech probability of the second signal stream to compute a first gain for the second signal stream; compute a frequency average of a frequency-based gain filter of the first signal stream; use the frequency average of the frequency-based gain filter of the first signal stream to compute a second gain for the second signal stream; and derive a single upper band gain to be applied to the second signal stream based on a weighted average of the first gain of the second signal stream and the second gain of the second signal stream.
[0010] In still another embodiment of the disclosure, the noise suppression module is further configured to compute a frequency average for the first and second gains of the second signal stream by averaging the first and second gains of the second signal stream over an upper portion of a frequency range of the first signal stream.
[0011] In other embodiments of the disclosure, the methods and systems described herein may optionally include one or more of the following additional features: the derived gain applied to the second signal stream is a single upper band gain, the data collected from the first signal stream is a speech probability and gain filter, the speech probability and gain filter collected from the first signal stream includes at least a frequency-based speech probability and a frequency-based gain filter for the first signal stream, processing the first signal stream in frequency-domain generates the frequency-based speech probability and the frequency- based gain filter for the first signal stream, the weighted average includes a weighting value for the first and second gains of the second signal stream, where the weighting value is based on the frequency-based speech probability of the first signal stream, the first range of frequencies includes frequencies between 0kHz and 8kHz, and the second range of frequencies includes frequencies between 8kHz and 16kHz, each of the first and second filter banks is an infinite impulse response filter bank or a finite impulse response filter bank, and/or each of the first and second filter banks include a plurality of all-pass filters.
[0012] Further scope of applicability of the present invention will become apparent from the Detailed Description given below. However, it should be understood that the Detailed Description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this Detailed Description.
BRIEF DESCRIPTION OF DRAWINGS
[0013] These and other objects, features and characteristics of the present disclosure will become more apparent to those skilled in the art from a study of the following Detailed Description in conjunction with the appended claims and drawings, all of which form a part of this specification. In the drawings:
[0014] Figure 1 is a block diagram of a representative embodiment in which one or more aspects described herein may be implemented. [0015] Figure 2 is a communication diagram illustrating example data flows in connection with noise suppression processing according to one or more embodiments described herein.
[0016] Figure 3 is a flowchart illustrating an example method for super-wideband noise suppression according to one or more embodiments described herein.
[0017] Figure 4 is a block diagram illustrating an example computing device arranged for multipath routing and processing of audio input signals according to one or more embodiments described herein.
[0018] The headings provided herein are for convenience only and do not necessarily affect the scope or meaning of the claimed invention.
[0019] In the drawings, the same reference numerals and any acronyms identify elements or acts with the same or similar structure or functionality for ease of understanding and convenience. The drawings will be described in detail in the course of the following Detailed Description.
DETAILED DESCRIPTION
[0020] Various examples of the invention will now be described. The following description provides specific details for a thorough understanding and enabling description of these examples. One skilled in the relevant art will understand, however, that the invention may be practiced without many of these details. Likewise, one skilled in the relevant art will also understand that the invention can include many other obvious features not described in detail herein. Additionally, some well-known structures or functions may not be shown or described in detail below, so as to avoid unnecessarily obscuring the relevant description.
[0021] The terminology used below is to be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a detailed description of certain specific examples of the invention. Indeed, certain terms may even be emphasized below. The following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents thereof.
[0022] In at least some embodiments of the present disclosure, noise suppression occurs in the frequency domain, where both noise estimation and noise filtering processes are performed. As will be described in greater detail herein, the noise suppression processing performed in the frequency domain involves a low-frequency band portion of a received signal, with a high-frequency band portion of the signal remaining in the time-domain.
[0023] In one or more embodiments described herein, a time-domain filter bank (e.g., a chain of all-pass filters such as polyphase IIR filters, a finite impulse response (FIR) filter bank, etc.) splits a super-wideband input signal (e.g., an input signal with a sampling rate of 32kHz, 48kHz, etc.) into two signal streams, a low-frequency band (L-band) stream and a high-frequency band (H-band) stream. In one example, the L-band signal stream contains portions of the received signal that include lower-frequency components (e.g., components with frequencies in the range of 0-8kHz, or 0-12kHz, etc.), and the H-band signal stream contains portions of the signal that include higher-frequency components (e.g., components with frequencies in the range of 8- 16kHz, or 12-24kHz, etc.). Noise suppression processing performed on the L-band stream in the frequency-domain generates noise suppression data (e.g., speech/noise probability and gain filter measurements for the L-band) that is used for noise suppression processing of the H-band stream, which remains in the time-domain.
[0024] In some embodiments of the disclosure, the L-band and H-band signal streams may include components in frequency ranges other than the exemplary frequency ranges used herein. In various examples described below, the frequency ranges of 0-8kHz and 8- 16kHz are used for the L-band and H-band signal streams, respectively. These are exemplary frequency ranges used for purposes of describing various features of the disclosure. These exemplary frequency ranges are not intended to limit the scope of the disclosure in any way. Instead, numerous other frequency ranges may be used for the L-band and/or H-band signal streams in addition to or instead of those used in the various examples described herein. For example, in a scenario where audio is sampled at 48kHz, a frequency range of 0-12kHz may be used for the L-band signal stream and a frequency range of 12-24kHz used for the H-band signal stream. In a different scenario, frequency ranges of 0-7kHz and 7-20kHz may be used for the L-band and H-band signal streams, respectively.
[0025] Additionally, the terms "narrowband," "wideband," and "super-wideband" are sometimes used herein to refer to audio signals with sampling rates at or above certain threshold sampling rates, or with sampling rates within certain ranges. These terms may also be used relative to one another in describing audio signals with particular sampling rates. For example, "super-wideband" is sometimes used herein to refer to audio signals with a sampling rate above wideband sampling rate of, e.g., 16kHz. As such, in describing various aspects of the disclosure, super-wideband is used to refer to audio signals sampled at a higher rate of, e.g., 32kHz or 48kHz. It should be understood that such use of the terms "narrowband," "wideband," and/or "super-wideband" are not in any way intended to limit the scope of the disclosure.
[0026] As will be described in greater detail herein, various aspects of the disclosure may be implemented in a noise suppression module. The noise suppression module may be part of a larger audio communications system and may include one or more submodules, units, and/or components that perform some of the processes described herein. In at least one example, the noise suppression module is located at the near-end (e.g., render stream) environment of a signal transmission path.
[0027] In at least some embodiments, H-band noise suppression processing includes applying a frame-based gain to the time-domain H-band signal stream, where the frame- based gain is derived from L-band noise suppression data generated in the frequency-domain. As used herein, "frame-based gain" indicates that the H-band gain varies with the frame index of the input signal and does have a frequency dependency. Additionally, the frame- based gain is also referred to herein as a "single upper band gain" (e.g., where a single gain (e.g., factor) is applied to the whole upper frequency band (e.g., H-band) of the signal). In some arrangements, if a received audio signal is determined to only contain narrowband and/or wideband frequency components (e.g., the signal is sampled at a rate of 8kHz or 16kHz), then the entire signal may be taken as the L-band signal stream, with noise suppression processing only acting on the L-band. In such situations, a filter bank is not used to split the received signal into H-band and L-band streams, and although a pointer for the H- band may be passed to the noise suppression module, no H-band processing is actually performed.
[0028] FIG. 1 and the following discussion provide a brief, general description of a representative embodiment in which various aspects of the present disclosure may be implemented. As shown in FIG. 1, a noise suppression module 140 may be located at the near-end environment of a signal transmission path, along with a capture device 105 also at the near-end and a render device 130 located at the far-end environment. In some arrangements, noise suppression module 140 may be one component in a larger system for audio (e.g., voice) communications. The noise suppression module 140 may be an independent component in such a larger system or may be a subcomponent within an independent component (not shown) of the system. In the example embodiment illustrated in FIG. 1 , noise suppression module 140 is arranged to receive and process input from capture device 105 and generate output to, e.g., one or more other audio processing components (not shown). These other audio processing components may be acoustic echo control (AEC), automatic gain control (AGC), and/or other audio quality improvement components. In some embodiments, these other processing components may receive input from capture device 105 prior to noise suppression module 140 receiving such input.
[0029] Capture device 105 may be any of a variety of audio input devices, such as one or more microphones configured to capture sound and generate input signals. Render device 130 may be any of a variety of audio output devices, including a loudspeaker or group of loudspeakers configured to output sound of one or more channels. For example, capture device 105 and render device 130 may be hardware devices internal to a computer system, or external peripheral devices connected to a computer system via wired and/or wireless connections. In some arrangements, capture device 105 and render device 130 may be components of a single device, such as a speakerphone, telephone handset, etc. Additionally, one or both of capture device 105 and render device 130 may include analog-to-digital and/or digital-to-analog transformation functionalities.
[0030] In at least the embodiment shown in FIG. 1 , noise suppression module 140 includes a controller 150 for coordinating various processes performed therein, and monitoring and/or adapting timing considerations for such processes. Noise suppression module 140 may also include a sampling unit 115, an analysis filter bank 120 and a synthesis filter bank 135, and a noise suppression unit 125. Each of these units may be in communication with controller 150 such that controller 150 facilitates some of the processes performed by and between the units. Details of the sampling unit 115, analysis filter bank 120, synthesis filter bank 135, and noise suppression unit 125 will be further described below.
[0031] Noise suppression unit 125 may include one or more smaller units or subunits, such as H-band noise suppression (NS) unit 160 and L-band NS unit 165. As shown, the sampling unit 115 and the analysis filter bank 120 may be in communication with L-band NS unit 165 in a manner that is separate from any such communications between H-band NS unit 160 and the sampling unit 115 or the analysis filter bank 120. For example, signals, signal information, signal data, etc., may be passed to L-band NS unit 165 from either or both of sampling unit 115 and analysis filter bank 120 without such signals also being passed to H- band NS unit 160. Furthermore, L-band NS unit 165 and H-band NS unit 160 may have separate lines of communication with synthesis filter bank 135. For example, as will be described in greater detail below, L-band NS unit 165 and H-band NS unit 160 may each pass separate signal streams of different frequency ranges to synthesis filter bank 135 so the streams may be recombined into a full signal. [0032] Additionally, L-band NS unit 165 and H-band NS unit 160 may be in communication with each other within noise suppression unit 125. For example, noise suppression processing performed by L-band NS unit 165 in the frequency-domain may generate noise suppression data that is passed to H-band NS unit 160 for noise suppression processing in the time-domain.
[0033] In one or more other embodiments of the present disclosure, one or more other components, modules, units, etc., may be included as part of noise suppression module 140, in addition to or instead of those illustrated in FIG. 1. Furthermore, the names used to identify the units and components included as part of noise suppression module 140 (e.g., "sampling unit," "analysis filter bank," "L-band NS unit," etc.) are exemplary in nature, and are not in any way intended to limit the scope of the disclosure.
[0034] FIG. 2 is a communication diagram illustrating example data flows for noise suppression processing according to at least some embodiments of the disclosure. As shown, a capture device 205 may pass an input signal 210 to a sampling unit 215. In some arrangements, sampling unit 215 may be a part or component of a noise suppression module (e.g., noise suppression module 140 shown in FIG. 1). Sampling unit 215 may sample input signal 210 at a rate of, for example, 8kHz, 16kHz, 32kHz, etc. for some time interval (e.g., ten milliseconds (ms)), depending on whether input signal 210 is narrowband, wideband, super-wideband, etc. in nature. Example time intervals that may be used by sampling unit 215 in one or more arrangements include 10 ms, 20 ms, 30 ms, etc. For a low sample rate signal 219 (e.g., input signal 210 sampled at a rate of 8kHz (e.g., narrowband), 16kHz (e.g., wideband), etc., by sampling unit 215), the entire input signal 210 is passed to a noise suppression unit 225 as L-band signal stream 222, bypassing an analysis filter bank 220. However, a high sample rate signal 217 (e.g., super-wideband input, input with a 32kHz sampling rate (320 samples) and including a frequency spectrum of 0-16kHz, and the like) is passed from sampling unit 215 to analysis filter bank 220. [0035] Although FIG. 2 shows low sample rate signal 219 and high sample rate signal 217 being out from sampling unit 215, it is important to note that each of these signals is input signal 210 simply identified under a different name for purposes of explanation. In other words, sampling unit 215 determines, based on a sampling rate of input signal 210 (e.g., 8kHz, 16kHz, etc., which in some arrangements may be identified as "low sample rate signal 219," or 32kHz, 48kHz, etc., which is some arrangements may be identified as "high sample rate signal 217"), whether input signal 210 is passed directly to noise suppression unit 225 as L-band signal stream 222, or whether input signal 210 is instead passed to analysis filter bank 220.
[0036] Where sampling unit 215 samples input signal 210 at a high sample rate (e.g., 32kHz, 48kHz, etc.), sampling unit 215 passes input signal 210 to analysis filter bank 220. Analysis filter bank 220 splits input signal 210 into two signal streams, L-band signal stream 222 and H-band signal stream 224. In one example, L-band signal stream 222 contains lower-frequency components (e.g., components with frequencies in the range of 0-8kHz, or 0-12kHz, etc.) of input signal 210, and H-band signal stream 224 contains higher-frequency components (e.g., components with frequencies in the range of 8- 16kHz, or 12-24kHz, etc.) of input signal 210. These same example frequency ranges for L-band signal stream 222 and H-band signal stream 224 may also be used in situations where input signal 210 bypasses analysis filter bank 220 and goes directly to noise suppression unit 225, such as when input signal 210 is output from sampling unit 215 as low sample rate signal 219. In other examples, L-band signal stream 222 and H-band signal stream 224 may each contain components of various other frequency ranges different from the example ranges described above.
[0037] Both L-band signal stream 222 and H-band signal stream 224 are passed to noise suppression unit 225 to undergo noise suppression processing as will be described below. In at least some arrangements, noise suppression unit 225 may include a L-band NS unit 265 and a H-band NS unit 260. L-band NS unit 265 may receive L-band signal stream 222 and perform noise-suppression processing as if L-band signal stream 222 is a wideband audio signal (e.g., 0-8kHz frequency spectrum). As will be described in greater detail below, noise suppression processing performed by L-band NS unit 265 on L-band signal stream 222 generates L-band NS data 280. In at least one example, L-band NS data 280 includes frequency-based speech/noise probability and gain filter data for L-band signal stream 222 that is passed to H-band NS unit 260 for noise suppression processing of H-band signal stream 224 in the time-domain. As such, H-band NS unit 260 is active when input signal 210 includes super-wideband input (e.g., input signal 210 passes to analysis filter bank 220 as a high sample rate signal 217) and executes noise suppression processing on H-band signal stream 224 after receiving L-band NS data 280 from L-band NS unit 265.
[0038] Noise suppression unit 225 outputs noise-suppressed L-band stream 245 and noise-suppressed H-band stream 255 as a result of, for example, processing performed by L- band NS unit 265 and H-band NS unit 260. In a scenario where input signal 210 bypasses analysis filter bank 220 and instead passes from sampling unit 215 to noise suppression unit 225 as a low sample rate signal 219, noise-suppressed L-band stream 245 may similarly bypass a synthesis filter bank 235 and pass directly as a noise-suppressed output signal 270. In such a scenario, where the entire input signal 210 is taken as L-band signal stream 222, noise-suppressed L-band stream 245 is taken as the entire noise-suppressed output signal 270. In other scenarios, noise-suppressed L-band stream 245 and noise-suppressed H-band stream 255 pass to synthesis filter bank 235 where the streams are recombined or synthesized into a full band signal (e.g., 0-16kHz frequency spectrum) that forms the noise-suppressed output signal 270.
[0039J As will be described in greater detail herein, while noise suppression is performed on the L-band signal stream (e.g., by L-band NS unit 265 on L-band signal stream 222 shown in FIG. 2) in the frequency domain, noise suppression is performed on the H-band signal stream (e.g., by H-band NS unit 260 on H-band signal stream 224) in the time-domain. For example, during noise suppression on the L-band signal stream, certain pre-processing is performed including buffering, windowing, and Fourier transformation to map the L-band signal stream to the frequency domain. Although buffering similar to that applied to the L- band signal stream may also be applied to the H-band signal stream during noise suppression processing on the H-band signal stream, no such windowing or Fourier transformation occurs on the H-band signal stream since it remains in the time-domain. Performing noise suppression on the H-band signal stream in the time-domain, rather than processing the H- band signal stream in the frequency domain, lowers related complexity costs and allows for more stable noise suppression to be achieved. This is partly because there is not a lot of energy present in the H-band signal stream. Accordingly, a more robust approach is to derive a frame-based gain from the L-band signal stream, which is then frequency averaged over a portion of the L-band, as will be further described below.
[0040] In at least one arrangement, either or both of analysis filter bank 220 and/or synthesis filter bank 235 are each comprised of a chain of three all-pass filters, e.g., a class of polyphase infinite impulse response (IIR) filters. In one or more other arrangements, any of a variety of two-channel filter banks may be used alone or in combination to form analysis filter bank 220 and/or synthesis filter bank 235.
[0041] FIG. 3 is a flowchart illustrating a method for noise suppression processing of super-wideband audio input according to one or more embodiments of the disclosure. In step 300, an input signal is sampled (e.g., input signal 210 sampled by sampling unit 215 shown in FIG. 2). In step 305, a determination is made as to whether the input signal is super- wideband in quality based on a sampling rate of the input signal from step 300. In at least one example, the input signal may be sampled at a rate of 8kHz, 16kHz, or 32kHz, depending on whether the signal is narrowband, wideband, or super-wideband, respectively. For purposes of this and other examples, an input signal may be considered a super-wideband signal if it is sampled at a rate of 32kHz in step 300. In further examples, an input signal may be considered a super-wideband signal if it is sampled at a higher rate, such as 48kHz. [0042] If it is determined in step 305 that the input signal is a super-wideband signal, then in step 335 the input signal is split into low-frequency band (L-band) and high-frequency band (H-band) signal streams (e.g.. L-band signal stream 222 and H-band signal stream 224 shown in FIG. 2). In at least one example, the input signal is split into L-band and H-band signal streams by a filter bank (e.g., analysis filter bank 220 shown in FIG. 2), with the L- band signal stream containing lower-frequency (e.g., 0-8kHz) components of the input signal and the H-band signal stream containing higher-frequency (e.g., 8- 16kHz) components of the input signal. If instead it is determined in step 305 that the input signal is not a super- wideband signal, then the process moves to step 310 where the entire input signal is taken as the L-band signal stream. In this scenario, the input signal is not split into separate signal streams, but instead bypasses any filter bank (e.g., analysis filter bank 220 shown in FIG. 2) as the L-band signal stream.
[0043] The process continues from either of steps 310 or 335 to step 315, where the L- band signal stream is transformed to the frequency-domain as part of noise suppression processing of the input signal. In some embodiments, the L-band and H-band signal streams may pass to a noise suppression unit (e.g., L-band NS unit 265 and H-band NS unit 260 as parts of noise suppression unit 225 shown in FIG. 2) where the L-band signal stream undergoes noise suppression processing by first being transformed (e.g., using the discrete Fourier Transform (DFT)) to the frequency-domain, as indicated by step 315. While the L- band signal stream is transformed to the frequency-domain in step 315, the H-band signal stream remains in the time-domain. In step 320, noise estimation and filtering are performed on the L-band signal stream in the frequency-domain. In step 325, it is determined whether an H-band signal stream is present (e.g., a super-wideband signal was found to be present in step 305 and the input signal was split in step 335). If an H-band signal stream is found to be present in step 325, then the process moves to step 340 where a gain is generated for the H- band signal stream using noise suppression data from the L-band signal stream, as described in greater detail below. If no H-band signal stream is found to be present in step 325, then the process continues to step 330 where the noise-suppressed L-band signal stream is transformed back to time-domain from the frequency-domain (e.g., using inverse-DFT).
[0044] In one or more embodiments, steps 315, 320, and 330 shown in FIG. 3 may collectively constitute noise suppression processing performed on the L-band signal stream. For example, steps 315, 320, and 330 may be performed by a L-band noise suppression unit (e.g., L-band NS unit 265 shown in FIG. 2). Similarly, steps 345, 350, 355 and 360 shown in FIG. 3, which will be described in greater detail below, may collectively constitute noise suppression processing performed on the H-band signal stream, and may be performed by a H-band noise suppression unit (e.g., H-band NS unit 260 shown in FIG. 2).
[0045] In at least some arrangements, the L-band signal stream may be processed in steps 315, 320, and 330 on a frame-by-frame basis as a wideband audio signal (e.g., frequency spectrum of 0-8kHz). For example, for noise suppression processing to be performed on the L-band signal stream in the frequency domain, the L-band signal stream may undergo certain pre-processing steps as part of the transformation step 31 , including buffering, windowing, and Fourier transformation (not shown in FIG. 3).
[0046] Additionally, noise estimation and filtering of the L-band signal stream in step 320 may include one or more substeps or sub-processes performed on each frame of the signal. In at least one example, for each frame of the L-band signal stream, an initial noise estimation may be obtained, followed by a speech/noise likelihood determination, an update to the initial noise estimate, and then application of a Wiener filter to reduce or suppress noise in the frame. Furthermore, step 330 of FIG. 3 may include certain processes necessary to convert each frame of the L-band signal stream back to the time-domain, such as inverse FFT, scaling, and window synthesis steps.
[0047] If it is determined in step 325 that an H-band signal stream is present (e.g., a super-wideband signal was found to be present in step 305 and the input signal was split in step 335), then in step 340, frequency-based speech/noise probability and frequency-based gain filter measurements are obtained from data generated by the noise estimation and filtering of the L-band signal stream in step 320. In at least one example, the frequency- based speech/noise probability i(Hi|¾m),{ }) may be expressed as follows, where superscript "L" is included to indicate that the computation is derived from L-band noise suppression data:
PL (Hl \ Yk(m), {F}) =
qAk + \ - q where Y^m) represents the observed noise input spectral coefficient, {F} denotes feature data of the L-band signal, which may include, for example, a spectral flatness measure, harmonic peak pitch, template matching, etc., H represents the state of speech being present in the frequency bin k and frame index m, and q represents a prior speech/noise probability of the L- band signal based on the features {Fj of the L-band signal. In various embodiments described herein, the frequency-based speech/noise probability, obtained in step 340 and expressed above, is also referred to as the "frequency-based speech probability."
[0048] Additionally, the frequency-based gain filter H^d (k, m) obtained in step 340 may be expressed as the following:
HUM- A <mL where pk is a prior speech-to-noise ratio estimated for the L-band signal stream.
[0049] In step 345, the frequency-based speech/noise probability measurement (which, as described above, is also referred to herein as the frequency-based speech probability) obtained in step 340 is used to determine a frame-based speech/noise probability for the riband signal stream. In at least one arrangement, an average measure of the frame-based speech/noise probability for the H-band signal stream is determined by computing a frequency average of the frequency-based speech/noise probability of the L-band signal stream: O. k-N-S where N refers to the highest frequency of the L-band signal stream (e.g., ~8kHz), and the quantity δ determines the lower bound of the frequency average for the L-band. An example implementation may set δ ~ (N I 2) = 4kHz (e.g., where the L-band signal stream contains components in the frequency range of 0-8kHz). As expressed this way, the upper portion of the L-band signal frequency spectrum is used to determine the speech/noise probability for the H-band signal stream, the basis for doing so being that the upper portion of the L-band signal spectrum (rather than the full L-band spectrum) is more representative of the signal spectrum in the H-band. In some embodiments described herein, the frame-based speech/noise probability derived for the H-band signal stream using the frequency-based speech probability in step 345 is also referred to as the "frame-based speech probability" for the H-band signal stream.
[0050] The process continues to step 350, where the H-band signal speech/noise probability (which, as described above, is also referred to herein as the H-band signal speech probability) determined in step 345 is used to extract a gain for the H-band signal frame. In at least one example, the gain for the H-band frame may be obtained by a mapping of the form G\ = 0.5 * [l+tanh(vf(2x-l))], with x = (PM ^ and w being a width parameter of the mapping. Furthermore, to provide for continuity of the gain from the L-band into the H- band, the average of the L-band gain is computed as: where the quantity δ determines the lower bound of the frequency average for the L-band. As described above, in a scenario where the L-band signal stream contains components in the frequency range of 0-8kHz, one example implementation may set δ ~ (Ν/ 2) = 4kHz. [0051] In step 355, the final frame-based (or single upper band) gain Gfinai for the H-band signal stream is computed as a weighted average of the two gains obtained in step 225:
Gfmal = wGl + (l - w)G2 where w is the weight parameter for the two gain terms. In at least one arrangement, the weight parameter is selected to be 0.5. In one or more other arrangements, the weight arameter is conditioned on the frequency average of the speech probability, for example w =
[0052] Once the final upper band gain Gfimi is determined in step 355, the process moves to step 360 where the final upper band gain is applied to the H-band signal stream in the time- domain. In step 365, the noise-suppressed H-band and L-band signal streams are synthesized (e.g., noise-suppressed H-band signal stream 255 and L-band signal stream 245 synthesized by synthesis filter bank 235 shown in FIG. 2) into the full-band signal and then output as a noise-suppressed (e.g., speech-enhanced) signal in step 370. In at least some arrangements, the noise-suppressed H-band and L-band signal streams may be synthesized in step 365 by passing both streams through a filter bank (e.g., synthesis filter bank 235 shown in FIG. 2) similar to the analysis filter bank (e.g., analysis filter bank 220 shown in FIG. 2) used to split up the input signal in step 335.
[0053] FIG. 4 is a block diagram illustrating an example computing device 400 that is arranged for multipath routing in accordance with one or more embodiments of the present disclosure. In a very basic configuration 401 , computing device 400 typically includes one or more processors 410 and system memory 420. A memory bus 430 may be used for communicating between the processor 410 and the system memory 420.
[0054] Depending on the desired configuration, processor 410 can be of any type including but not limited to a microprocessor (μΡ), a microcontroller (μθ), a digital signal processor (DSP), or any combination thereof. Processor 410 may include one or more levels of caching, such as a level one cache 41 1 and a level two cache 412, a processor core 413, and registers 414. The processor core 413 may include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof. A memory controller 415 can also be used with the processor 410, or in some embodiments the memory controller 415 can be an internal part of the processor 410.
[0055] Depending on the desired configuration, the system memory 420 can be of any type including but not limited to volatile memory (e.g., RAM), non-volatile memory (e.g., ROM, flash memory, etc.) or any combination thereof. System memory 420 typically includes an operating system 421, one or more applications 422, and program data 424. In at least some embodiments, application 422 includes a multipath processing algorithm 423 that is configured to pass a noisy input signal to a noise suppression component. The multipath processing algorithm is further arranged to pass a noise-suppressed output from the noise suppression component to other components in the signal processing pathway. Program Data 424 may include multipath routing data 425 that is useful for passing a noisy input signal along multiple signal pathways to, for example, a noise suppression component such that the component receives the noisy signal before the signal has been manipulated or altered by other audio processing.
[0056] Computing device 400 can have additional features and/or functionality, and additional interfaces to facilitate communications between the basic configuration 401 and any required devices and interfaces. For example, a bus/interface controller 440 can be used to facilitate communications between the basic configuration 401 and one or more data storage devices 450 via a storage interface bus 441. The data storage devices 450 can be removable storage devices 451, non-removable storage devices 452, or any combination thereof. Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), tape drives and the like. Example computer storage media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, and/or other data.
[0057] System memory 420, removable storage 451 and non-removable storage 452 are all examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 400. Any such computer storage media can be part of computing device 400.
[0058] Computing device 400 can also include an interface bus 442 for facilitating communication from various interface devices (e.g., output interfaces, peripheral interfaces, communication interfaces, etc.) to the basic configuration 401 via the bus/interface controller 440. Example output devices 460 include a graphics processing unit 461 and an audio processing unit 462, either or both of which can be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 463. Example peripheral interfaces 470 include a serial interface controller 471 or a parallel interface controller 472, which can be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 473. An example communication device 480 includes a network controller 481, which can be arranged to facilitate communications with one or more other computing devices 490 over a network communication (not shown) via one or more communication ports 482. The communication connection is one example of a communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. A "modulated data signal" can be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media can include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared (IR) and other wireless media. The term computer readable media as used herein can include both storage media and communication media.
[0059] Computing device 400 can be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Computing device 400 can also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
[0060] There is little distinction left between hardware and software implementations of aspects of systems; the use of hardware or software is generally (but not always, in that in certain contexts the choice between hardware and software can become significant) a design choice representing cost versus efficiency tradeoffs. There are various vehicles by which processes and/or systems and/or other technologies described herein can be effected (e.g., hardware, software, and/or firmware), and the preferred vehicle will vary with the context in which the processes and/or systems and/or other technologies are deployed. For example, if an implementer determines that speed and accuracy are paramount, the implementer may opt for a mainly hardware and/or firmware vehicle; if flexibility is paramount, the implementer may opt for a mainly software implementation. In one or more other scenarios, the implementer may opt for some combination of hardware, software, and/or firmware.
[0061] The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof.
[0062] In one or more embodiments, several portions of the subject matter described herein may be implemented via Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), digital signal processors (DSPs), or other integrated formats. However, those skilled in the art will recognize that some aspects of the embodiments described herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computer systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof. Those skilled in the art will further recognize that designing the circuitry and/or writing the code for the software and/or firmware would be well within the skill of one of skilled in the art in light of the present disclosure.
[0063] Additionally, those skilled in the art will appreciate that the mechanisms of the subject matter described herein are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the subject matter described herein applies regardless of the particular type of signal-bearing medium used to actually carry out the distribution. Examples of a signal-bearing medium include, but are not limited to, the following: a recordable-type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, a computer memory, etc.; and a transmission-type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
[0064] Those skilled in the art will also recognize that it is common within the art to describe devices and/or processes in the fashion set forth herein, and thereafter use engineering practices to integrate such described devices and/or processes into data processing systems. That is, at least a portion of the devices and/or processes described herein can be integrated into a data processing system via a reasonable amount of experimentation. Those having skill in the art will recognize that a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and non-volatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities). A typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network computing/communication systems.
[0065] With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
[0066] While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.

Claims

Claims We claim:
1. A method for noise suppression comprising: receiving an input signal at a first filter bank of a noise suppression module; splitting, by the first filter bank, the received signal into a first signal stream and a second signal stream, the first signal stream including a first range of frequencies and the second signal stream including a second range of frequencies higher than the first range of frequencies; deriving a gain for the second signal stream based on data collected from the first signal stream; applying the derived gain to the second signal stream in time-domain; and in response to applying the derived gain to the second signal stream, synthesizing at a second filter bank of the noise suppression module the first signal stream and the second signal stream into an output signal.
2. The method of claim 1, wherein the derived gain applied to the second signal stream is a single upper band gain.
3. The method of claim 1, wherein the data collected from the first signal stream is a speech probability and gain filter.
4. The method of claim 3, wherein the speech probability and gain filter collected from the first signal stream includes at least a frequency-based speech probability and a frequency- based gain filter for the first signal stream.
5. The method of claim 4, wherein processing the first signal stream in frequency- domain generates the frequency-based speech probability and the frequency-based gain filter for the first signal stream.
6. The method of claim 4, further comprising: computing a frequency average of the frequency-based speech probability of the first signal stream; deriving a frame-based speech probability for the second signal stream based on the frequency average of the frequency-based speech probability of the first signal stream; using the frame-based speech probability of the second signal stream to compute a first gain for the second signal stream; computing a frequency average of the frequency-based gain filter of the first signal stream; using the frequency average of the frequency-based gain filter of the first signal stream to compute a second gain for the second signal stream; and deriving a single upper band gain to be applied to the second signal stream based on a weighted average of the first gain of the second signal stream and the second gain of the second signal stream.
7. The method claim 6, wherein the weighted average includes a weighting value for the first and second gains of the second signal stream, the weighting value being based on the frequency-based speech probability of the first signal stream.
8. The method of claim 6, further comprising: computing a frequency average for the first and second gains of the second signal stream by averaging the first and second gains of the second signal stream over an upper portion of a frequency range of the first signal stream.
9. The method of claim 1, wherein the first range of frequencies includes frequencies between OkHz and 8kHz, and wherein the second range of frequencies includes frequencies between 8kHz and 16kHz.
10. The method of claim 1 , wherein each of the first and second filter banks is one of an infinite impulse response filter bank or a finite impulse response filter bank.
1 1. The method of claim 1, wherein each of the first and second filter banks include a plurality of all -pass filters.
12. A noise suppression system comprising: a first filter bank configured to split a received input signal into a first signal stream and a second signal stream, wherein the first signal stream includes a first range of frequencies and the second signal stream includes a second range of frequencies higher than the first range of frequencies; a noise suppression module configured to: receive, from the first filter bank, the first and second signal streams; derive a gain for the second signal stream based on data collected from the first signal stream; and apply the derived gain to the second signal stream in time-domain; and a second filter bank configured to, in response to the noise suppression module applying the derived gain to the second signal stream, synthesize the first signal stream and the second signal stream into an output signal.
13. The noise suppression system of claim 12, wherein the gain derived for the second signal stream is a single upper band gain.
14. The noise suppression system of claim 12, wherein the data collected from the first signal stream is a speech probability and gain filter.
15. The noise suppression system of claim 14, wherein the speech probability and gain filter collected from the first signal stream includes at least a frequency-based speech probability and a frequency-based gain filter for the first signal stream.
16. The noise suppression system of claim 15, wherein the noise suppression module is further configured to: process the first signal stream in frequency-domain to generate the frequency-based speech probability and the frequency-based gain filter for the first signal stream.
17. The noise suppression system of claim 15, wherein the noise suppression module is further configured to: compute a frequency average of the frequency-based speech probability of the first signal stream; derive a frame-based speech probability for the second signal stream based on the frequency average of the frequency-based speech probability of the first signal stream; use the frame-based speech probability of the second signal stream to compute a first gain for the second signal stream; compute a frequency average of the frequency-based gain filter of the first signal stream; use the frequency average of the frequency-based gain filter of the first signal stream to compute a second gain for the second signal stream; and derive a single upper band gain to be applied to the second signal stream based on a weighted average of the first gain of the second signal stream and the second gain of the second signal stream.
18. The noise suppression system of claim 17, wherein the weighted average includes a weighting value for the first and second gains of the second signal stream, the weighting value being based on the frequency-based speech probability of the first signal stream.
19. The noise suppression system of claim 17, wherein the noise suppression module is further configured to: compute a frequency average for the first and second gains of the second signal stream by averaging the first and second gains of the second signal stream over an upper portion of a frequency range of the first signal stream.
20. The noise suppression system of claim 12, wherein the first range of frequencies includes frequencies between OkHz and 8kHz, and wherein the second range of frequencies includes frequencies between 8kHz and 16kHz.
The noise suppression system of claim 12, wherein each of the first and second filter is one of an infinite impulse response filter bank or a finite impulse response filter
22. The noise suppression system of claim 12, wherein each of the first and second filter banks include a plurality of all-pass filters.
EP11722677.9A 2011-05-16 2011-05-16 Super-wideband noise supression Active EP2710590B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2011/036645 WO2012158157A1 (en) 2011-05-16 2011-05-16 Method for super-wideband noise supression

Publications (2)

Publication Number Publication Date
EP2710590A1 true EP2710590A1 (en) 2014-03-26
EP2710590B1 EP2710590B1 (en) 2015-10-07

Family

ID=44121273

Family Applications (1)

Application Number Title Priority Date Filing Date
EP11722677.9A Active EP2710590B1 (en) 2011-05-16 2011-05-16 Super-wideband noise supression

Country Status (2)

Country Link
EP (1) EP2710590B1 (en)
WO (1) WO2012158157A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9349383B2 (en) * 2013-01-29 2016-05-24 2236008 Ontario Inc. Audio bandwidth dependent noise suppression
EP2760022B1 (en) * 2013-01-29 2017-11-01 2236008 Ontario Inc. Audio bandwidth dependent noise suppression
US9741350B2 (en) 2013-02-08 2017-08-22 Qualcomm Incorporated Systems and methods of performing gain control
US9711156B2 (en) 2013-02-08 2017-07-18 Qualcomm Incorporated Systems and methods of performing filtering for gain determination
US9620134B2 (en) 2013-10-10 2017-04-11 Qualcomm Incorporated Gain shape estimation for improved tracking of high-band temporal characteristics
US10614816B2 (en) 2013-10-11 2020-04-07 Qualcomm Incorporated Systems and methods of communicating redundant frame information
US10083708B2 (en) 2013-10-11 2018-09-25 Qualcomm Incorporated Estimation of mixing factors to generate high-band excitation signal
US9384746B2 (en) 2013-10-14 2016-07-05 Qualcomm Incorporated Systems and methods of energy-scaled signal processing
US10163447B2 (en) 2013-12-16 2018-12-25 Qualcomm Incorporated High-band signal modeling
GB2597519B (en) * 2020-07-24 2022-12-07 Tgr1 618 Ltd Method and device for processing and providing audio information using bi-phasic separation and re-integration

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243496A1 (en) * 2005-01-21 2008-10-02 Matsushita Electric Industrial Co., Ltd. Band Division Noise Suppressor and Band Division Noise Suppressing Method
WO2011029484A1 (en) * 2009-09-14 2011-03-17 Nokia Corporation Signal enhancement processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2012158157A1 *

Also Published As

Publication number Publication date
WO2012158157A1 (en) 2012-11-22
EP2710590B1 (en) 2015-10-07

Similar Documents

Publication Publication Date Title
EP2710590B1 (en) Super-wideband noise supression
CN104424956B (en) Activate sound detection method and device
US7313518B2 (en) Noise reduction method and device using two pass filtering
CN101916567B (en) Speech enhancement method applied to dual-microphone system
CN102074245B (en) Dual-microphone-based speech enhancement device and speech enhancement method
CN103650040B (en) Use the noise suppressing method and device of multiple features modeling analysis speech/noise possibility
US20100198588A1 (en) Signal bandwidth extending apparatus
US20110191101A1 (en) Apparatus and Method for Processing an Audio Signal for Speech Enhancement Using a Feature Extraction
CN112992188B (en) Method and device for adjusting signal-to-noise ratio threshold in activated voice detection VAD judgment
US20170040027A1 (en) Frequency domain noise attenuation utilizing two transducers
CN102347028A (en) Double-microphone speech enhancer and speech enhancement method thereof
WO2020168981A1 (en) Wind noise suppression method and apparatus
CN102074246A (en) Dual-microphone based speech enhancement device and method
EP3757993B1 (en) Pre-processing for automatic speech recognition
EP2710591B1 (en) Reducing noise pumping due to noise suppression and echo control interaction
US7890319B2 (en) Signal processing apparatus and method thereof
EP2716023A1 (en) Control of adaptation step size and suppression gain in acoustic echo control
US9349383B2 (en) Audio bandwidth dependent noise suppression
US8736359B2 (en) Signal processing method, information processing apparatus, and storage medium for storing a signal processing program
EP2645738A1 (en) Signal processing device, signal processing method, and signal processing program
Roy Single channel speech enhancement using Kalman filter
Petrick et al. Robust front end processing for speech recognition in reverberant environments: Utilization of speech characteristics
EP2760022B1 (en) Audio bandwidth dependent noise suppression
Krishnamoorthy et al. Processing noisy speech for enhancement
EP2760021B1 (en) Sound field spatial stabilizer

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20131203

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAX Request for extension of the european patent (deleted)
RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 21/0224 20130101ALN20140825BHEP

Ipc: G10L 25/78 20130101ALN20140825BHEP

Ipc: G10L 21/0208 20130101AFI20140825BHEP

Ipc: G10L 21/0232 20130101ALN20140825BHEP

17Q First examination report despatched

Effective date: 20140904

REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Ref document number: 602011020318

Country of ref document: DE

Free format text: PREVIOUS MAIN CLASS: G10L0021020000

Ipc: G10L0021020800

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 21/0232 20130101ALN20150316BHEP

Ipc: G10L 21/0224 20130101ALN20150316BHEP

Ipc: G10L 25/78 20130101ALN20150316BHEP

Ipc: G10L 21/0208 20130101AFI20150316BHEP

INTG Intention to grant announced

Effective date: 20150415

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 754170

Country of ref document: AT

Kind code of ref document: T

Effective date: 20151015

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602011020318

Country of ref document: DE

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20151007

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 754170

Country of ref document: AT

Kind code of ref document: T

Effective date: 20151007

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160107

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160207

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160108

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160208

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602011020318

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160531

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

26N No opposition filed

Effective date: 20160708

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160516

Ref country code: BE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20160516

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160531

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160531

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20170131

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160531

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160516

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160516

REG Reference to a national code

Ref country code: DE

Ref legal event code: R082

Ref document number: 602011020318

Country of ref document: DE

Representative=s name: WUESTHOFF & WUESTHOFF, PATENTANWAELTE PARTG MB, DE

Ref country code: DE

Ref legal event code: R081

Ref document number: 602011020318

Country of ref document: DE

Owner name: GOOGLE LLC (N.D.GES.D. STAATES DELAWARE), MOUN, US

Free format text: FORMER OWNER: GOOGLE INC., MOUNTAIN VIEW, CALIF., US

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20110516

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

Ref country code: MT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160531

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20151007

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230505

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20240530

Year of fee payment: 14