EP2710590B1

EP2710590B1 - Super-wideband noise supression

Info

Publication number: EP2710590B1
Application number: EP11722677.9A
Authority: EP
Inventors: Marco Paniconi; Jan Skoglund
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2011-05-16
Filing date: 2011-05-16
Publication date: 2015-10-07
Anticipated expiration: 2031-05-16
Also published as: WO2012158157A1; EP2710590A1

Description

FIELD OF THE INVENTION

The present disclosure generally relates to systems and methods for transmission of audio signals such as voice communications. More specifically, aspects of the present disclosure relate to performing noise suppression on a signal in the time-domain using noise suppression data generated for a corresponding signal in the frequency-domain.

BACKGROUND

There is currently no robust, low-complexity method for providing effective noise suppression for super-wideband input speech signals, where super-wideband refers to signals with a sampling rate above wideband sampling rate, for example, 32kHz (as compared to 8kHz and 16kHz for narrowband and wideband, respectively). The difficultly lies with determining how to suppress only the relevant noise in the high-frequency band (H-band) of an input signal without also removing or suppressing aspects of the underlying speech that give the richer and fuller speech sound typical of super-wideband clean input. Because the low-frequency band (L-band) of the input signal contains most of the speech energy, it is generally very difficult to distinguish noise from actual speech in the H-band. This difficulty leads to over-suppressing or under-suppressing noise in the H-band, resulting in low-quality noise suppression for the overall signal.
Document WO 2001/029484 A1 describes a state of the art audio signal enhancement processing.
Document US 2008/243496 A1 relates to a band division noise suppressor suppressing noise. In the band division noise suppressor, a band dividing section divides an input voice signal into a low band voice signal and a high band voice signal. The low band voice signal is subjected to decimate at a decimation section, subjected to noise suppression at a low band noise suppressing section, and then interpolated at an interpolation section. On the other hand, the high band voice signal is subjected to noise suppression at a high band noise suppressing section. A band combination section composes the bands of low-band and high-band voice signals subjected to noise suppression and outputs a voice signal subjected to noise suppression over the entire band.

SUMMARY

This Summary introduces a selection of concepts in a simplified form in order to provide a basic understanding of some aspects of the present disclosure. This Summary is not an extensive overview of the disclosure, and is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. This Summary merely presents some of the concepts of the disclosure as a prelude to the Detailed Description provided below.
One embodiment of the present disclosure relates to a method for noise suppression according to independent claim 1.
Another embodiment of the disclosure relates to a noise suppression system according to independent claim 11.
Further scope of applicability of the present invention will become apparent from the Detailed Description given below. However, it should be understood that the Detailed Description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the scope of the invention will become apparent to those skilled in the art from this Detailed Description.

BRIEF DESCRIPTION OF DRAWINGS

These and other objects, features and characteristics of the present disclosure will become more apparent to those skilled in the art from a study of the following Detailed Description in conjunction with the appended claims and drawings, all of which form a part of this specification. In the drawings:

Figure 1 is a block diagram of a representative embodiment in which one or more aspects described herein may be implemented.
Figure 2 is a communication diagram illustrating example data flows in connection with noise suppression processing according to one or more embodiments described herein.
Figure 3 is a flowchart illustrating an example method for super-wideband noise suppression according to one or more embodiments described herein.
Figure 4 is a block diagram illustrating an example computing device arranged for multipath routing and processing of audio input signals according to one or more embodiments described herein.

The headings provided herein are for convenience only and do not necessarily affect the scope or meaning of the claimed invention.
In the drawings, the same reference numerals and any acronyms identify elements or acts with the same or similar structure or functionality for ease of understanding and convenience. The drawings will be described in detail in the course of the following Detailed Description.

DETAILED DESCRIPTION

Various examples of the invention will now be described. The following description provides specific details for a thorough understanding and enabling description of these examples. One skilled in the relevant art will understand, however, that the invention may be practiced without many of these details. Likewise, one skilled in the relevant art will also understand that the invention can include many other obvious features not described in detail herein. Additionally, some well-known structures or functions may not be shown or described in detail below, so as to avoid unnecessarily obscuring the relevant description.
The terminology used below is to be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a detailed description of certain specific examples of the invention. Indeed, certain terms may even be emphasized below. The following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents thereof.
In at least some embodiments of the present disclosure, noise suppression occurs in the frequency domain, where both noise estimation and noise filtering processes are performed. As will be described in greater detail herein, the noise suppression processing performed in the frequency domain involves a low-frequency band portion of a received signal, with a high-frequency band portion of the signal remaining in the time-domain.
In one or more embodiments described herein, a time-domain filter bank (e.g., a chain of all-pass filters such as polyphase IIR filters, a finite impulse response (FIR) filter bank, etc.) splits a super-wideband input signal (e.g., an input signal with a sampling rate of 32kHz, 48kHz, etc.) into two signal streams, a low-frequency band (L-band) stream and a high-frequency band (H-band) stream. In one example, the L-band signal stream contains portions of the received signal that include lower-frequency components (e.g., components with frequencies in the range of 0-8kHz, or 0-12kHz, etc.), and the H-band signal stream contains portions of the signal that include higher-frequency components (e.g., components with frequencies in the range of 8-16kHz, or 12-24kHz, etc.). Noise suppression processing performed on the L-band stream in the frequency-domain generates noise suppression data (e.g., speech/noise probability and gain filter measurements for the L-band) that is used for noise suppression processing of the H-band stream, which remains in the time-domain.
In some embodiments of the disclosure, the L-band and H-band signal streams may include components in frequency ranges other than the exemplary frequency ranges used herein. In various examples described below, the frequency ranges of 0-8kHz and 8-16kHz are used for the L-band and H-band signal streams, respectively. These are exemplary frequency ranges used for purposes of describing various features of the disclosure. These exemplary frequency ranges are not intended to limit the scope of the disclosure in any way. Instead, numerous other frequency ranges may be used for the L-band and/or H-band signal streams in addition to or instead of those used in the various examples described herein. For example, in a scenario where audio is sampled at 48kHz, a frequency range of 0-12kHz may be used for the L-band signal stream and a frequency range of 12-24kHz used for the H-band signal stream. In a different scenario, frequency ranges of 0-7kHz and 7-20kHz may be used for the L-band and H-band signal streams, respectively.
Additionally, the terms "narrowband," "wideband," and "super-wideband" are sometimes used herein to refer to audio signals with sampling rates at or above certain threshold sampling rates, or with sampling rates within certain ranges. These terms may also be used relative to one another in describing audio signals with particular sampling rates. For example, "super-wideband" is sometimes used herein to refer to audio signals with a sampling rate above wideband sampling rate of, e.g., 16kHz. As such, in describing various aspects of the disclosure, super-wideband is used to refer to audio signals sampled at a higher rate of, e.g., 32kHz or 48kHz. It should be understood that such use of the terms "narrowband," "wideband," and/or "super-wideband" are not in any way intended to limit the scope of the disclosure.
As will be described in greater detail herein, various aspects of the disclosure may be implemented in a noise suppression module. The noise suppression module may be part of a larger audio communications system and may include one or more submodules, units, and/or components that perform some of the processes described herein. In at least one example, the noise suppression module is located at the near-end (e.g., render stream) environment of a signal transmission path.
In at least some embodiments, H-band noise suppression processing includes applying a frame-based gain to the time-domain H-band signal stream, where the frame-based gain is derived from L-band noise suppression data generated in the frequency-domain. As used herein, "frame-based gain" indicates that the H-band gain varies with the frame index of the input signal and does have a frequency dependency. Additionally, the frame-based gain is also referred to herein as a "single upper band gain" (e.g., where a single gain (e.g., factor) is applied to the whole upper frequency band (e.g., H-band) of the signal). In some arrangements, if a received audio signal is determined to only contain narrowband and/or wideband frequency components (e.g., the signal is sampled at a rate of 8kHz or 16kHz), then the entire signal may be taken as the L-band signal stream, with noise suppression processing only acting on the L-band. In such situations, a filter bank is not used to split the received signal into H-band and L-band streams, and although a pointer for the H-band may be passed to the noise suppression module, no H-band processing is actually performed.
FIG. 1 and the following discussion provide a brief, general description of a representative embodiment in which various aspects of the present disclosure may be implemented. As shown in FIG. 1, a noise suppression module 140 may be located at the near-end environment of a signal transmission path, along with a capture device 105 also at the near-end and a render device 130 located at the far-end environment. In some arrangements, noise suppression module 140 may be one component in a larger system for audio (e.g., voice) communications. The noise suppression module 140 may be an independent component in such a larger system or may be a subcomponent within an independent component (not shown) of the system. In the example embodiment illustrated in FIG. 1, noise suppression module 140 is arranged to receive and process input from capture device 105 and generate output to, e.g., one or more other audio processing components (not shown). These other audio processing components may be acoustic echo control (AEC), automatic gain control (AGC), and/or other audio quality improvement components. In some embodiments, these other processing components may receive input from capture device 105 prior to noise suppression module 140 receiving such input.
Capture device 105 may be any of a variety of audio input devices, such as one or more microphones configured to capture sound and generate input signals. Render device 130 may be any of a variety of audio output devices, including a loudspeaker or group of loudspeakers configured to output sound of one or more channels. For example, capture device 105 and render device 130 may be hardware devices internal to a computer system, or external peripheral devices connected to a computer system via wired and/or wireless connections. In some arrangements, capture device 105 and render device 130 may be components of a single device, such as a speakerphone, telephone handset, etc. Additionally, one or both of capture device 105 and render device 130 may include analog-to-digital and/or digital-to-analog transformation functionalities.
In at least the embodiment shown in FIG. 1, noise suppression module 140 includes a controller 150 for coordinating various processes performed therein, and monitoring and/or adapting timing considerations for such processes. Noise suppression module 140 may also include a sampling unit 115, an analysis filter bank 120 and a synthesis filter bank 135, and a noise suppression unit 125. Each of these units may be in communication with controller 150 such that controller 150 facilitates some of the processes performed by and between the units. Details of the sampling unit 115, analysis filter bank 120, synthesis filter bank 135, and noise suppression unit 125 will be further described below.
Noise suppression unit 125 may include one or more smaller units or subunits, such as H-band noise suppression (NS) unit 160 and L-band NS unit 165. As shown, the sampling unit 115 and the analysis filter bank 120 may be in communication with L-band NS unit 165 in a manner that is separate from any such communications between H-band NS unit 160 and the sampling unit 115 or the analysis filter bank 120. For example, signals, signal information, signal data, etc., may be passed to L-band NS unit 165 from either or both of sampling unit 115 and analysis filter bank 120 without such signals also being passed to H-band NS unit 160. Furthermore, L-band NS unit 165 and H-band NS unit 160 may have separate lines of communication with synthesis filter bank 135. For example, as will be described in greater detail below, L-band NS unit 165 and H-band NS unit 160 may each pass separate signal streams of different frequency ranges to synthesis filter bank 135 so the streams may be recombined into a full signal.
Additionally, L-band NS unit 165 and H-band NS unit 160 may be in communication with each other within noise suppression unit 125. For example, noise suppression processing performed by L-band NS unit 165 in the frequency-domain may generate noise suppression data that is passed to H-band NS unit 160 for noise suppression processing in the time-domain.
In one or more other embodiments of the present disclosure, one or more other components, modules, units, etc., may be included as part of noise suppression module 140, in addition to or instead of those illustrated in FIG. 1. Furthermore, the names used to identify the units and components included as part of noise suppression module 140 (e.g., "sampling unit," "analysis filter bank," "L-band NS unit," etc.) are exemplary in nature, and are not in any way intended to limit the scope of the disclosure.
FIG. 2 is a communication diagram illustrating example data flows for noise suppression processing according to at least some embodiments of the disclosure. As shown, a capture device 205 may pass an input signal 210 to a sampling unit 215. In some arrangements, sampling unit 215 may be a part or component of a noise suppression module (e.g., noise suppression module 140 shown in FIG. 1). Sampling unit 215 may sample input signal 210 at a rate of, for example, 8kHz, 16kHz, 32kHz, etc. for some time interval (e.g., ten milliseconds (ms)), depending on whether input signal 210 is narrowband, wideband, super-wideband, etc. in nature. Example time intervals that may be used by sampling unit 215 in one or more arrangements include 10 ms, 20 ms, 30 ms, etc. For a low sample rate signal 219 (e.g., input signal 210 sampled at a rate of 8kHz (e.g., narrowband), 16kHz (e.g., wideband), etc., by sampling unit 215), the entire input signal 210 is passed to a noise suppression unit 225 as L-band signal stream 222, bypassing an analysis filter bank 220. However, a high sample rate signal 217 (e.g., super-wideband input, input with a 32kHz sampling rate (320 samples) and including a frequency spectrum of 0-16kHz, and the like) is passed from sampling unit 215 to analysis filter bank 220.
Although FIG. 2 shows low sample rate signal 219 and high sample rate signal 217 being out from sampling unit 215, it is important to note that each of these signals is input signal 210 simply identified under a different name for purposes of explanation. In other words, sampling unit 215 determines, based on a sampling rate of input signal 210 (e.g., 8kHz, 16kHz, etc., which in some arrangements may be identified as "low sample rate signal 219," or 32kHz, 48kHz, etc., which is some arrangements may be identified as "high sample rate signal 217"), whether input signal 210 is passed directly to noise suppression unit 225 as L-band signal stream 222, or whether input signal 210 is instead passed to analysis filter bank 220.
Where sampling unit 215 samples input signal 210 at a high sample rate (e.g., 32kHz, 48kHz, etc.), sampling unit 215 passes input signal 210 to analysis filter bank 220. Analysis filter bank 220 splits input signal 210 into two signal streams, L-band signal stream 222 and H-band signal stream 224. In one example, L-band signal stream 222 contains lower-frequency components (e.g., components with frequencies in the range of 0-8kHz, or 0-12kHz, etc.) of input signal 210, and H-band signal stream 224 contains higher-frequency components (e.g., components with frequencies in the range of 8-16kHz, or 12-24kHz, etc.) of input signal 210. These same example frequency ranges for L-band signal stream 222 and H-band signal stream 224 may also be used in situations where input signal 210 bypasses analysis filter bank 220 and goes directly to noise suppression unit 225, such as when input signal 210 is output from sampling unit 215 as low sample rate signal 219. In other examples, L-band signal stream 222 and H-band signal stream 224 may each contain components of various other frequency ranges different from the example ranges described above.
Both L-band signal stream 222 and H-band signal stream 224 are passed to noise suppression unit 225 to undergo noise suppression processing as will be described below. In at least some arrangements, noise suppression unit 225 may include a L-band NS unit 265 and a H-band NS unit 260. L-band NS unit 265 may receive L-band signal stream 222 and perform noise-suppression processing as if L-band signal stream 222 is a wideband audio signal (e.g., 0-8kHz frequency spectrum). As will be described in greater detail below, noise suppression processing performed by L-band NS unit 265 on L-band signal stream 222 generates L-band NS data 280. In at least one example, L-band NS data 280 includes frequency-based speech/noise probability and gain filter data for L-band signal stream 222 that is passed to H-band NS unit 260 for noise suppression processing of H-band signal stream 224 in the time-domain. As such, H-band NS unit 260 is active when input signal 210 includes super-wideband input (e.g., input signal 210 passes to analysis filter bank 220 as a high sample rate signal 217) and executes noise suppression processing on H-band signal stream 224 after receiving L-band NS data 280 from L-band NS unit 265.
Noise suppression unit 225 outputs noise-suppressed L-band stream 245 and noise-suppressed H-band stream 255 as a result of, for example, processing performed by L-band NS unit 265 and H-band NS unit 260. In a scenario where input signal 210 bypasses analysis filter bank 220 and instead passes from sampling unit 215 to noise suppression unit 225 as a low sample rate signal 219, noise-suppressed L-band stream 245 may similarly bypass a synthesis filter bank 235 and pass directly as a noise-suppressed output signal 270. In such a scenario, where the entire input signal 210 is taken as L-band signal stream 222, noise-suppressed L-band stream 245 is taken as the entire noise-suppressed output signal 270. In other scenarios, noise-suppressed L-band stream 245 and noise-suppressed H-band stream 255 pass to synthesis filter bank 235 where the streams are recombined or synthesized into a full band signal (e.g., 0-16kHz frequency spectrum) that forms the noise-suppressed output signal 270.
As will be described in greater detail herein, while noise suppression is performed on the L-band signal stream (e.g., by L-band NS unit 265 on L-band signal stream 222 shown in FIG. 2) in the frequency domain, noise suppression is performed on the H-band signal stream (e.g., by H-band NS unit 260 on H-band signal stream 224) in the time-domain. For example, during noise suppression on the L-band signal stream, certain pre-processing is performed including buffering, windowing, and Fourier transformation to map the L-band signal stream to the frequency domain. Although buffering similar to that applied to the L-band signal stream may also be applied to the H-band signal stream during noise suppression processing on the H-band signal stream, no such windowing or Fourier transformation occurs on the H-band signal stream since it remains in the time-domain. Performing noise suppression on the H-band signal stream in the time-domain, rather than processing the H-band signal stream in the frequency domain, lowers related complexity costs and allows for more stable noise suppression to be achieved. This is partly because there is not a lot of energy present in the H-band signal stream. Accordingly, a more robust approach is to derive a frame-based gain from the L-band signal stream, which is then frequency averaged over a portion of the L-band, as will be further described below.
In at least one arrangement, either or both of analysis filter bank 220 and/or synthesis filter bank 235 are each comprised of a chain of three all-pass filters, e.g., a class of polyphase infinite impulse response (IIR) filters. In one or more other arrangements, any of a variety of two-channel filter banks may be used alone or in combination to form analysis filter bank 220 and/or synthesis filter bank 235.
FIG. 3 is a flowchart illustrating a method for noise suppression processing of super-wideband audio input according to one or more embodiments of the disclosure. In step 300, an input signal is sampled (e.g., input signal 210 sampled by sampling unit 215 shown in FIG. 2). In step 305, a determination is made as to whether the input signal is super-wideband in quality based on a sampling rate of the input signal from step 300. In at least one example, the input signal may be sampled at a rate of 8kHz, 16kHz, or 32kHz, depending on whether the signal is narrowband, wideband, or super-wideband, respectively. For purposes of this and other examples, an input signal may be considered a super-wideband signal if it is sampled at a rate of 32kHz in step 300. In further examples, an input signal may be considered a super-wideband signal if it is sampled at a higher rate, such as 48kHz.
If it is determined in step 305 that the input signal is a super-wideband signal, then in step 335 the input signal is split into low-frequency band (L-band) and high-frequency band (H-band) signal streams (e.g., L-band signal stream 222 and H-band signal stream 224 shown in FIG. 2). In at least one example, the input signal is split into L-band and H-band signal streams by a filter bank (e.g., analysis filter bank 220 shown in FIG. 2), with the L-band signal stream containing lower-frequency (e.g., 0-8kHz) components of the input signal and the H-band signal stream containing higher-frequency (e.g., 8-16kHz) components of the input signal. If instead it is determined in step 305 that the input signal is not a super-wideband signal, then the process moves to step 310 where the entire input signal is taken as the L-band signal stream. In this scenario, the input signal is not split into separate signal streams, but instead bypasses any filter bank (e.g., analysis filter bank 220 shown in FIG. 2) as the L-band signal stream.
The process continues from either of steps 310 or 335 to step 315, where the L-band signal stream is transformed to the frequency-domain as part of noise suppression processing of the input signal. In some embodiments, the L-band and H-band signal streams may pass to a noise suppression unit (e.g., L-band NS unit 265 and H-band NS unit 260 as parts of noise suppression unit 225 shown in FIG. 2) where the L-band signal stream undergoes noise suppression processing by first being transformed (e.g., using the discrete Fourier Transform (DFT)) to the frequency-domain, as indicated by step 315. While the L-band signal stream is transformed to the frequency-domain in step 315, the H-band signal stream remains in the time-domain. In step 320, noise estimation and filtering are performed on the L-band signal stream in the frequency-domain. In step 325, it is determined whether an H-band signal stream is present (e.g., a super-wideband signal was found to be present in step 305 and the input signal was split in step 335). If an H-band signal stream is found to be present in step 325, then the process moves to step 340 where a gain is generated for the H-band signal stream using noise suppression data from the L-band signal stream, as described in greater detail below. If no H-band signal stream is found to be present in step 325, then the process continues to step 330 where the noise-suppressed L-band signal stream is transformed back to time-domain from the frequency-domain (e.g., using inverse-DFT).
In one or more embodiments, steps 315, 320, and 330 shown in FIG. 3 may collectively constitute noise suppression processing performed on the L-band signal stream. For example, steps 315, 320, and 330 may be performed by a L-band noise suppression unit (e.g., L-band NS unit 265 shown in FIG. 2). Similarly, steps 345, 350, 355 and 360 shown in FIG. 3, which will be described in greater detail below, may collectively constitute noise suppression processing performed on the H-band signal stream, and may be performed by a H-band noise suppression unit (e.g., H-band NS unit 260 shown in FIG. 2).
In at least some arrangements, the L-band signal stream may be processed in steps 315, 320, and 330 on a frame-by-frame basis as a wideband audio signal (e.g., frequency spectrum of 0-8kHz). For example, for noise suppression processing to be performed on the L-band signal stream in the frequency domain, the L-band signal stream may undergo certain pre-processing steps as part of the transformation step 315, including buffering, windowing, and Fourier transformation (not shown in FIG. 3).
Additionally, noise estimation and filtering of the L-band signal stream in step 320 may include one or more substeps or sub-processes performed on each frame of the signal. In at least one example, for each frame of the L-band signal stream, an initial noise estimation may be obtained, followed by a speech/noise likelihood determination, an update to the initial noise estimate, and then application of a Wiener filter to reduce or suppress noise in the frame. Furthermore, step 330 of FIG. 3 may include certain processes necessary to convert each frame of the L-band signal stream back to the time-domain, such as inverse FFT, scaling, and window synthesis steps.
If it is determined in step 325 that an H-band signal stream is present (e.g., a super-wideband signal was found to be present in step 305 and the input signal was split in step 335), then in step 340, frequency-based speech/noise probability and frequency-based gain filter measurements are obtained from data generated by the noise estimation and filtering of the L-band signal stream in step 320. In at least one example, the frequency-based speech/noise probability P^L (H ₁|Y_k (m),{F}) may be expressed as follows, where superscript "L" is included to indicate that the computation is derived from L-band noise suppression data: $P^{L} (H) (_{1} | Y_{k} (m), \{F\}) = \frac{q Δ_{k}}{q Δ_{k} + 1 - q}$
where Y_k (m) represents the observed noise input spectral coefficient, {F} denotes feature data of the L-band signal, which may include, for example, a spectral flatness measure, harmonic peak pitch, template matching, etc., H ₁ represents the state of speech being present in the frequency bin k and frame index m, and q represents a prior speech/noise probability of the L-band signal based on the features {F} of the L-band signal. In various embodiments described herein, the frequency-based speech/noise probability, P^L (H ₁|Y_k (m),{F}), obtained in step 340 and expressed above, is also referred to as the "frequency-based speech probability."
Additionally, the frequency-based gain filter $H_{w, dd}^{L} (k, m)$
obtained in step 340 may be expressed as the following: $H_{w, dd}^{L} (k, m) = \frac{ρ_{k} (m) ()}{1 + ρ_{k} (m)}$
where ρ_k is a prior speech-to-noise ratio estimated for the L-band signal stream.
In step 345, the frequency-based speech/noise probability measurement (which, as described above, is also referred to herein as the frequency-based speech probability) obtained in step 340 is used to determine a frame-based speech/noise probability for the H-band signal stream. In at least one arrangement, an average measure of the frame-based speech/noise probability for the H-band signal stream is determined by computing a frequency average of the frequency-based speech/noise probability of the L-band signal stream: $〈 P^{H} 〉 = \frac{1}{δ} \sum_{k = N - δ}^{N} P^{L} (H) (_{1} | Y_{k} (m), \{F\})$

where N refers to the highest frequency of the L-band signal stream (e.g., ∼8kHz), and the quantity δ determines the lower bound of the frequency average for the L-band. An example implementation may set δ ≈ (N / 2) = 4kHz (e.g., where the L-band signal stream contains components in the frequency range of 0-8kHz). As expressed this way, the upper portion of the L-band signal frequency spectrum is used to determine the speech/noise probability for the H-band signal stream, the basis for doing so being that the upper portion of the L-band signal spectrum (rather than the full L-band spectrum) is more representative of the signal spectrum in the H-band. In some embodiments described herein, the frame-based speech/noise probability derived for the H-band signal stream using the frequency-based speech probability in step 345 is also referred to as the "frame-based speech probability" for the H-band signal stream.
The process continues to step 350, where the H-band signal speech/noise probability (which, as described above, is also referred to herein as the H-band signal speech probability) determined in step 345 is used to extract a gain for the H-band signal frame. In at least one example, the gain for the H-band frame may be obtained by a mapping of the form G₁ = 0.5 * [1+tanh(w(2x-1))], with x = 〈P^H 〉 and w being a width parameter of the mapping. Furthermore, to provide for continuity of the gain from the L-band into the H-band, the average of the L-band gain is computed as: $G_{2} = \frac{1}{δ} \sum_{k = N - δ}^{N} H_{w, dd}^{L} (k)$
where the quantity δ determines the lower bound of the frequency average for the L-band. As described above, in a scenario where the L-band signal stream contains components in the frequency range of 0-8kHz, one example implementation may set δ ≈ (N / 2) = 4kHz.
In step 355, the final frame-based (or single upper band) gain G_final for the H-band signal stream is computed as a weighted average of the two gains obtained in step 225: $G_{final} = w G_{1} + (1 - w) G_{2}$
where w is the weight parameter for the two gain terms. In at least one arrangement, the weight parameter is selected to be 0.5. In one or more other arrangements, the weight parameter is conditioned on the frequency average of the speech probability, for example w = 0.25 if 〈P^H〉 > 0.5 and w = 0.5 if 〈P^H〉 ≤ 0.5.
Once the final upper band gain G_final is determined in step 355, the process moves to step 360 where the final upper band gain is applied to the H-band signal stream in the time-domain. In step 365, the noise-suppressed H-band and L-band signal streams are synthesized (e.g., noise-suppressed H-band signal stream 255 and L-band signal stream 245 synthesized by synthesis filter bank 235 shown in FIG. 2) into the full-band signal and then output as a noise-suppressed (e.g., speech-enhanced) signal in step 370. In at least some arrangements, the noise-suppressed H-band and L-band signal streams may be synthesized in step 365 by passing both streams through a filter bank (e.g., synthesis filter bank 235 shown in FIG. 2) similar to the analysis filter bank (e.g., analysis filter bank 220 shown in FIG. 2) used to split up the input signal in step 335.
FIG. 4 is a block diagram illustrating an example computing device 400 that is arranged for multipath routing in accordance with one or more embodiments of the present disclosure. In a very basic configuration 401, computing device 400 typically includes one or more processors 410 and system memory 420. A memory bus 430 may be used for communicating between the processor 410 and the system memory 420.
Depending on the desired configuration, processor 410 can be of any type including but not limited to a microprocessor (µP), a microcontroller (µC), a digital signal processor (DSP), or any combination thereof. Processor 410 may include one or more levels of caching, such as a level one cache 411 and a level two cache 412, a processor core 413, and registers 414. The processor core 413 may include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof. A memory controller 415 can also be used with the processor 410, or in some embodiments the memory controller 415 can be an internal part of the processor 410.
Depending on the desired configuration, the system memory 420 can be of any type including but not limited to volatile memory (e.g., RAM), non-volatile memory (e.g., ROM, flash memory, etc.) or any combination thereof. System memory 420 typically includes an operating system 421, one or more applications 422, and program data 424. In at least some embodiments, application 422 includes a multipath processing algorithm 423 that is configured to pass a noisy input signal to a noise suppression component. The multipath processing algorithm is further arranged to pass a noise-suppressed output from the noise suppression component to other components in the signal processing pathway. Program Data 424 may include multipath routing data 425 that is useful for passing a noisy input signal along multiple signal pathways to, for example, a noise suppression component such that the component receives the noisy signal before the signal has been manipulated or altered by other audio processing.
Computing device 400 can have additional features and/or functionality, and additional interfaces to facilitate communications between the basic configuration 401 and any required devices and interfaces. For example, a bus/interface controller 440 can be used to facilitate communications between the basic configuration 401 and one or more data storage devices 450 via a storage interface bus 441. The data storage devices 450 can be removable storage devices 451, non-removable storage devices 452, or any combination thereof. Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), tape drives and the like. Example computer storage media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, and/or other data.
System memory 420, removable storage 451 and non-removable storage 452 are all examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 400. Any such computer storage media can be part of computing device 400.
Computing device 400 can also include an interface bus 442 for facilitating communication from various interface devices (e.g., output interfaces, peripheral interfaces, communication interfaces, etc.) to the basic configuration 401 via the bus/interface controller 440. Example output devices 460 include a graphics processing unit 461 and an audio processing unit 462, either or both of which can be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 463. Example peripheral interfaces 470 include a serial interface controller 471 or a parallel interface controller 472, which can be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 473. An example communication device 480 includes a network controller 481, which can be arranged to facilitate communications with one or more other computing devices 490 over a network communication (not shown) via one or more communication ports 482. The communication connection is one example of a communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. A "modulated data signal" can be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media can include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared (IR) and other wireless media. The term computer readable media as used herein can include both storage media and communication media.
Computing device 400 can be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Computing device 400 can also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
There is little distinction left between hardware and software implementations of aspects of systems; the use of hardware or software is generally (but not always, in that in certain contexts the choice between hardware and software can become significant) a design choice representing cost versus efficiency tradeoffs. There are various vehicles by which processes and/or systems and/or other technologies described herein can be effected (e.g., hardware, software, and/or firmware), and the preferred vehicle will vary with the context in which the processes and/or systems and/or other technologies are deployed. For example, if an implementer determines that speed and accuracy are paramount, the implementer may opt for a mainly hardware and/or firmware vehicle; if flexibility is paramount, the implementer may opt for a mainly software implementation. In one or more other scenarios, the implementer may opt for some combination of hardware, software, and/or firmware.
The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof.
In one or more embodiments, several portions of the subject matter described herein may be implemented via Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), digital signal processors (DSPs), or other integrated formats. However, those skilled in the art will recognize that some aspects of the embodiments described herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computer systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof. Those skilled in the art will further recognize that designing the circuitry and/or writing the code for the software and/or firmware would be well within the skill of one of skilled in the art in light of the present disclosure.
Additionally, those skilled in the art will appreciate that the mechanisms of the subject matter described herein are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the subject matter described herein applies regardless of the particular type of signal-bearing medium used to actually carry out the distribution. Examples of a signal-bearing medium include, but are not limited to, the following: a recordable-type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, a computer memory, etc.; and a transmission-type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
Those skilled in the art will also recognize that it is common within the art to describe devices and/or processes in the fashion set forth herein, and thereafter use engineering practices to integrate such described devices and/or processes into data processing systems. That is, at least a portion of the devices and/or processes described herein can be integrated into a data processing system via a reasonable amount of experimentation. Those having skill in the art will recognize that a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and non-volatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities). A typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network computing/communication systems.
With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope being indicated by the following claims.

Claims

A method for noise suppression, the method characterized by:
receiving an input audio signal at a first filter bank (120) of a noise suppression module (140);

splitting, by the first filter bank, the received signal into a first signal stream (222) and a second signal stream (224), the first signal stream including a first range of frequencies and the second signal stream including a second range of frequencies higher than the first range of frequencies;

deriving a gain for the second signal stream based on speech probability and gain filter data collected from the first signal stream;

applying the derived gain to the second signal stream in time-domain; and

in response to applying the derived gain to the second signal stream, synthesizing at a second filter bank (135) of the noise suppression module the first signal stream and the second signal stream into an output signal.
The method of claim 1, characterized in that the derived gain applied to the second signal stream is a single upper band gain.
The method of claim 1, characterized in that the speech probability and gain filter data collected from the first signal stream includes at least a frequency-based speech probability and a frequency-based gain filter for the first signal stream.
The method of claim 3, characterized in that processing the first signal stream in frequency-domain generates the frequency-based speech probability and the frequency-based gain filter for the first signal stream.
The method of claim 3, further characterized by:
computing a frequency average of the frequency-based speech probability of the first signal stream;

deriving a frame-based speech probability for the second signal stream based on the frequency average of the frequency-based speech probability of the first signal stream;

using the frame-based speech probability of the second signal stream to compute a first gain for the second signal stream;

computing a frequency average of the frequency-based gain filter of the first signal stream;

using the frequency average of the frequency-based gain filter of the first signal stream to compute a second gain for the second signal stream; and

deriving a single upper band gain to be applied to the second signal stream based on a weighted average of the first gain of the second signal stream and the second gain of the second signal stream.
The method of claim 5, characterized in that the weighted average includes a weighting value for the first and second gains of the second signal stream, the weighting value being based on the frequency-based speech probability of the first signal stream.
The method of claim 5, further characterized by:
computing a frequency average for the first and second gains of the second signal stream by averaging the first and second gains of the second signal stream over an upper portion of a frequency range of the first signal stream.
The method of claim 1, characterized in that the first range of frequencies includes frequencies between 0kHz and 8kHz, and wherein the second range of frequencies includes frequencies between 8kHz and 16kHz.
The method of claim 1, characterized in that each of the first and second filter banks is one of an infinite impulse response filter bank or a finite impulse response filter bank.
The method of claim 1, characterized in that each of the first and second filter banks include a plurality of all-pass filters.
A noise suppression system, the noise suppression system characterized by:
a first filter bank (120) configured to split a received audio signal into a first signal stream (222) and a second signal stream (224), wherein the first signal stream includes a first range of frequencies and the second signal stream includes a second range of frequencies higher than the first range of frequencies;

a noise suppression module (140) configured to:
receive, from the first filter bank, the first and second signal streams;

derive a gain for the second signal stream based on speech probability and gain filter data collected from the first signal stream; and

apply the derived gain to the second signal stream in time-domain; and

a second filter bank (135) configured to, in response to the noise suppression module applying the derived gain to the second signal stream, synthesize the first signal stream and the second signal stream into an output signal.
The noise suppression system of claim 11, characterized in that the gain derived for the second signal stream is a single upper band gain.
The noise suppression system of claim 11, characterized in that the speech probability and gain filter data collected from the first signal stream includes at least a frequency-based speech probability and a frequency-based gain filter for the first signal stream.
The noise suppression system of claim 13, characterized in that the noise suppression module is further configured to:
process the first signal stream in frequency-domain to generate the frequency-based speech probability and the frequency-based gain filter for the first signal stream.
The noise suppression system of claim 13, characterized in that the noise suppression module is further configured to:
compute a frequency average of the frequency-based speech probability of the first signal stream;

derive a frame-based speech probability for the second signal stream based on the frequency average of the frequency-based speech probability of the first signal stream;

use the frame-based speech probability of the second signal stream to compute a first gain for the second signal stream;

compute a frequency average of the frequency-based gain filter of the first signal stream;

use the frequency average of the frequency-based gain filter of the first signal stream to compute a second gain for the second signal stream; and

derive a single upper band gain to be applied to the second signal stream based on a weighted average of the first gain of the second signal stream and the second gain of the second signal stream.
The noise suppression system of claim 15, characterized in that the weighted average includes a weighting value for the first and second gains of the second signal stream, the weighting value being based on the frequency-based speech probability of the first signal stream.
The noise suppression system of claim 15, characterized in that the noise suppression module is further configured to:
compute a frequency average for the first and second gains of the second signal stream by averaging the first and second gains of the second signal stream over an upper portion of a frequency range of the first signal stream.
The noise suppression system of claim 11, characterized in that the first range of frequencies includes frequencies between 0kHz and 8kHz, and wherein the second range of frequencies includes frequencies between 8kHz and 16kHz.
The noise suppression system of claim 11, characterized in that each of the first and second filter banks is one of an infinite impulse response filter bank or a finite impulse response filter bank.
The noise suppression system of claim 11, characterized in that each of the first and second filter banks include a plurality of all-pass filters.