WO2011029484A1 - Signal enhancement processing - Google Patents

Signal enhancement processing Download PDF

Info

Publication number
WO2011029484A1
WO2011029484A1 PCT/EP2009/061884 EP2009061884W WO2011029484A1 WO 2011029484 A1 WO2011029484 A1 WO 2011029484A1 EP 2009061884 W EP2009061884 W EP 2009061884W WO 2011029484 A1 WO2011029484 A1 WO 2011029484A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
signal component
parameter
value
enhancement processing
Prior art date
Application number
PCT/EP2009/061884
Other languages
French (fr)
Inventor
Thomas Esch
Peter Vary
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Priority to PCT/EP2009/061884 priority Critical patent/WO2011029484A1/en
Publication of WO2011029484A1 publication Critical patent/WO2011029484A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Definitions

  • This invention relates to the field of signal enhancement processing of audio signals.
  • the quality of today's telephone speech was designed to achieve a sufficient intelligibility.
  • the acoustic bandwidth in telephony systems is typically limited to the frequency range between 300 Hz and 3.4 kHz.
  • this typical "telephone sound" cannot satisfy the increased demands as the perceived speech quality is considerably reduced compared to the full audio bandwidth.
  • various wideband (50 Hz - 7 kHz) speech codecs have been developed in the past (e.g., the Adaptive Multi-Rate (AMR) Wideband Codec) and are about to be introduced in current mobile networks .
  • Most of these codecs are mainly designed for nearly noise-free input speech signals and may not perform well when the input signal is degraded by acoustic background noise.
  • noise suppression techniques may be required for wideband communication systems.
  • noise reduction algorithms have become part of many digital speech coding and speech processing systems. They are used for example in mobile communications, in hearing aids and in hands-free devices.
  • One of the popular methods for enhancing degraded speech is based on modeling the noisy input coefficients in the short-time Fourier transform (STFT) domain and to apply individual adaptive gains for each frequency bin.
  • STFT short-time Fourier transform
  • the processing applied to implement such techniques has been derived for narrowband signals under certain assumptions about the statistics of the speech and noise signals .
  • a method comprises estimating at least one parameter based on a first signal component of an audio signal, the at least one parameter being associated with a second signal component of the audio signal, wherein the first signal component is associated with a first frequency band and the second signal component is associated with a second frequency band, and performing a signal enhancement processing on the second signal component at least based on the at least one parameter.
  • a first apparatus which comprises means for estimating at least one parameter based on a first signal component of an audio signal, the at least one parameter being associated with a second signal component of the audio signal, wherein the first signal component is associated with a first frequency band and the second signal component is associated with a second frequency band, and wherein the first apparatus comprises means for performing a signal enhancement processing on the second signal component at least based on the at least one parameter.
  • the means of this apparatus can be implemented in hardware and/or software. They may comprise for instance a processor for executing computer program code for realizing the required functions, a memory storing the program code, or both. Alternatively, they could comprise for instance a circuit that is designed to realize the required functions, for instance implemented in a chipset or a chip, like an integrated circuit.
  • a second apparatus which comprises at least one processor and at least one memory including computer program code, the at least one memory and the computer program code, with the at least one processor, configured to cause the apparatus at least to perform the actions of the presented method.
  • a computer readable storage medium in which computer program code is stored.
  • the computer program code causes an apparatus to realize the actions of the presented method when executed by a processor.
  • the computer readable storage medium could be for example a disk or a memory or the like.
  • the memory may represent a memory card such as SD and micro SD cards or any other well-suited memory cards or memory sticks.
  • the computer program code could be stored in the computer readable storage medium in the form of instructions encoding the
  • the computer readable storage medium may be intended for taking part in the operation of a device, like an internal or external hard disk of a computer, or be intended for distribution of the program code, like an optical disc.
  • the signal enhancement processing may represent noise reduction processing, or a voice enhancement processing or any other well-suited signal enhancement processing .
  • the first frequency band differs from the second frequency band.
  • the first frequency band may span a frequency range from fl to f2
  • the second frequency band may span a frequency range from f3 to f , wherein f3>fl and f4>f2 holds.
  • the first and second frequency band may have an overlapping frequency range, i.e. f3 ⁇ f2 holds, but as another example, the first and second frequency band may not overlap, i.e. f3>f2 may hold.
  • the first frequency band may span a frequency range from 50 Hz to 4 kHz
  • the second frequency band may span a frequency range from 4 to 7 kHz. But of course, any other well suited frequency assignments for the first and second frequency range may be applied.
  • the estimated at least one parameter may represent at least one parameter which may be beneficial to increase the quality of the signal enhancement processing.
  • the type of the at least one parameter may depend on the kind of signal enhancement processing .
  • the first signal component may be extracted from the audio signal by means of filtering, e.g. by a lowpass filter or a bandpass filter
  • the second signal component may be extracted from the audio signal by means of filtering, e.g. by a highpass filter or a bandpass filter.
  • At least one parameter of the at least one parameter may represent estimations of signal properties of the second signal component.
  • these signal properties may comprise statistical information of the spectral characteristics of the second signal component.
  • at least one parameter of the at least one parameter may represent a set of spectral envelope representatives of the second signal component.
  • this set of spectral envelope representatives may comprise estimations of representations of the spectral envelope of the second signal component.
  • this set of spectral envelope representatives may comprise at least one energy value, wherein each of the at least one energy value is associated with one frequency sub-band of the second signal component.
  • at least one parameter of the at least one parameter may represent further information with respect to the second signal component.
  • the estimation of the at least one parameter is not based on the second signal component.
  • a signal enhancement processing is performed to the second signal component.
  • spectral dependencies may be exploited between the first signal component and the second signal component in order to perform the signal enhancement on the second signal component. Accordingly, spectral dependencies between the first frequency band and the second frequency band may be used to improve the signal enhancement processing.
  • the signal enhancement processing on the second signal may be performed by means of at least one filter, e.g. in the time-domain or in the frequency domain.
  • one filter of this at least one filter may be configured to be adapted to at least one value, i.e., this at least one value may be configured to be used to perform a signal enhancement processing to the second signal component.
  • this at least one value may represent at least one filter coefficient.
  • the signal enhancement processing to the second signal component may be performed based on the at least one parameter by means of the at least one filter, wherein the at least one value may be configured to be used to perform a signal enhancement processing at least based on the at least one parameter.
  • this at least one value may comprise at least one value configured to be used to perform a signal enhancement processing to the second signal component in the time-domain.
  • this at least one value may represent at least one filter coefficient of a filter configured to filter the second signal component in the time-domain.
  • this filter may represent a Finite Impulse Response (FIR) filter or an Infinite Impulse Response (IIR) filter.
  • this at least one value may be determined based on a Minimum Mean Squared Error (MMSE) criterion or a Zero Forcing (ZF) criterion or any other well-suited criterion.
  • MMSE Minimum Mean Squared Error
  • ZF Zero Forcing
  • a signal enhancement processing to the second signal component may be performed by filtering the second signal component by means of the respective filter in the time-domain.
  • the at least one value may comprise at least one value configured to be used to perform a signal enhancement processing to the second signal component in the frequency-domain.
  • the second signal component in the frequency domain may be represented by M F spectral coefficients at frequency bin k and frame ⁇ given by:
  • Y 2 ( ,k) S 2 ( ,k) + N 2 (X,k) , (1)
  • S 2 ( ,k) and N 2 ( ,k) may represent the spectral coefficients of the audio and the noise signal of the second signal component, respectively.
  • the first signal component in the frequency domain may be represented by M F spectral coefficients at frequency bin k and frame ⁇ given by
  • Y x ( ,k) S x ( ,k) + N x ( ,k) , (2) where S ⁇ X,k) and ⁇ ( ⁇ , ⁇ ) may represent the spectral
  • the frame index ⁇ is omitted in the following.
  • a converter may be configured to output M F subcomponents representing the first signal component and a converter may be configured to output M F subcomponents representing the second signal component in the
  • each subcomponent of the M P subcomponents is associated with one of M F sub-bands in the first and second frequency band.
  • this converter may represent a Fourier Transformation, e.g. a FFT, a DFT, a STFT or any other well-suited transformation.
  • the at least one filter may be adapted to the at least one value and may be configured to perform a signal enhancement processing to the second signal component in the frequency domain. Accordingly, the at least one filter may be configured to output an enhanced second signal component in the frequency domain, wherein the at least one filter is configured to perform this signal enhancement based on the at least one parameter.
  • the at least one value may comprise at least two gain factors, wherein each of the at least two gain factors is associated with a separate sub-band of the second frequency band.
  • these at least two gain factors may be determined based on a MMSE criterion or a ZF criterion or any other well-suited criterion.
  • a signal enhancement processing to the second signal component may be performed based on weighting a frequency component of the second signal associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors.
  • each of the at least two gain factors is associated with one sub-band of the second frequency band.
  • a signal enhancement processing is performed on the first signal component in order to obtain an enhanced first signal component.
  • the signal enhancement processing performed on the first signal component may represent a conventional signal enhancement processing.
  • This signal enhancement processing may be of the same type as the signal enhancement processing performed on the second signal component.
  • an enhanced second signal component obtained by signal enhancement processing performed on the second signal component may be combined with the enhanced first signal component to an enhanced audio signal.
  • the enhanced audio signal may be fed to a further signal processing, e.g. , to a speech encoder or any other well suited processing .
  • said performing a signal enhancement processing on the second signal may be based on a combined signal enhancement processing.
  • this combined signal enhancement processing may be based on a first signal enhancement processing and a second signal enhancement processing, both of the first and second signal enhancement processing may be configured to be applied to the second signal component, wherein the first signal enhancement processing is based on the at least one parameter and the second signal enhancement processing is not based on the at least one parameter.
  • the second signal enhancement processing may represent a conventional signal enhancement processing.
  • the first signal enhancement processing may be applied to the second signal component in order to obtain a first enhanced second signal component and the second signal enhancement processing may be applied to the second signal component in order to obtain a second enhanced second signal component, and the first enhanced second signal component and the second enhanced second signal component may be combined to the enhanced first signal component.
  • the first signal enhancement processing may be combined with the second signal enhancement processing in order to be applied as combined signal enhancement processing on the second signal component.
  • This example will be explained in more detail in the sequel of this description .
  • a signal enhancement processing is performed on the first signal component in order to obtain an enhanced first signal component, wherein said estimating at least one parameter is performed on basis of the enhanced first signal component.
  • the enhanced first signal component may be used to estimate the at least one parameter, which may lead to increased quality of the estimated at least one parameter.
  • any determining or estimation or extraction based on the first signal component may be based on the enhanced first signal component.
  • At least one parameter of said at least one parameter represents a set of spectral envelope representatives of the second signal component.
  • this set of spectral envelope representatives may be estimations of representations of the spectral envelope of the second signal component.
  • this set of spectral envelope representatives may comprise at least one energy value, wherein each of the at least one energy value is associated with one frequency sub-band of the second signal component.
  • at least one parameter of the at least one parameter may represent further information with respect to the second signal component.
  • At least one feature is determined based on the first signal component, the at least one feature representing at least signal parameter being associated with the first signal component, wherein said estimating the at least one parameter is performed on basis of the at least one feature.
  • determining the at least one feature may also be performed based on the enhanced first signal component, as mentioned above.
  • At least one feature of the at least one feature may represent a set of spectral envelope representatives of the first signal component.
  • the at least one feature of the at least one feature may be at least one energy representative, wherein each of the at least one energy representative is associated with the energy of the first signal component.
  • at least one feature of the at least one feature may be N c mel-frequency cepstral coefficients (MFCCs).
  • MFCCs mel-frequency cepstral coefficients
  • associated with the first frequency band may be equally spaced on the mel scale.
  • At least one feature of the at least one feature may be associated with further information of the first signal component.
  • at least one feature of the at least one feature may be associated with the zero-crossing rate (ZCR) of the first signal component.
  • ZCR zero-crossing rate
  • the zero-crossing rate maybe the rate of sign-changes along a signal, i.e., the rate at which the signal changes from positive to negative or back.
  • the at least one feature may be denoted as vector x(i), where ⁇ ⁇ ' denotes the time index.
  • this at least one feature may be any well-suited features of the first signal component which may be used to exploit a correlation and/or spectral dependencies to a signal parameter of the second signal component in the second frequency band.
  • an estimator may be configured to estimate the at least one parameter at least based on the at least one feature .
  • the estimator may apply a MMSE criterion, a Maximum a Posteriori (MAP) criterion, a Maximum Likelihood (ML) criterion, a ZF criterion or any other well-suited criterion for estimating the at least one parameter at least based on the at least one feature.
  • the estimator may not apply any information extracted from or based on the second signal component in order to estimate the at least one parameter.
  • the at least one parameter is estimated based on features that are determined only from the first signal component.
  • the estimator may be configured to perform the estimation based on a Hidden Markov Model (HMM) .
  • HMM Hidden Markov Model
  • At least one feature of said at least one feature is a set of spectral envelope
  • the sub-bands of this set of Mp' - l frequency sub-bands of the first frequency band may overlap in the frequency range.
  • said estimating at least one parameter is further based on at least one previous feature which has been previously determined based on the first signal component .
  • ⁇ ( ⁇ ) may represent the current at least one feature
  • x(X-m) may represent the current at least one feature
  • ... ⁇ ( ⁇ -1) may represent the m previous at least one features.
  • m ⁇ >l holds, m representing an integer.
  • the estimated at least one parameter exemplarily denoted as vector y
  • the estimated at least one parameter may be estimated by means of MMSE criterion for estimation of arameter vector y:
  • At least one parameter of the at least one parameter is a set of spectral envelope representatives of the second signal component.
  • the estimated at least one parameter ma be represented by vector with
  • j e -l ⁇ may represent the estimated energy of the second signal in the jth sub-band of a set of M F ' -I frequency subbands of the second frequency band.
  • the sub-bands of this set of M F '- ⁇ frequency subbands of the second frequency band may overlap in the frequency range.
  • At least one parameter of said at least one parameter is a set of spectral envelope representatives of the second signal component, and said estimating at least one parameter is further based on a codebook, the codebook comprising a plurality of sets of spectral envelope representatives associated with the second signal component.
  • Mc entries may be obtained with the LBG algorithm presented by Y. Linde, A. Buzo, R.M. Gray in "An algorithm for vector quantizer design,", IEEE
  • the MMSE estimate may be expressed as: which may represent a weighted sum over the Mc centroids of the codebook C.
  • ) represent a posteriori probabilities based on the determined sequence X.
  • an a posteriori probability based on the determined sequence X is determined for each of the plurality of sets of spectral envelope representatives of the codebook, and a weighted sum over the plurality of sets of of spectral envelope representatives is determined, wherein each set of spectral envelope representatives j). is weighted by the respective a posteriori probability ⁇ X) .
  • said estimating comprises: Determining for each of the plurality of sets of spectral envelope representatives of the codebook an a posteriori probability at least based on the at least one feature, and determining a weighted sum over the plurality of sets of spectral envelope representatives, wherein each set of spectral envelope representatives of the plurality of sets of spectral envelope representatives is weighted by the respective a posteriori probability.
  • Equation (4) shows an exemplary determining of the weighted sum over the plurality of sets of spectral envelope representatives.
  • determining the at least one a posteriori probability is based on a Hidden Markov Model (HMM) .
  • HMM Hidden Markov Model
  • This HMM may be trained during an offline training phase.
  • the HMM techniques described by B. Geiser, H. Taddei and P. Vary in "Artificial Bandwidth Extension without Side Information for ITU-T G.729.1," in Proceedings of European Conference on Speech Communication and Technology (INTERSPEECH) , Antwerp, Belgium, August 2007 may be applied.
  • said performing a signal enhancement processing on the second signal is based on a combined signal enhancement processing comprising:
  • the combined set of at least one value may be used as the at least one value explained with respect to the at least one filter.
  • the combined set of at least one value may be used to perform a signal enhancement processing on the second signal component, e.g. in time-domain or in the frequency domain.
  • the first signal enhancement method and the second signal enhancement method may be combined in order to provide an increased signal enhancement of the second signal component.
  • the first set of at least one value may comprise M p ' values and the second set of at least one value may comprise M F ' values
  • a combiner may be configured to combine one value of the first set of at least one value with one value of the second set of at lest one value to one combined value of a set of at least one combined value, wherein the set of at least one combined value comprises M F ' combined values .
  • the at least one value of the first set of at least one value may represent at least one filter coefficient, wherein a filter adapted to this at least one filter coefficient may be used to perform the signal enhancement processing on the second signal component, as explained above. Determining this first set of at least one value may depend on the applied signal enhancement processing.
  • the at least one value of the second set of at least one value may represent at least one filter coefficient, wherein a filter adapted to this at least one filter coefficient may be used to perform the signal enhancement processing on the second signal component, as explained above. Determining this second set of at least one value may depend on the applied signal enhancement processing.
  • the combining may be performed adaptively.
  • this adaptively combining may be performed on the basis of determined signal and/or noise parameters.
  • the first set of at least one value represents a first set of at least two gain factors and said second set of at least one value represent a second set of at least two gain factors, wherein each of the at least two gain factors of the first and second set is associated with a sub-band frequency component of the second signal component .
  • the first set of at least two gain factors may be configured to be used to perform a spectral weighting of the second signal based on weighting a sub-band frequency of the second signal associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors of the first set.
  • the second set of at least two gain factors may be configured to be used to perform a spectral weighting of the second signal based on weighting a sub-band frequency of the second signal associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors of the second set.
  • an SNR estimator may be configured to estimate at least one SNR representative based on the at least one parameter and the second signal component.
  • This at least one SNR representative is associated with the second signal component.
  • this at least one SNR representative may comprise a plurality of SNR representatives, wherein each of the plurality of SNR representatives is associated with one sub-band of a plurality of frequency sub-bands of the second frequency band.
  • the plurality of frequency sub-bands may comprise M F ' -l frequency sub-bands of the second frequency band, and, for instance, the sub-bands of this set of M' - ⁇ frequency sub-bands of the second frequency band may overlap in the frequency range.
  • an entity may be configured to estimate the j noise powers N 2 (j) with j e ⁇ 0,...,M F ' -1 ⁇ .
  • the estimation of the jth noise power may be written as wherein represents the energy of the audio signal in the jth sub-band of the second frequency band.
  • the first set of at least two gain factors is determined at least based on the plurality of SNR representatives.
  • the at least two gain factors of the first set may be represented by G hwe (j) with
  • G hwe (j) represents the gain factor associated with the jth sub-band of the M F ' — ⁇ frequency sub-bands of the second frequency band.
  • G !me (j) may be calculated as
  • G hwe (j) may be calculated based on f 2 (j) and ⁇ 2 ⁇ ) r e.g. as described by Y. Ephraim and D. Malah in "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-32, no. 6, pp. 1109-1121, December 1984.
  • the entity may be configured to determine the first set of at least two gain factors at least based on the plurality of SNR representatives based on any other well-suited method or as described by J. S. Lim and A. V. Oppenheim in "Enhancement and Bandwidth Compression of noisysy Speech", Proceedings of the IEEE, vol. 67, no. 12, pp. 1585-1604, December 1979.
  • the first set of at least one value may be associated with a first signal enhancement processing which is based on the at least one parameter determined on basis of the first signal component .
  • the second set of at least two gain factors may be configured to be used to perform a spectral weighting of the second signal based on weighting a sub-band frequency of the second signal associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors of the second set.
  • the at least two gain factors of the second set may be represented by G com (j) with j e ⁇ 0,... , M F ' - 1 ⁇ , wherein G com , (j) represents the gain factor associated with the jth sub-band of the M F ' - ⁇ frequency sub-bands of the second frequency band .
  • the second set of at least one value may be associated with a second signal enhancement processing which is not based on the at least one parameter determined on basis of the first signal component.
  • this second signal enhancement processing may represent a conventional signal enhancement processing.
  • an entity may be configured to perform a noise power estimation, e.g. as explained by R. Martin in "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics", IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 501-512, 2001, and to perform SNR estimation, e.g. as mentioned above, and to calculate the second set of at least one value, e.g., as mentioned above with respect to calculating the first set of at least one value.
  • a noise power estimation e.g. as explained by R. Martin in "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics", IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 501-512, 2001
  • SNR estimation e.g. as mentioned above
  • the second set of at least one value e.g., as mentioned above with respect to calculating the first set of at least one value.
  • the sub-bands associated with the gain factors are associated with a first frequency resolution, wherein the combined first set of at least two gain factors and second set of at least two gain factor are expanded to a second frequency resolution, the second frequency resolution being higher than the first frequency resolution .
  • the sub-bands associated with the gain factors are associated with a first frequency resolution may overlap in the frequency band.
  • the second signal component in the frequency domain may be represented by M F spectral coefficients.
  • M F >M F ' may hold, i.e., the signal processing explained with respect to the M p ' sub-bands throughout this specification may represent a signal processing performed with the first frequency resolution representing a lower frequency resolution compared to the frequency solution of the spectral components of first and second signal components (assuming a frequency domain signal processing) .
  • the reduction of frequency resolution may allow for an increased suppression of musical tones and may correspond to the properties of the human auditory system where the frequency selectivity decreases with higher frequencies.
  • M F gain factors of the second set may be determined being associated with M F sub-bands of the second frequency band, and an entity may be configured to decrease the frequency resolution from M F to M F ' , i.e. this entity may be configured to output M F ' gain factors.
  • This decrease of frequency resolution may be performed based on combining frequency-bins using overlapping Hann windows.
  • the variance of the gain factors over time may be reduced.
  • an entity may be configured to expand the frequency resolution of the combined set of gain factors from Mp' to M F . For instance, this may be performed based on using overlap-add of scaled Hann windows, e.g., the same Hann windows as mentioned above. Based on the frequency expanded combined set of gain factors the combined signal enhancement processing may be performed, e.g. by means of spectral weighting .
  • said combining is performed based on at least one cross-fading factor, wherein one of the at least one cross-fading factor combines one value of the at least one value of the first set of at least one value with one value of the at second set of at least one value.
  • the first set of at least one value may be represented by G, ( ) with j e [ ⁇ ,. , . , ⁇ -l and the second set of at least one value may be represented by G 2 (j) with j e [ ⁇ ,. , . , ⁇ - 1 ⁇ , wherein M ⁇ l holds.
  • G, (y) may correspond to G bwe (j)
  • G 2 (j) may correspond to G com (j)
  • M may correspond to M F ' .
  • a combiner may be configured to combine the first set of at least one value and the second set of at least one value by means of a set of at least one cross-fading factor, wherein this set of at least one cross-fading factor may be represented by c(j) with j e ⁇ ,. , . , - 1 ⁇ .
  • the jth combined value c(j) may be a function of the jth value G, (/) of the first set of at least one value and the jth value G 2 (j) of the second set of at least one value and at least of the jth cross-fading factor a(j) .
  • Any other well-suited combining based on the set of at least one cross-fading factor may also be applied.
  • a weighting between the influence of the first signal enhancement processing and the second signal processing may be performed.
  • each of the at least one cross-fading factor is calculated on the basis of one value of the first set of at least one value, of one value of the second set and of at one reference value of a set of at least one reference value, the set of at least one reference value representing at least one reference value configured to be used to perform a reference enhanced signal processing on the second signal component.
  • the first set of at least one value may be represented by G,(y) with j e ⁇ 0,..., -l ⁇
  • the second set of at least on value may be represented by G 2 (j)
  • the set of at least one reference value may be represented by G r (j) .
  • the at least one reference value is configured to be used to perform a reference enhanced signal processing on the second signal component.
  • the at least one reference value may represent reference filter values or weighting factor configured to perform a reference filtering in order to perform the enhanced signal processing on the second signal component.
  • each of the at least one reference value may be determined during a training process for each frame. The method of determining each of the at least one reference value may depend on the signal enhancement processing. For instance, during this training process reference signal parameters may be determined in order to obtain the at least one reference value.
  • a cross-fading factor a(j) of the set of at least on cross fading factors may be determined based on the respective value G,(j) of the first set of at least on value, based on the respective value G 2 (j) of the second set of at least on value and based on the respective cross-fading factor a(j) of the set of at least on cross fading factor. For instance, this may be performed as follows: As an example, assuming that the signal enhancement processing represents a noise reduction on the audio signal as mentioned above, the at least one reference value G r (j) may represent at least one reference weighting factor which may be determined from a clean audio and noise signal. For instance, this determining may be performed on a reference
  • SNR ⁇ may represent the noise
  • said at least one cross-fading vector is determined based on a plurality of reference cross-fading factors and estimations of signal parameters associated with the first and second frequency band, wherein the plurality of reference cross-fading factors has been determined during a training process, wherein each reference cross-fading factor of the plurality of reference
  • cross-fading factors is associated with at least one signal parameter of the first frequency band and at least one signal parameter of the second frequency band.
  • a plurality of reference cross-fading vectors may be determined, wherein each reference cross-fading factor of the plurality of reference cross-fading factors is associated with respective at least one signal parameter of the first frequency band and a respective at least one signal parameter of the second frequency band.
  • the at least one signal parameter of the first frequency band and the at least one signal parameter of the second frequency band may represent any well-suited signal parameter associated with the first signal component and the second signal component.
  • a look-up table for the estimation of cross-fading values o j) may be generated for each of the j sub-bands.
  • p x may represent the signal parameter of the first frequency band and 2 may represent the signal parameter of the second frequency band, wherein pi and P 2 are determined for each cross-fading value oc ⁇ j) obtained during a training process based on training data, pi and P 2 may be quantized, and the final look-up table may provide one reference cross-fading value (j) for each quantized combination of pi and p 2 .
  • pi may represent a first SNR associated with the first frequency band and p2 may represent a second SNR associated with the second frequency band.
  • the at least one signal parameter of the first frequency band may represent the averaged SNR
  • the at least one signal parameter of the second frequency band may represent the reference a priori
  • a first reference cross-fading value r (j) is obtained in a training process, e.g. for every frame ⁇ and/or for every sub-band j, e.g. as mentioned above.
  • the reference a priori SNR °f the respective jth sub-band and the averaged
  • first frequency band may be determined.
  • the plurality of reference cross-fading value a r (j) may be determined bein associated with the respective reference a priori SNRs sub-band and the respecti of the first
  • a look-up table for the estimation of cross-fading values j) may be generated for each of the j sub-bands. Therefore, ⁇ ,, ⁇ ' ) and ⁇ . may be quantized (e.g., 1 dB step size or another well-suited step size) and the associated reference cross-fading values a r (j) of the j sub-bands define the at least one reference cross-fading value.
  • the final look-up table may, rovide one reference cross-fading value ct(j) for each quantized combination of and ⁇ 1 .
  • the first and second signal enhancement processing can be combined in an adaptive way based on the at least one reference cross-fading value.
  • a cross-fading value c(j) may be determined by means of interpolation between at least two reference cross-fading values.
  • the respective SNR estimates of the SNR associated with the first frequency band and the SNR associated with the second frequency band may be determined based on the above mentioned conventional techniques.
  • said signal enhancement processing represents noise reduction processing.
  • Fig. la is a schematic block diagram which illustrates a first exemplary apparatus
  • Fig. lb is a schematic block diagram which illustrates a second exemplary apparatus
  • Fig. 2a is a flow chart illustrating a first exemplary method
  • Fig. 2b is a flow chart illustrating a second exemplary method
  • Fig. 3a is a schematic block diagram which illustrates a first exemplary filtering
  • Fig. 3b is a schematic block diagram which illustrates a second exemplary filtering
  • Fig. 4a is a schematic block diagram which illustrates an exemplary entity configured to estimate the at least one parameter
  • Fig. 4b is a schematic block diagram which illustrates a first exemplary determining of a first set of at least one value configured to be used to perform a signal enhancement processing on the second signal component ;
  • Fig. 4c is a schematic block diagram which illustrates a first exemplary entity configured to determine a second set of at least one value configured to be used to perform a signal enhancement processing on the second signal component;
  • Fig. 4d is a schematic block diagram which illustrates a second exemplary entity configured to determine a second set of at least one value configured to be used to perform a signal enhancement processing on the second signal component;
  • Fig. 5a is a schematic block diagram which illustrates an exemplary combiner
  • Fig. 5b is a schematic block diagram which illustrates a second exemplary combiner.
  • Figure la is a schematic block diagram which illustrates a first exemplary apparatus. This first exemplary apparatus will be described in conjunction with the flow chart of a first exemplary method depicted in figure 2a.
  • the method comprises estimating 210 at least one parameter 150 based on a first signal component 130 of an audio signal 101, the at least one parameter 150 being associated with a second signal component 140 of the audio signal 210, and the method comprises performing 220 a signal enhancement processing on the second signal component 140 at least based on the at least one parameter.
  • the signal enhancement processing may represent noise reduction processing, or a voice enhancement processing or any other well-suited signal enhancement processing .
  • the first signal component is associated with a first frequency band and the second signal component is associated with a second frequency band.
  • the first frequency band differs from the second frequency band.
  • the first frequency band may span a frequency range from fl to f2, and the second frequency band may span a frequency range from f3 to f4, wherein f3>fl and f4>f2 holds.
  • the first and second frequency band may have an overlapping frequency range, i.e. f3 ⁇ f2 holds, but as another example, the first and second frequency band may not overlap, i.e. f3>f2 holds.
  • the first frequency band may span a frequency range from 50 Hz to 4 kHz
  • the second frequency band may span a frequency range from 4 to 7 kHz. But of course, any other well suited frequency assignments for the first and second frequency range may be applied .
  • the estimated at least one parameter may represent at least one parameter which may be beneficial to increase the quality of the signal enhancement processing.
  • the type of the at least one parameter may depend on the kind of signal enhancement processing .
  • the first signal component 130 may be extracted from the audio signal 101 by means of filtering, e.g. by a lowpass filter or a bandpass filter
  • the second signal component 140 may be extracted from the audio signal 101 by means of filtering, e.g. by a highpass filter or a bandpass filter.
  • filtering e.g. by a highpass filter or a bandpass filter.
  • Such a filtering is depicted in the exemplary systems depicted in figures lc and Id, respectively, wherein the audio signal 101 is filtered by first filter 103 in order to output the first signal component 130 and filtered by second filter 104 in order to output the second signal component 140.
  • the first and second filters 103, 104 may further be configured to perform a downsampling .
  • the first filter 103 and the second filter 104 may represent a 2-channel Finite Impulse Response (FIR) Quadrature Mirror Filter (QMF) bank.
  • FIR Finite Impulse Response
  • QMF Quadrature Mirror Filter
  • Entity 110 is configured to estimate the at least one parameter 150 at least based on the first signal component 130. For instance, this estimation is not based on the second signal component 140.
  • At least one parameter of the at least one parameter 150 may be estimations of signal properties of the second signal component 140.
  • these signal properties may comprise statistical information of the spectral characteristics of the second signal component.
  • at least one parameter of the at least one parameter may represent a set of spectral envelope representatives of the second signal component.
  • this set of spectral envelope representatives may be estimations of
  • this set of spectral envelope representatives may comprise at least one energy value, wherein each of the at least one energy value is associated with one frequency sub-band of the second signal component.
  • at least one parameter of the at least one parameter may represent further information with respect to the second signal component.
  • a signal enhancement processing is performed to the second signal component 140, as indicated by entity 120 in figure la, wherein entity 120 is configured to output an enhanced second signal component 142.
  • the signal enhancement processing represents a noise reduction processing
  • the enhanced second signal component 142 may represent a noise reduced second signal component 142 or the weighting gains to perform the noise reduction in the frequency domain.
  • spectral dependencies may be exploited between the first signal component 130 and the second signal component 140 in order to perform the signal enhancement processing on the second signal component 140. Accordingly, spectral dependencies between the first frequency band and the second frequency band may be used to improve the signal enhancement processing .
  • the dashed arrow depicted in figure la illustrates that there may be a further signal processing with respect to the at least one parameter 150 before signal enhancement processing to the second signal component 140 is performed.
  • entity 120 may comprise at least one filter 145 configured to perform a signal enhancement processing to the second signal component 140, as depicted in figure lb.
  • one filter 145 of this at least one filter may be configured to be adapted to at least one value 150' , i.e. this at least one value 150' may be configured to be used to perform a signal enhancement processing to the second signal component 140.
  • this at least one value 150' may represent at least one filter coefficient.
  • the signal enhancement processing to the second signal component 140 may be performed based on the at least one parameter 150 by means of filter 145, wherein the at least one value 150' may be determined at least based on the at least one parameter 150.
  • this at least one value 150' may comprise at least one value configured to be used to perform a signal enhancement processing to the second signal component 140 in the time-domain.
  • this at least one value may represent at least one filter coefficient of a filter configured to filter the second signal component 140 in the time-domain.
  • this filter may represent a Finite Impulse Response (FIR) filter or an Infinite Impulse Response (IIR) filter.
  • this at least one value may be determined based on a Minimum Mean Squared Error (MMSE) criterion or a Zero Forcing (ZF) criterion or any other well-suited criterion.
  • MMSE Minimum Mean Squared Error
  • ZF Zero Forcing
  • a signal enhancement processing to the second signal component is performed by filtering the second signal component by means of the respective filter in the time-domain.
  • Figure lc depicts such a filtering in the time-domain, wherein filter 141 is adapted to the at least one value 150' and configured to perform a signal enhancement processing to the second signal component 140. Accordingly, filter 141 is configured to output an enhanced second signal component 142, wherein filter 141 is configured to perform this signal enhancement based on the at least one parameter 150. Similarly, filter 131 is configured to perform a signal enhancement processing to the first signal component 130 and to output an enhanced first signal component 132. This filter 131 may be adapted to at least one value 116 configured to be used to perform a signal enhancement processing to the first signal component 130. For instance, this at least one value 116 may be determined based on the first signal component 130.
  • the system depicted in figure lc comprises a third and fourth filter 193, 194 and a combiner 195 configured to combine the enhanced first signal component 132 and the enhanced second signal component 142 to a combined enhanced output audio signal 199 having the same bandwidth like the input audio signal 101.
  • the third filter and fourth filter 193, 194 may be configured to perform an upsampling to the respective input signal.
  • the third and fourth filter may represent a FIR QMF bank configured to combine the enhanced first and second signal components 132, 142 to the wideband output signal 199.
  • the at least one value 150' may comprise at least one value configured to be used to perform a signal enhancement processing to the second signal component 140 in the frequency-domain.
  • Figure Id depicts such a filtering in the frequency-domain, wherein converters 170 are configured to perform a conversion from time-domain to frequency-domain.
  • converters 170 may convert the time domain first and second signal components 130, 140 into frequency domain first and second signal components 130' , 140', respectively.
  • the converter 170 may be configured to output M F subcomponents representing the respective first and second signal component 130' , 140' in the frequency-domain, wherein each subcomponent of the M F subcomponents is associated with one of M F sub-bands in the first and second frequency band.
  • converter 170 may represent a Fourier
  • Filter 141' is adapted to the at least one value 150' and configured to perform a signal enhancement processing to the second signal component 140' in the frequency domain. Accordingly, filter 141' is configured to output an enhanced second signal component 142' in the frequency domain, wherein filter 141' is configured to perform this signal enhancement based on the at least one parameter 150.
  • filter 131' is configured to perform a signal enhancement processing to the first signal component 130' in the frequency domain and to output an enhanced first signal component 132' in the frequency domain.
  • This filter 131' may be adapted to at least one value 116 configured to be used to perform a signal enhancement processing to the first signal component 130. For instance, this at least one value 116 may be determined based on the first signal component 130.
  • the system depicted in figured Id further comprises converters 175 configured to perform a conversion from frequency-domain to time-domain.
  • converters 175 may convert the enhanced frequency-domain first and second signal components 132', 142' to enhanced time-domain first and second signal components 132, 142, respectively.
  • the converters 170 and 175 may be re-arranged.
  • time-domain to frequency-domain conversion may be applied to the audio signal 101 before being filtered by the first and second filter 103, 104, and/or frequency-domain -to time-domain conversion may be applied to output signal 199 after being combined by combiner 195.
  • the first, second, third and fourth filter 103, 104, 193, 194 and combiner 195 are configured to operate in the frequency domain .
  • the at least one value 150' may comprise at least two gain factors, wherein each of the at least two gain factors is associated with a separate sub-band of the second frequency band.
  • these at least two gain factors may be determined based on a MMSE criterion or a ZF criterion or any other well-suited criterion.
  • a signal enhancement processing to the second signal component 140' may be performed based on weighting a frequency component of the second signal 140' associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors.
  • each of the at least two gain factors is associated with one sub-band of the second frequency band.
  • Figure 3a is a schematic block diagram which illustrates a first exemplary filtering in the frequency domain based on the at least one value 150' configured to be used to perform a signal enhancement processing to the second signal component 140' in the frequency-domain.
  • the second signal component 140' comprises M F sub-components 301, 302 ... 303 representing the second signal component 140' in the frequency-domain, wherein each subcomponent 301, 302 ...303 of the M F subcomponents is associated with one of M F sub-bands in the second frequency band.
  • the at least two gain factors 150 are represented by M F gain factors 321, 322 ...
  • a first multiplier 331 multiplies the first subcomponent 301 with the first gain factor 321
  • a second multiplier 332 multiplies the second subcomponent 302 with the second gain factor 322, and so on until the M F th multiplier 333 multiplies the M F th subcomponent 301 with the M F th gain factor 333.
  • the filter 141' is configured to output M F enhanced
  • the filter 131' depicted in figure Id may be realized in the same way as filter 141' depicted in figure 3.
  • any other weighting implementation may be applied in order to perform the signal enhancement processing to the second (or first) signal component.
  • a signal enhancement processing on the second signal component 140 may be performed based on the at least one parameter determined on the basis of the first signal component 130.
  • Figure la depicts that the at least one parameter 150 is estimated based on the first signal component 130.
  • the at least one parameter may be estimated based on the first signal component 130 in the time-domain, or based on the first signal component 130' in the frequency-domain, or based on the enhanced signal component 132 in the time-domain, or based on the enhanced signal component 132' in the frequency domain.
  • Fig. 2b depicts a flow chart illustrating a second exemplary method based on the first exemplary method.
  • This second exemplary method comprises performing 205 a signal enhancement processing to the first signal component 130, 130' in order to obtain an enhanced first signal component 132, 132'.
  • this signal enhancement processing to the first signal component 130, 130' may be performed as explained with respect to figures lb and lc, but any other well-suited signal enhancement may also be applied.
  • estimating 210 the at least one parameter is performed based on the enhanced first signal component 132, 132' . Accordingly, the input of entity 110 depicted in figure la may be replaced with an enhanced first signal component 132, 132' .
  • the term based on the first signal component may include based on the enhanced first signal component.
  • Fig. 4a depicts a schematic block diagram which illustrates an exemplary entity 110' configured to estimate the at least one parameter 150 based on the first signal component 130.
  • this exemplary entity 110' may represent the entity 110 depicted in figure la.
  • the input 430 depicted in figure 4a may represent the first signal component 130, 130' or the enhanced first signal component 132, 132' .
  • the first signal component 130, 130' it is referred to the first signal component 130, 130', but the enhanced first signal component 132, 132' may also be used as input.
  • Entity 110' comprises a unit 410 and an estimator 420.
  • the unit 410 is configured to determine at least one feature 440 based on the first signal component 130, 130' representing at least one signal parameter being associated with the first signal component 130, 130' .
  • At least one feature of the least one feature 440 may represent a set of spectral envelope representatives of the first signal component 130, 130' .
  • the at least one feature of the at least one feature may be at least one energy representative, wherein each of the at least one energy representative is associated with the energy of the first signal component or the enhanced first signal component in a frequency sub-band of the first frequency band.
  • at least one feature of the at least one feature 440 may be N c mel-frequency cepstral coefficients (MFCCs).
  • MFCCs mel-frequency cepstral coefficients
  • At least one feature of the at least one feature may comprise further information of the first signal component 130, 130'.
  • at least one feature of the at least one feature may be associated with the zero-crossing rate (ZCR) of the first signal component 130, 130' .
  • ZCR zero-crossing rate
  • the zero-crossing rate may be the rate of sign-changes along a signal, i.e., the rate at which the signal changes from positive to negative or back.
  • the at least one feature 440 may be denoted as vector x(i), where ⁇ ' denotes the time index.
  • the estimator 420 is configured to estimate the at least one parameter 150 at least based on the at least one feature 440.
  • the estimator 420 may be configured to perform the estimation based on a Hidden Markov Model (HMM) .
  • HMM Hidden Markov Model
  • the estimator may apply a MMSE criterion, a Maximum a Posteriori (MAP) criterion, a Maximum Likelihood (ML) criterion, a Zero Forcing (ZF) criterion or any other well-suited criterion for estimating the at least one parameter at least based on the at least one feature 440.
  • the estimator 420 may not apply any information extracted from or based on the second signal component 140, 140' in order to estimate the at least one parameter 150.
  • the at least one parameter is estimated based on features that are determined only from the first signal component 130, 130'.
  • the estimation may be based on at least one feature which has been previously determined by unit 410.
  • m ⁇ l holds, m representing an integer.
  • the at least one estimated parameter 150 exemplarily denoted as vector y
  • the at least one estimated parameter 150 may be estimated by means of MMSE criterion for estimation of parameter vector y:
  • At least one parameter of the at least one parameter represents a set of spectral envelope representatives of the second signal component 140, 140' .
  • the estimated at least one parameter may be represented by vector y SMS) S 2 (M Thread'.-!) wherein with j € 0,...,M F '-V j may represent the estimated energy
  • the sub-bands of this set of M F ' -1 frequency sub-bands of the second frequency band may overlap in the frequency range.
  • the estimation may further be based on a codebook, the codebook comprising a plurality of sets of spectral envelope representatives associated with the second signal component 140.
  • this codebook may be obtained with the LBG algorithm presented by Y. Linde, A. Buzo, R.M. Gray in "An algorithm for vector quantizer design,", IEEE Transactions on Communications, vol. 28, no. 1, pp. 84-95, Jan. 1980. Of course, any other well-suited codebook may also be used.
  • the MMSE estimate may be expressed as: y MMSE - P(y, ⁇ x) , do )
  • the weights represent a posteriori probabilities based on the determined sequence X.
  • an a posteriori probability based on the determined sequence X is determined for each of the plurality of sets of spectral envelope representatives of the codebook, and a weighted sum over the plurality of sets of spectral envelope representatives is determined, wherein each set of spectral envelope representatives is weighted by the respective a posteriori probability
  • JSf is weighted by the respective a posteriori probability
  • determining the at least one a posteriori probability may be based on a Hidden Markov Model (HMM) .
  • HMM Hidden Markov Model
  • This HMM may be trained during an offline training phase.
  • FIG. 4a is configured to estimate at least one parameter 150 based on the first signal component 130, 130'.
  • Figure 4b is a schematic block diagram which illustrates a first exemplary determining of a first set of at least one value 450 configured to be used to perform a signal enhancement processing on the second signal component 140, 140' .
  • this at least one value 450 may represent at least one filter coefficient, wherein a filter adapted to this at least one filter coefficient may be used to perform the signal enhancement processing on the second signal component 140, 140', as explained above.
  • the at least one value 450 may represent the first set of at least two gain factors configured to be used to perform a spectral weighting of the second signal 140' based on weighting a sub-band frequency of the second signal associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors, as exemplarily depicted in figure 3a or 3b.
  • entity 460 may represent an SNR estimator 460 which is configured to estimate at least one SNR representative 465 based on the at least one parameter 150 and the second signal component 140, 140' .
  • This at least one SNR representative is associated with the second signal component 140, 140'.
  • this at least one SNR representative may comprise a plurality of SNR representatives, wherein each of the plurality of SNR representatives is associated with one sub-band of a plurality of frequency sub-bands of the second frequency band.
  • the plurality of frequency sub-bands may comprise M F ' -I frequency sub-bands of the second frequency band, and, for instance, the sub-bands of this set of M F ' - ⁇ frequency sub-bands of the second frequency band may overlap in the frequency range.
  • the plurality of SNR representatives may comprise a set of M F ' -I a priori SNR values wherein N 2 U) may
  • the entity 460 may be configured to estimate the j noise powers N 2 (j) with j e 0,...,M F ' -1 ⁇ .
  • the estimation of the jth noise power may be written as wherein
  • entity 470 may be configured to determine the first set of at least two gain factors at least based on the plurality of SNR representatives.
  • the at least two gain factors of the first set may be represented by G hwe ⁇ j) with j e -1 ⁇ , wherein G hwe (j) represents the gain factor associated with the jth sub-band of the M F '- ⁇ frequency sub-bands of the second frequency band.
  • G hwe (j) may be calculated as
  • G hwe (j) may be calculated based on 2 (j) and ⁇ 2 01 r e.g. as described by Y. Ephraim and D. Malah in "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-32, no. 6, pp. 1109-1121, December 1984.
  • entity 470 may be configured to determine the first set of at least two gain factors at least based on the plurality of SNR representatives based on any other well-suited method or as described by J. S. Lim and A. V. Oppenheim in "Enhancement and Bandwidth Compression of noisysy Speech", Proceedings of the IEEE, vol. 67, no. 12, pp. 1585-1604, December 1979.
  • the at least one value 465 may comprise a plurality of SNR representatives or other statistical representatives of the second signal component which may be used by entity 470 to determine the at least one filter coefficient .
  • the first set of at least one value 450 may be associated with a first signal enhancement processing which is based on the at least one parameter 150 determined on basis of the first signal component 130, 130' .
  • Figure 4c is a schematic block diagram which illustrates a first exemplary entity 480 configured to determine a second set of at least one value 490 configured to be used to perform a signal enhancement processing on the second signal component 140, 140' .
  • this at least one value 490 may represent at least one filter coefficient, wherein a filter adapted to this at least one filter coefficient may be used to perform the signal enhancement processing on the second signal component 140, 140', as explained above.
  • the at least one value 490 may represent second set of at least two gain factors configured to be used to perform a spectral weighting of the second signal 140' based on weighting a sub-band frequency of the second signal associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors, as exemplarily depicted in figure 3a or 3b.
  • the at least two gain factors of the second set may be represented by G conv (j) with j ⁇ 0,... , M F ' - 1 ⁇ , wherein G conv (j) represents the gain factor associated with the jth sub-band of the M F '— ⁇ frequency sub-bands of the second frequency band.
  • the second set of at least one value 490 may be associated with a second signal enhancement processing which is not based on the at least one parameter 150 determined on basis of the first signal component 130, 130' .
  • this second signal enhancement processing may represent a conventional signal enhancement processing.
  • entity 480 may be configured to perform a noise power estimation, e.g. as explained by R. Martin in "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics", IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 501-512, 2001, and to perform SNR estimation, e.g. as mentioned above with respect to figure 4b, and to calculate the set of at least one value 490, e.g. as mentioned above with respect to figure 4b.
  • a noise power estimation e.g. as explained by R. Martin in "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics", IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 501-512, 2001
  • SNR estimation e.g. as mentioned above with respect to figure 4b
  • the set of at least one value 490 e.g. as mentioned above with respect to figure 4b.
  • Figure 4d is a schematic block diagram which illustrates a second exemplary entity 480' configured to determine a second set of at least one value 490' configured to be used to perform a signal enhancement processing on the second signal component 140' in the frequency domain.
  • the second signal component 140' in the frequency domain e.g. determined as explained with respect to figure Id, may be represented by M F spectral coefficients at frequency bin k and frame ⁇ given by:
  • Y 2 ( ,k) S 2 (A,k) + N 2 (A,k) , (13) where S 2 ( ,k) and N 2 ( ,k) represent the spectral coefficients of the audio and the noise signal.
  • the frame index ⁇ is omitted in the following.
  • M F >M F ' may hold, i.e., the signal processing explained with respect to the M F ' sub-bands throughout this specification may represent a signal processing performed with a lower frequency resolution compared to the frequency solution of the first and second signal components 130' , 140' outputted by converters 170 depicted in figure Id.
  • the reduction of frequency resolution may allow for an increased suppression of musical tones and may correspond to the properties of the human auditory system where the frequency selectivity decreases with higher frequencies.
  • entity 480' may be configured to output M F gain factors being associated with M F sub-bands of the second frequency band
  • an entity 495 is configured to decrease the frequency resolution from M F to M F ' , i.e. the entity 495 may be configured to output M F ' gain factors.
  • This decrease of frequency resolution may be performed based on combining frequency-bins using overlapping Hann windows.
  • the variance of the gain factors may be reduced.
  • the second exemplary filtering depicted in figure 3b may be used for performing the spectral weighting based on the at least one value 150' , wherein the at least one value 150' ' represent at least two gain factors associated with frequency resolution M F ' .
  • Entity 350 may be configured to expand the frequency resolution from M F ' to M F . For instance, this may be performed based on using overlap-add of scaled Hann windows, e.g. the same Hann windows as mentioned above.
  • entity 350 is configured to output a M F gain factors 150' ' which are representatives of inputted M F ' gain factors 150' .
  • the remaining signal processing of the second exemplary filtering corresponds to the first exemplary filtering .
  • Figure 5a is a schematic block diagram which illustrates an exemplary combiner 510 configured to combine the first set of at least one value 450 and the second set of at least one value 490 to a combined set of at least one value 550.
  • the combined set of at least one value 550 may be used as the at least one value 150' explained with respect to one of figures lb, lc, Id, 3a, 3b and 4b.
  • the combined set of at least one value 550 may be used to perform a signal enhancement processing on the second signal component 140, 140', e.g. in time-domain or in the frequency domain.
  • the first signal enhancement method and the second signal enhancement method may be combined in order to provide an increased signal enhancement of the second signal component 140, 140'.
  • the first set of at least one value 450 may comprise M F ' values and the second set of at least one value
  • the combiner 510 may be configured to combine one value of the first set of at least one value with one value of the second set of at lest one value to one combined value of the set of at least one combined value 550, wherein the set of at least one combined value comprises M p ' combined values.
  • This combining may be performed adaptively. As an example, this adaptively combining may be performed on the basis of determined signal and/or noise parameters.
  • Figure 5b is a schematic block diagram which illustrates a second exemplary combiner 510' configured to combine the first set of at least one value 450 and the second set of at least one value 490 to a combined set of at least one value 550, wherein the second exemplary combiner 510' is based on the first exemplary combiner 510.
  • the first set of at least one value 450 may be represented by G,( ) with j e ⁇ ,.,., -l ⁇ and the second set of at least on value 490 may be represented by G,(y) with j € ⁇ ,..., -lj- , wherein M > ⁇ holds.
  • G,( ) may correspond to G lme j)
  • G 2 (j) may correspond to G conv (j)
  • M may correspond to M F ' .
  • the combiner 550' may be configured to combine the first set of at least one value 450 and the second set of at least one value 490 by means of a set of at least one cross-fading factor, wherein this set of at least one cross-fading factor may be represented by c (j) with j e ⁇ ,... , M -l ⁇ .
  • this set of at least one cross-fading factor may be represented by c (j) with j e ⁇ ,... , M -l ⁇ .
  • the jth combined value c(y) may be a function of the j th value
  • said combining may be performed based on at least one cross-fading factor, wherein one of the at least one cross-fading factor combines one value of the at least one value of the first set of at least one value 450 with one value of the at least second set of at least one value 490.
  • reference sign 545 may be associated with the set of cross-fading factors a(j)
  • the multipliers 535 and 540 and the adder 560 may perform the mathematical operations and may be implemented -times.
  • the second set of cross-fading factors 535 may be independent from the set of cross-fading factors 545, or there may exist another correlation between the set of cross-fading factors 545 and the second set of cross-fading factors 535.
  • each of the at least one cross-fading factor is calculated on the basis of one value of the first set of at least one value 450, of one value of the second set of at least one value 490 and of at one reference value of a set of at least one reference value, the set of at least one reference value representing at least one reference value configured to be used to perform a reference enhanced signal processing on the second signal component.
  • the first set of at least one value 450 may be represented by G(j) with j e ⁇ ,... ,M -1 ⁇
  • the second set of at least one value 490 may be represented by G 2 (j)
  • the set of at least one reference value may be represented by G r (j) .
  • the at least one reference value is configured to be used to perform a reference enhanced signal processing on the second signal component.
  • each of the at least one reference value may be determined during a training process for each frame.
  • the method of determining each of the at least one reference value may depend on the signal enhancement processing. For instance, during this training process reference signal parameters may be determined in order to obtain the at least one reference value.
  • a cross-fading factor o(j) of the set of at least on cross fading factors may be determined based on the respective value G ⁇ j) of the first set of at least on value
  • the respective value G 2 (j) of the second set of at least on value 490 and the respective cross-fading factor (j) of the set of at least on cross fading factor e.g. as follows:
  • the at least one reference value G r (j) may represent at least one reference weighting factor which may be determined from a clean audio and noise signal. For instance, this determinin may be performed on a reference a posterio /or on a reference a priori
  • a plurality of reference cross-fading vectors may be determined, wherein each reference cross-fading factor of the plurality of reference cross-fading factors is associated with a first SNR associated with the first frequency band and with a second SNR associated with the second frequency sub-band.
  • a first reference cross-fading value a r (j) is obtained in a training process, e.g. for every frame ⁇ and/or for every sub-band j, e.g. as mentioned above.
  • a look-up table for the estimation of cross-fading values a(j) may be generated for each of the j sub-bands. Therefore, ⁇ ,.(. ⁇ ) and ⁇ ⁇ , may be quantized (e.g., 1 dB step size or another well-suited step size) and the associated reference cross-fading values c r (j) of the j sub-bands define the at least one reference cross-fading value.
  • the final look-up table may provide one reference cross-fading value (x(j) for each quantized combination of ,.(y) and ⁇ 1 .
  • the SNR associated with the second frequency band may represent a SNR associated with a sub-band of the second frequency band.
  • the first and second signal enhancement processing can be combined in an adaptive way based on the at least one reference cross-fading value.
  • a cross-fading value a(j) may be determined by means of interpolation between at least two reference cross-fading values .
  • the respective SNR estimates of the SNR associated with the first frequency band and the SNR associated with the second frequency band may be determined based on the above mentioned conventional techniques.
  • the logical blocks in the schematic block diagrams as well as the flowchart and algorithm steps presented in the above description may at least partially be implemented in electronic hardware and/or computer software, wherein it may depend on the functionality of the logical block, flowchart step and algorithm step and on design constraints imposed on the respective devices to which degree a logical block, a flowchart step or algorithm step is implemented in hardware or software.
  • the presented logical blocks, flowchart steps and algorithm steps may for instance be implemented in one or more digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable devices.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • the computer software may be stored in a variety of
  • computer-readable storage media of electric, magnetic, electro-magnetic or optic type may be read and executed by a processor, such as for instance a microprocessor.
  • a processor such as for instance a microprocessor.
  • the processor and the storage medium may be coupled to interchange information, or the storage medium may be included in the processor.
  • connection in the described embodiments is to be understood in a way that the involved components are operationally coupled.
  • connections can be direct or indirect with any number or combination of intervening elements, and there may be merely a functional relationship between the components.
  • Any of the processors mentioned in this text could be a processor of any suitable type.
  • Any processor may comprise but is not limited to one or more microprocessors, one or more processor (s) with accompanying digital signal processor (s), one or more processor (s) without accompanying digital signal processor (s), one or more special-purpose computer chips, one or more field-programmable gate arrays (FPGAS), one or more controllers, one or more application-specific integrated circuits (ASICS) , or one or more computer (s) .
  • the relevant structure/hardware has been programmed in such a way to carry out the described function.
  • any of the memories mentioned in this text could be implemented as a single memory or as a combination of a plurality of distinct memories, and may comprise for example a read-only memory, a random access memory, a flash memory or a hard disc drive memory etc.
  • any of the actions described or illustrated herein may be implemented using executable instructions in a general-purpose or special-purpose processor and stored on a computer-readable storage medium (e.g., disk, memory, or the like) to be executed by such a processor.
  • a computer-readable storage medium e.g., disk, memory, or the like
  • References to 'computer-readable storage medium' should be understood to encompass specialized circuits such as FPGAs, ASICs, signal processing devices, and other devices.

Abstract

It is disclosed to estimate at least one parameter based on a first signal component of an audio signal, the at least one parameter being associated with a second signal component of the audio signal; wherein the first signal component is associated with a first frequency band and the second signal component is associated with a second frequency band; and to perform a signal enhancement processing on the second signal component at least based on the at least one parameter.

Description

Signal Enhancement Processing
FIELD OF THE DISCLOSURE
This invention relates to the field of signal enhancement processing of audio signals.
BACKGROUND
The quality of today's telephone speech was designed to achieve a sufficient intelligibility. The acoustic bandwidth in telephony systems is typically limited to the frequency range between 300 Hz and 3.4 kHz. However, this typical "telephone sound" cannot satisfy the increased demands as the perceived speech quality is considerably reduced compared to the full audio bandwidth. As a reasonable compromise, various wideband (50 Hz - 7 kHz) speech codecs have been developed in the past (e.g., the Adaptive Multi-Rate (AMR) Wideband Codec) and are about to be introduced in current mobile networks . Most of these codecs are mainly designed for nearly noise-free input speech signals and may not perform well when the input signal is degraded by acoustic background noise. In order to improve the listening comfort and to keep the high quality capability also in noisy environments, noise suppression techniques may be required for wideband communication systems.
When a speech communication device is used in environments with high levels of ambient noise, the noise picked up by the microphone may significantly impair the quality and the intelligibility of the transmitted speech signal. In order to get a reliable separation from the noise signal (e.g., engine noise, street noise) , noise reduction algorithms have become part of many digital speech coding and speech processing systems. They are used for example in mobile communications, in hearing aids and in hands-free devices.
One of the popular methods for enhancing degraded speech is based on modeling the noisy input coefficients in the short-time Fourier transform (STFT) domain and to apply individual adaptive gains for each frequency bin. In many occasions the processing applied to implement such techniques has been derived for narrowband signals under certain assumptions about the statistics of the speech and noise signals .
SUMMARY OF SOME EMBODIMENTS OF THE INVENTION
A method is described, which comprises estimating at least one parameter based on a first signal component of an audio signal, the at least one parameter being associated with a second signal component of the audio signal, wherein the first signal component is associated with a first frequency band and the second signal component is associated with a second frequency band, and performing a signal enhancement processing on the second signal component at least based on the at least one parameter.
Moreover, a first apparatus is described, which comprises means for estimating at least one parameter based on a first signal component of an audio signal, the at least one parameter being associated with a second signal component of the audio signal, wherein the first signal component is associated with a first frequency band and the second signal component is associated with a second frequency band, and wherein the first apparatus comprises means for performing a signal enhancement processing on the second signal component at least based on the at least one parameter.
The means of this apparatus can be implemented in hardware and/or software. They may comprise for instance a processor for executing computer program code for realizing the required functions, a memory storing the program code, or both. Alternatively, they could comprise for instance a circuit that is designed to realize the required functions, for instance implemented in a chipset or a chip, like an integrated circuit.
Moreover, a second apparatus is described, which comprises at least one processor and at least one memory including computer program code, the at least one memory and the computer program code, with the at least one processor, configured to cause the apparatus at least to perform the actions of the presented method.
Moreover, a computer readable storage medium is described, in which computer program code is stored. The computer program code causes an apparatus to realize the actions of the presented method when executed by a processor.
The computer readable storage medium could be for example a disk or a memory or the like. As an example, the memory may represent a memory card such as SD and micro SD cards or any other well-suited memory cards or memory sticks. The computer program code could be stored in the computer readable storage medium in the form of instructions encoding the
computer-readable storage medium. The computer readable storage medium may be intended for taking part in the operation of a device, like an internal or external hard disk of a computer, or be intended for distribution of the program code, like an optical disc.
As an example, the signal enhancement processing may represent noise reduction processing, or a voice enhancement processing or any other well-suited signal enhancement processing .
The first frequency band differs from the second frequency band. For instance, the first frequency band may span a frequency range from fl to f2, and the second frequency band may span a frequency range from f3 to f , wherein f3>fl and f4>f2 holds. As an example, the first and second frequency band may have an overlapping frequency range, i.e. f3<f2 holds, but as another example, the first and second frequency band may not overlap, i.e. f3>f2 may hold. For instance, the first frequency band may span a frequency range from 50 Hz to 4 kHz, and the second frequency band may span a frequency range from 4 to 7 kHz. But of course, any other well suited frequency assignments for the first and second frequency range may be applied.
The estimated at least one parameter may represent at least one parameter which may be beneficial to increase the quality of the signal enhancement processing. The type of the at least one parameter may depend on the kind of signal enhancement processing . For instance, the first signal component may be extracted from the audio signal by means of filtering, e.g. by a lowpass filter or a bandpass filter, and the second signal component may be extracted from the audio signal by means of filtering, e.g. by a highpass filter or a bandpass filter.
As an example, at least one parameter of the at least one parameter may represent estimations of signal properties of the second signal component. For instance, these signal properties may comprise statistical information of the spectral characteristics of the second signal component. E.g., at least one parameter of the at least one parameter may represent a set of spectral envelope representatives of the second signal component. Thus, this set of spectral envelope representatives may comprise estimations of representations of the spectral envelope of the second signal component. For instance, this set of spectral envelope representatives may comprise at least one energy value, wherein each of the at least one energy value is associated with one frequency sub-band of the second signal component. Furthermore, at least one parameter of the at least one parameter may represent further information with respect to the second signal component.
For instance, the estimation of the at least one parameter is not based on the second signal component.
Based on this at least one parameter, which has been estimated based on the first signal component, a signal enhancement processing is performed to the second signal component. Thus, spectral dependencies may be exploited between the first signal component and the second signal component in order to perform the signal enhancement on the second signal component. Accordingly, spectral dependencies between the first frequency band and the second frequency band may be used to improve the signal enhancement processing.
For instance, there may be a further signal processing with respect to the at least one parameter before signal enhancement processing is performed on the second signal component .
As an example, the signal enhancement processing on the second signal may be performed by means of at least one filter, e.g. in the time-domain or in the frequency domain. As an example, one filter of this at least one filter may be configured to be adapted to at least one value, i.e., this at least one value may be configured to be used to perform a signal enhancement processing to the second signal component. For instance, this at least one value may represent at least one filter coefficient. Accordingly, as an example, the signal enhancement processing to the second signal component may be performed based on the at least one parameter by means of the at least one filter, wherein the at least one value may be configured to be used to perform a signal enhancement processing at least based on the at least one parameter.
For instance, this at least one value may comprise at least one value configured to be used to perform a signal enhancement processing to the second signal component in the time-domain. As an example, this at least one value may represent at least one filter coefficient of a filter configured to filter the second signal component in the time-domain. For instance, this filter may represent a Finite Impulse Response (FIR) filter or an Infinite Impulse Response (IIR) filter. As an example, this at least one value may be determined based on a Minimum Mean Squared Error (MMSE) criterion or a Zero Forcing (ZF) criterion or any other well-suited criterion. In this exemplary case, a signal enhancement processing to the second signal component may be performed by filtering the second signal component by means of the respective filter in the time-domain.
Furthermore, as another example, the at least one value may comprise at least one value configured to be used to perform a signal enhancement processing to the second signal component in the frequency-domain.
For instance, the second signal component in the frequency domain may be represented by MF spectral coefficients at frequency bin k and frame λ given by:
Y2( ,k) = S2( ,k) + N2(X,k) , (1) where S2( ,k) and N2( ,k) may represent the spectral coefficients of the audio and the noise signal of the second signal component, respectively. Correspondingly, the first signal component in the frequency domain may be represented by MF spectral coefficients at frequency bin k and frame λ given by
Yx( ,k) = Sx( ,k) + Nx( ,k) , (2) where S{X,k) and Ν(λ,Κ) may represent the spectral
coefficients of the audio and the noise signal of the first signal component, respectively. For the sake of brevity, the frame index λ is omitted in the following.
As an example, a converter may be configured to output MF subcomponents representing the first signal component and a converter may be configured to output MF subcomponents representing the second signal component in the
frequency-domain, wherein each subcomponent of the MP subcomponents is associated with one of MF sub-bands in the first and second frequency band.
For instance, this converter may represent a Fourier Transformation, e.g. a FFT, a DFT, a STFT or any other well-suited transformation. The at least one filter may be adapted to the at least one value and may be configured to perform a signal enhancement processing to the second signal component in the frequency domain. Accordingly, the at least one filter may be configured to output an enhanced second signal component in the frequency domain, wherein the at least one filter is configured to perform this signal enhancement based on the at least one parameter.
For instance, the at least one value may comprise at least two gain factors, wherein each of the at least two gain factors is associated with a separate sub-band of the second frequency band. As an example, these at least two gain factors may be determined based on a MMSE criterion or a ZF criterion or any other well-suited criterion. In this exemplary case, a signal enhancement processing to the second signal component may be performed based on weighting a frequency component of the second signal associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors. Thus, for instance, each of the at least two gain factors is associated with one sub-band of the second frequency band.
As an example, a signal enhancement processing is performed on the first signal component in order to obtain an enhanced first signal component.
For instance, the signal enhancement processing performed on the first signal component may represent a conventional signal enhancement processing. This signal enhancement processing may be of the same type as the signal enhancement processing performed on the second signal component.
As an example, an enhanced second signal component obtained by signal enhancement processing performed on the second signal component may be combined with the enhanced first signal component to an enhanced audio signal. As an example, the enhanced audio signal may be fed to a further signal processing, e.g. , to a speech encoder or any other well suited processing .
Furthermore, as another example, said performing a signal enhancement processing on the second signal may be based on a combined signal enhancement processing.
For instance, this combined signal enhancement processing may be based on a first signal enhancement processing and a second signal enhancement processing, both of the first and second signal enhancement processing may be configured to be applied to the second signal component, wherein the first signal enhancement processing is based on the at least one parameter and the second signal enhancement processing is not based on the at least one parameter. For instance, the second signal enhancement processing may represent a conventional signal enhancement processing.
As an example, the first signal enhancement processing may be applied to the second signal component in order to obtain a first enhanced second signal component and the second signal enhancement processing may be applied to the second signal component in order to obtain a second enhanced second signal component, and the first enhanced second signal component and the second enhanced second signal component may be combined to the enhanced first signal component.
Or, as another example, the first signal enhancement processing may be combined with the second signal enhancement processing in order to be applied as combined signal enhancement processing on the second signal component. This example will be explained in more detail in the sequel of this description .
According to one embodiment, a signal enhancement processing is performed on the first signal component in order to obtain an enhanced first signal component, wherein said estimating at least one parameter is performed on basis of the enhanced first signal component.
Thus, the enhanced first signal component may be used to estimate the at least one parameter, which may lead to increased quality of the estimated at least one parameter. In the sequel, it has to be understood, that any determining or estimation or extraction based on the first signal component may be based on the enhanced first signal component.
According to one embodiment, at least one parameter of said at least one parameter represents a set of spectral envelope representatives of the second signal component.
For instance, this set of spectral envelope representatives may be estimations of representations of the spectral envelope of the second signal component. For instance, this set of spectral envelope representatives may comprise at least one energy value, wherein each of the at least one energy value is associated with one frequency sub-band of the second signal component. Furthermore, at least one parameter of the at least one parameter may represent further information with respect to the second signal component.
According to one embodiment, at least one feature is determined based on the first signal component, the at least one feature representing at least signal parameter being associated with the first signal component, wherein said estimating the at least one parameter is performed on basis of the at least one feature.
It has to be understood, that determining the at least one feature may also be performed based on the enhanced first signal component, as mentioned above.
For instance, at least one feature of the at least one feature may represent a set of spectral envelope representatives of the first signal component. For instance, the at least one feature of the at least one feature may be at least one energy representative, wherein each of the at least one energy representative is associated with the energy of the first signal component. As an example, at least one feature of the at least one feature may be Nc mel-frequency cepstral coefficients (MFCCs). Thus the frequency sub-bands
associated with the first frequency band may be equally spaced on the mel scale.
Furthermore, at least one feature of the at least one feature may be associated with further information of the first signal component. For instance, at least one feature of the at least one feature may be associated with the zero-crossing rate (ZCR) of the first signal component. The zero-crossing rate maybe the rate of sign-changes along a signal, i.e., the rate at which the signal changes from positive to negative or back.
As an example, the at least one feature may be denoted as vector x(i), where λί' denotes the time index. For instance, an extractor may configured to output the at least one feature maybe configured to operate on a frame-by-frame basis, i.e., for instance, X={x(l), ... ,χ(λ)} may represent a sequence of feature vectors from the first signal component of frames 1 to λ. For instance, λ designates the current frame.
Thus, for instance this at least one feature may be any well-suited features of the first signal component which may be used to exploit a correlation and/or spectral dependencies to a signal parameter of the second signal component in the second frequency band. For instance, an estimator may be configured to estimate the at least one parameter at least based on the at least one feature .
As an example, the estimator may apply a MMSE criterion, a Maximum a Posteriori (MAP) criterion, a Maximum Likelihood (ML) criterion, a ZF criterion or any other well-suited criterion for estimating the at least one parameter at least based on the at least one feature. As an example, the estimator may not apply any information extracted from or based on the second signal component in order to estimate the at least one parameter. Thus, for instance, the at least one parameter is estimated based on features that are determined only from the first signal component.
As an example, the estimator may be configured to perform the estimation based on a Hidden Markov Model (HMM) .
According to one embodiment, at least one feature of said at least one feature is a set of spectral envelope
representatives of one out of: the first signal component, and the enhanced first signal component.
For instance, the at least one feature of said at least one feature may be represented by vector x = {x'(0),... , x'(MF' - 1)} , wherein x'(y') with j e 0,... , MF' - l} may be a representative of the energy of the first signal/enhanced first signal in the jth sub-band of a set of MF' - 1 frequency sub-bands of the first frequency band. As an example, the sub-bands of this set of Mp' - l frequency sub-bands of the first frequency band may overlap in the frequency range. According to one embodiment, said estimating at least one parameter is further based on at least one previous feature which has been previously determined based on the first signal component .
As an example, the estimating may be based on sequence X={x(A-m), ... , x (λ-l ) , x (λ) } , wherein χ(λ) may represent the current at least one feature and x(X-m), ... χ(λ-1) may represent the m previous at least one features. Thus, in this case m≥>l holds, m representing an integer.
As an example, under assumption of applying an MMSE estimation, the estimated at least one parameter, exemplarily denoted as vector y , may be estimated by means of MMSE criterion for estimation of arameter vector y:
Figure imgf000015_0001
Equation (3) may be solved by the conditional expectation iw = \ X)
Of course, any other well-suited criterion may used for estimating the at least one parameter based on sequence X={x(X-m), ... , x (λ-1) , x (λ) } .
It is now assumed, as an example, that at least one parameter of the at least one parameter is a set of spectral envelope representatives of the second signal component. For instance, the estimated at least one parameter ma be represented by vector with
Figure imgf000015_0002
j e
Figure imgf000016_0001
-l} may represent the estimated energy of the second signal in the jth sub-band of a set of MF' -I frequency subbands of the second frequency band. As an example, the sub-bands of this set of MF'-\ frequency subbands of the second frequency band may overlap in the frequency range.
According to one embodiment, at least one parameter of said at least one parameter is a set of spectral envelope representatives of the second signal component, and said estimating at least one parameter is further based on a codebook, the codebook comprising a plurality of sets of spectral envelope representatives associated with the second signal component.
For instance, this codebook may represent a precomputed codebook C = {yl,...,yMc} for the parameter vectors y comprising
Mc entries. For instance, this codebook may be obtained with the LBG algorithm presented by Y. Linde, A. Buzo, R.M. Gray in "An algorithm for vector quantizer design,", IEEE
Transactions on Communications, vol. 28, no. 1, pp. 84-95, Jan. 1980. Of course, any other well-suited codebook may also be used, e.g. depending on the type of signal enhancement processing .
As an example, using the codebook C, the MMSE estimate may be expressed as:
Figure imgf000016_0002
which may represent a weighted sum over the Mc centroids of the codebook C. The weights P(j>(. | ) represent a posteriori probabilities based on the determined sequence X. Thus, according to equation (4) , an a posteriori probability based on the determined sequence X is determined for each of the plurality of sets of spectral envelope representatives of the codebook, and a weighted sum over the plurality of sets of of spectral envelope representatives is determined, wherein each set of spectral envelope representatives j). is weighted by the respective a posteriori probability \X) .
According to one embodiment, said estimating comprises: Determining for each of the plurality of sets of spectral envelope representatives of the codebook an a posteriori probability at least based on the at least one feature, and determining a weighted sum over the plurality of sets of spectral envelope representatives, wherein each set of spectral envelope representatives of the plurality of sets of spectral envelope representatives is weighted by the respective a posteriori probability.
For instance, the respective a posteriori probability may be determined based on an offline training phase. Equation (4) shows an exemplary determining of the weighted sum over the plurality of sets of spectral envelope representatives.
According to one embodiment, determining the at least one a posteriori probability is based on a Hidden Markov Model (HMM) . This HMM may be trained during an offline training phase. As an example, the HMM techniques described by B. Geiser, H. Taddei and P. Vary in "Artificial Bandwidth Extension without Side Information for ITU-T G.729.1," in Proceedings of European Conference on Speech Communication and Technology (INTERSPEECH) , Antwerp, Belgium, August 2007 may be applied.
According to one embodiment, said performing a signal enhancement processing on the second signal is based on a combined signal enhancement processing comprising:
Determining a first set of at least one value configured to be used to perform a signal enhancement processing on the second signal component at least based on the at least one parameter, determining a second set of at least one value configured to be used to perform a signal enhancement processing on the second signal component not based on the at least one parameter, and combining the first set of at least one value and the second set of at least one value.
As an example, the combined set of at least one value may be used as the at least one value explained with respect to the at least one filter. Thus, the combined set of at least one value may be used to perform a signal enhancement processing on the second signal component, e.g. in time-domain or in the frequency domain. For instance, the first signal enhancement method and the second signal enhancement method may be combined in order to provide an increased signal enhancement of the second signal component.
For instance, the first set of at least one value may comprise Mp' values and the second set of at least one value may comprise MF' values, and a combiner may be configured to combine one value of the first set of at least one value with one value of the second set of at lest one value to one combined value of a set of at least one combined value, wherein the set of at least one combined value comprises MF' combined values .
For instance, the at least one value of the first set of at least one value may represent at least one filter coefficient, wherein a filter adapted to this at least one filter coefficient may be used to perform the signal enhancement processing on the second signal component, as explained above. Determining this first set of at least one value may depend on the applied signal enhancement processing.
For instance, the at least one value of the second set of at least one value may represent at least one filter coefficient, wherein a filter adapted to this at least one filter coefficient may be used to perform the signal enhancement processing on the second signal component, as explained above. Determining this second set of at least one value may depend on the applied signal enhancement processing.
For instance, the combining may be performed adaptively. As an example, this adaptively combining may be performed on the basis of determined signal and/or noise parameters.
According to one embodiment, the first set of at least one value represents a first set of at least two gain factors and said second set of at least one value represent a second set of at least two gain factors, wherein each of the at least two gain factors of the first and second set is associated with a sub-band frequency component of the second signal component .
As an example, the first set of at least two gain factors may be configured to be used to perform a spectral weighting of the second signal based on weighting a sub-band frequency of the second signal associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors of the first set. Further, as an example, the second set of at least two gain factors may be configured to be used to perform a spectral weighting of the second signal based on weighting a sub-band frequency of the second signal associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors of the second set.
As an example, an SNR estimator may be configured to estimate at least one SNR representative based on the at least one parameter and the second signal component. This at least one SNR representative is associated with the second signal component. For instance, this at least one SNR representative may comprise a plurality of SNR representatives, wherein each of the plurality of SNR representatives is associated with one sub-band of a plurality of frequency sub-bands of the second frequency band. As an example, the plurality of frequency sub-bands may comprise MF' -l frequency sub-bands of the second frequency band, and, for instance, the sub-bands of this set of M' - \ frequency sub-bands of the second frequency band may overlap in the frequency range. As an example, the plurality of SNR representatives may comprise S2(j) a set of Mp' -\ a priori SNR values ζ2(]') = , wherein
N2U)
Figure imgf000021_0001
represents one of the at least one parameter.
For instance, an entity may be configured to estimate the j noise powers N2(j) with j e {0,...,MF' -1} . As an example, the estimation of the jth noise power may be written as
Figure imgf000021_0002
wherein represents the energy of the audio signal in the jth sub-band of the second frequency band.
Furthermore, as an example, the plurality of SNR
representatives may further comprise a set of MF' -\ a posteriori SNR values y2(j) = W)|2
N2(j)
In this exemplary case, the first set of at least two gain factors is determined at least based on the plurality of SNR representatives. For instance, the at least two gain factors of the first set may be represented by Ghwe(j) with
j e 0,...,MF' -1} , wherein Ghwe(j) represents the gain factor associated with the jth sub-band of the MF' —\ frequency sub-bands of the second frequency band. As an example, under the exemplary assumption that the signal enhancement processing is a noise reduction processing, G!me(j) may be calculated as
Figure imgf000022_0001
or Ghwe(j) may be calculated based on f2(j) and ξ2ϋ) r e.g. as described by Y. Ephraim and D. Malah in "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-32, no. 6, pp. 1109-1121, December 1984. For instance, the entity may be configured to determine the first set of at least two gain factors at least based on the plurality of SNR representatives based on any other well-suited method or as described by J. S. Lim and A. V. Oppenheim in "Enhancement and Bandwidth Compression of Noisy Speech", Proceedings of the IEEE, vol. 67, no. 12, pp. 1585-1604, December 1979.
Of course, depending on the type of signal enhancement processing, another calculation of gain factors Ghwe(j) may be applied .
The first set of at least one value may be associated with a first signal enhancement processing which is based on the at least one parameter determined on basis of the first signal component . Furthermore, for instance, the second set of at least two gain factors may be configured to be used to perform a spectral weighting of the second signal based on weighting a sub-band frequency of the second signal associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors of the second set. For instance, the at least two gain factors of the second set may be represented by Gcom (j) with j e {0,... , MF' - 1} , wherein Gcom, (j) represents the gain factor associated with the jth sub-band of the MF' - \ frequency sub-bands of the second frequency band .
As an example, the second set of at least one value may be associated with a second signal enhancement processing which is not based on the at least one parameter determined on basis of the first signal component. For instance, this second signal enhancement processing may represent a conventional signal enhancement processing.
As an example, under the assumption that the signal enhancement processing represents a noise reduction processing, an entity may be configured to perform a noise power estimation, e.g. as explained by R. Martin in "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics", IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 501-512, 2001, and to perform SNR estimation, e.g. as mentioned above, and to calculate the second set of at least one value, e.g., as mentioned above with respect to calculating the first set of at least one value. According to one embodiment, the sub-bands associated with the gain factors are associated with a first frequency resolution, wherein the combined first set of at least two gain factors and second set of at least two gain factor are expanded to a second frequency resolution, the second frequency resolution being higher than the first frequency resolution .
As an example, the sub-bands associated with the gain factors are associated with a first frequency resolution may overlap in the frequency band.
For instance, as explained with respect to equation (1), the second signal component in the frequency domain may be represented by MF spectral coefficients. Furthermore, with respect to the signal processing explained with respect to the M' sub-bands throughout this specification, MF>MF' may hold, i.e., the signal processing explained with respect to the Mp' sub-bands throughout this specification may represent a signal processing performed with the first frequency resolution representing a lower frequency resolution compared to the frequency solution of the spectral components of first and second signal components (assuming a frequency domain signal processing) . The reduction of frequency resolution may allow for an increased suppression of musical tones and may correspond to the properties of the human auditory system where the frequency selectivity decreases with higher frequencies. For instance, MF gain factors of the second set may be determined being associated with MF sub-bands of the second frequency band, and an entity may be configured to decrease the frequency resolution from MF to MF' , i.e. this entity may be configured to output MF' gain factors. This decrease of frequency resolution may be performed based on combining frequency-bins using overlapping Hann windows. Thus, the variance of the gain factors over time may be reduced.
For instance, an entity may be configured to expand the frequency resolution of the combined set of gain factors from Mp' to MF . For instance, this may be performed based on using overlap-add of scaled Hann windows, e.g., the same Hann windows as mentioned above. Based on the frequency expanded combined set of gain factors the combined signal enhancement processing may be performed, e.g. by means of spectral weighting .
According to one embodiment, said combining is performed based on at least one cross-fading factor, wherein one of the at least one cross-fading factor combines one value of the at least one value of the first set of at least one value with one value of the at second set of at least one value.
As an example, the first set of at least one value may be represented by G, ( ) with j e [θ,. , . ,Μ -l and the second set of at least one value may be represented by G2(j) with j e [θ,. , . ,Μ - 1} , wherein M≥l holds. For instance, G, (y) may correspond to Gbwe (j) , G2(j) may correspond to Gcom(j) and M may correspond to MF' . A combiner may be configured to combine the first set of at least one value and the second set of at least one value by means of a set of at least one cross-fading factor, wherein this set of at least one cross-fading factor may be represented by c(j) with j e {θ,. , . , - 1} . For instance, the jth combined value c(j) may be a function of the jth value G, (/) of the first set of at least one value and the jth value G2(j) of the second set of at least one value and at least of the jth cross-fading factor a(j) .
As an example, the jth combined value c(j) of the set of combined values may be determined as follows: cU) = a(j) · Gl(j) + (1 - a(J)) ·<¾(j) ( 7 )
Any other well-suited combining based on the set of at least one cross-fading factor may also be applied.
Thus, for instance, a weighting between the influence of the first signal enhancement processing and the second signal processing may be performed.
According to one embodiment, each of the at least one cross-fading factor is calculated on the basis of one value of the first set of at least one value, of one value of the second set and of at one reference value of a set of at least one reference value, the set of at least one reference value representing at least one reference value configured to be used to perform a reference enhanced signal processing on the second signal component. As an example, the first set of at least one value may be represented by G,(y) with j e{0,..., -l} , the second set of at least on value may be represented by G2(j) , and the set of at least one reference value may be represented by Gr(j) . The at least one reference value is configured to be used to perform a reference enhanced signal processing on the second signal component. For instance, the at least one reference value may represent reference filter values or weighting factor configured to perform a reference filtering in order to perform the enhanced signal processing on the second signal component. E.g., each of the at least one reference value may be determined during a training process for each frame. The method of determining each of the at least one reference value may depend on the signal enhancement processing. For instance, during this training process reference signal parameters may be determined in order to obtain the at least one reference value.
For instance, a cross-fading factor a(j) of the set of at least on cross fading factors may be determined based on the respective value G,(j) of the first set of at least on value, based on the respective value G2(j) of the second set of at least on value and based on the respective cross-fading factor a(j) of the set of at least on cross fading factor. For instance, this may be performed as follows:
Figure imgf000027_0001
As an example, assuming that the signal enhancement processing represents a noise reduction on the audio signal as mentioned above, the at least one reference value Gr(j) may represent at least one reference weighting factor which may be determined from a clean audio and noise signal. For instance, this determining may be performed on a reference
|72(/)|2
a posteriori SNR Yr(j) = - -γ and/or on a reference a priori
SNR ξ may represent the noise
Figure imgf000028_0001
power of the jth sub-band of the second signal component and wherein |S2( )|2 represents power of jth sub-component of second signal, associated with the jth sub-band.
According to one embodiment, said at least one cross-fading vector is determined based on a plurality of reference cross-fading factors and estimations of signal parameters associated with the first and second frequency band, wherein the plurality of reference cross-fading factors has been determined during a training process, wherein each reference cross-fading factor of the plurality of reference
cross-fading factors is associated with at least one signal parameter of the first frequency band and at least one signal parameter of the second frequency band.
For instance, during a training process, which may be performed for every frame λ and/or for every sub-band of the j sub-bands of the second frequency band, a plurality of reference cross-fading vectors may be determined, wherein each reference cross-fading factor of the plurality of reference cross-fading factors is associated with respective at least one signal parameter of the first frequency band and a respective at least one signal parameter of the second frequency band.
The at least one signal parameter of the first frequency band and the at least one signal parameter of the second frequency band may represent any well-suited signal parameter associated with the first signal component and the second signal component.
As an example, based on training data, a look-up table for the estimation of cross-fading values o j) may be generated for each of the j sub-bands. For instance, px may represent the signal parameter of the first frequency band and 2 may represent the signal parameter of the second frequency band, wherein pi and P2 are determined for each cross-fading value oc{j) obtained during a training process based on training data, pi and P2 may be quantized, and the final look-up table may provide one reference cross-fading value (j) for each quantized combination of pi and p2.
For instance, pi may represent a first SNR associated with the first frequency band and p2 may represent a second SNR associated with the second frequency band.
As an example, the at least one signal parameter of the first frequency band may represent the averaged SNR
1
of the first signal component in the first frequency band, and the at least one signal parameter of the second frequency band may represent the reference a priori
SNR ξ ]) = of the respective jth sub-band.
Figure imgf000030_0001
For instance, as a non-limiting example, a first reference cross-fading value r(j) is obtained in a training process, e.g. for every frame λ and/or for every sub-band j, e.g. as mentioned above. In addition, the reference a priori SNR °f the respective jth sub-band and the averaged
Figure imgf000030_0002
_ l |7(y)|2
SNR ξλ =— Ύ]— — of the first signal component in the
' -U' J= |.v,,(./)|
first frequency band may be determined. On this basis, the plurality of reference cross-fading value ar(j) may be determined bein associated with the respective reference a priori SNRs sub-band and the respecti of the first
Figure imgf000030_0003
signal component in the first frequency band.
As an example, based on training data, a look-up table for the estimation of cross-fading values j) may be generated for each of the j sub-bands. Therefore, ξ,,Ο') and ξ. may be quantized (e.g., 1 dB step size or another well-suited step size) and the associated reference cross-fading values ar(j) of the j sub-bands define the at least one reference cross-fading value. Thus, the final look-up table may, rovide one reference cross-fading value ct(j) for each quantized combination of and ξ1 .
Accordingly, the first and second signal enhancement processing can be combined in an adaptive way based on the at least one reference cross-fading value.
For instance, a cross-fading value c(j) may be determined by means of interpolation between at least two reference cross-fading values.
Furthermore, as an example, the respective SNR estimates of the SNR associated with the first frequency band and the SNR associated with the second frequency band may be determined based on the above mentioned conventional techniques.
According to one embodiment, said signal enhancement processing represents noise reduction processing.
The features of the present invention and of its exemplary embodiments as presented above shall also be understood to be disclosed in all possible combinations with each other.
It is to be noted that the above description of embodiments of the present invention is to be understood to be merely exemplary and non-limiting.
Further aspects of the invention will be apparent from and elucidated with reference to the detailed description presented hereinafter. BRIEF DESCRIPTION OF THE FIGURES
In the figures show:
Fig. la is a schematic block diagram which illustrates a first exemplary apparatus;
Fig. lb is a schematic block diagram which illustrates a second exemplary apparatus;
Fig. 2a is a flow chart illustrating a first exemplary method;
Fig. 2b is a flow chart illustrating a second exemplary method;
Fig. 3a is a schematic block diagram which illustrates a first exemplary filtering;
Fig. 3b is a schematic block diagram which illustrates a second exemplary filtering;
Fig. 4a is a schematic block diagram which illustrates an exemplary entity configured to estimate the at least one parameter;
Fig. 4b is a schematic block diagram which illustrates a first exemplary determining of a first set of at least one value configured to be used to perform a signal enhancement processing on the second signal component ;
Fig. 4c is a schematic block diagram which illustrates a first exemplary entity configured to determine a second set of at least one value configured to be used to perform a signal enhancement processing on the second signal component;
Fig. 4d is a schematic block diagram which illustrates a second exemplary entity configured to determine a second set of at least one value configured to be used to perform a signal enhancement processing on the second signal component;
Fig. 5a is a schematic block diagram which illustrates an exemplary combiner; and
Fig. 5b is a schematic block diagram which illustrates a second exemplary combiner.
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS OF THE INVENTION
In the following detailed description, exemplary embodiments of the present invention will be described in the context of exemplary methods and apparatuses.
Figure la is a schematic block diagram which illustrates a first exemplary apparatus. This first exemplary apparatus will be described in conjunction with the flow chart of a first exemplary method depicted in figure 2a.
The method comprises estimating 210 at least one parameter 150 based on a first signal component 130 of an audio signal 101, the at least one parameter 150 being associated with a second signal component 140 of the audio signal 210, and the method comprises performing 220 a signal enhancement processing on the second signal component 140 at least based on the at least one parameter.
As an example, the signal enhancement processing may represent noise reduction processing, or a voice enhancement processing or any other well-suited signal enhancement processing . The first signal component is associated with a first frequency band and the second signal component is associated with a second frequency band. The first frequency band differs from the second frequency band.
For instance, the first frequency band may span a frequency range from fl to f2, and the second frequency band may span a frequency range from f3 to f4, wherein f3>fl and f4>f2 holds. As an example, the first and second frequency band may have an overlapping frequency range, i.e. f3<f2 holds, but as another example, the first and second frequency band may not overlap, i.e. f3>f2 holds. For instance, the first frequency band may span a frequency range from 50 Hz to 4 kHz, and the second frequency band may span a frequency range from 4 to 7 kHz. But of course, any other well suited frequency assignments for the first and second frequency range may be applied .
The estimated at least one parameter may represent at least one parameter which may be beneficial to increase the quality of the signal enhancement processing. The type of the at least one parameter may depend on the kind of signal enhancement processing .
For instance, the first signal component 130 may be extracted from the audio signal 101 by means of filtering, e.g. by a lowpass filter or a bandpass filter, and the second signal component 140 may be extracted from the audio signal 101 by means of filtering, e.g. by a highpass filter or a bandpass filter. Such a filtering is depicted in the exemplary systems depicted in figures lc and Id, respectively, wherein the audio signal 101 is filtered by first filter 103 in order to output the first signal component 130 and filtered by second filter 104 in order to output the second signal component 140. For instance, the first and second filters 103, 104 may further be configured to perform a downsampling . As an example, the first filter 103 and the second filter 104 may represent a 2-channel Finite Impulse Response (FIR) Quadrature Mirror Filter (QMF) bank.
Entity 110 is configured to estimate the at least one parameter 150 at least based on the first signal component 130. For instance, this estimation is not based on the second signal component 140.
As an example, at least one parameter of the at least one parameter 150 may be estimations of signal properties of the second signal component 140. For instance, these signal properties may comprise statistical information of the spectral characteristics of the second signal component. E.g., at least one parameter of the at least one parameter may represent a set of spectral envelope representatives of the second signal component. Thus, this set of spectral envelope representatives may be estimations of
representations of the spectral envelope of the second signal component 140. For instance, this set of spectral envelope representatives may comprise at least one energy value, wherein each of the at least one energy value is associated with one frequency sub-band of the second signal component. Furthermore, at least one parameter of the at least one parameter may represent further information with respect to the second signal component. Based on this at least one parameter 150, which has been estimated based on the first signal component 130, a signal enhancement processing is performed to the second signal component 140, as indicated by entity 120 in figure la, wherein entity 120 is configured to output an enhanced second signal component 142. For instance, in case the signal enhancement processing represents a noise reduction processing, the enhanced second signal component 142 may represent a noise reduced second signal component 142 or the weighting gains to perform the noise reduction in the frequency domain.
Thus, spectral dependencies may be exploited between the first signal component 130 and the second signal component 140 in order to perform the signal enhancement processing on the second signal component 140. Accordingly, spectral dependencies between the first frequency band and the second frequency band may be used to improve the signal enhancement processing .
The dashed arrow depicted in figure la illustrates that there may be a further signal processing with respect to the at least one parameter 150 before signal enhancement processing to the second signal component 140 is performed.
For instance, entity 120 may comprise at least one filter 145 configured to perform a signal enhancement processing to the second signal component 140, as depicted in figure lb. As an example, one filter 145 of this at least one filter may be configured to be adapted to at least one value 150' , i.e. this at least one value 150' may be configured to be used to perform a signal enhancement processing to the second signal component 140. For instance, this at least one value 150' may represent at least one filter coefficient. Accordingly, as an example, the signal enhancement processing to the second signal component 140 may be performed based on the at least one parameter 150 by means of filter 145, wherein the at least one value 150' may be determined at least based on the at least one parameter 150.
For instance, this at least one value 150' may comprise at least one value configured to be used to perform a signal enhancement processing to the second signal component 140 in the time-domain. As an example, this at least one value may represent at least one filter coefficient of a filter configured to filter the second signal component 140 in the time-domain. For instance, this filter may represent a Finite Impulse Response (FIR) filter or an Infinite Impulse Response (IIR) filter. As an example, this at least one value may be determined based on a Minimum Mean Squared Error (MMSE) criterion or a Zero Forcing (ZF) criterion or any other well-suited criterion. In this exemplary case, a signal enhancement processing to the second signal component is performed by filtering the second signal component by means of the respective filter in the time-domain.
Figure lc depicts such a filtering in the time-domain, wherein filter 141 is adapted to the at least one value 150' and configured to perform a signal enhancement processing to the second signal component 140. Accordingly, filter 141 is configured to output an enhanced second signal component 142, wherein filter 141 is configured to perform this signal enhancement based on the at least one parameter 150. Similarly, filter 131 is configured to perform a signal enhancement processing to the first signal component 130 and to output an enhanced first signal component 132. This filter 131 may be adapted to at least one value 116 configured to be used to perform a signal enhancement processing to the first signal component 130. For instance, this at least one value 116 may be determined based on the first signal component 130. The system depicted in figure lc comprises a third and fourth filter 193, 194 and a combiner 195 configured to combine the enhanced first signal component 132 and the enhanced second signal component 142 to a combined enhanced output audio signal 199 having the same bandwidth like the input audio signal 101. For instance, the third filter and fourth filter 193, 194 may be configured to perform an upsampling to the respective input signal. Furthermore, as an example, the third and fourth filter may represent a FIR QMF bank configured to combine the enhanced first and second signal components 132, 142 to the wideband output signal 199.
Furthermore, as another example, the at least one value 150' may comprise at least one value configured to be used to perform a signal enhancement processing to the second signal component 140 in the frequency-domain. Figure Id depicts such a filtering in the frequency-domain, wherein converters 170 are configured to perform a conversion from time-domain to frequency-domain. Thus, converters 170 may convert the time domain first and second signal components 130, 140 into frequency domain first and second signal components 130' , 140', respectively. As an example, the converter 170 may be configured to output MF subcomponents representing the respective first and second signal component 130' , 140' in the frequency-domain, wherein each subcomponent of the MF subcomponents is associated with one of MF sub-bands in the first and second frequency band.
For instance, converter 170 may represent a Fourier
Transformation, e.g. a FFT, a DFT, S FT or any other well-suited transformation. Filter 141' is adapted to the at least one value 150' and configured to perform a signal enhancement processing to the second signal component 140' in the frequency domain. Accordingly, filter 141' is configured to output an enhanced second signal component 142' in the frequency domain, wherein filter 141' is configured to perform this signal enhancement based on the at least one parameter 150.
Similarly, filter 131' is configured to perform a signal enhancement processing to the first signal component 130' in the frequency domain and to output an enhanced first signal component 132' in the frequency domain. This filter 131' may be adapted to at least one value 116 configured to be used to perform a signal enhancement processing to the first signal component 130. For instance, this at least one value 116 may be determined based on the first signal component 130. The system depicted in figured Id further comprises converters 175 configured to perform a conversion from frequency-domain to time-domain. Thus, converters 175 may convert the enhanced frequency-domain first and second signal components 132', 142' to enhanced time-domain first and second signal components 132, 142, respectively. Of course, the converters 170 and 175 may be re-arranged. For instance, time-domain to frequency-domain conversion may be applied to the audio signal 101 before being filtered by the first and second filter 103, 104, and/or frequency-domain -to time-domain conversion may be applied to output signal 199 after being combined by combiner 195. In this exemplary case, the first, second, third and fourth filter 103, 104, 193, 194 and combiner 195 are configured to operate in the frequency domain .
For instance, the at least one value 150' may comprise at least two gain factors, wherein each of the at least two gain factors is associated with a separate sub-band of the second frequency band. As an example, these at least two gain factors may be determined based on a MMSE criterion or a ZF criterion or any other well-suited criterion. In this exemplary case, a signal enhancement processing to the second signal component 140' may be performed based on weighting a frequency component of the second signal 140' associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors. Thus, for instance, each of the at least two gain factors is associated with one sub-band of the second frequency band.
Figure 3a is a schematic block diagram which illustrates a first exemplary filtering in the frequency domain based on the at least one value 150' configured to be used to perform a signal enhancement processing to the second signal component 140' in the frequency-domain. As an example, the second signal component 140' comprises MF sub-components 301, 302 ... 303 representing the second signal component 140' in the frequency-domain, wherein each subcomponent 301, 302 ...303 of the MF subcomponents is associated with one of MF sub-bands in the second frequency band. The at least two gain factors 150 are represented by MF gain factors 321, 322 ...
323 which are fed to filter 141' . For instance, a first multiplier 331 multiplies the first subcomponent 301 with the first gain factor 321, a second multiplier 332 multiplies the second subcomponent 302 with the second gain factor 322, and so on until the MFth multiplier 333 multiplies the MFth subcomponent 301 with the MF th gain factor 333. Thus, the filter 141' is configured to output MF enhanced
sub-components 311, 312 ...313 which have been weighted by the respective gain factors of the M gain factors, wherein these MF enhanced sub-components 311, 312 ... 313 may represent the enhanced second signal component 142'. As an example, the filter 131' depicted in figure Id may be realized in the same way as filter 141' depicted in figure 3. Of course, any other weighting implementation may be applied in order to perform the signal enhancement processing to the second (or first) signal component.
Thus, in the time-domain as well as in the frequency domain a signal enhancement processing on the second signal component 140 may be performed based on the at least one parameter determined on the basis of the first signal component 130.
Figure la depicts that the at least one parameter 150 is estimated based on the first signal component 130. This has to be understood, for instance, that the at least one parameter may be estimated based on the first signal component 130 in the time-domain, or based on the first signal component 130' in the frequency-domain, or based on the enhanced signal component 132 in the time-domain, or based on the enhanced signal component 132' in the frequency domain.
Fig. 2b depicts a flow chart illustrating a second exemplary method based on the first exemplary method.
This second exemplary method comprises performing 205 a signal enhancement processing to the first signal component 130, 130' in order to obtain an enhanced first signal component 132, 132'. For instance, this signal enhancement processing to the first signal component 130, 130' may be performed as explained with respect to figures lb and lc, but any other well-suited signal enhancement may also be applied. Then, estimating 210 the at least one parameter is performed based on the enhanced first signal component 132, 132' . Accordingly, the input of entity 110 depicted in figure la may be replaced with an enhanced first signal component 132, 132' . Thus, regarding the estimation of the at least one parameter 150, it has to be understood that the term based on the first signal component may include based on the enhanced first signal component.
Fig. 4a depicts a schematic block diagram which illustrates an exemplary entity 110' configured to estimate the at least one parameter 150 based on the first signal component 130. For instance, this exemplary entity 110' may represent the entity 110 depicted in figure la.
For instance, the input 430 depicted in figure 4a may represent the first signal component 130, 130' or the enhanced first signal component 132, 132' . In the sequel, it is referred to the first signal component 130, 130', but the enhanced first signal component 132, 132' may also be used as input.
Entity 110' comprises a unit 410 and an estimator 420. The unit 410 is configured to determine at least one feature 440 based on the first signal component 130, 130' representing at least one signal parameter being associated with the first signal component 130, 130' . At least one feature of the least one feature 440 may represent a set of spectral envelope representatives of the first signal component 130, 130' . For instance, the at least one feature of the at least one feature may be at least one energy representative, wherein each of the at least one energy representative is associated with the energy of the first signal component or the enhanced first signal component in a frequency sub-band of the first frequency band. As an example, at least one feature of the at least one feature 440 may be Nc mel-frequency cepstral coefficients (MFCCs). Thus the frequency sub-bands associated with the first frequency band are equally spaced on the mel scale.
Furthermore, at least one feature of the at least one feature may comprise further information of the first signal component 130, 130'. For instance, at least one feature of the at least one feature may be associated with the zero-crossing rate (ZCR) of the first signal component 130, 130' . The zero-crossing rate may be the rate of sign-changes along a signal, i.e., the rate at which the signal changes from positive to negative or back.
As an example, the at least one feature 440 may be denoted as vector x(i), where ί' denotes the time index. For instance, unit 410 may operate on a frame-by-frame basis, i.e., for instance, X= { x ( 1 ) , ... ,χ(λ) } may represent a sequence of feature vectors from the first signal component 130, 130' of frames 1 to λ. For instance, λ designates the current frame.
The estimator 420 is configured to estimate the at least one parameter 150 at least based on the at least one feature 440. For instance, the estimator 420 may be configured to perform the estimation based on a Hidden Markov Model (HMM) .
For instance, the estimator may apply a MMSE criterion, a Maximum a Posteriori (MAP) criterion, a Maximum Likelihood (ML) criterion, a Zero Forcing (ZF) criterion or any other well-suited criterion for estimating the at least one parameter at least based on the at least one feature 440. As an example, the estimator 420 may not apply any information extracted from or based on the second signal component 140, 140' in order to estimate the at least one parameter 150. Thus, for instance, the at least one parameter is estimated based on features that are determined only from the first signal component 130, 130'.
The estimation may be based on at least one feature which has been previously determined by unit 410. As an example, the estimating may be based on sequence X={x(X-m), ..., χ(λ-1), ,χ(λ)}, wherein χ(λ) may represent the current at least one feature and x(A-m) ... χ(λ-1) may represent the m previous at least one features. Thus, in this case m≥l holds, m representing an integer.
As an example, under assumption of applying an MMSE estimation, the at least one estimated parameter 150, exemplarily denoted as vector y , may be estimated by means of MMSE criterion for estimation of parameter vector y:
Figure imgf000045_0001
For instance, equation (9) may be solved by the conditional expectation yMUSE =E{ \ X)
It is now assumed, as an example, that at least one parameter of the at least one parameter represents a set of spectral envelope representatives of the second signal component 140, 140' . For instance, the estimated at least one parameter may be represented by vector y SMS) S2(M„'.-!) wherein with j€ 0,...,MF'-Vj may represent the estimated energy
Figure imgf000045_0002
of the second signal in the j th sub-band of a set of MF' -1 frequency sub-bands of the second frequency band. As an example, the sub-bands of this set of MF' -1 frequency sub-bands of the second frequency band may overlap in the frequency range.
Furthermore, the estimation may further be based on a codebook, the codebook comprising a plurality of sets of spectral envelope representatives associated with the second signal component 140. This codebook may represent a precomputed codebook C = {*,,..., Mc) for the parameter vectors y comprising Mc entries. For instance, this codebook may be obtained with the LBG algorithm presented by Y. Linde, A. Buzo, R.M. Gray in "An algorithm for vector quantizer design,", IEEE Transactions on Communications, vol. 28, no. 1, pp. 84-95, Jan. 1980. Of course, any other well-suited codebook may also be used.
Using the codebook C, the MMSE estimate may be expressed as: yMMSE - P(y, \ x) , do)
Figure imgf000046_0001
which may represent a weighted sum over the Mc centroids of the codebook C. The weights represent a posteriori probabilities based on the determined sequence X. Thus, according to equation (10) , an a posteriori probability based on the determined sequence X is determined for each of the plurality of sets of spectral envelope representatives of the codebook, and a weighted sum over the plurality of sets of spectral envelope representatives is determined, wherein each set of spectral envelope representatives is weighted by the respective a posteriori probability |JSf).
For instance, determining the at least one a posteriori probability may be based on a Hidden Markov Model (HMM) . This HMM may be trained during an offline training phase. As an example, the HMM techniques described by B. Geiser, H. Taddei and P. Vary in "Artificial Bandwidth Extension without Side Information for ITU-T G.729.1," in Proceedings of European Conference on Speech Communication and Technology
(INTERSPEECH) , Antwerp, Belgium, August 2007 may be applied.
Thus, the entity 110' depicted in figures 4a is configured to estimate at least one parameter 150 based on the first signal component 130, 130'. Figure 4b is a schematic block diagram which illustrates a first exemplary determining of a first set of at least one value 450 configured to be used to perform a signal enhancement processing on the second signal component 140, 140' . For instance, this at least one value 450 may represent at least one filter coefficient, wherein a filter adapted to this at least one filter coefficient may be used to perform the signal enhancement processing on the second signal component 140, 140', as explained above. As an example, the at least one value 450 may represent the first set of at least two gain factors configured to be used to perform a spectral weighting of the second signal 140' based on weighting a sub-band frequency of the second signal associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors, as exemplarily depicted in figure 3a or 3b.
As an example, under the assumption that the at least one value 450 represents the first set of at least two gain factors, entity 460 may represent an SNR estimator 460 which is configured to estimate at least one SNR representative 465 based on the at least one parameter 150 and the second signal component 140, 140' . This at least one SNR representative is associated with the second signal component 140, 140'. For instance, this at least one SNR representative may comprise a plurality of SNR representatives, wherein each of the plurality of SNR representatives is associated with one sub-band of a plurality of frequency sub-bands of the second frequency band. As an example, the plurality of frequency sub-bands may comprise MF' -I frequency sub-bands of the second frequency band, and, for instance, the sub-bands of this set of MF' -\ frequency sub-bands of the second frequency band may overlap in the frequency range. As an example, the plurality of SNR representatives may comprise a set of MF' -I a priori SNR values wherein N2U) may
Figure imgf000048_0001
represent the estimated noise power of the th sub-band of the second signal component and wherein represents one
Figure imgf000048_0002
of the at least one parameter 150. For instance, the entity 460 may be configured to estimate the j noise powers N2(j) with j e 0,...,MF' -1} . For instance, the estimation of the jth noise power may be written as
Figure imgf000048_0003
wherein |K,(J)|2 represents the energy of the audio signal in the jth sub-band of the second frequency band.
Furthermore, as an example, the plurality of SNR
representatives may further comprise a set of MF' -1 a posteriori SNR values 2(j) =
Figure imgf000048_0004
In this exemplary case, i.e., under the assumption that the at least one value 450 represents the first set of at least two gain factors, entity 470 may be configured to determine the first set of at least two gain factors at least based on the plurality of SNR representatives. For instance, the at least two gain factors of the first set may be represented by Ghwe{j) with j e
Figure imgf000049_0001
-1} , wherein Ghwe(j) represents the gain factor associated with the jth sub-band of the MF'-\ frequency sub-bands of the second frequency band. As an example, under the exemplary assumption that the signal enhancement processing is a noise reduction processing, Ghwe(j) may be calculated as
Ghwe(J)= or (12a)
G^U) = T^ (12 )
or Ghwe(j) may be calculated based on 2(j) and ξ201 r e.g. as described by Y. Ephraim and D. Malah in "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-32, no. 6, pp. 1109-1121, December 1984. For instance, entity 470 may be configured to determine the first set of at least two gain factors at least based on the plurality of SNR representatives based on any other well-suited method or as described by J. S. Lim and A. V. Oppenheim in "Enhancement and Bandwidth Compression of Noisy Speech", Proceedings of the IEEE, vol. 67, no. 12, pp. 1585-1604, December 1979.
As another example, under the assumption that the at least one value represents at least one filter coefficient of a time-domain filter, the at least one value 465 may comprise a plurality of SNR representatives or other statistical representatives of the second signal component which may be used by entity 470 to determine the at least one filter coefficient .
The first set of at least one value 450 may be associated with a first signal enhancement processing which is based on the at least one parameter 150 determined on basis of the first signal component 130, 130' .
Figure 4c is a schematic block diagram which illustrates a first exemplary entity 480 configured to determine a second set of at least one value 490 configured to be used to perform a signal enhancement processing on the second signal component 140, 140' . For instance, this at least one value 490 may represent at least one filter coefficient, wherein a filter adapted to this at least one filter coefficient may be used to perform the signal enhancement processing on the second signal component 140, 140', as explained above. As an example, the at least one value 490 may represent second set of at least two gain factors configured to be used to perform a spectral weighting of the second signal 140' based on weighting a sub-band frequency of the second signal associated with a sub-band of the second frequency band with the respective gain factor of the at least two gain factors, as exemplarily depicted in figure 3a or 3b. For instance, the at least two gain factors of the second set may be represented by Gconv(j) with j≡{0,... , MF' - 1} , wherein Gconv(j) represents the gain factor associated with the jth sub-band of the MF'—\ frequency sub-bands of the second frequency band.
The second set of at least one value 490 may be associated with a second signal enhancement processing which is not based on the at least one parameter 150 determined on basis of the first signal component 130, 130' . For instance, this second signal enhancement processing may represent a conventional signal enhancement processing.
As an example, under the assumption that the signal enhancement processing represents a noise reduction processing, entity 480 may be configured to perform a noise power estimation, e.g. as explained by R. Martin in "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics", IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 501-512, 2001, and to perform SNR estimation, e.g. as mentioned above with respect to figure 4b, and to calculate the set of at least one value 490, e.g. as mentioned above with respect to figure 4b.
Figure 4d is a schematic block diagram which illustrates a second exemplary entity 480' configured to determine a second set of at least one value 490' configured to be used to perform a signal enhancement processing on the second signal component 140' in the frequency domain. For instance, the second signal component 140' in the frequency domain, e.g. determined as explained with respect to figure Id, may be represented by MF spectral coefficients at frequency bin k and frame λ given by:
Y2( ,k) = S2(A,k) + N2(A,k) , (13) where S2( ,k) and N2( ,k) represent the spectral coefficients of the audio and the noise signal. For the sake of brevity, the frame index λ is omitted in the following. Furthermore, MF>MF' may hold, i.e., the signal processing explained with respect to the MF' sub-bands throughout this specification may represent a signal processing performed with a lower frequency resolution compared to the frequency solution of the first and second signal components 130' , 140' outputted by converters 170 depicted in figure Id. In case the signal enhancement processing comprises noise reduction, the reduction of frequency resolution may allow for an increased suppression of musical tones and may correspond to the properties of the human auditory system where the frequency selectivity decreases with higher frequencies. For instance, entity 480' may be configured to output MF gain factors being associated with MF sub-bands of the second frequency band, and an entity 495 is configured to decrease the frequency resolution from MF to MF' , i.e. the entity 495 may be configured to output MF' gain factors. This decrease of frequency resolution may be performed based on combining frequency-bins using overlapping Hann windows. Thus, the variance of the gain factors may be reduced.
For instance, under the assumption that MF > MF' holds, the second exemplary filtering depicted in figure 3b may be used for performing the spectral weighting based on the at least one value 150' , wherein the at least one value 150' ' represent at least two gain factors associated with frequency resolution MF' . Entity 350 may be configured to expand the frequency resolution from MF' to MF . For instance, this may be performed based on using overlap-add of scaled Hann windows, e.g. the same Hann windows as mentioned above. Thus, as an example, entity 350 is configured to output a MF gain factors 150' ' which are representatives of inputted MF' gain factors 150' . The remaining signal processing of the second exemplary filtering corresponds to the first exemplary filtering .
Figure 5a is a schematic block diagram which illustrates an exemplary combiner 510 configured to combine the first set of at least one value 450 and the second set of at least one value 490 to a combined set of at least one value 550. As an example, the combined set of at least one value 550 may be used as the at least one value 150' explained with respect to one of figures lb, lc, Id, 3a, 3b and 4b. Thus, the combined set of at least one value 550 may be used to perform a signal enhancement processing on the second signal component 140, 140', e.g. in time-domain or in the frequency domain. For instance, the first signal enhancement method and the second signal enhancement method may be combined in order to provide an increased signal enhancement of the second signal component 140, 140'.
For instance, the first set of at least one value 450 may comprise MF' values and the second set of at least one value
490 may comprise MF' values, and the combiner 510 may be configured to combine one value of the first set of at least one value with one value of the second set of at lest one value to one combined value of the set of at least one combined value 550, wherein the set of at least one combined value comprises Mp' combined values. This combining may be performed adaptively. As an example, this adaptively combining may be performed on the basis of determined signal and/or noise parameters.
Figure 5b is a schematic block diagram which illustrates a second exemplary combiner 510' configured to combine the first set of at least one value 450 and the second set of at least one value 490 to a combined set of at least one value 550, wherein the second exemplary combiner 510' is based on the first exemplary combiner 510.
As an example, the first set of at least one value 450 may be represented by G,( ) with j e {θ,.,., -l} and the second set of at least on value 490 may be represented by G,(y) with j€ {θ,..., -lj- , wherein M > \ holds. For instance, G,( ) may correspond to Glme j) , G2 (j) may correspond to Gconv (j) and M may correspond to MF' .
The combiner 550' may be configured to combine the first set of at least one value 450 and the second set of at least one value 490 by means of a set of at least one cross-fading factor, wherein this set of at least one cross-fading factor may be represented by c (j) with j e {θ,... , M -l} . For instance, the jth combined value c(y) may be a function of the j th value
G,(j) of the first set of at least one value 450 and the jth value G,(/) of the second set of at least one value 490 and at least of the jth cross-fading factor a(j) .
Thus, for instance, said combining may be performed based on at least one cross-fading factor, wherein one of the at least one cross-fading factor combines one value of the at least one value of the first set of at least one value 450 with one value of the at least second set of at least one value 490.
As an example, as exemplarily depicted in figure 5b, the jth combined value c(j) 550' of the set of combined values 550 may be determined as follows : c(j) = «(./) G,(j) +(1 - «(/)) ·G2(./) (14)
For example, reference sign 545 may be associated with the set of cross-fading factors a(j) , and reference sign 535 may be associated with a second set of cross-fading factors a2U) wherein in this exemplary case a2(j)=l-a(j) holds. The multipliers 535 and 540 and the adder 560 may perform the mathematical operations and may be implemented -times.
Of course, the second set of cross-fading factors 535 may be independent from the set of cross-fading factors 545, or there may exist another correlation between the set of cross-fading factors 545 and the second set of cross-fading factors 535.
For instance, each of the at least one cross-fading factor is calculated on the basis of one value of the first set of at least one value 450, of one value of the second set of at least one value 490 and of at one reference value of a set of at least one reference value, the set of at least one reference value representing at least one reference value configured to be used to perform a reference enhanced signal processing on the second signal component. As an example, the first set of at least one value 450 may be represented by G(j) with j e {θ,... ,M -1} , the second set of at least one value 490 may be represented by G2(j) , and the set of at least one reference value may be represented by Gr(j) . The at least one reference value is configured to be used to perform a reference enhanced signal processing on the second signal component. E.g., each of the at least one reference value may be determined during a training process for each frame. The method of determining each of the at least one reference value may depend on the signal enhancement processing. For instance, during this training process reference signal parameters may be determined in order to obtain the at least one reference value.
For instance, a cross-fading factor o(j) of the set of at least on cross fading factors may be determined based on the respective value G^j) of the first set of at least on value
450, the respective value G2(j) of the second set of at least on value 490 and the respective cross-fading factor (j) of the set of at least on cross fading factor, e.g. as follows:
Figure imgf000056_0001
As an example, assuming that the signal enhancement processing represents a noise reduction on the audio signal as mentioned above, the at least one reference value Gr(j) may represent at least one reference weighting factor which may be determined from a clean audio and noise signal. For instance, this determinin may be performed on a reference a posterio /or on a reference a priori
SNR ς ( /) = may represent the noise
Figure imgf000057_0001
power of the jth sub-band of the second signal component and wherein 2(./)| represents power of jth sub-component of second signal, associated with the jth sub-band.
For instance, during a training process, which may be performed for every frame λ and/or for every sub-band of the j sub-bands of the second frequency band, a plurality of reference cross-fading vectors may be determined, wherein each reference cross-fading factor of the plurality of reference cross-fading factors is associated with a first SNR associated with the first frequency band and with a second SNR associated with the second frequency sub-band.
As an example, a first reference cross-fading value ar(j) is obtained in a training process, e.g. for every frame λ and/or for every sub-band j, e.g. as mentioned above. In addition, the reference a priori SNR espective jth
sub-band and the averaged of the first
Figure imgf000057_0002
signal component in the first frequency band may be determined . Based on training data, a look-up table for the estimation of cross-fading values a(j) may be generated for each of the j sub-bands. Therefore, ζ,.(.ί) and ξΧι, may be quantized (e.g., 1 dB step size or another well-suited step size) and the associated reference cross-fading values cr(j) of the j sub-bands define the at least one reference cross-fading value. Thus, the final look-up table may provide one reference cross-fading value (x(j) for each quantized combination of ,.(y) and ξ1. , ξΓϋ) representing the SNR associated with the second frequency band and ξλ], representing the SNR associated with the first frequency band. Thus, the SNR associated with the second frequency band may represent a SNR associated with a sub-band of the second frequency band.
Accordingly, the first and second signal enhancement processing can be combined in an adaptive way based on the at least one reference cross-fading value.
A cross-fading value a(j) may be determined by means of interpolation between at least two reference cross-fading values .
Furthermore, the respective SNR estimates of the SNR associated with the first frequency band and the SNR associated with the second frequency band may be determined based on the above mentioned conventional techniques.
Furthermore, it is readily clear for a person skilled in the art that the logical blocks in the schematic block diagrams as well as the flowchart and algorithm steps presented in the above description may at least partially be implemented in electronic hardware and/or computer software, wherein it may depend on the functionality of the logical block, flowchart step and algorithm step and on design constraints imposed on the respective devices to which degree a logical block, a flowchart step or algorithm step is implemented in hardware or software. The presented logical blocks, flowchart steps and algorithm steps may for instance be implemented in one or more digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable devices. The computer software may be stored in a variety of
computer-readable storage media of electric, magnetic, electro-magnetic or optic type and may be read and executed by a processor, such as for instance a microprocessor. To this end, the processor and the storage medium may be coupled to interchange information, or the storage medium may be included in the processor.
Any presented connection in the described embodiments is to be understood in a way that the involved components are operationally coupled. Thus, the connections can be direct or indirect with any number or combination of intervening elements, and there may be merely a functional relationship between the components.
Any of the processors mentioned in this text could be a processor of any suitable type. Any processor may comprise but is not limited to one or more microprocessors, one or more processor (s) with accompanying digital signal processor (s), one or more processor (s) without accompanying digital signal processor (s), one or more special-purpose computer chips, one or more field-programmable gate arrays (FPGAS), one or more controllers, one or more application-specific integrated circuits (ASICS) , or one or more computer (s) . The relevant structure/hardware has been programmed in such a way to carry out the described function.
Any of the memories mentioned in this text could be implemented as a single memory or as a combination of a plurality of distinct memories, and may comprise for example a read-only memory, a random access memory, a flash memory or a hard disc drive memory etc.
Moreover, any of the actions described or illustrated herein may be implemented using executable instructions in a general-purpose or special-purpose processor and stored on a computer-readable storage medium (e.g., disk, memory, or the like) to be executed by such a processor. References to 'computer-readable storage medium' should be understood to encompass specialized circuits such as FPGAs, ASICs, signal processing devices, and other devices.
It will be understood that all presented embodiments are only exemplary, that features of these embodiments may be omitted or replaced and that other features may be added. Any mentioned element and any mentioned method step can be used in any combination with all other mentioned elements and all other mentioned method step, respectively. It is the intention, therefore, to be limited only as indicated by the scope of the claims appended hereto.

Claims

1. A method comprising:
estimating at least one parameter based on a first signal component of an audio signal, the at least one parameter being associated with a second signal component of the audio signal; wherein
the first signal component is associated with a first frequency band and the second signal component is associated with a second frequency band; and
performing a signal enhancement processing on the second signal component at least based on the at least one parameter .
2. The method according to claim 1, comprising performing a signal enhancement processing on the first signal component in order to obtain an enhanced first signal component, wherein said estimating at least one parameter is performed on basis of the enhanced first signal component .
3. The method of one of the preceding claims, wherein at least one parameter of said at least one parameter represents a set of spectral envelope representatives of the second signal component.
4. The method according to one of the preceding claims, comprising determining at least one feature based on the first signal component representing at least signal parameter being associated with the first signal component, wherein said estimating the at least one parameter is performed on basis of the at least one feature .
5. The method according to claim 4, wherein at least one feature of said at least one feature is a set of spectral envelope representatives of one out of:
the first signal component; and
the enhanced first signal component.
6. The method according to one of claims 4 and 5, wherein said estimating at least one parameter is further based on at least one previous feature which has been previously determined based on the first signal component.
7. The method of one of claims 4 to 6, wherein at least one parameter of said at least one parameter represents a set of spectral envelope representatives of the second signal component, and wherein said estimating at least one parameter is further based on a codebook, the codebook comprising a plurality of sets of spectral envelope representatives associated with the second signal component .
8. The method of claim 7, wherein said estimating comprises: determining for each of the plurality of sets of spectral envelope representatives of the codebook an a posteriori probability at least based on the at least one feature; and
determining a weighted sum over the plurality of sets of spectral envelope representatives, wherein each set of spectral envelope representatives of the plurality of sets of spectral envelope representatives is weighted by the respective a posteriori probability.
9. The method of claim 8, wherein determining the at least one a posteriori probability is based on a Hidden Markov Model .
10. The method according to one of claims 1 to 9, wherein said performing a signal enhancement processing on the second signal is based on a combined signal enhancement processing comprising:
determining a first set of at least one value configured to be used to perform a signal enhancement processing on the second signal component at least based on the at least one parameter,
determining a second set of at least one value configured to be used to perform a signal enhancement processing on the second signal component not based on the at least one parameter, and
combining the first set of at least one value and the second set of at least one value.
11. The method according to claim 10, wherein said first set of at least one value represents a first set of at least two gain factors and said second set of at least one value represent a second set of at least two gain factors, wherein each of the at least two gain factors of the first and second set is associated with a sub-band frequency component of the second signal component.
12. The method according to claim 11, wherein the sub-bands associated are associated with a first frequency resolution, the method comprising expanding the combined first set of at least two gain factors and second set of at least two gain factor to a second frequency resolution, the second frequency resolution being higher than the first frequency resolution.
13. The method according to one of claims 10 to 12, wherein said combining is performed based on at least one cross-fading factor, wherein one of the at least one cross-fading factor combines one value of the at least one value of the first set of at least one value with one value of the second set of at least one value.
14. The method according to claim 13, wherein each of the at least one cross-fading factor is calculated on the basis of one value of the first set of at least one value, of one value of the second set and of at one reference value of a set of at least one reference value, the set of at least one reference value representing at least one reference value configured to be used to perform a reference enhanced signal processing on the second signal component .
15. The method according to one of claim 13 and 14, comprising determining said at least one cross-fading vector based on a plurality of reference cross-fading factors and estimations of signal parameters associated with the first and second frequency band, wherein the plurality of reference cross-fading factors has been determined during a training process, wherein each reference cross-fading factor of the plurality of reference cross-fading factors is associated with at least one signal parameter of the first frequency band and at least one signal parameter of the second frequency band.
16. The method according to one of the preceding claims, wherein said signal enhancement processing represents noise reduction processing.
17. An apparatus comprising:
means for estimating at least one parameter based on a first signal component of an audio signal, the at least one parameter being associated with a second signal component of the audio signal; wherein the first signal component is associated with a first frequency band and the second signal component is associated with a second frequency band; and
means for performing a signal enhancement processing on the second signal component at least based on the at least one parameter.
18. The apparatus according to claim 17, comprising means for performing a signal enhancement processing on the first signal component in order to obtain an enhanced first signal component, wherein said estimating at least one parameter is performed on basis of the enhanced first signal component.
19. The apparatus according to one of claims 17 to 18, wherein the apparatus is one of:
a chip;
an integrated circuit; and an audio device.
20. An apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code, with the at least one processor, configured to cause the apparatus at least to perform:
estimating at least one parameter based on a first signal component of an audio signal, the at least one parameter being associated with a second signal component of the audio signal; wherein
the first signal component is associated with a first frequency band and the second signal component is associated with a second frequency band; and
performing a signal enhancement processing on the second signal component at least based on the at least one parameter .
21. The apparatus according to claim 20, wherein the at least one memory and the computer program code, with the at least one processor, are configured to cause the apparatus to perform a signal enhancement processing on the first signal component in order to obtain an enhanced first signal component, wherein said estimating at least one parameter is performed on basis of the enhanced first signal component.
22. The apparatus according to one of claims 20 to 21, wherein the apparatus is one of:
a chip;
an integrated circuit; and
an audio device.
23. A computer program code causing an apparatus to perform the following when executed on a processor:
estimating at least one parameter based on a first signal component of an audio signal, the at least one parameter being associated with a second signal component of the audio signal; wherein
the first signal component is associated with a first frequency band and the second signal component is associated with a second frequency band; and
performing a signal enhancement processing on the second signal component at least based on the at least one parameter .
24. A computer readable storage medium in which the computer program code according to claim 23 is stored.
PCT/EP2009/061884 2009-09-14 2009-09-14 Signal enhancement processing WO2011029484A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2009/061884 WO2011029484A1 (en) 2009-09-14 2009-09-14 Signal enhancement processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2009/061884 WO2011029484A1 (en) 2009-09-14 2009-09-14 Signal enhancement processing

Publications (1)

Publication Number Publication Date
WO2011029484A1 true WO2011029484A1 (en) 2011-03-17

Family

ID=41820732

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2009/061884 WO2011029484A1 (en) 2009-09-14 2009-09-14 Signal enhancement processing

Country Status (1)

Country Link
WO (1) WO2011029484A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2710590B1 (en) * 2011-05-16 2015-10-07 Google, Inc. Super-wideband noise supression
CN113409802A (en) * 2020-10-29 2021-09-17 腾讯科技(深圳)有限公司 Voice signal enhancement processing method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040024596A1 (en) * 2002-07-31 2004-02-05 Carney Laurel H. Noise reduction system
US20070088541A1 (en) * 2005-04-01 2007-04-19 Vos Koen B Systems, methods, and apparatus for highband burst suppression
US20080243496A1 (en) * 2005-01-21 2008-10-02 Matsushita Electric Industrial Co., Ltd. Band Division Noise Suppressor and Band Division Noise Suppressing Method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040024596A1 (en) * 2002-07-31 2004-02-05 Carney Laurel H. Noise reduction system
US20080243496A1 (en) * 2005-01-21 2008-10-02 Matsushita Electric Industrial Co., Ltd. Band Division Noise Suppressor and Band Division Noise Suppressing Method
US20070088541A1 (en) * 2005-04-01 2007-04-19 Vos Koen B Systems, methods, and apparatus for highband burst suppression

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PETER VARY: "Speech Enhancement by Conditional Estimation - Noise Reduction, Error Concealment & Bandwidth Extension, what makes the difference?", PROCEEDINGS OF INTERNATIONAL WORKSHOP ON ACOUSTIC ECHO AND NOISE CONTROL (IWAENC), 30 September 2008 (2008-09-30), Seattle, WA, USA, XP002576239, Retrieved from the Internet <URL:http://www.ind.rwth-aachen.de/de/veroeffentlichungen/download/publabel/vary08a/> [retrieved on 20100331] *
THOMAS ESCH, FLORIAN HEESE, BERND GEISER UND PETER VARY: "Wideband Noise Suppression Supported by Artificial Bandwidth Extension Techniques", PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 15 March 2010 (2010-03-15), Dallas, TX, USA, pages 4790 - 4793, XP002576240, Retrieved from the Internet <URL:http://www.ind.rwth-aachen.de/de/veroeffentlichungen/download/publabel/esch10/> [retrieved on 20100331] *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2710590B1 (en) * 2011-05-16 2015-10-07 Google, Inc. Super-wideband noise supression
CN113409802A (en) * 2020-10-29 2021-09-17 腾讯科技(深圳)有限公司 Voice signal enhancement processing method, device, equipment and storage medium
CN113409802B (en) * 2020-10-29 2023-09-15 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for enhancing voice signal

Similar Documents

Publication Publication Date Title
Shao et al. An auditory-based feature for robust speech recognition
Graf et al. Features for voice activity detection: a comparative analysis
CN103026407B (en) Bandwidth extender
US8930184B2 (en) Signal bandwidth extending apparatus
JP5666444B2 (en) Apparatus and method for processing an audio signal for speech enhancement using feature extraction
JP5097504B2 (en) Enhanced model base for audio signals
Xiao et al. Normalization of the speech modulation spectra for robust speech recognition
Kim et al. Nonlinear enhancement of onset for robust speech recognition.
EP2476116A1 (en) A method and an apparatus for processing an audio signal
Pulakka et al. Speech bandwidth extension using gaussian mixture model-based estimation of the highband mel spectrum
Kornagel Techniques for artificial bandwidth extension of telephone speech
Siam et al. A novel speech enhancement method using Fourier series decomposition and spectral subtraction for robust speaker identification
KR20050049103A (en) Method and apparatus for enhancing dialog using formant
WO2016137696A1 (en) Systems and methods for speech restoration
JP2009223210A (en) Signal band spreading device and signal band spreading method
WO2011029484A1 (en) Signal enhancement processing
Krini et al. Model-based speech enhancement
Alku et al. Linear predictive method for improved spectral modeling of lower frequencies of speech with small prediction orders
Elshamy et al. Two-stage speech enhancement with manipulation of the cepstral excitation
Rehr et al. Robust DNN-based speech enhancement with limited training data
Xiao et al. Inventory based speech enhancement for speaker dedicated speech communication systems
Esch et al. Wideband noise suppression supported by artificial bandwidth extension techniques
Seyedin et al. New features using robust MVDR spectrum of filtered autocorrelation sequence for robust speech recognition
Roy Single channel speech enhancement using Kalman filter
Graf Design of Scenario-specific Features for Voice Activity Detection and Evaluation for Different Speech Enhancement Applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09782979

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09782979

Country of ref document: EP

Kind code of ref document: A1