WO2021043412A1 - Noise reduction in a headset by employing a voice accelerometer signal - Google Patents

Noise reduction in a headset by employing a voice accelerometer signal Download PDF

Info

Publication number
WO2021043412A1
WO2021043412A1 PCT/EP2019/073760 EP2019073760W WO2021043412A1 WO 2021043412 A1 WO2021043412 A1 WO 2021043412A1 EP 2019073760 W EP2019073760 W EP 2019073760W WO 2021043412 A1 WO2021043412 A1 WO 2021043412A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
microphone signal
filter
mode
voice
Prior art date
Application number
PCT/EP2019/073760
Other languages
French (fr)
Inventor
Riitta Niemisto
Ville MYLLYLÄ
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to CN201980099938.4A priority Critical patent/CN114341978A/en
Priority to PCT/EP2019/073760 priority patent/WO2021043412A1/en
Publication of WO2021043412A1 publication Critical patent/WO2021043412A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal

Definitions

  • the present invention relates to the field of speech/voice and audio signal processing, specifically to noise reduction in headsets.
  • the invention proposes a device and a method, respectively, for noise reduction in a headset.
  • the device and the method address the problem, that when headsets are used, e.g. for making phone calls in a noisy environment, the speech/voice signal obtained by a microphone is corrupted with noise.
  • a conventional approach to the noise problem is using same methods that are used in telephony for noise reduction: For instance, spectral subtraction (which dates back to the late 70’s, see e.g. ‘S.F. Boll, " Suppression of acoustic noise in speech using spectral subtraction " IEEE Trans. Acoust. Signal Proc., vol. ASSP-27, Apr. 1979’. Further, Voice Activity Detection (VAD) and beamformers have also been studied over the years.
  • VAD Voice Activity Detection
  • a newer approach is to utilize a voice accelerometer to generate a voice accelerometer signal.
  • a voice accelerometer signal has the advantage that its Signal-to-Noise Ratio (SNR) is inherently better for user speech/voice, but only in a narrow frequency range, usually below 1 kHz.
  • SNR Signal-to-Noise Ratio
  • FIG. 8 compares a spectrum of an output signal of the “Air Pods” (a) with a spectrum of an output signal produced by a device (b) based on the conventional approach described above.
  • US20180367882 Al (see FIG. 9) describes that a voice accelerometer signal is used for estimating speech, in order to have a more precise speech estimate in noise suppression (and other speech enhancement tasks).
  • US20170337933 A1 (see FIG. 10), describes that noise suppression is carried out in two stages: Firstly, in a “First Equalizer”, and secondly using gains after a filter bank (FB). The gains are adjusted according to the Voice Signature (VS), which is derived from a voice accelerometer signal.
  • the filter banks are used for providing frequency division.
  • an objective is to provide a device and a method, respectively, for noise reduction in a headset, which are able to reduce noise in a microphone signal efficiently without losing information of voice/speech components in the microphone signal.
  • restoring the user’s voice/speech in the microphone signal i.e. as picked up by the microphone
  • the presence of noise i.e. even if the SNR of the microphone signal is low.
  • embodiments of the invention base on the realization that, if the SNR of a microphone signal is too low, there is no way to restore the user’s voice/speech components in the microphone signal, by using only the microphone signal itself. Accordingly, embodiments of the invention propose using in addition a voice accelerometer signal to improve the SNR in the microphone signal. Then, noise reduction can be continued using traditional methods that offer sufficient noise reduction in moderate noise level. The SNR of the microphone signal can particularly be improved in low frequencies by utilizing the voice accelerometer signal.
  • a first aspect of the invention provides device for noise reduction in a headset, wherein the device is configured to: obtain a microphone signal, obtain a voice accelerometer signal, and adjust the voice accelerometer signal based on the microphone signal, wherein in a first mode of operation, the device is further configured to: modify the microphone signal by attenuating low frequencies, combine the modified microphone signal with the adjusted voice accelerometer signal, and output the combined signal.
  • Adjusting the voice accelerometer signal based on the microphone signal may particularly mean modifying the voice accelerometer such that its amplitude and phase is the same as, or at least similar to, the amplitude and phase of the microphone signal. This may be done by adaptive filtering using the microphone signal and the voice accelerometer signal as an input.
  • the “low frequencies” of the microphone signal, which are attenuated in the first mode may include frequencies below 1 kHz, or may include frequencies below 2 kHz, or may include frequencies below 3 kHz.
  • the frequency range that is attenuated may be selected based on the noise level in the environment of the device.
  • the device of the first aspect is able to use the voice accelerometer signal to improve the SNR of the microphone signal, in particular in the low frequencies.
  • the noise in the microphone signal can be reduced without losing information, in particular regarding voice/speech components in the microphone signal.
  • Traditional noise reduction methods can then be applied, for instance by the first device or another device, to the combined signal.
  • the device comprises a filter bank comprising a plurality of filters, wherein the filter bank is configured to combine the modified microphone signal with the adjusted voice accelerometer signal in the first mode.
  • Different filters of a filter bank or different filter banks can be employed, depending, for instance, on the SNR of the microphone signal, particularly in the low frequencies. These can be easily implemented, thus making the device of the first aspect very flexible to different environmental (noise) conditions.
  • the filter bank comprises a first filter and a second filter, the first filter being a high pass filter and the second filter being a low pass filter, and wherein in the first mode, the device is further configured to: filter the microphone signal with the first filter to obtain the modified microphone signal, and filter the adjusted voice accelerometer signal with the second filter before combining it with the modified microphone signal.
  • the high pass (first) filter is configured to attenuate the low frequencies of the microphone signal.
  • the low pass (second) filter is configured to attenuate high frequencies of the voice accelerometer signal.
  • Typical voice accelerometer signals have content only in the lower frequencies, e.g. below 3 kHz, or even only below 1 kHz.
  • the low pass filter may accordingly be configured to pass frequencies of the voice accelerometer signal below 3 kHz, or even only below 1 kHz.
  • adjusting the voice accelerometer signal based on the microphone signal is performed in a sampling rate lower than a sampling rate of the microphone signal, and the second filter is configured to interpolate the adjusted voice accelerometer signal to the sampling rate of the microphone signal.
  • the device in a second mode of operation, is further configured to delay the microphone signal, and output the delayed microphone signal.
  • the device is accordingly configured to produce a delay of the microphone signal.
  • the delay may be caused by filtering the microphone signal.
  • the microphone signal is not changed when producing the delay, i.e. the content of the microphone signal remains the same, but is shifted in time.
  • the delaying of the microphone signal allows smooth switching between the first mode and second mode of operation and vice versa.
  • the second mode of operation is preferably used, if no noise reduction is necessary.
  • the voice accelerometer signal may be not combined with the microphone signal, and the low frequencies of the microphone signal may be not attenuated.
  • the different operation modes make the device very flexible to and efficient in different noise conditions.
  • the device is further configured to apply noise suppression on the combined signal in the first mode and/or apply noise suppression on the delayed microphone signal in the second mode.
  • the signal output by the device can be subjected to further noise suppression or reduction, in particular based on traditional methods.
  • the filter bank is configured to delay the microphone signal in the second mode.
  • the device in the second mode, is further configured to filter the microphone signal with the first filter and with the second filter to obtain the delayed microphone signal.
  • the first and second filter may be configured such that the content of the microphone signal is not changed, when it is filtered using both filters.
  • the microphone signal is only delayed by the filtering procedure.
  • the device is configured to adjust the filters of the filter bank, when changing between the first mode and the second mode.
  • different filters can be used for the filter bank in the first mode and in the second mode, respectively.
  • a high pass filter configured to suppress the low frequencies, the suppression depending on the noise level, is desired.
  • filters to delay, but not change, the microphone signal are desired.
  • the filters of the filter bank each comprise less than 10 taps, in particular each comprise 7 taps.
  • the filters of the filter bank are short, and thus are particularly efficient to implement.
  • the device is configured to determine whether the environment is quiet or noisy, and to select the first mode if the environment is noisy and select the second mode if the environment is quiet.
  • the device could also select a third mode if the environment is quiet, in which the microphone signal is output without delaying it.
  • “Quiet” or “noisy” environments may be distinguished based on the noise level or some more sophisticated method. That means, if the noise level is below a threshold value, the environment may be regarded as “quiet”, and if the noise level is above the threshold value, the environment may be regarded as “noisy”.
  • different threshold values may be used to distinguish more precisely, e.g., between “noisy”, “very noisy” etc., or between “ordinary noise” (e.g. cafeteria, car, street, station, school yard etc.) and “wind noise”.
  • the noise level may be considered (only) in a certain frequency range.
  • An exemplary threshold value for distinguishing between a “quiet” and “noisy” environment may be a noise sound pressure level of 30 dB, or of 50 dB, or of 60 dB.
  • the device is further configured to, if determining that the environment is noisy, determine whether the environment is windy, and enhance the attenuation of the low frequencies of the microphone signal in the first mode if the environment is windy.
  • a “windy” environment may be determined based on wind noise that is present in the microphone signal. Wind noise can be detected, and can particularly be distinguished from ordinary noise. For instance, wind noise may be highly non-stationary, and may vary, e.g. include wind gusts. Ordinary noise may be more stationary and even.
  • the device is configured to change gradually between the first mode and the second mode.
  • the filters of the filter bank may be adjusted such that they are between the filters used in the first mode and the filters used in the second mode. For instance, an attenuation of the low frequencies may gradually increase when changing gradually form the second mode to the first mode.
  • the device comprises an adaptive filter configured to adjust the voice accelerometer signal based on the microphone signal, in particular to adjust the amplitude and phase of the voice accelerometer signal to the amplitude and phase of the microphone signal.
  • the amplitude and phase is adjusted to be the same, or at least similar, regarding speech/voice components. That is, the speech/voice components within the adjusted voice accelerometer signal may match the speech/voice components within the microphone signal.
  • the device in particular being a headset, comprises a microphone to generate the microphone signal, and a voice accelerometer to generate the voice accelerometer signal.
  • the device may further include the adaptive filter and/or the filter bank and/or processing circuitry.
  • a second aspect of the invention provides a method for noise reduction in a headset, wherein the method comprises: obtaining a microphone signal, obtaining a voice accelerometer signal, and adjusting the voice accelerometer signal based on the microphone signal, wherein in a first mode of operation, the method further comprises: modifying the microphone signal by attenuating low frequencies, combining the modified microphone signal with the adjusted voice accelerometer signal, and outputting the combined signal.
  • the method comprises using a fdter bank comprising a plurality of fdters to combine the modified microphone signal with the adjusted voice accelerometer signal in the first mode.
  • the filter bank comprises a first filter and a second filter, the first filter being a high pass filter and the second filter being a low pass filter, and wherein in the first mode, and the method further comprises filtering the microphone signal with the first filter to obtain the modified microphone signal, and filtering the adjusted voice accelerometer signal with the second filter before combining it with the modified microphone signal.
  • adjusting the voice accelerometer signal based on the microphone signal is performed in a sampling rate lower than a sampling rate of the microphone signal, and the second filter interpolates the adjusted voice accelerometer signal to the sampling rate of the microphone signal.
  • the method further comprises delaying the microphone signal, and outputting the delayed microphone signal.
  • the method further comprises applying noise suppression on the combined signal in the first mode and/or applying noise suppression on the delayed microphone signal in the second mode.
  • the filter bank delays the microphone signal in the second mode.
  • the method further comprises filtering the microphone signal with the first filter and with the second filter to obtain the delayed microphone signal.
  • the method comprises adjusting the filters of the filter bank, when changing between the first mode and the second mode.
  • the filters of the filter bank each comprise less than 10 taps, in particular each comprise 7 taps.
  • the method comprises determining whether the environment is quiet or noisy, and selecting the first mode if the environment is noisy and select the second mode if the environment is quiet.
  • the method further comprises, if determining that the environment is noisy, determining whether the environment is windy, and enhancing the attenuation of the low frequencies of the microphone signal in the first mode if the environment is windy.
  • the method comprises changing gradually between the first mode and the second mode.
  • the method comprises using an adaptive filter for adjusting the voice accelerometer signal based on the microphone signal, in particular for adjusting the amplitude and phase of the voice accelerometer signal to the amplitude and phase of the microphone signal.
  • the method in particular being performed in a headset, comprises using a microphone to generate the microphone signal, and using a voice accelerometer to generate the voice accelerometer signal.
  • the method of the second aspect and its implementation forms achieve all the advantages and effects described above for the device of the first aspect and its respective implementation forms. Definitions and explanations given above regarding the device of the first aspect are also valid for the method of the second aspect.
  • FIG. 1 shows a device according to an embodiment of the invention.
  • FIG. 2 shows a device according to an embodiment of the invention, particularly in the first mode of operation.
  • FIG. 3 shows a device according to an embodiment of the invention, particularly in the second mode of operation.
  • FIG. 4 shows examples of filter banks (a), (b), and (c) for a device according to an embodiment of the invention.
  • FIG. 5 shows schematically a mode selection procedure performed by a device according to an embodiment of the invention.
  • FIG. 6 compares a spectrum (a) of an output signal of a device according to an embodiment of the invention with a spectrum (b) of an output signal obtained with a conventional approach
  • FIG. 7 shows a method according to an embodiment of the invention.
  • FIG. 8 compares a spectrum (a) of an output signal of Apple’s “Air Pods” with a spectrum (b) of an output signal obtained with another conventional approach.
  • FIG. 9 shows schematically another conventional noise reduction method
  • FIG. 10 shows schematically another conventional noise reduction method. DETAILED DESCRIPTION OF EMBODIMENTS
  • FIG. 1 shows a device 100 according to an embodiment of the invention.
  • the device 100 is suitable for noise reduction in a headset, or at least suitable for supporting noise reduction in a headset.
  • the device 100 may be included in the headset.
  • the device 100 may also be the headset, for example, may be a headset suitable for a mobile phone or other mobile device.
  • the device 100 while being suitable for use in a headset, may also be suitable for use in any other microphone-based device or application.
  • the device 100 may comprise processing circuitry (not shown) configured to perform, conduct or initiate the various operations of the device 100 described herein.
  • the processing circuitry may comprise hardware and software.
  • the hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry.
  • the digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field- programmable arrays (FPGAs), digital signal processors (DSPs), or multi-purpose processors.
  • the processing circuitry comprises one or more processors and a non-transitory memory connected to the one or more processors.
  • the non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the device 100 to perform, conduct or initiate the operations or methods described herein.
  • the device 100 is configured to obtain a microphone signal 101, particularly provided by one or more microphones or of a microphone array. That means the microphone signal 101 can be provided by one microphone, or can be a signal generated by combining multiple microphones.
  • the device 100 may include the one or more microphones, in order to generate the microphone signal 101 itself. However, the device 100 can also receive the microphone signal 101 from the one or more (external) microphones, or from another device.
  • the device 100 is further configured to obtain a voice accelerometer signal 102, particularly provided by one or more voice accelerometers or a voice accelerometer array. That means the voice accelerometer signal 102 can be provided by one voice accelerometer, or can be a signal generated by combining multiple voice accelerometers.
  • the device 100 may include the one or more voice accelerometers, in order to generate the voice accelerometer signal 102 itself. However, the device 100 can also receive the voice accelerometer signal 102 from one or more voice accelerometers, or from another device.
  • a voice accelerometer may be configured to measure vibrations (e.g. in x- y- and z-direction(s)), for instance vibrations caused at/in the head of a user.
  • the vibrations may particularly be caused by speech/voice of the user, i.e. when the user speaks into the one or more microphones. Accordingly, the voice accelerometer signal 102 can reflect the speech/voice of the user, particularly at low frequencies.
  • a voice accelerometer may also be realized by a microphone, i.e. the voice accelerometer signal 102 may be a microphone signal, wherein that microphone may be placed inside the user’s ear.
  • the device 100 is further configured to adjust the voice accelerometer signal 102 based on the microphone signal 101 to obtain an adjusted voice accelerometer signal 103, e.g. by taking the microphone signal 101 as input to the adjustment procedure.
  • the adjustment can be carried out by adaptive filtering of the voice accelerometer signal 102 and the microphone signal 101.
  • the adjustment may align an amplitude and/or a phase of the microphone signal 101 with the amplitude and/or phase of the voice accelerometer signal 102, in particular, regarding speech/voice signal components (but not noise components caused by noise leaking into the accelerometer signal 102).
  • the device 100 can be operated at least in a first mode of operation (shown in FIG. 1), and particularly also in a second mode of operation (not shown in FIG. 1), and more particularly even in an intermediate mode.
  • the device 100 may accordingly be configured to change gradually between the first mode and the second mode.
  • the device 100 is further configured to modify the microphone signal 101 by attenuating low frequencies (of the microphone signal 101), in order to obtain a modified microphone signal 104.
  • noise in the microphone signal 101 reflected mostly in the low frequencies, can be reduced.
  • the device 100 is configured to combine the modified microphone signal 104 with the adjusted voice accelerometer signal 103, and to then output the combined signal 105.
  • the (adjusted) voice accelerometer signal 102 can thus be used to improve the SNR of the (modified) microphone signal 101.
  • This first mode of operation is most suitable for a noisy environment.
  • the device 100 is further configured to delay the microphone signal 101 to obtain a delayed microphone signal 204, and to output the delayed microphone signal 204 (not shown in FIG. 1, but see e.g. FIG. 3).
  • the microphone signal 101 may be not modified in the second mode apart from the delay.
  • the microphone signal 101 is not combined with the (adjusted) voice accelerometer signal 102. This second mode of operation is most suitable for a quiet environment.
  • the combined signal 105 is generated by the device 100
  • the delayed microphone signal 204 is generated by the device 100.
  • FIG. 2 shows a device 100 according to an embodiment of the invention.
  • the embodiment of the device 100 shown in FIG. 2 bases on the embodiment of the device 100 shown in FIG. 1.
  • the device 100 is shown with further, optional features in FIG. 2.
  • the device 100 of FIG. 2 includes all features of the device 100 of FIG. 1. Same features are thereby labelled with the same reference signs and function likewise.
  • FIG. 2 illustrate in particular the first mode of operation of the device 100.
  • FIG. 3 shows the same device 100 according to an embodiment of the invention as shown in FIG. 2, but illustrates in particular the second mode of operation of the device 100.
  • the device 100 may include an adaptive filter 203, which may be configured to adjust the voice accelerometer signal 102 based on the microphone signal 101.
  • the adaptive filter 203 may be used to adjust the amplitude and/or phase of the voice accelerometer signal 102 to the amplitude and/or phase of the microphone signal 101, in particular regarding voice/speech components in the signals.
  • the adaptive filter 203 may to this end receive the voice accelerometer signal 102 as an input, and may also receive the adjusted voice accelerometer signal 103 and the microphone signal 101 as a further input (particularly, this input may be a sum of the two signals 103 and 101).
  • the device 100 may include a filter bank 200, which comprises a plurality of filters 201, 202, in particular comprises a first filter 201 and a second filter 202.
  • the filter bank 200 can be configured to perform the combination of the modified microphone signal 104 with the adjusted voice accelerometer signal 103.
  • the filter bank 200 can be configured to cause the delay of the microphone signal 101.
  • the filters 201, 202 of the filter bank 200 may be adjustable, i.e. can be changed. For instance, their filtering behaviour with respect to pass amplitude and/or frequency can be changed.
  • the first filter 201 may be a high pass filter
  • the second filter 202 may be a low pass filter.
  • the microphone signal 101 may be filtered with the first filter 201 to obtain the modified microphone signal 104
  • the adjusted voice accelerometer signal 103 may be filtered with the second filter 202, before combining it with the modified microphone signal 104.
  • the microphone signal 101 may be filtered with both the first filter 201 and the second filter 202, in order to obtain the delayed microphone signal 204.
  • putting any kind of signal through a (analysis-synthesis) filter bank produces a delayed version of that signal.
  • a (analysis-synthesis) filter bank produces a delayed version of that signal.
  • the SNR of the microphone signal 101 can be improved, particularly in the low band, by firstly modifying the voice accelerometer signal 102 to have a similar amplitude and/or phase as the microphone signal 101 for user speech. This can be achieved with the adaptive filter 203 (wherein adaptation may be performed only when the user is speaking, i.e., when pitch is detected). Further, by secondly combining the signals 101 and 102 using the filter bank 200, and then potentially continuing noise reduction as in a traditional method (based on the combined signal 105).
  • the filter bank 200 that is used in the first mode or second mode can depend on the SNR of the microphone signal 101, particularly in the low frequencies.
  • the SNR of the microphone signal 101 is typically better than that of the voice accelerometer signal 102, because movement causes low level noise in the voice accelerometer signal 102.
  • the voice accelerometer signal 102 is preferably not used, but the microphone signal 101 as such and the fdter bank 200 are used (only) for producing the delay (i.e. generating the delayed microphone signal 204).
  • a noisy environment e.g.
  • the attenuation of the low frequencies of the microphone signal 101 can be implemented using a high pass filter as the first filter 201.
  • a voice accelerometer signal 102 has content only in its lower frequencies.
  • the adaptive filtering, using the adaptive filter 203 may be conducted in a lower sampling rate, and the filter bank 200 may involve an interpolation of the adaptively filtered voice accelerometer signal 103, e.g. to 16 kHz (i.e., after inserting enough zeros between the samples, the attenuation of the second filter 202 of the filter bank 200 has to be deeper).
  • adjusting the voice accelerometer signal 102 based on the microphone signal 101 may be performed in a sampling rate lower than a sampling rate of the microphone signal 101.
  • the second filter 202 may then be configured to interpolate the adjusted voice accelerometer signal 103 to the sampling rate of the microphone signal 101.
  • the filters 201 , 202 of the filter bank 200 can be selected to be very short, in which case they are particularly efficient to implement.
  • the filters 201, 202 may have less than 10 taps each, e.g., may have only 7 taps each. Due to the interpolation of the adjusted voice accelerometer signal 104, the length of the combined second filter 202 may be longer, e.g. 32.
  • FIG. 4 shows examples (a), (b), and (c) of filters 201, 202 of the filter bank 200 for a device 100 according to an embodiment of the invention, in particular for the device 100 of FIG.
  • the filters 201, 202 are most suitable for ordinary noise (not wind).
  • the first filter 201 is a high pass filter that is used to filter the microphone signal 101
  • the second filter 202 is a low pass filter that is used to filter the voice accelerometer signal 102.
  • the first filter 101 attenuates the microphone signal 101 below 3 kHz, by up to 20 dB at very low frequencies approaching 0 kHz.
  • the second filter 202 passes the voice accelerometer approximately up to 3-4 kHz.
  • the filters 201 , 202 are most suitable for wind noise.
  • the first filter 201 is a high pass filter that is used to filter the microphone signal 101
  • the second filter 202 is a low pass filter that is used to filter the voice accelerometer signal 102.
  • the first filter 101 attenuates the microphone signal 101 below 3 kHz, by up to 80 dB at very low frequencies approaching 0 kHz (and even infinity at approximately 0 kHz). That is, compared to (a), the attenuation of the low frequencies of the microphone signal 101 is significantly enhanced.
  • the second filter 202 passes the voice accelerometer approximately up to 3-4 kHz.
  • the filters 201, 202 are most suitable for ordinary noise, like in (a), but without any interpolation.
  • the first filter 201 is a high pass filter that is used to filter the microphone signal 101
  • the second filter 202 is a low pass filter that is used to filter the voice accelerometer signal 102.
  • the first filter 101 attenuates the microphone signal 101 below
  • FIG. 5 shows schematically a mode selection procedure performed by a device 100 according to an embodiment of the invention, in particular by the device 100 of FIG. 2 or FIG. 3.
  • the device 100 may determine whether the environment (its surroundings) is quiet or noisy, and can select the first mode if the environment is noisy and can select the second mode if the environment is quiet. Further, if the device 100 determines that the environment is noisy, the device 100 may further determine whether the environment is windy, and may enhance the attenuation of the low frequencies of the microphone signal 101 in the first mode if the environment is windy.
  • the mode(s) may be changed gradually to avoid noise pumping effects and other discontinuities. In very strong winds, additional attenuation of also higher frequencies (upper band) of the microphone signal 101 can be applied, e.g. by 18 dB.
  • the device 100 may comprise additionally an analysis filter bank 500, a high pass filter 501, a first mixer 502, and a second mixer 503.
  • the first mixer 503 may receive the voice accelerometer signal 101 and the microphone signal 101, as output form the analysis filter bank 500, and can determine whether the environment is quiet.
  • the second mixer 503 may receive the microphone signal 101, as output from the analysis filter bank 500 and the same microphone signal 101 after having passed the high pass filter 501, and can determine whether the environment is windy.
  • the filter bank 200 can be respectively selected, i.e. its filters 201, 202 can be selected according to the determined noise condition, and can operate on the microphone signal 101 or on the microphone signal 101 and the voice accelerometer signal 102, respectively.
  • FIG. 6 compares a spectrum (a) of an output signal of a device 100 according to an embodiment of the invention with a spectrum (b) of an output signal of a device performing a conventional approach (e.g. spectral subtraction as described in the background part).
  • the spectrum (a) is in particular the result of using the device 100 in the first mode of operation, i.e. generating the combined signal 105, and then applying a conventional noise suppression approach to the combined signal 105.
  • the spectrum (b) is in particular the result of applying the same conventional noise suppression approach directly to the microphone signal (no voice accelerometer signal is used).
  • the spectrum (a) of the output signal of the device 100 has considerably less noise than the conventional device. Notably, in strong windy conditions, the conventional approach may be used by the device 100.
  • the method 700 is suitable to reducing noise in a headset.
  • the method 700 may be performed by the device 100, particularly by the device 100 as shown in FIG. 1, 2 or 3. Details of the method 700 can be implemented as described above for the device 100.
  • the method 700 includes: a step 701 of obtaining 701 a microphone signal 101; a step 702 of obtaining a voice accelerometer signal 102; and a step 703 of adjusting the voice accelerometer signal 102 based on the microphone signal 101.
  • the method 700 further comprises a step 704 of modifying the microphone signal by attenuating low frequencies (of the microphone signal 101); a step 705 of combining the modified microphone signal 104 with the adjusted voice accelerometer signal 103; and a step 706 of outputting the combined signal 105.
  • the method 700 may further comprise a step of delaying the microphone signal 101, and a step of outputting the delayed microphone signal 204.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The present invention relates to the field of speech and audio signal processing, namely to noise reduction in headsets. In particular, the invention proposes a device and a method for noise reduction in a headset. The device is configured to obtain a microphone signal; obtain a voice accelerometer signal; and adjust the voice accelerometer signal based on the microphone signal. In a first mode of operation, the device is further configured to: modify the microphone signal by attenuating low frequencies; combine the modified microphone signal with the adjusted voice accelerometer signal; and output the combined signal.

Description

NOISE REDUCTION IN A HEADSET BY EMPLOYING A VOICE ACCELEROMETER SIGNAL TECHNICAL FIELD
The present invention relates to the field of speech/voice and audio signal processing, specifically to noise reduction in headsets. The invention proposes a device and a method, respectively, for noise reduction in a headset. The device and the method address the problem, that when headsets are used, e.g. for making phone calls in a noisy environment, the speech/voice signal obtained by a microphone is corrupted with noise.
BACKGROUND A conventional approach to the noise problem is using same methods that are used in telephony for noise reduction: For instance, spectral subtraction (which dates back to the late 70’s, see e.g. ‘S.F. Boll, " Suppression of acoustic noise in speech using spectral subtraction " IEEE Trans. Acoust. Signal Proc., vol. ASSP-27, Apr. 1979’. Further, Voice Activity Detection (VAD) and beamformers have also been studied over the years.
A newer approach is to utilize a voice accelerometer to generate a voice accelerometer signal. Such a voice accelerometer signal has the advantage that its Signal-to-Noise Ratio (SNR) is inherently better for user speech/voice, but only in a narrow frequency range, usually below 1 kHz.
US 20140093093 Al, which is related to Apple’s “Air Pods”, describes that VAD is computed from a voice accelerometer and is used for steering a beamformer and updating a noise estimate in noise suppression. FIG. 8 compares a spectrum of an output signal of the “Air Pods” (a) with a spectrum of an output signal produced by a device (b) based on the conventional approach described above.
US20180367882 Al (see FIG. 9) describes that a voice accelerometer signal is used for estimating speech, in order to have a more precise speech estimate in noise suppression (and other speech enhancement tasks). US20170337933 A1 (see FIG. 10), describes that noise suppression is carried out in two stages: Firstly, in a “First Equalizer”, and secondly using gains after a filter bank (FB). The gains are adjusted according to the Voice Signature (VS), which is derived from a voice accelerometer signal. The filter banks are used for providing frequency division.
With the above-described methods, if the SNR of the microphone signal is too low, i.e. in the presence of noise, the user’s voice/speech in the microphone signal cannot be restored. SUMMARY
In view of the above-mentioned problem and disadvantages, embodiments of the present invention aim to improve the conventional methods. An objective is to provide a device and a method, respectively, for noise reduction in a headset, which are able to reduce noise in a microphone signal efficiently without losing information of voice/speech components in the microphone signal. In particular, restoring the user’s voice/speech in the microphone signal (i.e. as picked up by the microphone) in the presence of noise, i.e. even if the SNR of the microphone signal is low, should be possible. The objective is achieved by embodiments of the invention as described in the enclosed independent claims. Advantageous implementations of the embodiments of the invention are further defined in the dependent claims.
In particular, embodiments of the invention base on the realization that, if the SNR of a microphone signal is too low, there is no way to restore the user’s voice/speech components in the microphone signal, by using only the microphone signal itself. Accordingly, embodiments of the invention propose using in addition a voice accelerometer signal to improve the SNR in the microphone signal. Then, noise reduction can be continued using traditional methods that offer sufficient noise reduction in moderate noise level. The SNR of the microphone signal can particularly be improved in low frequencies by utilizing the voice accelerometer signal.
A first aspect of the invention provides device for noise reduction in a headset, wherein the device is configured to: obtain a microphone signal, obtain a voice accelerometer signal, and adjust the voice accelerometer signal based on the microphone signal, wherein in a first mode of operation, the device is further configured to: modify the microphone signal by attenuating low frequencies, combine the modified microphone signal with the adjusted voice accelerometer signal, and output the combined signal.
Adjusting the voice accelerometer signal based on the microphone signal may particularly mean modifying the voice accelerometer such that its amplitude and phase is the same as, or at least similar to, the amplitude and phase of the microphone signal. This may be done by adaptive filtering using the microphone signal and the voice accelerometer signal as an input.
The “low frequencies” of the microphone signal, which are attenuated in the first mode, may include frequencies below 1 kHz, or may include frequencies below 2 kHz, or may include frequencies below 3 kHz. The frequency range that is attenuated may be selected based on the noise level in the environment of the device.
In the first mode, the device of the first aspect is able to use the voice accelerometer signal to improve the SNR of the microphone signal, in particular in the low frequencies. Thus, the noise in the microphone signal can be reduced without losing information, in particular regarding voice/speech components in the microphone signal. Traditional noise reduction methods can then be applied, for instance by the first device or another device, to the combined signal.
In an implementation form of the first aspect, the device comprises a filter bank comprising a plurality of filters, wherein the filter bank is configured to combine the modified microphone signal with the adjusted voice accelerometer signal in the first mode.
Different filters of a filter bank or different filter banks can be employed, depending, for instance, on the SNR of the microphone signal, particularly in the low frequencies. These can be easily implemented, thus making the device of the first aspect very flexible to different environmental (noise) conditions.
In an implementation form of the first aspect, the filter bank comprises a first filter and a second filter, the first filter being a high pass filter and the second filter being a low pass filter, and wherein in the first mode, the device is further configured to: filter the microphone signal with the first filter to obtain the modified microphone signal, and filter the adjusted voice accelerometer signal with the second filter before combining it with the modified microphone signal.
In other words, the high pass (first) filter is configured to attenuate the low frequencies of the microphone signal. The low pass (second) filter is configured to attenuate high frequencies of the voice accelerometer signal. Typical voice accelerometer signals have content only in the lower frequencies, e.g. below 3 kHz, or even only below 1 kHz. The low pass filter may accordingly be configured to pass frequencies of the voice accelerometer signal below 3 kHz, or even only below 1 kHz.
In an implementation form of the first aspect, adjusting the voice accelerometer signal based on the microphone signal is performed in a sampling rate lower than a sampling rate of the microphone signal, and the second filter is configured to interpolate the adjusted voice accelerometer signal to the sampling rate of the microphone signal.
In an implementation form of the first aspect, in a second mode of operation, the device is further configured to delay the microphone signal, and output the delayed microphone signal.
The device is accordingly configured to produce a delay of the microphone signal. The delay may be caused by filtering the microphone signal. In particular, the microphone signal is not changed when producing the delay, i.e. the content of the microphone signal remains the same, but is shifted in time. The delaying of the microphone signal allows smooth switching between the first mode and second mode of operation and vice versa.
The second mode of operation is preferably used, if no noise reduction is necessary. In this case, the voice accelerometer signal may be not combined with the microphone signal, and the low frequencies of the microphone signal may be not attenuated. The different operation modes make the device very flexible to and efficient in different noise conditions. In an implementation form of the first aspect, the device is further configured to apply noise suppression on the combined signal in the first mode and/or apply noise suppression on the delayed microphone signal in the second mode.
That is, the signal output by the device can be subjected to further noise suppression or reduction, in particular based on traditional methods.
In an implementation form of the first aspect, the filter bank is configured to delay the microphone signal in the second mode.
In an implementation form of the first aspect, in the second mode, the device is further configured to filter the microphone signal with the first filter and with the second filter to obtain the delayed microphone signal.
The first and second filter may be configured such that the content of the microphone signal is not changed, when it is filtered using both filters. The microphone signal is only delayed by the filtering procedure.
In an implementation form of the first aspect, the device is configured to adjust the filters of the filter bank, when changing between the first mode and the second mode.
That means, different filters can be used for the filter bank in the first mode and in the second mode, respectively. For instance, in the first mode, a high pass filter configured to suppress the low frequencies, the suppression depending on the noise level, is desired. In the second mode, filters to delay, but not change, the microphone signal are desired. It is also possible to adjust the filters of the filter bank, i.e. to use different filters, for different noise conditions in the first mode. For instance, in ordinary noise conditions, a moderate attenuation of the low frequencies of the microphone signal may be sufficient. In strong/windy noise conditions, a stronger attenuation of the low frequencies may be desired. In strong windy noise conditions, additional attenuation of also higher frequencies of the microphone signal can be applied, e.g. above 1 kHz, or even above 3 kHz.
Thus, a different first filter may be used, respectively, depending on the noise. In an implementation form of the first aspect, the filters of the filter bank each comprise less than 10 taps, in particular each comprise 7 taps.
Accordingly, the filters of the filter bank are short, and thus are particularly efficient to implement.
In an implementation form of the first aspect, the device is configured to determine whether the environment is quiet or noisy, and to select the first mode if the environment is noisy and select the second mode if the environment is quiet.
The device could also select a third mode if the environment is quiet, in which the microphone signal is output without delaying it.
“Quiet” or “noisy” environments (of the device, i.e. surroundings of the device) may be distinguished based on the noise level or some more sophisticated method. That means, if the noise level is below a threshold value, the environment may be regarded as “quiet”, and if the noise level is above the threshold value, the environment may be regarded as “noisy”. Of course, different threshold values may be used to distinguish more precisely, e.g., between “noisy”, “very noisy” etc., or between “ordinary noise” (e.g. cafeteria, car, street, station, school yard etc.) and “wind noise”. The noise level may be considered (only) in a certain frequency range. An exemplary threshold value for distinguishing between a “quiet” and “noisy” environment may be a noise sound pressure level of 30 dB, or of 50 dB, or of 60 dB.
In an implementation form of the first aspect, the device is further configured to, if determining that the environment is noisy, determine whether the environment is windy, and enhance the attenuation of the low frequencies of the microphone signal in the first mode if the environment is windy.
A “windy” environment may be determined based on wind noise that is present in the microphone signal. Wind noise can be detected, and can particularly be distinguished from ordinary noise. For instance, wind noise may be highly non-stationary, and may vary, e.g. include wind gusts. Ordinary noise may be more stationary and even. In an implementation form of the first aspect, the device is configured to change gradually between the first mode and the second mode.
This is beneficial to avoid noise pumping effects and other discontinuities. For instance, the filters of the filter bank may be adjusted such that they are between the filters used in the first mode and the filters used in the second mode. For instance, an attenuation of the low frequencies may gradually increase when changing gradually form the second mode to the first mode.
In an implementation form of the first aspect, the device comprises an adaptive filter configured to adjust the voice accelerometer signal based on the microphone signal, in particular to adjust the amplitude and phase of the voice accelerometer signal to the amplitude and phase of the microphone signal.
In particular, the amplitude and phase is adjusted to be the same, or at least similar, regarding speech/voice components. That is, the speech/voice components within the adjusted voice accelerometer signal may match the speech/voice components within the microphone signal.
In an implementation form of the first aspect, the device, in particular being a headset, comprises a microphone to generate the microphone signal, and a voice accelerometer to generate the voice accelerometer signal.
The device may further include the adaptive filter and/or the filter bank and/or processing circuitry.
A second aspect of the invention provides a method for noise reduction in a headset, wherein the method comprises: obtaining a microphone signal, obtaining a voice accelerometer signal, and adjusting the voice accelerometer signal based on the microphone signal, wherein in a first mode of operation, the method further comprises: modifying the microphone signal by attenuating low frequencies, combining the modified microphone signal with the adjusted voice accelerometer signal, and outputting the combined signal. In an implementation form of the second aspect, the method comprises using a fdter bank comprising a plurality of fdters to combine the modified microphone signal with the adjusted voice accelerometer signal in the first mode.
In an implementation form of the second aspect, the filter bank comprises a first filter and a second filter, the first filter being a high pass filter and the second filter being a low pass filter, and wherein in the first mode, and the method further comprises filtering the microphone signal with the first filter to obtain the modified microphone signal, and filtering the adjusted voice accelerometer signal with the second filter before combining it with the modified microphone signal.
In an implementation form of the second aspect, adjusting the voice accelerometer signal based on the microphone signal is performed in a sampling rate lower than a sampling rate of the microphone signal, and the second filter interpolates the adjusted voice accelerometer signal to the sampling rate of the microphone signal.
In an implementation form of the second aspect, in a second mode of operation, the method further comprises delaying the microphone signal, and outputting the delayed microphone signal.
In an implementation form of the second aspect, the method further comprises applying noise suppression on the combined signal in the first mode and/or applying noise suppression on the delayed microphone signal in the second mode.
In an implementation form of the second aspect, the filter bank delays the microphone signal in the second mode.
In an implementation form of the second aspect, in the second mode, the method further comprises filtering the microphone signal with the first filter and with the second filter to obtain the delayed microphone signal.
In an implementation form of the second aspect, the method comprises adjusting the filters of the filter bank, when changing between the first mode and the second mode. In an implementation form of the second aspect, the filters of the filter bank each comprise less than 10 taps, in particular each comprise 7 taps.
In an implementation form of the second aspect, the method comprises determining whether the environment is quiet or noisy, and selecting the first mode if the environment is noisy and select the second mode if the environment is quiet.
In an implementation form of the second aspect, the method further comprises, if determining that the environment is noisy, determining whether the environment is windy, and enhancing the attenuation of the low frequencies of the microphone signal in the first mode if the environment is windy.
In an implementation form of the second aspect, the method comprises changing gradually between the first mode and the second mode.
In an implementation form of the second aspect, the method comprises using an adaptive filter for adjusting the voice accelerometer signal based on the microphone signal, in particular for adjusting the amplitude and phase of the voice accelerometer signal to the amplitude and phase of the microphone signal.
In an implementation form of the second aspect, the method, in particular being performed in a headset, comprises using a microphone to generate the microphone signal, and using a voice accelerometer to generate the voice accelerometer signal.
The method of the second aspect and its implementation forms achieve all the advantages and effects described above for the device of the first aspect and its respective implementation forms. Definitions and explanations given above regarding the device of the first aspect are also valid for the method of the second aspect.
It has to be noted that all devices, elements, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities.
Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof. BRIEF DESCRIPTION OF DRAWINGS
The above described aspects and implementation forms of the present invention will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which
FIG. 1 shows a device according to an embodiment of the invention.
FIG. 2 shows a device according to an embodiment of the invention, particularly in the first mode of operation.
FIG. 3 shows a device according to an embodiment of the invention, particularly in the second mode of operation.
FIG. 4 shows examples of filter banks (a), (b), and (c) for a device according to an embodiment of the invention.
FIG. 5 shows schematically a mode selection procedure performed by a device according to an embodiment of the invention. FIG. 6 compares a spectrum (a) of an output signal of a device according to an embodiment of the invention with a spectrum (b) of an output signal obtained with a conventional approach
FIG. 7 shows a method according to an embodiment of the invention. FIG. 8 compares a spectrum (a) of an output signal of Apple’s “Air Pods” with a spectrum (b) of an output signal obtained with another conventional approach.
FIG. 9 shows schematically another conventional noise reduction method
FIG. 10 shows schematically another conventional noise reduction method. DETAILED DESCRIPTION OF EMBODIMENTS
FIG. 1 shows a device 100 according to an embodiment of the invention. The device 100 is suitable for noise reduction in a headset, or at least suitable for supporting noise reduction in a headset. The device 100 may be included in the headset. The device 100 may also be the headset, for example, may be a headset suitable for a mobile phone or other mobile device. However, the device 100, while being suitable for use in a headset, may also be suitable for use in any other microphone-based device or application.
The device 100 may comprise processing circuitry (not shown) configured to perform, conduct or initiate the various operations of the device 100 described herein. The processing circuitry may comprise hardware and software. The hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry. The digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field- programmable arrays (FPGAs), digital signal processors (DSPs), or multi-purpose processors. In one embodiment, the processing circuitry comprises one or more processors and a non-transitory memory connected to the one or more processors. The non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the device 100 to perform, conduct or initiate the operations or methods described herein.
The device 100 is configured to obtain a microphone signal 101, particularly provided by one or more microphones or of a microphone array. That means the microphone signal 101 can be provided by one microphone, or can be a signal generated by combining multiple microphones. The device 100 may include the one or more microphones, in order to generate the microphone signal 101 itself. However, the device 100 can also receive the microphone signal 101 from the one or more (external) microphones, or from another device.
The device 100 is further configured to obtain a voice accelerometer signal 102, particularly provided by one or more voice accelerometers or a voice accelerometer array. That means the voice accelerometer signal 102 can be provided by one voice accelerometer, or can be a signal generated by combining multiple voice accelerometers. The device 100 may include the one or more voice accelerometers, in order to generate the voice accelerometer signal 102 itself. However, the device 100 can also receive the voice accelerometer signal 102 from one or more voice accelerometers, or from another device. Notably, a voice accelerometer may be configured to measure vibrations (e.g. in x- y- and z-direction(s)), for instance vibrations caused at/in the head of a user. The vibrations may particularly be caused by speech/voice of the user, i.e. when the user speaks into the one or more microphones. Accordingly, the voice accelerometer signal 102 can reflect the speech/voice of the user, particularly at low frequencies. A voice accelerometer may also be realized by a microphone, i.e. the voice accelerometer signal 102 may be a microphone signal, wherein that microphone may be placed inside the user’s ear.
The device 100 is further configured to adjust the voice accelerometer signal 102 based on the microphone signal 101 to obtain an adjusted voice accelerometer signal 103, e.g. by taking the microphone signal 101 as input to the adjustment procedure. For instance, the adjustment can be carried out by adaptive filtering of the voice accelerometer signal 102 and the microphone signal 101. The adjustment may align an amplitude and/or a phase of the microphone signal 101 with the amplitude and/or phase of the voice accelerometer signal 102, in particular, regarding speech/voice signal components (but not noise components caused by noise leaking into the accelerometer signal 102).
The device 100 can be operated at least in a first mode of operation (shown in FIG. 1), and particularly also in a second mode of operation (not shown in FIG. 1), and more particularly even in an intermediate mode. The device 100 may accordingly be configured to change gradually between the first mode and the second mode. In the first mode of operation, the device 100 is further configured to modify the microphone signal 101 by attenuating low frequencies (of the microphone signal 101), in order to obtain a modified microphone signal 104. Thus, noise in the microphone signal 101, reflected mostly in the low frequencies, can be reduced. Further, the device 100 is configured to combine the modified microphone signal 104 with the adjusted voice accelerometer signal 103, and to then output the combined signal 105. The (adjusted) voice accelerometer signal 102 can thus be used to improve the SNR of the (modified) microphone signal 101. This first mode of operation is most suitable for a noisy environment.
In the second mode of operation, the device 100 is further configured to delay the microphone signal 101 to obtain a delayed microphone signal 204, and to output the delayed microphone signal 204 (not shown in FIG. 1, but see e.g. FIG. 3). In particular, the microphone signal 101 may be not modified in the second mode apart from the delay. For example, the microphone signal 101 is not combined with the (adjusted) voice accelerometer signal 102. This second mode of operation is most suitable for a quiet environment.
In the first mode, the combined signal 105 is generated by the device 100, and in the second mode the delayed microphone signal 204 is generated by the device 100. In an intermediate mode, the generated signal may be defined by generated signal = f * combined signal + (1-f) * delayed microphone signal wherein f goes gradually from zero (0) to one (1), or vice versa, if the first and second modes are gradually changed.
FIG. 2 shows a device 100 according to an embodiment of the invention. The embodiment of the device 100 shown in FIG. 2 bases on the embodiment of the device 100 shown in FIG. 1. In particular, the device 100 is shown with further, optional features in FIG. 2. Accordingly, the device 100 of FIG. 2 includes all features of the device 100 of FIG. 1. Same features are thereby labelled with the same reference signs and function likewise. FIG. 2 illustrate in particular the first mode of operation of the device 100. FIG. 3 shows the same device 100 according to an embodiment of the invention as shown in FIG. 2, but illustrates in particular the second mode of operation of the device 100.
As shown in FIG. 2 and FIG. 3, respectively, the device 100 may include an adaptive filter 203, which may be configured to adjust the voice accelerometer signal 102 based on the microphone signal 101. In particular, the adaptive filter 203 may be used to adjust the amplitude and/or phase of the voice accelerometer signal 102 to the amplitude and/or phase of the microphone signal 101, in particular regarding voice/speech components in the signals. The adaptive filter 203 may to this end receive the voice accelerometer signal 102 as an input, and may also receive the adjusted voice accelerometer signal 103 and the microphone signal 101 as a further input (particularly, this input may be a sum of the two signals 103 and 101).
As further shown in FIG. 2 and FIG. 3, respectively, the device 100 may include a filter bank 200, which comprises a plurality of filters 201, 202, in particular comprises a first filter 201 and a second filter 202. In the first mode (FIG. 2), the filter bank 200 can be configured to perform the combination of the modified microphone signal 104 with the adjusted voice accelerometer signal 103. In the second mode (FIG. 3), the filter bank 200 can be configured to cause the delay of the microphone signal 101. The filters 201, 202 of the filter bank 200 may be adjustable, i.e. can be changed. For instance, their filtering behaviour with respect to pass amplitude and/or frequency can be changed.
The first filter 201 may be a high pass filter, and the second filter 202 may be a low pass filter. In the first mode (FIG. 2), the microphone signal 101 may be filtered with the first filter 201 to obtain the modified microphone signal 104, and the adjusted voice accelerometer signal 103 may be filtered with the second filter 202, before combining it with the modified microphone signal 104. In the second mode (FIG. 3), the microphone signal 101 may be filtered with both the first filter 201 and the second filter 202, in order to obtain the delayed microphone signal 204. Notably, putting any kind of signal through a (analysis-synthesis) filter bank produces a delayed version of that signal. In the device 100 of FIG. 2, i.e. in the first mode of operation of the device 100, the SNR of the microphone signal 101 can be improved, particularly in the low band, by firstly modifying the voice accelerometer signal 102 to have a similar amplitude and/or phase as the microphone signal 101 for user speech. This can be achieved with the adaptive filter 203 (wherein adaptation may be performed only when the user is speaking, i.e., when pitch is detected). Further, by secondly combining the signals 101 and 102 using the filter bank 200, and then potentially continuing noise reduction as in a traditional method (based on the combined signal 105).
The filter bank 200 that is used in the first mode or second mode can depend on the SNR of the microphone signal 101, particularly in the low frequencies. In a quiet environment, the SNR of the microphone signal 101 is typically better than that of the voice accelerometer signal 102, because movement causes low level noise in the voice accelerometer signal 102. Then, the voice accelerometer signal 102 is preferably not used, but the microphone signal 101 as such and the fdter bank 200 are used (only) for producing the delay (i.e. generating the delayed microphone signal 204). This is the second mode as shown in FIG. 3, which is accordingly most suitable for a quiet environment. In a noisy environment (e.g. in a cafeteria, car, street, station, school yard, etc.) or in mild wind, it may be sufficient to reduce the noise in the low frequencies by e.g. 10-20 dB (through attenuation of the microphone signal 101 in the low frequencies). However, wind noise can also be very strong in the low frequencies, and extra attenuation may be needed in this case. This is the first mode as shown in FIG. 2, which is accordingly most suitable for a noisy environment, even for a windy environment. The attenuation of the low frequencies of the microphone signal 101 can be implemented using a high pass filter as the first filter 201.
Typically, a voice accelerometer signal 102 has content only in its lower frequencies. Hence, the adaptive filtering, using the adaptive filter 203, may be conducted in a lower sampling rate, and the filter bank 200 may involve an interpolation of the adaptively filtered voice accelerometer signal 103, e.g. to 16 kHz (i.e., after inserting enough zeros between the samples, the attenuation of the second filter 202 of the filter bank 200 has to be deeper). Thus, adjusting the voice accelerometer signal 102 based on the microphone signal 101 may be performed in a sampling rate lower than a sampling rate of the microphone signal 101. The second filter 202 may then be configured to interpolate the adjusted voice accelerometer signal 103 to the sampling rate of the microphone signal 101. The filters 201 , 202 of the filter bank 200 can be selected to be very short, in which case they are particularly efficient to implement. For instance, in an exemplary filter bank 200, the filters 201, 202 may have less than 10 taps each, e.g., may have only 7 taps each. Due to the interpolation of the adjusted voice accelerometer signal 104, the length of the combined second filter 202 may be longer, e.g. 32.
FIG. 4 shows examples (a), (b), and (c) of filters 201, 202 of the filter bank 200 for a device 100 according to an embodiment of the invention, in particular for the device 100 of FIG.
2 or FIG. 3.
In (a) the filters 201, 202 are most suitable for ordinary noise (not wind). The first filter 201 is a high pass filter that is used to filter the microphone signal 101, and the second filter 202 is a low pass filter that is used to filter the voice accelerometer signal 102. The first filter 101 attenuates the microphone signal 101 below 3 kHz, by up to 20 dB at very low frequencies approaching 0 kHz. The second filter 202 passes the voice accelerometer approximately up to 3-4 kHz.
In (b) the filters 201 , 202 are most suitable for wind noise. The first filter 201 is a high pass filter that is used to filter the microphone signal 101, and the second filter 202 is a low pass filter that is used to filter the voice accelerometer signal 102. The first filter 101 attenuates the microphone signal 101 below 3 kHz, by up to 80 dB at very low frequencies approaching 0 kHz (and even infinity at approximately 0 kHz). That is, compared to (a), the attenuation of the low frequencies of the microphone signal 101 is significantly enhanced.The second filter 202 passes the voice accelerometer approximately up to 3-4 kHz.
In (c), the filters 201, 202 are most suitable for ordinary noise, like in (a), but without any interpolation. The first filter 201 is a high pass filter that is used to filter the microphone signal 101, and the second filter 202 is a low pass filter that is used to filter the voice accelerometer signal 102. The first filter 101 attenuates the microphone signal 101 below
3 kHz, by up to 20 dB at very low frequencies approaching 0 kHz as in (a). The second filter 202 can be short, e.g. can have 7 taps. FIG. 5 shows schematically a mode selection procedure performed by a device 100 according to an embodiment of the invention, in particular by the device 100 of FIG. 2 or FIG. 3. Generally, the device 100 may determine whether the environment (its surroundings) is quiet or noisy, and can select the first mode if the environment is noisy and can select the second mode if the environment is quiet. Further, if the device 100 determines that the environment is noisy, the device 100 may further determine whether the environment is windy, and may enhance the attenuation of the low frequencies of the microphone signal 101 in the first mode if the environment is windy. The mode(s) may be changed gradually to avoid noise pumping effects and other discontinuities. In very strong winds, additional attenuation of also higher frequencies (upper band) of the microphone signal 101 can be applied, e.g. by 18 dB.
In particular, as shown in FIG. 5, the device 100 may comprise additionally an analysis filter bank 500, a high pass filter 501, a first mixer 502, and a second mixer 503. The first mixer 503 may receive the voice accelerometer signal 101 and the microphone signal 101, as output form the analysis filter bank 500, and can determine whether the environment is quiet. The second mixer 503 may receive the microphone signal 101, as output from the analysis filter bank 500 and the same microphone signal 101 after having passed the high pass filter 501, and can determine whether the environment is windy. The filter bank 200 can be respectively selected, i.e. its filters 201, 202 can be selected according to the determined noise condition, and can operate on the microphone signal 101 or on the microphone signal 101 and the voice accelerometer signal 102, respectively.
FIG. 6 compares a spectrum (a) of an output signal of a device 100 according to an embodiment of the invention with a spectrum (b) of an output signal of a device performing a conventional approach (e.g. spectral subtraction as described in the background part). The spectrum (a) is in particular the result of using the device 100 in the first mode of operation, i.e. generating the combined signal 105, and then applying a conventional noise suppression approach to the combined signal 105. The spectrum (b) is in particular the result of applying the same conventional noise suppression approach directly to the microphone signal (no voice accelerometer signal is used). The spectrum (a) of the output signal of the device 100 has considerably less noise than the conventional device. Notably, in strong windy conditions, the conventional approach may be used by the device 100. FIG. 7 shows a method 700 according to an embodiment of the invention. The method 700 is suitable to reducing noise in a headset. The method 700 may be performed by the device 100, particularly by the device 100 as shown in FIG. 1, 2 or 3. Details of the method 700 can be implemented as described above for the device 100.
The method 700 includes: a step 701 of obtaining 701 a microphone signal 101; a step 702 of obtaining a voice accelerometer signal 102; and a step 703 of adjusting the voice accelerometer signal 102 based on the microphone signal 101. In the first mode of operation, the method 700 further comprises a step 704 of modifying the microphone signal by attenuating low frequencies (of the microphone signal 101); a step 705 of combining the modified microphone signal 104 with the adjusted voice accelerometer signal 103; and a step 706 of outputting the combined signal 105. In the second mode of operation (not shown in FIG. 7), the method 700 may further comprise a step of delaying the microphone signal 101, and a step of outputting the delayed microphone signal 204.
The present invention has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed invention, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.

Claims

Claims
1. A device (100) for noise reduction in a headset, wherein the device (100) is configured to: obtain a microphone signal (101), obtain a voice accelerometer signal (102), and adjust the voice accelerometer signal (102) based on the microphone signal (101), wherein in a first mode of operation, the device (100) is further configured to: modify the microphone signal (101) by attenuating low frequencies, combine the modified microphone signal (104) with the adjusted voice accelerometer signal (103), and output the combined signal (105).
2. The device (100) according to claim 1, comprising: a filter bank (200) comprising a plurality of filters (201, 202), wherein the filter bank (200) is configured to combine the modified microphone signal (104) with the adjusted voice accelerometer signal (103) in the first mode.
3. The device (100) according to claim 2, wherein: the filter bank (200) comprises a first filter (201) and a second filter (202), the first filter (201) being a high pass filter and the second filter (202) being a low pass filter, and wherein in the first mode, the device (100) is further configured to: filter the microphone signal (101) with the first filter (201) to obtain the modified microphone signal (104), and filter the adjusted voice accelerometer signal (103) with the second filter (202) before combining it with the modified microphone signal (104).
4. The device (100) according to claim 3, wherein: adjusting the voice accelerometer signal (102) based on the microphone signal (101) is performed in a sampling rate lower than a sampling rate of the microphone signal (101), and the second filter (202) is configured to interpolate the adjusted voice accelerometer signal (104) to the sampling rate of the microphone signal (101).
5. The device (100) according to one of the claims 1 to 4, wherein in a second mode of operation, the device (100) is further configured to: delay the microphone signal (101), and output the delayed microphone signal (204).
6. The device (100) according to claim 5, further configured to: apply noise suppression on the combined signal (105) in the first mode and/or apply noise suppression on the delayed microphone signal (204) in the second mode.
7. The device (100) according to claim 5 or 6 and according to one of the claims 2 to 4, wherein: the filter bank (200) is configured to delay the microphone signal (101) in the second mode.
8. The device (100) according to claim 7, wherein in the second mode, the device (100) is further configured to: filter the microphone signal (101) with the first filter (201) and with the second filter (202) to obtain the delayed microphone signal (204).
9. The device (100) according to claim 7 or 8, configured to: adjust the filters (201 , 202) of the filter bank (200), when changing between the first mode and the second mode.
10. The device (100) according to claim 2 or 3 or according to one of the claims 7 to 9, wherein: the filters (201, 202) of the filter bank (200) each comprise less than 10 taps, in particular each comprise 7 taps.
11. The device (100) according to one of the claims 5 to 10, configured to: determine whether the environment is quiet or noisy, and select the first mode if the environment is noisy and select the second mode if the environment is quiet.
12. The device (100) according to claim 11, further configured to, if determining that the environment is noisy: determine whether the environment is windy, and enhance the attenuation of the low frequencies of the microphone signal (101) in the first mode if the environment is windy.
13. The device (100) according to claim 9, 11 or 12, configured to: change gradually between the first mode and the second mode.
14. The device (100) according to one of the claims 1 to 13, comprising: an adaptive filter (203) configured to adjust the voice accelerometer signal (102) based on the microphone signal (101), in particular to adjust the amplitude and phase of the voice accelerometer signal (102) to the amplitude and phase of the microphone signal (101).
15. The device (100) according to one of the claims 1 to 14, in particular being a headset, the device (100) comprising: a microphone to generate the microphone signal (101), and a voice accelerometer to generate the voice accelerometer signal (102).
16. A method (700) for noise reduction in a headset, wherein the method (700) comprises: obtaining (701) a microphone signal (101), obtaining (702) a voice accelerometer signal (102), and adjusting (703) the voice accelerometer signal (102) based on the microphone signal (101), wherein in a first mode of operation, the method (700) further comprises: modifying (704) the microphone signal (101) by attenuating low frequencies, combining (705) the modified microphone signal (104) with the adjusted voice accelerometer signal (103), and outputting (706) the combined signal (105).
PCT/EP2019/073760 2019-09-05 2019-09-05 Noise reduction in a headset by employing a voice accelerometer signal WO2021043412A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980099938.4A CN114341978A (en) 2019-09-05 2019-09-05 Noise reduction in headset using voice accelerometer signals
PCT/EP2019/073760 WO2021043412A1 (en) 2019-09-05 2019-09-05 Noise reduction in a headset by employing a voice accelerometer signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2019/073760 WO2021043412A1 (en) 2019-09-05 2019-09-05 Noise reduction in a headset by employing a voice accelerometer signal

Publications (1)

Publication Number Publication Date
WO2021043412A1 true WO2021043412A1 (en) 2021-03-11

Family

ID=67875468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2019/073760 WO2021043412A1 (en) 2019-09-05 2019-09-05 Noise reduction in a headset by employing a voice accelerometer signal

Country Status (2)

Country Link
CN (1) CN114341978A (en)
WO (1) WO2021043412A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022204697A1 (en) * 2021-03-24 2022-09-29 Bose Corporation Audio processing for wind noise reduction on wearable devices

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5590241A (en) * 1993-04-30 1996-12-31 Motorola Inc. Speech processing system and method for enhancing a speech signal in a noisy environment
US20140093093A1 (en) 2012-09-28 2014-04-03 Apple Inc. System and method of detecting a user's voice activity using an accelerometer
US20140270231A1 (en) * 2013-03-15 2014-09-18 Apple Inc. System and method of mixing accelerometer and microphone signals to improve voice quality in a mobile device
US20150179189A1 (en) * 2013-12-24 2015-06-25 Saurabh Dadu Performing automated voice operations based on sensor data reflecting sound vibration conditions and motion conditions
US20170111734A1 (en) * 2015-10-16 2017-04-20 Nxp B.V. Controller for a haptic feedback element
JP2017098963A (en) * 2016-12-02 2017-06-01 京セラ株式会社 measuring device
US20170337933A1 (en) 2016-05-21 2017-11-23 Stephen P. Forte Noise reduction methodology for wearable devices employing multitude of sensors
US20180367882A1 (en) 2017-06-16 2018-12-20 Cirrus Logic International Semiconductor Ltd. Earbud speech estimation
CN109767783A (en) * 2019-02-15 2019-05-17 深圳市汇顶科技股份有限公司 Sound enhancement method, device, equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5590241A (en) * 1993-04-30 1996-12-31 Motorola Inc. Speech processing system and method for enhancing a speech signal in a noisy environment
US20140093093A1 (en) 2012-09-28 2014-04-03 Apple Inc. System and method of detecting a user's voice activity using an accelerometer
US20140270231A1 (en) * 2013-03-15 2014-09-18 Apple Inc. System and method of mixing accelerometer and microphone signals to improve voice quality in a mobile device
US20150179189A1 (en) * 2013-12-24 2015-06-25 Saurabh Dadu Performing automated voice operations based on sensor data reflecting sound vibration conditions and motion conditions
US20170111734A1 (en) * 2015-10-16 2017-04-20 Nxp B.V. Controller for a haptic feedback element
US20170337933A1 (en) 2016-05-21 2017-11-23 Stephen P. Forte Noise reduction methodology for wearable devices employing multitude of sensors
JP2017098963A (en) * 2016-12-02 2017-06-01 京セラ株式会社 measuring device
US20180367882A1 (en) 2017-06-16 2018-12-20 Cirrus Logic International Semiconductor Ltd. Earbud speech estimation
CN109767783A (en) * 2019-02-15 2019-05-17 深圳市汇顶科技股份有限公司 Sound enhancement method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
S.F. BOLL: "Suppression of acoustic noise in speech using spectral subtraction", IEEE TRANS. ACOUST. SIGNAL PROC., vol. ASSP-27, April 1979 (1979-04-01), XP002072967, doi:10.1109/TASSP.1979.1163209
SHAHIDUR RAHMAN M ET AL: "Low-frequency band noise suppression using bone conducted speech", COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011 IEEE PACIFIC RIM CONFERENCE ON, IEEE, 23 August 2011 (2011-08-23), pages 520 - 525, XP031971208, ISBN: 978-1-4577-0252-5, DOI: 10.1109/PACRIM.2011.6032948 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022204697A1 (en) * 2021-03-24 2022-09-29 Bose Corporation Audio processing for wind noise reduction on wearable devices
US11521633B2 (en) 2021-03-24 2022-12-06 Bose Corporation Audio processing for wind noise reduction on wearable devices

Also Published As

Publication number Publication date
CN114341978A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
US10339952B2 (en) Apparatuses and systems for acoustic channel auto-balancing during multi-channel signal extraction
AU771444B2 (en) Noise reduction apparatus and method
US8010355B2 (en) Low complexity noise reduction method
EP1169883B1 (en) System and method for dual microphone signal noise reduction using spectral subtraction
US8521530B1 (en) System and method for enhancing a monaural audio signal
CA2382175C (en) Noisy acoustic signal enhancement
US6097820A (en) System and method for suppressing noise in digitally represented voice signals
US8170879B2 (en) Periodic signal enhancement system
US9076456B1 (en) System and method for providing voice equalization
US11631421B2 (en) Apparatuses and methods for enhanced speech recognition in variable environments
US20060206320A1 (en) Apparatus and method for noise reduction and speech enhancement with microphones and loudspeakers
JP4836720B2 (en) Noise suppressor
US20140301558A1 (en) Dual stage noise reduction architecture for desired signal extraction
US20140025374A1 (en) Speech enhancement to improve speech intelligibility and automatic speech recognition
JPS63500543A (en) noise suppression system
JP2002541753A (en) Signal Noise Reduction by Time Domain Spectral Subtraction Using Fixed Filter
CN102074245A (en) Dual-microphone-based speech enhancement device and speech enhancement method
EP1913591B1 (en) Enhancement of speech intelligibility in a mobile communication device by controlling the operation of a vibrator in dependance of the background noise
JP6533959B2 (en) Audio signal processing apparatus and audio signal processing method
WO2021043412A1 (en) Noise reduction in a headset by employing a voice accelerometer signal
JP2002541529A (en) Reduction of signal noise by time domain spectral subtraction
RU2589298C1 (en) Method of increasing legible and informative audio signals in the noise situation
Wolff et al. Spatial maximum a posteriori post-filtering for arbitrary beamforming
Liu et al. A targeting-and-extracting technique to enhance hearing in the presence of competing speech
Gustafsson et al. Spectral subtraction using correct convolution and a spectrum dependent exponential averaging method.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19765466

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19765466

Country of ref document: EP

Kind code of ref document: A1