EP0948237B1 - Method for noise suppression in a microphone signal - Google Patents

Method for noise suppression in a microphone signal Download PDF

Info

Publication number
EP0948237B1
EP0948237B1 EP19990106123 EP99106123A EP0948237B1 EP 0948237 B1 EP0948237 B1 EP 0948237B1 EP 19990106123 EP19990106123 EP 19990106123 EP 99106123 A EP99106123 A EP 99106123A EP 0948237 B1 EP0948237 B1 EP 0948237B1
Authority
EP
European Patent Office
Prior art keywords
signal
method
time
filtering function
frequency domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP19990106123
Other languages
German (de)
French (fr)
Other versions
EP0948237A2 (en
EP0948237A3 (en
Inventor
Hans-Jörg Thomas
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harman Becker Automotive Systems GmbH
Original Assignee
Harman Becker Automotive Systems GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to DE19814971 priority Critical
Priority to DE1998114971 priority patent/DE19814971A1/en
Application filed by Harman Becker Automotive Systems GmbH filed Critical Harman Becker Automotive Systems GmbH
Publication of EP0948237A2 publication Critical patent/EP0948237A2/en
Publication of EP0948237A3 publication Critical patent/EP0948237A3/en
Application granted granted Critical
Publication of EP0948237B1 publication Critical patent/EP0948237B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/007Protection circuits for transducers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02168Noise filtering characterised by the method used for estimating noise the estimation exclusively taking place during speech pauses

Description

  • The invention relates to a method for noise immunity of a microphone signal.
  • Such methods are becoming increasingly important, in particular for the voice input of commands and / or for hands-free telephones, with the situation in a vehicle in particular being an important application.
  • A particular situation is often given in vehicles in that a playback device such as a radio, a cassette or CD player via a loudspeaker Sound environment generates, which superimposed as an interference signal recorded by a microphone voice signal, for example, for speech recognition or telephone transmission. To detect voice input in a voice recognition device or for intelligible voice transmission over the telephone, the microphone signal must be freed as much as possible of interference signal components.
  • The interference signal emanating from a source of interference, in particular a loudspeaker, not only reaches the microphone on the shortest direct path, but also occurs via numerous reflections as a superposition of a plurality of echoes with different transit times in the microphone signal. The entire effect of the interference signal from the source of interference on the microphone signal can be described by an a priori unknown transfer function of the room, such as the passenger compartment of a motor vehicle. The transfer function changes depending on the occupation of the vehicle and the position of the individual persons. By replicating this transfer function and filtering a reference signal from the source of interference with this replica, a compensation signal can be generated which, by subtraction from the microphone signal, delivers a signal freed from the interference signal, for example a pure speech signal. In the real case, said replica represents a more or less good approximation to the unknown transfer function and the disturbance can not be completely eliminated.
  • EP 0 250 048 A1 discloses a digital block adaptable filter which is adapted in the frequency domain.
  • Object of the present invention is to provide a method for noise immunity of a microphone signal that has good characteristics hinsichltich the suppression with reasonable signal processing overhead.
  • The invention is described in claim 1. The subclaims contain advantageous embodiments and further developments of the invention.
  • It is essential to the inventive method that the compensation of the Störsignalanteils in the microphone signal is made by means of a generated from the reference signal on the replica of the transfer function compensation signal in the frequency domain, so that the microphone signal, compensation signal and output signal in the frequency range, i. in the form of spectra. Although the signal processing in this process step in the frequency domain requires a spectral transformation of the microphone signal, but takes into account that the reproduction of the transfer function in the frequency range is more advantageous and provides for a beneficial subsequent additional noise reduction of the output signal, which is typically also made in the frequency domain, already a particularly suitable Signal form ready.
  • By simple approximations when replacing a processing step with a time window, a significant reduction in processing overhead can be reduced by transitioning to a convolution in the frequency domain.
  • For long impulse responses of the transfer function or their replica, an advantageous development of the invention provides for a division of the replica filter into a plurality of sub-filters at time-shifted segments of the segmented reference signal whose coefficient update can be staggered over time, whereby the signal processing effort can be minimized.
  • It proves to be particularly advantageous to suppress the interference of a speech signal on the basis of a setting of the Replica filter that was obtained and stored in a previous speech break.
  • The division of the replica filter into a plurality of sub-filters and the noise removal on the basis of a filter setting obtained in a speech break are independent of the Störsignalkompensation in the frequency domain independently for the noise immunity of a microphone signal feasible and advantageous.
  • The invention is illustrated below with reference to preferred embodiments with reference to the figures still in detail. Showing:
  • Fig. 1
    a principle of compensation of a radio signal
    Fig. 2a
    a block diagram to Fig. 1
    Fig. 2b
    a block diagram of the filter simulation
    Fig. 3
    a detailed example Fig. 2b
    Fig. 4
    an extension to several subfilters
    Fig. 5
    a transition to compensation in the frequency domain
    Fig. 6
    a detailed example Fig. 5b
    Fig. 7
    an embodiment with several sub-filters
    Fig. 8
    an embodiment with storage of the filter settings
    Fig. 9
    Signals of a synthetic example scene
    Fig. 10
    Impulse response and transfer function too Fig. 9
    Fig. 11
    Signal of a first measurement scene
    Fig. 12
    Impulse response and transfer function too Fig. 11
    Fig. 13
    the example after Fig. 11 with storage of the filter settings
    Fig. 14
    a speech pause detection too Fig. 13
    Fig. 15
    Impulse responses and transfer functions too Fig. 11 and Fig. 13
    Fig. 16
    Transition from a time window to a convolution in the frequency domain
    Fig. 17
    a rectangle time window with line spectrum
    Fig. 18
    a Hamming time window with line spectrum
    Fig. 19
    Staggering of signal blocks during filter calculation
    Fig. 20
    Signals of a second measurement scene
    Fig. 21
    a speech pause detection too Fig. 20
    Fig. 22
    Impulse responses and transfer functions too Fig. 20 and Fig. 21
    Fig. 23
    Signals of a third measurement scene
    Fig. 24
    a speech pause detection too Fig. 23
    Fig. 25
    Impulse responses and transfer functions too Fig. 23 and Fig. 24
    Fig. 26
    Signals of a fourth measurement scene
    Fig. 27
    a speech pause detection too Fig. 26
    Fig. 28
    Impulse responses and transfer functions too Fig. 26 and Fig. 27 ,
  • Fig. 1 represents the principle of a device for (single-channel) radio signal compensation. The acoustic signal emitted by the loudspeaker reaches the microphone of the speech input system directly, but also via numerous reflections in the vehicle interior. Assuming that the transmission path G thus represents a transversal filter with a weighted sum of time-delayed echoes, a filter simulation H can be found which, in the ideal case H = G, enables a complete compensation of the radio signal.
  • The loudspeaker signal x is filtered by the a priori unknown transfer function G of the vehicle interior. The result is the noise component r, which adds to the speech signal s to the microphone signal y. To compensate for the noise component r, an estimate r ^ is generated from the loudspeaker signal x by means of the filter simulation H. The output of the circuit provides the estimate for the speech signal: s = s + r - r = s + e
    Figure imgb0001
  • The speech signal s is therefore still the error signal at the output of the circuit e = r - r
    Figure imgb0002
    superimposed, which should be kept as small as possible in practice. The speech signal may still contain disturbances in the form of, for example, engine noise or external noise, but which are not explicitly dealt with in this context.
  • H is an adaptive filter and works according to a standard method known in the literature, the LMS algorithm (Least Mean Squares). In addition to the input signal x, the error signal E is still required in order to accomplish the coefficient adaptation in the filter H. For this purpose, the output signal s ^ is fed to the determination of the filter coefficients.
  • Fig. 2a shows in another representation again the arrangement of Fig. 1 as a radio signal compensation. The adaptive system H can be realized, for example, in the time domain as an FIR filter (finite-impulse-response filter). For large impulse response lengths, as they often occur in practice, however, this requires a very high computational effort. Various advantages over a time domain solution are the realization of the LMS algorithm in the frequency domain (FLMS). Because of the blockwise processing of data in the spectral transformations realized as discrete Fourier transforms and the filter realization in the frequency domain by multiplication, this method becomes particularly computationally favorable.
  • Fig. 2b shows a block diagram of the FLMS algorithm. The associated theory is known per se and therefore not discussed in detail here. F is a spectral transformation FFT of a time signal into the frequency domain and F -1 is the inverse IFFT. The processing steps referred to as projections P1, P2 and P3 are used for the correct segmentation of the data by the block use with the FFT or IFFT and will be explained in more detail later. The operation of the filter is to multiply the reference spectrum X by the filter coefficient vector H. The spectrum of the filter output R ^ is transformed back into the time domain via F -1 . After applying the projection P2 to the real part of the compensation signal thus obtained, the signal r ^ is available. The difference of the signals s = y - r = s + r - r = s + e
    Figure imgb0003
    represents the actual output, an estimate of the speech input.
  • An essential part of the adaptive filter is the coefficient adaptation in block K, which is in the Fig. 2b through the renewal equation H = H + ΔH'
    Figure imgb0004
    is described. The projection P1, which is particularly complicated here with two spectral transformations, calculates from H 'the coefficient vector H required for the filtering. In addition to the reference spectrum X, the spectrum S 1 of the output signal evaluated with P 3 is calculated to calculate the correction vector ΔH' s + r - r
    Figure imgb0005
    needed.
  • A detailed block diagram of the in Fig. 2b shown FLMS algorithm Fig. 3 , The samples of a signal and the nodes of the FFT are commonly referred to as samples. All spectral transforms and their inverses are to be segmented as 256-point FFTs, each overlapping 128 samples. It should be noted that the output signal s ^ in the time domain consists of 128 sample blocks. It arises from the difference of the second block halves (thus in each case the samples 129 to 256) of microphone signal and filtered compensation signal r ^. The projection P1, which requires 2 FFTs and converts the vector H 'into the vector H, is elaborate. In this case, the first half (samples 1 to 128) is cut out of the complex 256-point result vector of the inverse transformation from the frequency to the time domain (IFFT) and the second half (samples 129 to 256) is set to zero. After applying this rectangular window in the time domain, the transformation into the frequency domain takes place again by means of FFT. Simple is the projection P2. It consists of the above-described fragmentation of the last 128 samples, resulting in overlapping 256-sample blocks again resulting in non-overlapping 128-sample blocks. Finally, the projection P3 is also very simple, which, in turn, provides overlapping 256-sample blocks from non-overlapping 128-sample blocks of the output signal by preprogramming 128 null values. The adaptation of the filter coefficients H ' L + 1 for a cycle L + 1 consists of the addition of a renewal vector ΔH' L to the old coefficient vector H ' L. This renewal is calculated from the product between the spectrum S ^ L of the output signal and the conjugate complex spectrum X * L of the reference signal - weighted with a spectral power normalization 2μ L , ΔH' L = 2 μ L X * L S L ,
    Figure imgb0006
    , For the purpose of this power normalization, the inverse of the smoothed reference power spectrum S xx, L multiplied by a constant 2α is to be calculated 2 μ L = 2 α / S xx . L .
    Figure imgb0007
    , for which a recursive filter of the first order with a constant β is used s xx . L = β X L 2 + 1 - β s xx . L - 1 ,
    Figure imgb0008
  • The operation of the LMS algorithm is significantly influenced by the adaptation constant α and the smoothing constant β. Latches in recursion loops are labeled Sp.
  • The previously described arrangement of the FLMS algorithm allows filter emulations with a maximum impulse response length of half an FFT length, in the example case 128 samples. If longer impulse responses are to be compensated, the already known FLMS algorithm for a sub-filter ( Fig. 4a ) to n subfilters. A 3-part filter solution with an impulse response length of 3 × 128 = 384 samples has proven itself in the case of radio signal suppression in passenger cars with a voice input system ( Fig. 4b ). The im Fig. 4a Block B with the input signals X and S ^ and the compensation spectrum R ^ as the output is denoted by the Fig. 4b to replace the extension shown. The spectrum X of the reference signal is delayed by latches D by 1 or 2 block lengths, and the instantaneous X1 and the two delayed spectra X2, X3 are separately multiplied by coefficient vectors H1, H2, H3 determined separately in an extended projection P1. The formation of the coefficient vectors is analogous to the case of only a sub-filter, wherein in K1, K2, K3 respectively the associated reference spectrum is linked to the spectrum S ^ of the output signal. The effort is considerably increased mainly by the tripling of the projection P1. Additional storage space is required to provide the spectra of the reference signal X which is older by 1 or 2 block lengths.
  • In the exemplified task of suppressing the radio signal in speech input in the car, it is advantageous not output the output data in the time but in the frequency domain, as this improved adaptation to a downstream noise suppression can be achieved. The already presented FLMS algorithm required with a sub-filter according to Fig. 5a a total of 5 FFT's with an output signal in the time domain. If an FFT is connected downstream of the output, the effort for a frequency range output signal increases to 6 FFTs. The same number of FFT results initially even with an equivalent solution after Fig. 5b , However, this variant has the following advantages:
    • In the simultaneous spectral analysis of the signals x and y, only a single 256-point FFT with little additional effort for a spectral separation is necessary. One achieves a saving of 1 FFT.
    • The newly defined projection with P4 is formally identical to the projection P1 except for the time window used. As will be seen later, P4 can be replaced by a relatively simple convolution operation in the frequency domain without the need to sacrifice any appreciable quality. One achieves a saving of 2 FFT's.
  • Fig. 6 provides a more detailed block diagram of the frequency domain output FLMS algorithm and again allows comparison with Fig. 3 (Time domain output). The filter adaptation consisting of smoothing of the spectral power, power normalization and coefficient renewal has remained unchanged. What is new are the FFT in the microphone channel, the difference formation YR ^ in the frequency domain instead of in the time domain for output formation, and finally the newly defined projection P4, which differs from the projection P1 only by the complementary time domain window.
  • As a preliminary stage of a preferred embodiment described below, consider Fig. 7 , Shown is the FLMS algorithm with 3 sub-filters (384 sample impulse response), which provides sufficient suppression of the radio signal in the microphone channel of the speech input system. The projections P1 and P4 are shown simplified. It's already out Fig. 4b known additional effort in the form of the memory D and the tripling of the projection P1 visible. In contrast to the 1-part filter solution after Fig. 6 the sum W of present and the two temporally preceding reference power spectra is given to the input of the recursive filter. The fact that the filter output is now practically 3 times the smoothed spectral power is taken into account after the inverse by multiplying by the constant 6α. After the spectral power normalization of the output spectrum S ^ modified in P4, the filter adaptation is now carried out separately for the 3 coefficient vectors of the 3 sub-filters.
  • An example Z0 for the operation of the invention according to Fig. 7 shows Fig. 9 , The input data was synthetically generated. Reference signal X represents 100,000 samples of white Gaussian noise at a sampling rate of fs = 12kHz. Microphone signal Y resulted from convolution of this noise signal with a likewise constructed 384 sample impulse response and the addition of an extremely weak speech signal. While listening to this in Fig. 9 recorded above signal y, the 10 spoken numbers are barely visible in the colored (because filtered) noise. The output signal of the estimator, which was transformed back into the time domain, is freed after a transient of about 1 second (12,000 samples) very effective the speech input from noise and provides an undistorted but slightly reverberated speech signal S ^ ( Fig. 9 below). The two parameters used were α = 0.05 and β = 0.5, values which also proved to be good in the examples presented later.
  • From the 129 samples of partial coefficient vectors H1, H2, H3 of the 3 sub-filters according to Fig. 7, the resulting 3 * 128 sample impulse response or the associated filter transfer function can be calculated at any time. So shows Fig. 10 above is the 384 sample impulse response as it appears at the very end of the scene, that is, after the digit "0" was spoken. It is a very accurate image of the impulse response that was used to convolve with white Gaussian noise and thus to synthetically generate the signal micro. The associated amount transfer function ( Fig. 10 below) in the range between the frequencies 0 and fs / 2 = 6 kHz represents a low-pass frequency response involving numerous narrowband resonance peaks.
  • White noise as the reference input signal and filtered "colored" noise as the microphone input signal are the simplest case in terms of the task of finding a replica of this filter. Since the reference signal contains all frequency components by definition, the filter adaptation succeeds fastest here. The additional additive speech input in the microphone input signal - ie the actual useful signal of the speech input system - represents a disturbance for the (F) LMS algorithm, which hinders the correct adaptation of the filter coefficients. In other words, the system is only able to correctly reproduce the room acoustics of the vehicle interior (distance between radio loudspeaker and microphone) during pauses in speech thereby causing a compensation of the radio playback. In the example demonstrated above Fig. 9 This works very well, since the microphone input consists essentially of noise and only a very small part of speech input.
  • On the other hand, from real measurements in the vehicle, the reference signal radio picked up from the radio speaker terminals and the signal micro recorded from the microphone of the voice input system came from the scene Z1. This microphone signal is in Fig. 11 shown above, consists of 100000 samples and thus has a sampling time of 12 kHz a time duration of about 8.3 seconds. It is fluent and relatively fast spoken language of a vehicle occupant seated in the rear right of the car while at the same time sounding at normal volume from the car radio speaker. After application of the anti-interference measure according to Fig. 7 and conversion into the time domain results in the Fig. 11 output signal shown below. The hearing test results in a clear elaboration of the language portion or a notable especially in the short language breaks music suppression. It is conspicuous and disadvantageous, however, that the desired radio signal suppression strongly depends on whether or not there is talk. The again at the end of the scene determined 384 sample impulse response with associated transfer function is off Fig. 12 seen. A correct impulse response can be recognized by the typical zero samples (dead time) at the beginning, which stem from the duration of the direct sound from the radio loudspeaker to the microphone. From the existing strong disturbances at the beginning and at the end of the impulse response, it can therefore be concluded that the filter adaptation at this point is extremely inadequate because of existing speech input.
  • The following with reference to Fig. 8 described embodiment is based on the following basic idea: a suitable feature is used together with a threshold as an indicator for a voice input. If the characteristic falls below the threshold, this is an indication of missing speech input. In this case, as already stated above, a largely undisturbed filter adaptation can take place. With speech input, the filter coefficient set is used, which was stored immediately before the threshold was exceeded, ie at the end of the preceding speech break. As a rule, these stored coefficients H10, H20, H30 provide significantly better radio signal compensation than the current coefficients H, H2, H3, which constantly change under the disturbing influence of the voice input.
  • Fig. 8 represents an embodiment with a further improved FLMS processing with 3 sub-filters. In addition to the already in Fig. 7 existing current filter coefficient vectors H1, H2, H3, which were needed to form the continuously adopted output signal yR, there now exists an additional output signal (y-Ro) formed using stored coefficients H10, H20, H30. The current coefficient sets H1, H2, H3 represent a useful compensation filter in the frequency domain only in the absence of speech input in the steady state, however, provide insufficient filter characteristics in voice input, because the adaptation process in the control loop is constantly disturbed. In the absence of voice input, ie high filter quality, the three switches are closed and the current coefficient sets are written into the coefficient memories M1, M2, M3: H10 = H1, H20 = H2, H30 = H3. The outputs (y-Ro) and (y-Ra) are identical. Inserting voice inputs cause the 3 switches, whereby the last located in the memories M1, M2, M3 coefficients H10, H20, H30 are no longer overwritten and remain unchanged. This state, in which the outputs (Y-Ro) and (Y-Ra) differ, is maintained until a speech break is detected again and the switches are closed.
  • As a speech pause feature fea, the smoothed sum of all absolute values of the coefficient correction vectors ΔH1 ', ΔH2', ΔH3 'has proved successful ( Fig. 8a ). This variable is equal to zero or has small numerical values if there is little or no need to change the coefficients. This is the case in speech pauses, the control loop is practically steady. Disturbances, as caused by voice input - but also by movements of the vehicle occupants - have an increased Nachregelbedarf result, which is noticeable by correspondingly large numerical values at ΔH1 ', ΔH2', ΔH3 'and thus the feature fea. A smoothing filter, for example a 1st order recursive low-pass filter with the input feat, provides at its output the smoothed speech pause feature fea which, after comparison with a threshold value th, controls the coefficients transfer switches.
  • The operation of the improved FLMS algorithm after Fig. 8 demonstrated Fig. 13 , Above is the recorded signal y of scene Z1 (cf. Fig. 11 above), below the obtained output signal. Already the visual comparison of the output signals of Fig. 13 and Fig. 11 shows the improved elaboration of the language passages. The comparative hearing test confirms this: even during voice input the music suppression is much better. The history of the speech pause feature and the constant threshold over time (scaled here in FFT blocks) Fig. 14 above. In the speech breaks detected by the threshold undershoot ( Fig. 14 below) the transfer of the coefficients to the memories takes place continuously as described, in order to be available there as stored coefficients during speech input. The already in Fig. 12 At the end of the scene, the measured 384-sample impulse response with associated magnitude transfer function is in Fig. 15 represented as current impulse response (a) or current transfer function (b). In contrast to this highly disturbed speech input from the current coefficients H1, H2, H3, an impulse response (c) and a high-quality transfer function (d) can be calculated from the stored coefficients H10, H20, H30. The impulse response from the stored coefficients has the typical zero samples at the beginning, which are caused by the transit time of the direct sound from the radio loudspeaker to the speech input microphone. From the dead time of about 40 samples to be read in the example, the distance between loudspeaker and microphone can be determined.
  • As already indicated above, the complex projection P4 (IFFT, window on the right in the time domain, FFT) can be replaced without noticeable loss of quality by a relatively simple convolution in the frequency domain, thus saving 2 FFTs. Consider this Fig. 16 , In a first step, the "right-sided" 128-sample rectangle window in the time domain ( Fig. 16a ) in the ideal projection replaced by a 128 sample Hamming window ( Fig. 16b ). Compared to the rectangular window this has the advantage of a much narrower spectrum. As Fig. 17 In the case of the rectangular window, the real part of the spectrum consists of a single line (DC component), whereas the antisymmetrical part of the imaginary spectrum consists of many slowly decreasing lines with alternating lines Zeroing exists. In contrast, the complex spectrum of the Hamming window ( Fig. 18 ) to a total of 7 lines, of which only 3 are different in the symmetric real part and only 4 values in the antisymmetric imaginary part. All further outlying parts are negligible. This special property of the Hamming window advantageously allows multiplication in the time domain ( Fig. 16b ) by folding it with the associated 7-sample spectrum in the frequency domain and thus saving one IFFT and one FFT ( Fig. 16c ).
  • In principle, of course, the projection P1 (IFFT - left-sided rectangular window - FFT) can be replaced by a corresponding convolution operation in the frequency domain with the conjugate complex 7-line spectrum. Experiments have shown, however, that savings are made at this point with a significant deterioration of the transient response. Nevertheless, cost-effective solutions can be achieved by following the LMS algorithm Fig. 8 the 3 projections P1 need not be processed simultaneously in a 256 sample input data block. The 128-sample overlapping input data blocks of length 256 are numbered beginning with "1" at random Fig. 19a outlined. For example, if the input data blocks are modulo-3, the 3 sub-filter projections are not possible in parallel ( Fig. 19b ) but sequentially in successive blocks Fig. 19 to calculate. Thus, with ideal projection P1 per block of data not 6 but only 2 FFT's are necessary. It has been shown that the compensation of the radio signal still works sufficiently, if the distances between the sub-filter projections to be calculated are chosen to be even greater. If you count the blocks eg modulo 6, you only have to calculate a projection in every second block ( FIG. 19d ). Even a reduction to a distance of four blocks between two successive P1 calculations by means of modulo-12 counting still leads to useful results ( Fig. 19e ).
  • The performance of the FLMS algorithm with 3 sub-filters according to block switching Fig. 8 and a sequential calculation of the ideal projection P1 in the time grid Fig. 19e and the projection P2 by means of convolution in the frequency domain ( Fig. 16c ) with a complex 7-line spectrum ( Fig. 18 ) is demonstrated on the basis of 3 measurement scenes.
  • The first of these scenes Z2 involves voice input of digits, with the radio loudspeaker emitting near-white noise at a relatively high volume. The corresponding 100000 sample microphone signal is in Fig. 20 above, the extracted output signal in Fig. 20 shown below. A clear noise exemption of the output signal compared to the microphone input is found by interception comparison. The time course of the speech pause feature is together with the constant threshold th Fig. 21 pictured above and derived therefrom language pauses or the associated switch positions in Fig. 21 below. Finally shows Fig. 22 in to Fig. 15 analogously, the impulse response (a) and transfer function (b) found at the end of the scene on the basis of the current coefficients and the corresponding magnitudes (c), (d) on the basis of the speech pause setting. It can be clearly seen that the current impulse response found at the end of the scene represents a disturbed result due to speech input, while the impulse response from the last speech pause has a high quality from the stored coefficient sets.
  • The first 100000 samples of a measurement scene Z3 with POP music on the radio and fluent to fast spoken language of the person sitting on the right back are in the form of the microphone signal y in Fig. 23 recorded above. After about 10000 samples (0.83 s) the radio signal is usefully suppressed ( Fig. 23 below). Even with the voice input beginning in the last third of this scene, the POP music suppression is effectively maintained, whereby the speech intelligibility here is markedly improved over the microphone signal. After a long linguistic break, there is no longer a threshold underrun because of the subsequent pause-free speech input ( Fig. 24 ). For this reason, the in Fig. 25 The impulse response recorded at the bottom of the scene based on the stored coefficients is relatively outdated in terms of time, because it was already up to date about 2.3 s (215 blocks * 10.7 ms). Again, the current impulse response ( Fig. 25 above) strong interference originating from the speech input. Like a comparison with the similar scene Z1 after FIGS. 11 to 15 shows, despite the greatly reduced computational effort, the quality of noise immunity remains high.
  • The last scene Z4 after Fig. 26 was created without voice input and is finally to demonstrate again the music suppression properties of the described FLMS algorithm. After about 18000 samples or 1.5 s is - as out Fig. 26 below - the music effectively suppressed. This property is maintained until the end of the scene with unchanged quality. Fig. 27 indicates that the speech pause size fea remains predominantly below the threshold th. The times in which the stored coefficients are used are therefore only very short. Impulse response and transfer function from current Coefficients are therefore essentially identical to the corresponding courses of speech pause coefficients.

Claims (10)

  1. Method of eliminating interference in a microphone signal due to components of a source signal which is present as a reference signal (x) and after passing through a transmission path with a priori unknown transmission function (G), is superimposed on a voice signal (s) as an interference signal (r) in the microphone signal, by adaptive simulation of the interference signal and compensation of the actual and the simulated interference signal in an output signal, wherein the microphone signal is likewise transformed into the frequency domain, the signal compensation occurs in the frequency domain and the output signal present in the frequency domain is linked with the reference signal present in the frequency domain for adaptation of the simulation, wherein for simulation of the interference signal an adaptive filtering function of a simulation filter is applied to the reference signal, characterised in that the occurrence of the voice signal in the microphone signal is detected and when a voice signal occurs the filtering function set before the occurrence of the voice signal is retained in order to form the output signal.
  2. Method as claimed in Claim 1, wherein the output signal spectrum is transformed into the time domain, the length of the time signal is doubled by placing zeros in front of it, back-transforming it into the frequency domain and is used for simulation of the transmission function.
  3. Method as claimed in Claim 1, wherein the output signal spectrum is convoluted with the spectrum of a Hamming time window and is used for simulation of the transmission function.
  4. Method as claimed in Claim 1, wherein the filtering function is predetermined by a coefficient vector with adaptively adjusted coefficients.
  5. Method as claimed in Claim 1, wherein when the voice signal is detected the adaptive readjustment of a current filtering function is continued in addition to the formation of the output signal.
  6. Method as claimed in Claim 5, wherein the occurrence of the voice signal is detected from a change in the current filtering function.
  7. Method as claimed in Claim 6, wherein the change in the current filtering function is smoothed over time for detection of the occurrence of a voice signal.
  8. Method as claimed in any one of the preceding claims, in which the filtering function is divided into several partial filtering functions for successive segments of a total pulse response from all partial filters and is applied to reference signal spectra during time segments of the segmented reference time signal which are offset in time.
  9. Method as claimed in Claim 8, wherein the adaptation of the filtering function is carried out in parallel for the partial filters.
  10. Method as claimed in Claim 9, wherein the adaptation of the filtering function for the individual partial filters is carried out sequentially in time.
EP19990106123 1998-04-03 1999-04-01 Method for noise suppression in a microphone signal Expired - Lifetime EP0948237B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
DE19814971 1998-04-03
DE1998114971 DE19814971A1 (en) 1998-04-03 1998-04-03 Procedure for the elimination of interference from a microphone signal

Publications (3)

Publication Number Publication Date
EP0948237A2 EP0948237A2 (en) 1999-10-06
EP0948237A3 EP0948237A3 (en) 2006-02-08
EP0948237B1 true EP0948237B1 (en) 2008-06-11

Family

ID=7863491

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19990106123 Expired - Lifetime EP0948237B1 (en) 1998-04-03 1999-04-01 Method for noise suppression in a microphone signal

Country Status (4)

Country Link
US (1) US6895095B1 (en)
EP (1) EP0948237B1 (en)
AT (1) AT398326T (en)
DE (2) DE19814971A1 (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19958836A1 (en) * 1999-11-29 2001-05-31 Deutsche Telekom Ag In car communication system has individual microphones and loudspeakers allows easy conversation
DE10041885A1 (en) * 2000-08-25 2002-03-07 Mueller Bbm Gmbh Speech signal transmission system e.g. for motor vehicle hands-free telephone, adjusts filter so that difference signal is minimum for given harmonic frequency
DE10052991A1 (en) * 2000-10-19 2002-05-02 Deutsche Telekom Ag Determining spatial acoustic and electroacoustic parameters, involves conducting signal conversion steps in room with sound source, electroacoustic converters in predefined arrangement
DE10221990B4 (en) * 2002-05-17 2006-10-12 Audi Ag Reduction of noise on car radios with bus connections
US20040047475A1 (en) * 2002-09-09 2004-03-11 Ford Global Technologies, Inc. Audio noise cancellation system for a sensor in an automotive vehicle
JP2005218010A (en) * 2004-02-02 2005-08-11 Matsushita Electric Ind Co Ltd Intravehicle data transmission system
US8170879B2 (en) 2004-10-26 2012-05-01 Qnx Software Systems Limited Periodic signal enhancement system
US7716046B2 (en) * 2004-10-26 2010-05-11 Qnx Software Systems (Wavemakers), Inc. Advanced periodic signal enhancement
US8306821B2 (en) 2004-10-26 2012-11-06 Qnx Software Systems Limited Sub-band periodic signal enhancement system
US7949520B2 (en) 2004-10-26 2011-05-24 QNX Software Sytems Co. Adaptive filter pitch extraction
US8543390B2 (en) 2004-10-26 2013-09-24 Qnx Software Systems Limited Multi-channel periodic signal enhancement system
US7525440B2 (en) * 2005-06-01 2009-04-28 Bose Corporation Person monitoring
EP1848243B1 (en) * 2006-04-18 2009-02-18 Harman/Becker Automotive Systems GmbH Multi-channel echo compensation system and method
AT445966T (en) * 2006-05-08 2009-10-15 Harman Becker Automotive Sys Echoverringer for time version systems
AT436151T (en) * 2006-05-10 2009-07-15 Harman Becker Automotive Sys Compensation of multi-channel choses by decorrelation
EP1879181B1 (en) * 2006-07-11 2014-05-21 Nuance Communications, Inc. Method for compensation audio signal components in a vehicle communication system and system therefor
US20080063122A1 (en) * 2006-09-07 2008-03-13 Gwo-Jia Jong Method for suppressing co-channel interference from different frequency
AT522078T (en) * 2006-12-18 2011-09-15 Harman Becker Automotive Sys Echo compensation with low complexity
US7577257B2 (en) * 2006-12-21 2009-08-18 Verizon Services Operations, Inc. Large scale quantum cryptographic key distribution network
US20080225688A1 (en) * 2007-03-14 2008-09-18 Kowalski John M Systems and methods for improving reference signals for spatially multiplexed cellular systems
EP1995940B1 (en) * 2007-05-22 2011-09-07 Harman Becker Automotive Systems GmbH Method and apparatus for processing at least two microphone signals to provide an output signal with reduced interference
EP2018034B1 (en) 2007-07-16 2011-11-02 Nuance Communications, Inc. Method and system for processing sound signals in a vehicle multimedia system
US8850154B2 (en) 2007-09-11 2014-09-30 2236008 Ontario Inc. Processing system having memory partitioning
US8209514B2 (en) 2008-02-04 2012-06-26 Qnx Software Systems Limited Media processing system having resource partitioning
US8904400B2 (en) 2007-09-11 2014-12-02 2236008 Ontario Inc. Processing system having a partitioning component for resource partitioning
US8694310B2 (en) 2007-09-17 2014-04-08 Qnx Software Systems Limited Remote control server protocol system
EP2222091B1 (en) * 2009-02-23 2013-04-24 Nuance Communications, Inc. Method for determining a set of filter coefficients for an acoustic echo compensation means
JP6062861B2 (en) * 2011-10-07 2017-01-18 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America Encoding apparatus and encoding method
DE102014214143B4 (en) * 2014-03-14 2015-12-31 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for processing a signal in the frequency domain
US20180166073A1 (en) * 2016-12-13 2018-06-14 Ford Global Technologies, Llc Speech Recognition Without Interrupting The Playback Audio
DE102017101497A1 (en) 2017-01-26 2018-07-26 Infineon Technologies Ag Micro Electro Mechanical System (MEMS) circuit and method for reconstructing a disturbance

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL8601604A (en) * 1986-06-20 1988-01-18 Philips Nv FREQUENCY DOMAIN BLOCK-adaptive digital filter.
JP2748626B2 (en) * 1989-12-29 1998-05-13 日産自動車株式会社 Active noise control device
US5649012A (en) * 1995-09-15 1997-07-15 Hughes Electronics Method for synthesizing an echo path in an echo canceller
US5937060A (en) * 1996-02-09 1999-08-10 Texas Instruments Incorporated Residual echo suppression
JP3654470B2 (en) * 1996-09-13 2005-06-02 日本電信電話株式会社 Echo canceling method for subband multi-channel audio communication conference

Also Published As

Publication number Publication date
DE19814971A1 (en) 1999-10-07
EP0948237A3 (en) 2006-02-08
DE59914782D1 (en) 2008-07-24
AT398326T (en) 2008-07-15
US6895095B1 (en) 2005-05-17
EP0948237A2 (en) 1999-10-06

Similar Documents

Publication Publication Date Title
Araki et al. Exploring multi-channel features for denoising-autoencoder-based speech enhancement
CA2713127C (en) Apparatus and method for computing control information for an echo suppression filter and apparatus and method for computing a delay value
DE60316704T2 (en) Multi-channel language recognition in unusual environments
KR100382024B1 (en) Device and method for processing speech
KR100594563B1 (en) Signal noise reduction by spectral subtraction using linear convolution and causal filtering
AU740951B2 (en) Method for Noise Reduction, Particularly in Hearing Aids
Gustafsson et al. A psychoacoustic approach to combined acoustic echo cancellation and noise reduction
EP1208689B1 (en) Acoustical echo cancellation device
KR100860805B1 (en) Voice enhancement system
US8355511B2 (en) System and method for envelope-based acoustic echo cancellation
EP1779644B1 (en) Systems and methods for echo cancellation and noise reduction
US8229106B2 (en) Apparatus and methods for enhancement of speech
CA2566751C (en) Noise reduction for automatic speech recognition
EP1046273B1 (en) Methods and apparatus for providing comfort noise in communications systems
US7062040B2 (en) Suppression of echo signals and the like
EP1143416B1 (en) Time domain noise reduction
DE69531710T2 (en) Method and device for reducing noise in speech signals
Allen et al. Multimicrophone signal‐processing technique to remove room reverberation from speech signals
JP5450567B2 (en) Method and system for clear signal acquisition
EP0682801B1 (en) A noise reduction system and device, and a mobile radio station
EP1803288B1 (en) Echo cancellation
JP2685031B2 (en) Noise cancellation method and noise cancellation device
US5553014A (en) Adaptive finite impulse response filtering method and apparatus
EP1806739B1 (en) Noise suppressor
La Bouquin-Jeannes et al. Enhancement of speech degraded by coherent and incoherent noise using a cross-spectral estimator

Legal Events

Date Code Title Description
AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent to:

Free format text: AL;LT;LV;MK;RO;SI

RAP1 Rights of an application transferred

Owner name: DAIMLERCHRYSLER AG

RAP1 Rights of an application transferred

Owner name: HARMAN BECKER AUTOMOTIVE SYSTEMS (BECKER DIVISION)

RAP1 Rights of an application transferred

Owner name: HARMAN BECKER AUTOMOTIVE SYSTEMS GMBH

AX Request for extension of the european patent to:

Extension state: AL LT LV MK RO SI

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

17P Request for examination filed

Effective date: 20060807

AKX Designation fees paid

Designated state(s): AT CH DE ES FR GB IT LI NL SE

17Q First examination report despatched

Effective date: 20070803

RIC1 Information provided on ipc code assigned before grant

Ipc: H04R 3/00 20060101ALI20070927BHEP

Ipc: G10L 21/02 20060101AFI20070927BHEP

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT CH DE ES FR GB IT LI NL SE

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REF Corresponds to:

Ref document number: 59914782

Country of ref document: DE

Date of ref document: 20080724

Kind code of ref document: P

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20080611

NLV1 Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20080911

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20080922

26N No opposition filed

Effective date: 20090312

PGFP Annual fee paid to national office [announced from national office to epo]

Ref country code: AT

Payment date: 20090401

Year of fee payment: 11

PGFP Annual fee paid to national office [announced from national office to epo]

Ref country code: CH

Payment date: 20090427

Year of fee payment: 11

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20100401

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20100430

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20100430

REG Reference to a national code

Ref country code: DE

Ref legal event code: R082

Ref document number: 59914782

Country of ref document: DE

Representative=s name: GRUENECKER, KINKELDEY, STOCKMAIR & SCHWANHAEUS, DE

REG Reference to a national code

Ref country code: DE

Ref legal event code: R082

Ref document number: 59914782

Country of ref document: DE

Representative=s name: GRUENECKER, KINKELDEY, STOCKMAIR & SCHWANHAEUS, DE

Effective date: 20120411

Ref country code: DE

Ref legal event code: R081

Ref document number: 59914782

Country of ref document: DE

Owner name: NUANCE COMMUNICATIONS, INC. (N.D.GES.D. STAATE, US

Free format text: FORMER OWNER: HARMAN BECKER AUTOMOTIVE SYSTEMS GMBH, 76307 KARLSBAD, DE

Effective date: 20120411

Ref country code: DE

Ref legal event code: R082

Ref document number: 59914782

Country of ref document: DE

Representative=s name: GRUENECKER PATENT- UND RECHTSANWAELTE PARTG MB, DE

Effective date: 20120411

REG Reference to a national code

Ref country code: FR

Ref legal event code: TP

Owner name: NUANCE COMMUNICATIONS, INC., US

Effective date: 20120924

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 18

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 19

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 20

PGFP Annual fee paid to national office [announced from national office to epo]

Ref country code: IT

Payment date: 20180420

Year of fee payment: 20

Ref country code: FR

Payment date: 20180426

Year of fee payment: 20

PGFP Annual fee paid to national office [announced from national office to epo]

Ref country code: GB

Payment date: 20180427

Year of fee payment: 20

Ref country code: DE

Payment date: 20180629

Year of fee payment: 20

REG Reference to a national code

Ref country code: DE

Ref legal event code: R071

Ref document number: 59914782

Country of ref document: DE

REG Reference to a national code

Ref country code: GB

Ref legal event code: PE20

Expiry date: 20190331

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20190331