EP2362390B1 - Noise suppression - Google Patents

Noise suppression Download PDF

Info

Publication number
EP2362390B1
EP2362390B1 EP10250748.0A EP10250748A EP2362390B1 EP 2362390 B1 EP2362390 B1 EP 2362390B1 EP 10250748 A EP10250748 A EP 10250748A EP 2362390 B1 EP2362390 B1 EP 2362390B1
Authority
EP
European Patent Office
Prior art keywords
noise
audio frame
threshold value
horn
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP10250748.0A
Other languages
German (de)
French (fr)
Other versions
EP2362390A1 (en
Inventor
Murali Mohan Deshpande
Sudeendra Maddur Gundurao
Rob Goyens
Wouter Joos Tirry
Jeremy Thomas Davies
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NXP BV
Original Assignee
NXP BV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NXP BV filed Critical NXP BV
Publication of EP2362390A1 publication Critical patent/EP2362390A1/en
Application granted granted Critical
Publication of EP2362390B1 publication Critical patent/EP2362390B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain

Definitions

  • the invention relates to detection and suppression or removal of noise in audio signals, with particular relevance for radio communication devices such as hand-portable radiotelephones.
  • Most single-channel noise suppression systems are based on modification of the spectral amplitudes of an audio signal by means of a gain function.
  • the calculation of a gain vector for implementing the gain function is carried out based on noise component estimation.
  • a low SNR signal to noise ratio
  • a conservative noise estimation process can be adopted, in which only stationary and slowly-varying non-stationary components are tracked by the noise estimator.
  • This method can also be used independently of a speech detector.
  • a faster noise component estimator to track non-stationary components would need to make use of a speech detector or even a dedicated speech model.
  • Multi-channel noise suppression systems can suppress both stationary and non-stationary noise. Most multi-channel noise suppressors rely on desired-speech detectors for calculating various parameters to obtain a noise reference. The gain vector is then evaluated based on the estimated noise.
  • Speech detectors or speech models are never 100% reliable, leading to speech degradation caused by imperfect gain calculations resulting from imperfect detector decisions. This problem is particularly prevalent under low SNR conditions.
  • the processing delay must be kept to a minimum in communication systems, only a short time window is available for deciding and estimating the noise. A time delay of longer than a few tens of milliseconds can have a noticeable impact on telephone conversations.
  • An example of a single channel noise suppression system 10 is illustrated in figure 1 .
  • An input audio input signal z ( n ) consists of the sum of a desired speech signal s(n) and a noise signal sn(n).
  • the audio input signal is sectioned into overlapping blocks and windowed.
  • the windowed signal is transformed from the time domain into the frequency domain using an FFT (Fast Fourier Transform). These steps are represented by the windowing and FFT block 11 in figure 1 .
  • the magnitude spectrum obtained is then modified by a correction block 12 that applies a gain function in order to obtain an estimate of the speech signal, s(n) , which is then output by the system after inverse FFT and desectioning steps 12,13.
  • the phase of the signal is left unchanged.
  • the correction to the amplitude spectrum is obtained using a gain function that is determined for each frame and for each frequency bin in the frame.
  • the index i corresponds to the frequency bin number and k corresponds to the frame number.
  • This modified amplitude spectrum is fed, together with the unmodified phase, to an IFFT (Inverse Fast Fourier Transform) block 13 for obtaining, after desectioning 14, the output signal ⁇ ( n ) in which the noise sn(n) present in the input signal z(n) has been suppressed.
  • IFFT Inverse Fast Fourier Transform
  • US 2006/0200344 discloses a method of reducing noise in an audio signal, in which a furrow filter is used to select spectral components that are narrow in frequency but relatively broad in time and a bar filter is used to select spectral components that are broad in frequency but relatively narrow in time, the relative energy distributions of the filters analysed to determine the optimal proportion of spectral components for an output signal.
  • US 2007/0192102 discloses a method of adaptively aligning windows to extract features according to the types and characteristics of voice signals, in which window lengths based on the window update points in a corresponding order are determined by employing the concept of a higher order peak and windows aligned according to window lengths.
  • the method optionally comprises, for each of the sampled audio frames:
  • the gain function may comprise a filter that is applied in the time domain, the filter being a notch filter having one or more notches at frequencies corresponding to the one or more detected spectral peaks.
  • the step of comparing a measure of high frequency content in the audio frame to a threshold value may comprise computing a sum of differences between consecutive samples, comparing the sum of differences to a second threshold value, and determining noise is present if the sum of differences exceeds the second threshold value
  • the step of comparing a measure of high frequency content in the audio frame to a threshold value may comprise computing a measure of energy in the audio frame, comparing this measure of energy to a third threshold value, and determining noise is present if the measure of energy exceeds the third threshold value.
  • the step of comparing a measure of high frequency content in the audio frame to a threshold value may comprise:
  • one or more of the above threshold values may be used in determining whether noise is present in each audio frame and, in a particular preferred embodiment, all three thresholds are used to determine whether noise is present.
  • Detecting a noise pattern in the sampled audio frame may be done by comparing frequencies of the spectrum of the audio frame with an average spectrum of the audio frame, a spectral peak being detected if a magnitude of a frequency exceeds the average spectrum by a preset factor.
  • the high frequency region of the audio signal spectrum is preferably a region over 2kHz. In preferred embodiments, this high frequency region will extend between 2kHz and half the frequency at which the audio signal is sampled.
  • the gain function may comprise a first gain function configured to emphasise a speech signal in the audio frame and a second gain function configured to suppress noise detected in the audio frame.
  • the first gain function may be derived from a conventional speech detection process.
  • the audio signal in the method will typically comprise a speech signal and a noise signal, and the invention is particularly suited for when the noise signal is a vehicle horn noise.
  • the noise signal will generally be periodic and will have a harmonic structure, or in other words will comprise a fundamental frequency component and one or more harmonic components at other frequencies.
  • Embodiments according to the invention may be incorporated into a hand-portable radio communications device such as a radiotelephone that comprises a noise suppression module configured to perform the method of the invention.
  • the invention may also be embodied in a computer program for causing a computer to perform the method, which may be provided on a data carrier such as a memory chip, a computer-readable disc or other type of storage medium.
  • the invention is based on using a noise signal detector and a filtering mechanism to suppress horn-like noise signals.
  • the invention can be used together with single-channel or multi-channel noise suppression systems or as a standalone system for suppressing noise in the form of horn-like signals, thereby enhancing audio intelligibility and quality.
  • Advantages of the invention relate to the detection of horn-like noise patterns instead of detection of speech.
  • the detection of horn-like noise can be done more accurately than speech for low SNR situations, thereby making use of the invention more appropriate when an input audio signal is strongly affected by high energy high frequency non-stationary noise such as horn noises.
  • the detection of noise according to the invention operates on individual audio frames, and therefore operates effectively instantaneously.
  • This type of detection can be used to steer, or modify, a noise suppression system that incorporates other noise suppression methods or used as a standalone noise removal system to specifically remove horn-like signals when detected.
  • the noise estimation part of an existing system could in practice be modified to adapt aggressively during presence of a horn signal.
  • a generic solution would require a very reliable speech detector to avoid the problem of the noise component estimator being significantly biased by speech.
  • Various methods have been tried in this direction but without success.
  • the invention provides a detector specifically directed to horn-like signals and uses this for suppression or removal of noise by spectral modification. The invention therefore offers a simpler solution to the problem of dealing with a particular type of noise that is likely to occur in practice.
  • the horn detection and suppression system can be a standalone system or it could form part of a larger noise detection and suppression system.
  • a basic block diagram of a standalone horn removal system 20 is shown in figure 2 .
  • a horn detection decision is made by a horn detector system 21, and a horn removal system 22 operates to suppress or remove the detected horn signal by applying a spectral gain function to the signal.
  • the horn detector system 21 provides the input signal z to the horn removal system 22, together with an indication, provided in this case by a single bit horn detection flag, of whether horn noise has been detected in the frame in question.
  • a basic block diagram of a horn suppressor system provided as part of a larger noise detection and suppression system 30 is shown in figure 3 .
  • a noise suppression system 31 receives an input from a horn detector 21, which detects horn sounds on the input signal z.
  • the noise suppression system 31 comprises a gain modification module 32 that is configured to compute a new gain for suppressing horn noise patterns whenever such horn sounds are detected. If no horn noise is detected, the gain modification module 31 suppresses noise in a conventional way, for example by the use of speech detection.
  • Speech signals generally have the following characteristics:
  • Figures 4 to 7 illustrate the main characteristics of typical horn-like signals.
  • the audio frame comprises 80 samples, extending over a sample window of 10ms, i.e. corresponding to a sampling frequency of 8kHz, resulting in a maximum sampled frequency range of 0-4kHz.
  • high frequency variations are visible as large variations in the values of alternate or consecutive samples.
  • the principle component of a horn noise will be augmented by other frequency components, though in some cases, as in the motorcycle horn noise in figures 7a and 7b , the principle frequency component at around 3200Hz dominates. In other cases such as the truck horn noise of figures 6a and 6b , multiple frequency component of roughly equal magnitude are present in addition to the principle component at 3200Hz.
  • Short duration audio frames such as these tend to have a poor frequency resolution when represented in the frequency domain. Detection of horn noise based on time domain analysis methods has however been found to be advantageous on frames of duration as short as 10ms.
  • Horn-like signals are highly varying and have harmonic spectra, i.e. generally comprise a fundamental frequency component together with harmonics at related frequencies. This characteristic can be used to detect such signals by determining the number of zero-crossing variations present in each frame.
  • the term 'zero crossings' refers to samples that fall either side of a zero line 41 ( figure 4a ). For a sampling frequency of 8kHz, the highest number of zero crossings will occur when sampling a 4kHz sine wave signal, where each sample alternates between a positive side of the signal and a negative side.
  • the FOCSV parameter is determined to be 1 if both a previous pair and a current pair of samples involve a change in sign, and is zero otherwise.
  • the FOASV parameter is determined to be 1 if two pairs of samples separated from each other by an intermediate sample involve a change in sign, and is zero otherwise.
  • a frame can thereby be classified based on TotalHFVariations as being a horn or a non-horn frame. In practice, TotalHFVariations has been observed to be higher for frames having horn-like signals.
  • a threshold (Threshold HFV ) was determined experimentally considering a range of various signals. The following relationships can therefore be used to determine the presence of horn-like signals in each frame, based on this parameter:
  • Horn-like signals also exhibit large amplitude differences between consecutive samples, which correspond to the signals having a high energy.
  • the energy of the difference signal will therefore be comparatively higher for horn-like signals when compared to non-horn like signals.
  • a term representing this energy can be based on a First Order Consecutive
  • the FOCSD energy parameter for a frame is determined from a sum of the squares of the differences between consecutive samples.
  • Threshold Energy' was determined by analyzing the variations in FOCSDEnergy in relation to the actual signal energy for various signals.
  • Horn signals are generally non-stationary, occurring for only short durations (typically less than 4 seconds and often less than 1 second). Considering frame processing using blocks of 10ms each, horn signals can therefore span up to 400 frames. Horn signals have a high energy content throughout their duration. This property can be used to discriminate horn signals from unvoiced speech signals that may also have significant high-frequency content. The following relations can be used to classify frames with horn and non-horn content:
  • Threshold AvEnergy can be determined by analyzing the variations of InstantaneousBlockEnergy in relation to an average signal energy for various un-voiced signals and horn-like signals.
  • FIG. 8 A flow diagram illustrating the process of determining whether noise is present in an audio frame is shown in figure 8 .
  • the process repeats (i.e. between points marked 'A') by operating on consecutive frames until there are no more frames to analyse, when the process ends (step 812).
  • a time domain narrowband audio signal is sampled at 8kHz, resulting in successive blocks each consisting of N samples.
  • steps 802 three different tests are carried out.
  • a first test involves computing the TotalHFVariations parameter, as detailed above (step 803), and comparing this parameter with a first threshold value Threshold HFv (step 804). If the threshold value is not exceeded, the horn detection flag for that block is set to false (step 805), and the process continues to the next block (step 806).
  • a second test involves computing the FOCDEnergy parameter, as detailed above (step 807), and comparing this parameter with a second threshold value Threshold Energy (step 808). If the threshold value is not exceeded, the horn detection flag for that block is set to false (step 805), and the process continues to the next block (step 806).
  • a third test involves computing the InstantaneousSignalEnergy parameter, as detailed above (step 809), and comparing this parameter with a third threshold value Threshold AVEnergy (step 810). If the threshold value is not exceeded, the horn detection flag for that block is set to false (step 805), and the process continues to the next block (step 806).
  • each frame is subjected to a noise suppression process.
  • the noise suppression process may i) leave the frame unchanged, ii) implement a conventional noise suppression process with no horn noise suppression, iii) implement horn noise suppression alone, or iv) implement both conventional noise suppression and horn noise suppression.
  • FIG. 9 An exemplary noise detection and suppression system is illustrated in figure 9 , in which a horn noise detection and suppression system is incorporated with a conventional speech-based noise suppression system.
  • the characteristics of detected horn signals are incorporated into a modified gain vector to produce a modified magnitude spectrum that is adjusted to emphasise detected speech and to suppress any detected horn noises.
  • a first step 901 an input signal z(n) is transformed into windowed FFT frames of size N, the value for N being chosen such that the signal can be considered to be stationary within each frame.
  • a time domain input audio signal frame of N samples is thereby transformed into a frequency domain frame of N/2+1 samples.
  • the magnitude spectrum zampl(i) (step 902) is then used in the computation of a gain vector, and the phase part of the frame is neglected.
  • step 903 assuming horn noise has been identified in a preceding time domain test (as described above), spectral peaks present in the magnitude spectrum zampl(i) are identified, which are taken to represent the horn signal present. This results in one or more indices of spectral peaks from the magnitude spectral bin values, which are used in a secondary gain computation step (step 907).
  • the spectral bin indices satisfying the above relationship are identified as spectral peak bin numbers.
  • the spectral peak indices identified are stored and used later for the gain computation by modifying gain vector when a horn sound is detected.
  • the noise floor used by noise suppression system for each frame is calculated by the Noise Floor Update block 903.
  • the Noise Floor Estimate (NFE) is calculated by searching minima of the spectral bins over multiple frames and a noise floor used for each frame, i.e. Current Noise Floor (CNF) is updated in this block.
  • the outputs of this block 904 are CNF(i) and NFE(i). CNF is used in the subsequent gain calculation steps.
  • An output from a speech detection block 905 is used by the Noise Floor Update block 904 for calculating NFE and CNF.
  • a secondary gain computation block 907 is used to modify the gain computed in equation 4.
  • a secondary gain vector is computed based on the previously defined horn noise detection information.
  • the resulting secondary gain vector Gain sec( i,k) is of size (N/2+1) and all the values in this vector are initialized to 1. This initialized value ensures that the gain computed by the gain computation block 906 is used when no horn-like signals have been detected in the present frame.
  • the secondary gain computation block 907 block takes the horn detection flag and bin numbers calculated by the spectral peak detection block 903.
  • the secondary gain vector is used for modifying the gain calculated by the gain computation block 906, represented by combining block 908. All the elements of this vector are initialized to 1 in every frame before modification.
  • This new gain vector is applied to the real FFT data (in block 909), resulting in a modified magnitude spectrum (block 910).
  • the modified spectrum 910 is passed through an inverse FFT block 912, resulting in a noise-suppressed signal.
  • An equivalent operation is possible in the time domain, for example by applying a notch filter where the desired frequency response of the filter corresponds to the gain vector, i.e. where one or more notches in the filter correspond to the one or more spectral peaks that represent the noise that is to be suppressed.
  • FIG 10 Illustrated in figure 10 is a block diagram of a noise detection and suppression system 1000 in which a standalone horn suppression system is used, the main difference between this system and the system 900 in figure 9 being that a speech detection part of the system is not used.
  • an input signal z(n) is windowed and transformed to the frequency domain (block 1001), resulting in a magnitude spectrum (block 1002).
  • a gain computation block 1004 takes in the spectral bin numbers from a spectral peak detection block 1003, the spectral bin numbers corresponding to spectral peaks identified in the magnitude spectrum. All the elements of the gain vector are initialized to 1 every frame before modification.
  • This gain vector is then applied to the real FFT data (in combining block 1005), resulting in a modified magnitude spectrum (block 1006), to which the phase part 1007 is applied.
  • the modified spectrum is passed through an inverse FFT block 1008 and a noise-suppressed signal is output.
  • Apps of embodiments of the invention described herein include speech enhancement devices used in communication/recording, audio enhancement during capture, editing and playback, and in audio scene analysis and steering of other processes such as noise adaptive audio or ringtone playback.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Noise Elimination (AREA)

Description

  • The invention relates to detection and suppression or removal of noise in audio signals, with particular relevance for radio communication devices such as hand-portable radiotelephones.
  • Communication systems such as mobile phones are often used outdoors, which places strong requirements on the performance of any noise suppression systems present. One of the most dominant noise types is street noise, and in particular that caused by motorized traffic. Traffic noise can generally be classified into various types, but in many countries the sound of horns being used by cars and other vehicles such as autorickshaws strongly dominates the sound scene. Horn sounds in particular are perceived as annoying because they tend to be loud, and can adversely affect the quality and intelligibility of a conversation or can even inhibit communication altogether using mobile phones.
  • Current noise suppression algorithms can conveniently be divided into two categories, being single channel noise suppression systems and multi-channel noise suppression systems.
  • Most single-channel noise suppression systems are based on modification of the spectral amplitudes of an audio signal by means of a gain function. The calculation of a gain vector for implementing the gain function is carried out based on noise component estimation. A low SNR (signal to noise ratio) will significantly degrade the performance of such a noise estimator. In order to avoid speech degradation during low SNR situations, a conservative noise estimation process can be adopted, in which only stationary and slowly-varying non-stationary components are tracked by the noise estimator. This method can also be used independently of a speech detector. A faster noise component estimator to track non-stationary components would need to make use of a speech detector or even a dedicated speech model.
  • Multi-channel noise suppression systems can suppress both stationary and non-stationary noise. Most multi-channel noise suppressors rely on desired-speech detectors for calculating various parameters to obtain a noise reference. The gain vector is then evaluated based on the estimated noise.
  • Speech detectors or speech models are never 100% reliable, leading to speech degradation caused by imperfect gain calculations resulting from imperfect detector decisions. This problem is particularly prevalent under low SNR conditions. In addition, because the processing delay must be kept to a minimum in communication systems, only a short time window is available for deciding and estimating the noise. A time delay of longer than a few tens of milliseconds can have a noticeable impact on telephone conversations.
  • An example of a single channel noise suppression system 10 is illustrated in figure 1. An input audio input signal z(n) consists of the sum of a desired speech signal s(n) and a noise signal sn(n). The audio input signal is sectioned into overlapping blocks and windowed. The windowed signal is transformed from the time domain into the frequency domain using an FFT (Fast Fourier Transform). These steps are represented by the windowing and FFT block 11 in figure 1. The magnitude spectrum obtained is then modified by a correction block 12 that applies a gain function in order to obtain an estimate of the speech signal, s(n) , which is then output by the system after inverse FFT and desectioning steps 12,13. The phase of the signal is left unchanged. The correction to the amplitude spectrum is obtained using a gain function that is determined for each frame and for each frequency bin in the frame.
  • Various existing methods are known for calculating the gain function based on different error criteria, for example as disclosed by R. Martin in "Spectral Subtraction Based on Minimum Statistics", Signal Processing VII: Theories and Application, pp. 1182-1185, EUSIPCO, 1994.
  • The gain vector obtained is used for modifying the real FFT of the audio signal frames according to the following relationship: s ^ i k = zampl i k Gain i k
    Figure imgb0001
    where zampl(i,k) corresponds to the magnitude spectra of the input audio signal, Gain(i,k) is the gain factor and ŝ(i,k) is the modified magnitude spectrum of input audio signal. The index i corresponds to the frequency bin number and k corresponds to the frame number.
  • This modified amplitude spectrum is fed, together with the unmodified phase, to an IFFT (Inverse Fast Fourier Transform) block 13 for obtaining, after desectioning 14, the output signal (n) in which the noise sn(n) present in the input signal z(n) has been suppressed.
  • There are certain inherent limitations with existing noise-suppression algorithms, one or which relates to suppressing non-stationary noise such as horn sounds, as this requires steering the noise estimator to catch up with such noise using a speech detector. This approach has limitations because horn signals tend to have high energy and highly harmonic spectra that are normally detected incorrectly as speech, and under low SNR conditions the presence of such horn signals can cause speech detector performance to deteriorate. Furthermore, horn signals tend to occur only for very short durations (typically less than 1 second), so a noise estimator without a speech detector cannot normally be used effectively.
  • It is an object of the invention to address one or more of the above mentioned problems.
  • US 2006/0200344 discloses a method of reducing noise in an audio signal, in which a furrow filter is used to select spectral components that are narrow in frequency but relatively broad in time and a bar filter is used to select spectral components that are broad in frequency but relatively narrow in time, the relative energy distributions of the filters analysed to determine the optimal proportion of spectral components for an output signal.
  • US 2007/0192102 discloses a method of adaptively aligning windows to extract features according to the types and characteristics of voice signals, in which window lengths based on the window update points in a corresponding order are determined by employing the concept of a higher order peak and windows aligned according to window lengths.
  • M. Szczerba & A. Czyzewski, in "Pitch Detection Enhancement Employing Music Prediction", Journal of Intelligent Information Systems, 24:2/3, pp 223-251, 2005, disclose pitch detection methods widely used for extracting musical data from digital signals.
  • According to the invention there is provided a method of suppressing noise in an audio signal, as defined by the appended claims.
  • The method optionally comprises, for each of the sampled audio frames:
    • transforming the audio frame from the time domain into the frequency domain; and
    • converting the resulting noise-suppressed audio frame back to the time domain,
    • wherein the gain function comprises a gain vector applied in the frequency domain.
  • In alternative embodiments, the gain function may comprise a filter that is applied in the time domain, the filter being a notch filter having one or more notches at frequencies corresponding to the one or more detected spectral peaks.
  • The step of comparing a measure of high frequency content in the audio frame to a threshold value may comprise computing a sum of differences between consecutive samples, comparing the sum of differences to a second threshold value, and determining noise is present if the sum of differences exceeds the second threshold value
  • The step of comparing a measure of high frequency content in the audio frame to a threshold value may comprise computing a measure of energy in the audio frame, comparing this measure of energy to a third threshold value, and determining noise is present if the measure of energy exceeds the third threshold value.
  • The step of comparing a measure of high frequency content in the audio frame to a threshold value may comprise:
    • computing a sum of differences between consecutive samples;
    • comparing the sum of differences to a second threshold value;
    • computing a measure of energy in the audio frame;
    • comparing the measure of energy to a third threshold value; and
    • determining noise is present if the sum of the first and second numbers exceeds the first threshold value, the sum of differences exceeds the second threshold value and the measure of energy exceeds the third threshold value.
  • Generally, therefore, one or more of the above threshold values may be used in determining whether noise is present in each audio frame and, in a particular preferred embodiment, all three thresholds are used to determine whether noise is present.
  • Detecting a noise pattern in the sampled audio frame may be done by comparing frequencies of the spectrum of the audio frame with an average spectrum of the audio frame, a spectral peak being detected if a magnitude of a frequency exceeds the average spectrum by a preset factor.
  • The high frequency region of the audio signal spectrum is preferably a region over 2kHz. In preferred embodiments, this high frequency region will extend between 2kHz and half the frequency at which the audio signal is sampled.
  • The gain function may comprise a first gain function configured to emphasise a speech signal in the audio frame and a second gain function configured to suppress noise detected in the audio frame. The first gain function may be derived from a conventional speech detection process.
  • The audio signal in the method will typically comprise a speech signal and a noise signal, and the invention is particularly suited for when the noise signal is a vehicle horn noise.
  • The noise signal will generally be periodic and will have a harmonic structure, or in other words will comprise a fundamental frequency component and one or more harmonic components at other frequencies.
  • Embodiments according to the invention may be incorporated into a hand-portable radio communications device such as a radiotelephone that comprises a noise suppression module configured to perform the method of the invention.
  • The invention may also be embodied in a computer program for causing a computer to perform the method, which may be provided on a data carrier such as a memory chip, a computer-readable disc or other type of storage medium.
  • In a general aspect, the invention is based on using a noise signal detector and a filtering mechanism to suppress horn-like noise signals. The invention can be used together with single-channel or multi-channel noise suppression systems or as a standalone system for suppressing noise in the form of horn-like signals, thereby enhancing audio intelligibility and quality.
  • Advantages of the invention relate to the detection of horn-like noise patterns instead of detection of speech. The detection of horn-like noise can be done more accurately than speech for low SNR situations, thereby making use of the invention more appropriate when an input audio signal is strongly affected by high energy high frequency non-stationary noise such as horn noises.
  • The detection of noise according to the invention operates on individual audio frames, and therefore operates effectively instantaneously. This type of detection can be used to steer, or modify, a noise suppression system that incorporates other noise suppression methods or used as a standalone noise removal system to specifically remove horn-like signals when detected.
  • The noise estimation part of an existing system could in practice be modified to adapt aggressively during presence of a horn signal. However, a generic solution would require a very reliable speech detector to avoid the problem of the noise component estimator being significantly biased by speech. Various methods have been tried in this direction but without success. Instead of trying to implement a robust speech detector in a noise suppression system that is also capable of handling horn-like signals, the invention provides a detector specifically directed to horn-like signals and uses this for suppression or removal of noise by spectral modification. The invention therefore offers a simpler solution to the problem of dealing with a particular type of noise that is likely to occur in practice.
  • Embodiments of the present invention will now be described by way of example and with reference to the accompanying drawings in which:
    • figure 1 is a schematic block diagram of a noise suppression system;
    • figure 2 is a schematic block diagram of standalone horn noise suppression system;
    • figure 3 is a schematic block diagram of a horn noise suppression system as part of a noise detection and suppression system;
    • figures 4a and 4b are time and frequency domain representations of a single sampled audio frame of a rickshaw horn recording;
    • figures 5a and 5b are time and frequency domain representations of a single sampled audio frame of a car horn recording;
    • figures 6a and 6b are time and frequency domain representations of a single sampled audio frame of a truck horn recording;
    • figures 7a and 7b are time and frequency domain representations of a single sampled audio frame of a motorcycle horn recording;
    • figure 8 is a flow diagram illustrating operation of an exemplary embodiment of a time domain horn noise detector;
    • figure 9 is a block diagram illustrating operation of a horn noise suppression system as part of a noise detection and suppression system; and
    • figure 10 is a block diagram illustrating operation of a standalone horn noise suppression system.
  • Exemplary embodiments according to the invention comprise in general the following two steps:
    1. 1. Detection of a horn signal; and
    2. 2. Suppression or removal of the detected horn-like signal.
  • The horn detection and suppression system can be a standalone system or it could form part of a larger noise detection and suppression system. A basic block diagram of a standalone horn removal system 20 is shown in figure 2. A horn detection decision is made by a horn detector system 21, and a horn removal system 22 operates to suppress or remove the detected horn signal by applying a spectral gain function to the signal. The horn detector system 21 provides the input signal z to the horn removal system 22, together with an indication, provided in this case by a single bit horn detection flag, of whether horn noise has been detected in the frame in question.
  • A basic block diagram of a horn suppressor system provided as part of a larger noise detection and suppression system 30 is shown in figure 3. A noise suppression system 31 receives an input from a horn detector 21, which detects horn sounds on the input signal z. The noise suppression system 31 comprises a gain modification module 32 that is configured to compute a new gain for suppressing horn noise patterns whenever such horn sounds are detected. If no horn noise is detected, the gain modification module 31 suppresses noise in a conventional way, for example by the use of speech detection.
  • When designing a horn noise detector, it is necessary to understand the difference in characteristics between speech and horn-like sounds. Speech signals generally have the following characteristics:
    • Limited zero-crossings in the time-domain signal, and with a signal energy concentrated in low frequency bands, at least for voiced sounds (i.e. <2 kHz);
    • Limited high-frequency transitions (typically < 20%);
    • The energy of any high-frequency transitions is negligible;
    • For unvoiced sounds, i.e. sounds other than vowel sounds that may have certain characteristics similar to noise (for example plosives such as 'b' and 'p' sounds made when the vocal folds are apart but are not vibrating), most energy is concentrated in the high frequency region. However, unvoiced speech signals have a low overall energy.
  • Horn-like signals, on the other hand, generally have the following characteristics:
    • A high number of zero crossings and high energy in high frequency bands (> 2 kHz);
    • High-frequency transitions occurring frequently (> 80%);
    • Dominant high frequencies present;
    • Energy of high-frequency transitions is considerable;
    • Harmonic in nature, i.e. having a fundamental frequency and one or more harmonic over/undertone frequencies.
  • Figures 4 to 7 illustrate the main characteristics of typical horn-like signals. In figures 4a, 5a, 6a and 7a, the time-domain representation of audio recordings of rickshaw, car, truck and motorcycle horns respectively are shown, while in figures 4b, 5b, 6b and 7b the corresponding frequency domain representations of the same signals are shown. In each case, the audio frame comprises 80 samples, extending over a sample window of 10ms, i.e. corresponding to a sampling frequency of 8kHz, resulting in a maximum sampled frequency range of 0-4kHz. In each case, high frequency variations are visible as large variations in the values of alternate or consecutive samples. In most cases, the principle component of a horn noise will be augmented by other frequency components, though in some cases, as in the motorcycle horn noise in figures 7a and 7b, the principle frequency component at around 3200Hz dominates. In other cases such as the truck horn noise of figures 6a and 6b, multiple frequency component of roughly equal magnitude are present in addition to the principle component at 3200Hz.
  • Short duration audio frames such as these tend to have a poor frequency resolution when represented in the frequency domain. Detection of horn noise based on time domain analysis methods has however been found to be advantageous on frames of duration as short as 10ms.
  • High-frequency sample variations
  • Horn-like signals are highly varying and have harmonic spectra, i.e. generally comprise a fundamental frequency component together with harmonics at related frequencies. This characteristic can be used to detect such signals by determining the number of zero-crossing variations present in each frame. As used herein, the term 'zero crossings' refers to samples that fall either side of a zero line 41 (figure 4a). For a sampling frequency of 8kHz, the highest number of zero crossings will occur when sampling a 4kHz sine wave signal, where each sample alternates between a positive side of the signal and a negative side.
  • Two parameters related to zero-crossings can be used in detecting horn-like noise:
    • First Order Consecutive Sample Variations (FOCSV), which are herein defined by consecutive samples that lead to a change in sign; and
    • First Order Alternate Sample variations (FOASV), defined by alternate samples leading to a change in sign.
  • As an illustrative example, If x represents a frame of input audio samples and i represents a sample number in the frame, then the parameter FOCSV is computed as follows: Pr vDiff = x i - 1 - x i - 2
    Figure imgb0002
    CurDiff = x i - x i - 1
    Figure imgb0003
    FOCSV = { + 1 , if Pr vDiff = = - ve & & CurDiff = = + ve + 1 , if Pr vDiff = = + ve & & CurDiff = = - ve 0 , otherwise
    Figure imgb0004
  • In other words, the FOCSV parameter is determined to be 1 if both a previous pair and a current pair of samples involve a change in sign, and is zero otherwise.
  • The parameter FOASV, on the other hand is determined as follows: Pr vDiff = x i - 2 - x i - 3
    Figure imgb0005
    CurDiff = x i - x i - 1
    Figure imgb0006
    FOASV = { + 1 , if Pr vDiff = = - ve & & CurDiff = = + ve + 1 , if Pr vDiff = = + ve & & CurDiff = = - ve 0 , otherwise
    Figure imgb0007
  • In other words, the FOASV parameter is determined to be 1 if two pairs of samples separated from each other by an intermediate sample involve a change in sign, and is zero otherwise.
  • In a frame containing N samples, the total number of high-frequency sample variations (TotalHFVariations) can be determined using the following relation: TotalHFVariations = i = 0 N - 2 FOCSV + i = 0 N - 2 ; i = i + 2 FOASV
    Figure imgb0008
    where the terms FOCSV and FOASV are defined as above, and i is the sample number in each frame (which ranges from 0 to N-1). A frame can thereby be classified based on TotalHFVariations as being a horn or a non-horn frame. In practice, TotalHFVariations has been observed to be higher for frames having horn-like signals. A threshold (ThresholdHFV) was determined experimentally considering a range of various signals. The following relationships can therefore be used to determine the presence of horn-like signals in each frame, based on this parameter:
    • TotalHFVariations < ThresholdHFV , for non-horn signals
    • TotalHFVariations ≥ ThresholdHFV for horn signals.
    Energy of difference between consecutive samples
  • Horn-like signals also exhibit large amplitude differences between consecutive samples, which correspond to the signals having a high energy. The energy of the difference signal will therefore be comparatively higher for horn-like signals when compared to non-horn like signals. A term representing this energy can be based on a First Order Consecutive
  • Sample Difference (FOCSD) computed for various signal samples. This may be defined as follows: FOCSD i = x i - x i - 1
    Figure imgb0009
    FOCSDEnergy = i = 0 N - 2 FOCSD i * FOCSD i
    Figure imgb0010
  • In other words, the FOCSD energy parameter for a frame is determined from a sum of the squares of the differences between consecutive samples.
  • It has been observed that this FOCSD Energy will be higher for frames having horn-like signals than for frames having non horn-like signals. The following relations can be used to classify frames with horn and non-horn content:
    • FOCSDEnergy < Threshold Energy' for non-horn signals
    • FOCSDEnergy > Threshold Energy' for horn signals.
  • The threshold term, Threshold Energy' was determined by analyzing the variations in FOCSDEnergy in relation to the actual signal energy for various signals.
  • Instantaneous signal energy
  • Horn signals are generally non-stationary, occurring for only short durations (typically less than 4 seconds and often less than 1 second). Considering frame processing using blocks of 10ms each, horn signals can therefore span up to 400 frames. Horn signals have a high energy content throughout their duration. This property can be used to discriminate horn signals from unvoiced speech signals that may also have significant high-frequency content. The following relations can be used to classify frames with horn and non-horn content:
    • InstantaneousBlockEnergy < ThresholdAvEnergy, for non-horn signals
    • InstantaneousBlockEnergy > ThresholdAvEnerg, for horn signals.
  • Threshold AvEnergy can be determined by analyzing the variations of InstantaneousBlockEnergy in relation to an average signal energy for various un-voiced signals and horn-like signals.
  • The presence of a horn sound in a signal block is preferably decided based on all of the above three criteria. A flow diagram illustrating the process of determining whether noise is present in an audio frame is shown in figure 8. The process repeats (i.e. between points marked 'A') by operating on consecutive frames until there are no more frames to analyse, when the process ends (step 812).
  • As a first step 801, a time domain narrowband audio signal is sampled at 8kHz, resulting in successive blocks each consisting of N samples. For each block (step 802), three different tests are carried out. A first test involves computing the TotalHFVariations parameter, as detailed above (step 803), and comparing this parameter with a first threshold value ThresholdHFv (step 804). If the threshold value is not exceeded, the horn detection flag for that block is set to false (step 805), and the process continues to the next block (step 806).
  • A second test involves computing the FOCDEnergy parameter, as detailed above (step 807), and comparing this parameter with a second threshold value ThresholdEnergy (step 808). If the threshold value is not exceeded, the horn detection flag for that block is set to false (step 805), and the process continues to the next block (step 806).
  • A third test involves computing the InstantaneousSignalEnergy parameter, as detailed above (step 809), and comparing this parameter with a third threshold value ThresholdAVEnergy (step 810). If the threshold value is not exceeded, the horn detection flag for that block is set to false (step 805), and the process continues to the next block (step 806).
  • Only if all three of the above threshold tests are passed does the process proceed to setting the horn detection flag for that block to true (step 811).
  • Following detection of horn noise in each frame, each frame is subjected to a noise suppression process. Depending on whether horn noise was detected in a frame, and whether the noise suppression system incorporates a conventional noise suppression process, the noise suppression process may i) leave the frame unchanged, ii) implement a conventional noise suppression process with no horn noise suppression, iii) implement horn noise suppression alone, or iv) implement both conventional noise suppression and horn noise suppression.
  • An exemplary noise detection and suppression system is illustrated in figure 9, in which a horn noise detection and suppression system is incorporated with a conventional speech-based noise suppression system. The characteristics of detected horn signals are incorporated into a modified gain vector to produce a modified magnitude spectrum that is adjusted to emphasise detected speech and to suppress any detected horn noises.
  • In a first step 901, an input signal z(n) is transformed into windowed FFT frames of size N, the value for N being chosen such that the signal can be considered to be stationary within each frame. A time domain input audio signal frame of N samples is thereby transformed into a frequency domain frame of N/2+1 samples. The magnitude spectrum zampl(i) (step 902) is then used in the computation of a gain vector, and the phase part of the frame is neglected.
  • In step 903, assuming horn noise has been identified in a preceding time domain test (as described above), spectral peaks present in the magnitude spectrum zampl(i) are identified, which are taken to represent the horn signal present. This results in one or more indices of spectral peaks from the magnitude spectral bin values, which are used in a secondary gain computation step (step 907). The level for identifying the peaks is determined by calculating an average spectrum zamplavg, given by the following equation: zampl avg = i = 0 N / 2 zampl i N / 2 + 1
    Figure imgb0011

    and the resulting magnitude spectral bin values are compared to zamp/ avg multiplied by a peak detection factor α , a decision is made on whether to classify a particular sample from the magnitude spectrum as a peak value according to the following relationship: if zampl i α * zampl avg , i is detected as peak index
    Figure imgb0012
  • The spectral bin indices satisfying the above relationship are identified as spectral peak bin numbers. The spectral peak indices identified are stored and used later for the gain computation by modifying gain vector when a horn sound is detected.
  • The noise floor used by noise suppression system for each frame is calculated by the Noise Floor Update block 903. The Noise Floor Estimate (NFE) is calculated by searching minima of the spectral bins over multiple frames and a noise floor used for each frame, i.e. Current Noise Floor (CNF) is updated in this block. The outputs of this block 904 are CNF(i) and NFE(i). CNF is used in the subsequent gain calculation steps. An output from a speech detection block 905 is used by the Noise Floor Update block 904 for calculating NFE and CNF.
  • A gain computation block 906 receives CNF(i,k), zampl(i,k) and YN (gain factor) where i corresponds to a spectral bin number and k is the frame number. Computation of a gain, Gainss(i,k), is given by the following relationship: Gain ss i k = zampl i k - Y N . CNF i k zampl i k
    Figure imgb0013
  • In addition to the gain computation 906, a secondary gain computation block 907 is used to modify the gain computed in equation 4. In this block 907, a secondary gain vector is computed based on the previously defined horn noise detection information. The resulting secondary gain vector Gainsec(i,k) is of size (N/2+1) and all the values in this vector are initialized to 1. This initialized value ensures that the gain computed by the gain computation block 906 is used when no horn-like signals have been detected in the present frame. The secondary gain computation block 907 block takes the horn detection flag and bin numbers calculated by the spectral peak detection block 903. The secondary gain Gainsec(i,k) is calculated using the following relationship Gain sec i k = 0
    Figure imgb0014

    where i is a spectral peak and corresponds to a frequency value of over 2000Hz. The secondary gain vector is used for modifying the gain calculated by the gain computation block 906, represented by combining block 908. All the elements of this vector are initialized to 1 in every frame before modification.
  • The resulting new gain vector, Gainnew(i,k), that is then used for noise suppression is thereby computed using the following relationship: Gain new i k = Gain sec i k * Gain ss i k
    Figure imgb0015
  • This new gain vector is applied to the real FFT data (in block 909), resulting in a modified magnitude spectrum (block 910). The modified spectrum 910 is passed through an inverse FFT block 912, resulting in a noise-suppressed signal. An equivalent operation is possible in the time domain, for example by applying a notch filter where the desired frequency response of the filter corresponds to the gain vector, i.e. where one or more notches in the filter correspond to the one or more spectral peaks that represent the noise that is to be suppressed.
  • Illustrated in figure 10 is a block diagram of a noise detection and suppression system 1000 in which a standalone horn suppression system is used, the main difference between this system and the system 900 in figure 9 being that a speech detection part of the system is not used. As before, an input signal z(n) is windowed and transformed to the frequency domain (block 1001), resulting in a magnitude spectrum (block 1002). A gain computation block 1004 takes in the spectral bin numbers from a spectral peak detection block 1003, the spectral bin numbers corresponding to spectral peaks identified in the magnitude spectrum. All the elements of the gain vector are initialized to 1 every frame before modification. On horn detection, the gain computation block 1004 computes a gain vector Gain (i, k) using the following relationship: Gain i k = 0
    Figure imgb0016

    where i is a spectral peak and corresponds to a frequency value of over 2000Hz.
  • This gain vector is then applied to the real FFT data (in combining block 1005), resulting in a modified magnitude spectrum (block 1006), to which the phase part 1007 is applied. The modified spectrum is passed through an inverse FFT block 1008 and a noise-suppressed signal is output.
  • Applications of embodiments of the invention described herein include speech enhancement devices used in communication/recording, audio enhancement during capture, editing and playback, and in audio scene analysis and steering of other processes such as noise adaptive audio or ringtone playback.
  • Other embodiments are also intentionally within the scope of the invention, which is defined by the following claims.

Claims (13)

  1. A method of suppressing noise in an audio signal, the method comprising:
    receiving an audio signal (801);
    dividing the received audio signal into a series of sampled audio frames (802); and
    for each of the sampled audio frames:
    i) determining whether noise is present in the audio frame by detecting a noise pattern in the sampled audio frame having one or more spectral peaks in a high frequency region of the audio signal spectrum; and
    ii) if noise is determined to be present in the audio frame (811), applying (908, 1005) a gain function to suppress the one or more spectral peaks in the sampled audio frame,
    wherein the step of determining whether noise is present in the audio frame comprises comparing (803,804,807,808,809,810) a measure of high frequency content in the audio frame to a threshold value by computing (803) a first number of consecutive samples of opposite sign and a second number of alternate samples of opposite sign, comparing (804) a sum of the first and second numbers to a first threshold value, and determining (811, 805) noise is present if the sum exceeds the first threshold value.
  2. The method of claim 1 comprising, for each of the sampled audio frames:
    transforming (901, 1001) the audio frame from the time domain into the frequency domain; and
    converting (912, 1008) the resulting noise-suppressed audio frame back to the time domain,
    wherein the gain function comprises a gain vector applied in the frequency domain.
  3. The method of claim 1 wherein the step of comparing a measure of high frequency content in the audio frame to a threshold value comprises computing (807) a sum of differences between consecutive samples, comparing (808) the sum of differences to a second threshold value, and determining (811, 805) noise is present if the sum of differences exceeds the second threshold value
  4. The method of claims 1 or claim 3 wherein the step of comparing a measure of high frequency content in the audio frame to a threshold value comprises computing (809) a measure of energy in the audio frame, comparing (810) this measure of energy to a third threshold value, and determining (811, 805) noise is present if the measure of energy exceeds the third threshold value.
  5. The method of claim 1 wherein the step of comparing a measure of high frequency content in the audio frame to a threshold value comprises:
    computing (807) a sum of differences between consecutive samples;
    comparing (808) the sum of differences to a second threshold value;
    computing (809) a measure of energy in the audio frame;
    comparing (810) the measure of energy to a third threshold value; and
    determining (811, 805) noise is present if the sum of the first and second numbers exceeds the first threshold value, the third sum of differences exceeds the second threshold value and the measure of energy exceeds the third threshold value.
  6. The method of any preceding claim wherein the step of determining whether noise is present in the audio frame is carried out in the time domain.
  7. The method of claim 6 wherein detecting a noise pattern in the sampled audio frame comprises comparing frequencies of the spectrum of the audio frame with an average spectrum of the audio frame, a spectral peak being detected if a magnitude of a frequency exceeds the average spectrum by a preset factor.
  8. The method of claim any preceding claim wherein the high frequency region of the audio signal spectrum exceeds 2kHz.
  9. The method of claim 1 or claim 2 wherein the gain function comprises a combination of a first gain function configured to emphasise a speech signal in the audio frame and a second gain function configured to suppress noise detected in the audio frame.
  10. The method of any preceding claim wherein the noise signal has a harmonic structure.
  11. The method of claim 10 wherein the noise signal is a vehicle horn noise.
  12. A hand-portable radio communications device comprising a noise suppression module configured to perform the method of any one of claims 1 to 11.
  13. A computer program for causing a computer to perform the method of any one of claims 1 to 11.
EP10250748.0A 2010-02-12 2010-04-09 Noise suppression Active EP2362390B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
IN312DE2010 2010-02-12

Publications (2)

Publication Number Publication Date
EP2362390A1 EP2362390A1 (en) 2011-08-31
EP2362390B1 true EP2362390B1 (en) 2016-01-06

Family

ID=42799759

Family Applications (1)

Application Number Title Priority Date Filing Date
EP10250748.0A Active EP2362390B1 (en) 2010-02-12 2010-04-09 Noise suppression

Country Status (1)

Country Link
EP (1) EP2362390B1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103632676B (en) * 2013-11-12 2016-08-24 广州海格通信集团股份有限公司 A kind of low signal-to-noise ratio voice de-noising method
CN103632681B (en) * 2013-11-12 2016-09-07 广州海格通信集团股份有限公司 A kind of spectral envelope silence detection method
CN104869209B (en) * 2015-04-24 2017-12-12 广东小天才科技有限公司 A kind of method and device for adjusting mobile terminal recording
JP6477295B2 (en) * 2015-06-29 2019-03-06 株式会社Jvcケンウッド Noise detection apparatus, noise detection method, and noise detection program
CN106356071B (en) * 2016-08-30 2019-10-25 广州市百果园网络科技有限公司 A kind of noise detecting method and device
WO2022045395A1 (en) * 2020-08-27 2022-03-03 임재윤 Audio data correction method and device for removing plosives

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7742914B2 (en) * 2005-03-07 2010-06-22 Daniel A. Kosek Audio spectral noise reduction method and apparatus
KR100735417B1 (en) * 2006-01-24 2007-07-04 삼성전자주식회사 Method of align window available to sampling peak feature in voice signal and the system thereof

Also Published As

Publication number Publication date
EP2362390A1 (en) 2011-08-31

Similar Documents

Publication Publication Date Title
US8521521B2 (en) System for suppressing passing tire hiss
US8073689B2 (en) Repetitive transient noise removal
EP1918910B1 (en) Model-based enhancement of speech signals
US8600073B2 (en) Wind noise suppression
EP2151822B1 (en) Apparatus and method for processing and audio signal for speech enhancement using a feature extraction
US8271279B2 (en) Signature noise removal
EP1745468B1 (en) Noise reduction for automatic speech recognition
EP2362390B1 (en) Noise suppression
EP2546831B1 (en) Noise suppression device
EP1744305B1 (en) Method and apparatus for noise reduction in sound signals
EP2202730B1 (en) Noise detection apparatus, noise removal apparatus, and noise detection method
US8352257B2 (en) Spectro-temporal varying approach for speech enhancement
US20050288923A1 (en) Speech enhancement by noise masking
EP2151821A1 (en) Noise-reduction processing of speech signals
EP1806739A1 (en) Noise suppressor
US9613633B2 (en) Speech enhancement
US8326621B2 (en) Repetitive transient noise removal
EP1995722B1 (en) Method for processing an acoustic input signal to provide an output signal with reduced noise
CN104575513B (en) The processing system of burst noise, the detection of burst noise and suppressing method and device
Chandra et al. Usable speech detection using the modified spectral autocorrelation peak to valley ratio using the LPC residual
US7451082B2 (en) Noise-resistant utterance detector
JP5193130B2 (en) Telephone voice section detecting device and program thereof
Ding Speech enhancement in transform domain
KR20200038292A (en) Low complexity detection of speech speech and pitch estimation
Górriz et al. Bispectra analysis-based vad for robust speech recognition

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR

AX Request for extension of the european patent

Extension state: AL BA ME RS

17P Request for examination filed

Effective date: 20120229

REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Ref document number: 602010029859

Country of ref document: DE

Free format text: PREVIOUS MAIN CLASS: G10L0021020000

Ipc: G10L0021020800

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 21/0208 20130101AFI20150619BHEP

Ipc: G10L 21/0232 20130101ALI20150619BHEP

INTG Intention to grant announced

Effective date: 20150713

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

INTG Intention to grant announced

Effective date: 20151103

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 769454

Country of ref document: AT

Kind code of ref document: T

Effective date: 20160215

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602010029859

Country of ref document: DE

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 7

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20160106

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 769454

Country of ref document: AT

Kind code of ref document: T

Effective date: 20160106

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160407

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160406

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160506

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160430

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160506

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602010029859

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

26N No opposition filed

Effective date: 20161007

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: LU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160409

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160430

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160430

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160406

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 8

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160409

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 9

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20100409

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160430

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20160106

REG Reference to a national code

Ref country code: DE

Ref legal event code: R081

Ref document number: 602010029859

Country of ref document: DE

Owner name: GOODIX TECHNOLOGY (HK) COMPANY LIMITED, CN

Free format text: FORMER OWNER: NXP B.V., EINDHOVEN, NL

REG Reference to a national code

Ref country code: GB

Ref legal event code: 732E

Free format text: REGISTERED BETWEEN 20200917 AND 20200923

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20230425

Year of fee payment: 14

Ref country code: DE

Payment date: 20230420

Year of fee payment: 14

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20230419

Year of fee payment: 14