EP0655731B1 - Noise suppressor available in pre-processing and/or post-processing of a speech signal - Google Patents

Noise suppressor available in pre-processing and/or post-processing of a speech signal Download PDF

Info

Publication number
EP0655731B1
EP0655731B1 EP19940118782 EP94118782A EP0655731B1 EP 0655731 B1 EP0655731 B1 EP 0655731B1 EP 19940118782 EP19940118782 EP 19940118782 EP 94118782 A EP94118782 A EP 94118782A EP 0655731 B1 EP0655731 B1 EP 0655731B1
Authority
EP
European Patent Office
Prior art keywords
signal
speech
noise
feature parameter
produce
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP19940118782
Other languages
German (de)
French (fr)
Other versions
EP0655731A3 (en
EP0655731A2 (en
Inventor
Kazunori C/O Nec Corporation Ozawa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Publication of EP0655731A2 publication Critical patent/EP0655731A2/en
Publication of EP0655731A3 publication Critical patent/EP0655731A3/en
Application granted granted Critical
Publication of EP0655731B1 publication Critical patent/EP0655731B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility

Definitions

  • This invention relates to a noise suppressor for use in suppressing a noise signal from a speech signal.
  • a speech signal is subjected to pre-processing before the speech signal is encoded into a sequence of encoded signals.
  • pre-processing has been made to judge either a speech duration or a non-speech duration, in an article which is contributed by J.F. Lynch, Jr. et al to IEEE and which is entitled "SPEECH/SILENCE SEGMENTATION FOR REAL-TIME CODING VIA RULE BASED ADAPTIVE ENDPOINT DETECTION" (Proceedings ICASSP, pages 1348-1351, 1987).
  • description is made only about detection between the speech duration and the non-speech duration but is not made about suppressing a noise signal from the speech signal during the pre-processing.
  • speech encoding is usually carried out in connection not only with the spectrum but also with a phase component of the speech signal. This shows that a noise component can not be removed which is included in the phase component in the above-mentioned method.
  • the spectrum subtraction is disadvantageous in that the noise component can not be completely suppressed from the speech signal.
  • the spectrum subtraction can not be applied on post-processing which is carried out after the encoded signal sequence is decoded into a sequence of decoded signals.
  • EP-A-459364 discloses a noise signal prediction system according to the preamble of claim 1.
  • a speech signal is given in the form of a sequence of digital speech signals to be subjected to pre-processing and post-processing to suppress a noise signal from the speech signal.
  • the pre-processing is carried out in response to an input signal specified by the digital speech signal sequence which is not encoded yet while the post-processing is carried out in response to an input signal specified by the digital speech signal sequence which is already decoded. Therefore, it is noted that the terms "digital speech signal sequence” and "input signal” may be used in two different meanings hereinunder so as to include both the pre-processing and the post-processing.
  • the input signal includes the speech signal (namely, the digital speech signal sequence) and the noise signal and may be therefore considered as a combination of the digital speech signal sequence and the noise signal.
  • feature parameters are extracted from the input signal and may be, for example, selected one or ones of spectrum parameters representative of features of a spectrum in the input signal, pitch prediction gains representative of periodicity of the input signal, and the like.
  • the feature parameters are used to determine either a speech duration or a non-speech duration by comparing the feature parameters with a threshold level.
  • a preliminary sound source signal which specifies a sound source is obtained by the use of the input signal and the feature parameters on the pre-processing and the post-processing. Specifically, the preliminary sound source signal appears in the form of an error signal which is produced on the pre-processing by allowing the input signal to pass through an inverse filter controlled by the feature parameters.
  • the preliminary sound source signal appears in the form of a decoder output signal or a sequence of decoded signals which is decoded by the use of the feature parameters.
  • the speech signal has an amplitude greater than the noise signal in the preliminary sound source signal, it is possible to suppress the noise signal alone by comparing an amplitude of the preliminary sound source signal with a predetermined threshold level and to therefore attain a noise-suppressed signal.
  • the noise-suppressed signal is reproduced by the use of the feature parameters into a noise-free output signal on the pre-processing or is produced as a noise-free decoded signal on the post-processing.
  • the noise-free output signal may be encoded by an encoder after the pre-processing while the noise-free decoded signal may be converted into an audio signal after the post-processing.
  • Noise suppression may be carried out only within a selected one of the speech duration or the non-speech duration or within both the speech duration and the non-speech duration.
  • this invention enables to suppress the noise signal on a waveform by the use of the feature parameters and is applicable to both the pre-processing and the post-processing.
  • a noise suppressor is applicable to the pre-processing and is therefore supplied through an input terminal 10 with an input signal IN which includes a speech signal and a noise signal superposed on the speech signal.
  • the speech signal is given in the form of a sequence of digital speech signals.
  • the input signal IN is given to a frame division circuit 11 and is divided by the frame division circuit 11 into a plurality of frames each of which has a length of, for example, 40 milliseconds.
  • Each frame is further subdivided by a subframe division circuit 12 into a plurality of subframes each of which has a length of, for example, eight milliseconds.
  • the input signal IN is divided into the subframes, as mentioned above, and is sent in the form of a divided input signal sequence x(n) either at every frame or at every subframe to a feature parameter calculator 15 on one hand and to a noise suppression circuit 20 on the other hand.
  • the divided input signal sequence x(n) may be referred to as an internal input signal.
  • the feature parameter calculator 15 is supplied with the internal input signal x(n) at every subframe.
  • the feature parameter calculator 15 at first places a window to extract a piece of the internal input signal x(n) in relation to each subframe.
  • the window is longer than each subframe length and may be, for example, 24 milliseconds.
  • the feature parameter calculator 15 calculates, as feature parameters, spectrum parameters indicative of features of a spectrum in the input signal, pitch prediction gains indicative of periodicity of the speech signal, and an average amplitude in each subframe. In this event, average power may be calculated in the feature parameter calculator 15. Such calculations of the feature parameters are known in the art and will not be described any longer. In any event, the feature parameters are produced as feature parameter signals from the feature parameter calculator 15.
  • the feature parameter calculator 15 shown in Fig. 1 calculates the spectrum parameters of a predetermined order which may be, for example, a tenth order.
  • a i linear prediction coefficients
  • the Burg analysis is used to calculate the linear prediction coefficients.
  • the Burg analysis is described in detail in a book (pages 82 to 87) which is written by Nakamizo et al and which is titled "Signal Analysis and System Identification” published by Corona Company Ltd, Tokyo, in 1988. Accordingly, description will be omitted from the instant specification as regards the Burg analysis.
  • the linear prediction coefficients may be also calculated by the use of a covariance method or a correlation method.
  • the pitch prediction gains are also calculated in the feature parameter calculator 15.
  • the feature parameter calculator 15 supplies a speech detection circuit 25 and the noise suppression circuit 20 with the feature parameter signals representative of the feature parameters, as mentioned above.
  • the speech detection circuit 25 detects or determines either the speech duration or the non-speech duration of the speech signal in response to at least one of the feature parameters. To this end, a wide variety of methods can be applied to determine the speech duration or the non-speech duration.
  • the illustrated speech detection circuit 25 at first smooths the pitch prediction gains P g and the average amplitude R to obtain smoothed pitch prediction gains P g ' and a smoothed average amplitude R' and thereafter compares the smoothed pitch prediction gains P g ' and the smoothed average amplitude R' with first and second threshold values TH1 and TH2, respectively.
  • the speech detection circuit 25 judges that the non-speech duration lasts in the internal input signal x(n). Otherwise, the speech detection circuit 25 judges that the speech duration lasts in the internal input signal x(n). Thus, the non-speech and the speech durations are detected by the speech detection circuit 25.
  • the first and the second threshold values TH1 and TH2 may be invariable or variable with time.
  • the speech detection circuit 25 comprises a calculation circuit for calculating the smoothed values (namely, the smoothed pitch prediction gains P g ' and the smoothed average amplitude R') in accordance with Equation 4 and a comparator unit for comparing the smoothed values with the first and the second threshold values TH1 and TH2.
  • the illustrated speech detection circuit 25 can produce the smoothed average amplitude R' at every frame or at every subframe and a detection signal DT representative of either the speech or the non-speech duration at every frame or at every subframe.
  • the smoothed average amplitude R' is delivered to a memory circuit 30 while the detection signal DT is sent to the noise suppression circuit 20.
  • the noise suppression circuit 20 is operable to suppress the noise signal within at least one of the speech and the non-speech durations.
  • the noise suppression circuit 20 comprises an inverse filter 201 supplied with the internal input signal x(n) from the input terminal 10 through the frame and the subframe division circuits 11 and 12.
  • the feature parameters a i are also supplied from the feature parameter calculator 15 to the inverse filter 201.
  • the inverse filter 201 carries out an inverse filtering operation to produce an inverse-filtered signal e(n) which may be called a preliminary sound source signal because the inverse-filtered signal e(n) specifies a sound source.
  • the inverse-filtered signal e(n) is dependent on the feature parameters and specifies the sound source.
  • the inverse-filtered signal e(n) includes a speech signal component and a noise signal component superposed on the speech signal component and appears in the form of a continuous signal.
  • the inverse filter 201 may be simply called a filter circuit.
  • the inverse-filtered signal e(n) is specified by a comparatively large amplitude pulse within a portion of the speech signal component appearing in the speech duration because the speech signal has a pitch.
  • the inverse-filtered signal e(n) exhibits a comparatively small amplitude within a portion of the noise signal.
  • the noise suppression circuit 20 illustrated in Fig. 2 comprises a threshold value calculation circuit 202 supplied with the smoothed average amplitude R' which is calculated by the feature parameter calculator 15 in accordance with Equation 4 and which is memorized into the memory circuit 30.
  • the threhold value TH1 is determined by the average amplitude R memorized in the memory circuit 30.
  • the inverse-filtered signal e(n) and the threshold value signal are sent to a suppressor unit 203 which is also given the detection signal DT from the speech detection circuit 25.
  • the suppressor unit 203 is put into an active state or into an inactive state in response to the detection signal DT. In this event, the suppressor unit 203 may suppress the noise signal within at least one of the speech duration and the non-speech duration. In the illustrated example, it is assumed that the suppressor unit 203 is put into the active state within the non-speech duration in response to the detection signal DT, although the suppressor unit 203 may be put into the active state within the speech duration.
  • the suppressor unit 203 compares the inverse-filtered signal e(n) with the threshold value signal.
  • the suppressor unit 203 attenuates the inverse-filtered signal e(n) by a predetermined amount or renders the inverse-filtered signal e(n) into zero when the inverse-filtered signal e(n) is smaller than the threshold value TH1.
  • the suppressor unit 203 produces a noise-suppressed signal e' specified by: where K is greater than zero and smaller than unity.
  • a combination of the threshold value calculation circuit 202 and the suppressor unit 203 serves to suppress the noise signal included in the inverse-filtered signal e(n) and to produce the noise-suppressed signal e'(n) and may be collectively called a noise suppression portion.
  • the noise-suppressed signal e'(n) is sent to a reproduction circuit 204 together with the feature parameters a i .
  • the reproduction circuit 204 reproduces the noise-suppressed signal e'(n) into a noise-suppressed speech signal x'(n) with reference to the feature parameters ai.
  • the noise-supressed speech signal x'(n) is delivered through an output terminal 35 of the noise suppression circuit 20 to an encoder (not shown) to be encoded.
  • the noise-suppressed speech signal x'(n) is produced during the pre-processing prior to the encoding. Since the noise-suppression is carried out with reference to the feature parameters of the input signal IN, a phase component of the noise signal can also be suppressed in the above-mentioned example.
  • a noise suppressor (depicted at 40) according to a second embodiment of this invention is operable to carry out post-processing after decoding.
  • the illustrated noise processor 40 is connected to a decoder 45 which is supplied as a decoder input signal or an input signal DIN with feature parameters of a speech signal and an index signal related to a sound source.
  • the decoder 45 itself may be similar to that known in the art and produces a sequence of decoded sound source signals v(n) representative of a sound source together with the feature parameters and the index signal, in a known manner.
  • the decoded sound source signal sequence v(n) and the feature parameters and the index signal are sent to the noise suppressor 40.
  • the decoded sound source signal sequence v(n) is given to a noise suppression circuit which is depicted at 50 and which is operable in a manner to be described later in detail.
  • the illustrated noise suppressor 40 comprises a speech detection circuit 25' and a memory circuit 30' which may be similar to those illustrated in Fig. 1, respectively. From this fact, it is readily understood that the speech detection circuit 25' is operated in response to the feature parameters, such as the spectrum parameters, the pitch prediction gains P g , and the average amplitude R, to detect either the speech duration or the non-speech duration.
  • the speech detection circuit 25' supplies the noise suppression circuit 50 with a detection signal DT' indicative of either the speech duration or the non-speech duration.
  • the speech detection circuit 25' calculates the smoothed average amplitude R' which is stored in the memory circuit 30'.
  • the noise suppression circuit 50 comprises a threshold calculator 501 supplied with the smoothed average amplitude R' to calculate a threshold value signal representative of a threshold value TH2, like in the threshold value calculation circuit 202.
  • the threshold value signal is given to the suppressor unit 502 together with the detection signal DT'.
  • the suppressor unit 502 is put into an active state within at least one of the speech and the non-speech durations.
  • the illustrated suppressor unit 502 becomes active only within the non-speech duration, like in the suppressor unit 203.
  • the suppressor unit 502 produces a sequence of noise-suppressed sound source signals v'(n) given by: where K is identical with K shown in Equation 7.
  • the threshold value TH2 may be equal to that of Equation 7.
  • the noise-suppressed sound source signals v'(n) are sent to a speech reproducing circuit 52 which is supplied with the feature parameters from the decoder 45.
  • the speech reproducing circuit 52 reproduces the noise-suppressed sound signals into a reproduced speech signal with reference to the feature parameters in a known manner.
  • the reproduced speech signal is delivered to a loudspeaker or the like.
  • the noise suppressor according to this invention can be used in post-processing the decoded sound source signals DIN in the above-mentioned manner.
  • the feature parameters need not be always restricted to the linear prediction coefficients but may be any other parameters known in the art.
  • the speech detection circuit 25 or 25' may be operated in a manner different from that illustrated in Figs. 1 and 3.
  • the post-processing can be carried out to suppress the noise signal even when the feature parameters are not transmitted from a transmitter and are not received by the decoder 45 (Fig. 3).
  • the speech signal is once reproduced by a receiver to form a reproduced speech waveform and to thereafter calculate feature parameters from the reproduced speech waveform in the manner mentioned in conjunction with Fig. 1.
  • the calculated feature parameters can be used to suppress the noise signal in the above-mentioned manner.
  • the noise suppression is possible during both the pre-processing and the post-processing of the speech signal. Moreover, it is also possible to suppress not only the noise signal appearing within the non-speech duration but also a non-speech signal superposed on the speech signal appearing within the speech duration. Such suppression can be accomplished on the waveform.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Noise Elimination (AREA)
  • Filters That Use Time-Delay Elements (AREA)

Description

  • This invention relates to a noise suppressor for use in suppressing a noise signal from a speech signal.
  • As a rule, a speech signal is subjected to pre-processing before the speech signal is encoded into a sequence of encoded signals. For example, such pre-processing has been made to judge either a speech duration or a non-speech duration, in an article which is contributed by J.F. Lynch, Jr. et al to IEEE and which is entitled "SPEECH/SILENCE SEGMENTATION FOR REAL-TIME CODING VIA RULE BASED ADAPTIVE ENDPOINT DETECTION" (Proceedings ICASSP, pages 1348-1351, 1987). In the article, description is made only about detection between the speech duration and the non-speech duration but is not made about suppressing a noise signal from the speech signal during the pre-processing. In other words, Lynch et al never consider about pre-processing which suppresses the noise signal from the speech signal. Practically, even when the pre-processing described in the article is used for suppressing the noise signal from the speech signal, it is difficult to suppress the noise signal, namely, a non-speech signal within the speech duration.
  • On the other hand, spectrum subtraction has been proposed to remove a noise component from the speech signal in JP-A-2-278298. Thereafter, the speech signal is encoded into a sequence of encoded signals. With this method, only a noise spectrum which results from the noise component is subtracted or removed from a spectrum including the noise spectrum and produced as a noise-subtracted speech signal. Thus, the noise-subtracted speech signal might be free from the noise component on the spectrum.
  • However, it is to be noted that speech encoding is usually carried out in connection not only with the spectrum but also with a phase component of the speech signal. This shows that a noise component can not be removed which is included in the phase component in the above-mentioned method.
  • Therefore, the spectrum subtraction is disadvantageous in that the noise component can not be completely suppressed from the speech signal.
  • Moreover, the spectrum subtraction can not be applied on post-processing which is carried out after the encoded signal sequence is decoded into a sequence of decoded signals.
  • At any rate, no consideration is made at all about suppressing a noise component on post-processing, despite that noise suppression is necessary after decoding.
  • EP-A-459364 discloses a noise signal prediction system according to the preamble of claim 1.
  • It is an object of this invention to provide a noise suppressor which is capable of completely suppressing a noise component or signal from a speech signal.
  • It is another object of this invention to provide a noise suppressor of the type described, which can be used either on pre-processing or on post-processing of the speech signal.
  • It is still another object of this invention to provide a noise processor of the type described, which can suppress the noise signal not only within a speech duration but also within a non-speech duration.
  • These objects are attained with the features of the claims.
  • Fig. 1 is a block diagram of a noise suppressor according to a first embodiment of this invention;
  • Fig. 2 is a block diagram for use in describing a part of the noise suppressor illustrated in Fig. 1;
  • Fig. 3 is a block diagram of a noise suppressor according to a second embodiment of this invention; and
  • Fig. 4 is a block diagram for use in describing a part of the noise suppressor illustrated in Fig. 3.
  • Description will be at first made as regards a principle of this invention so as to facilitate an understanding of this invention. Herein, it is assumed that a speech signal is given in the form of a sequence of digital speech signals to be subjected to pre-processing and post-processing to suppress a noise signal from the speech signal. In addition, the pre-processing is carried out in response to an input signal specified by the digital speech signal sequence which is not encoded yet while the post-processing is carried out in response to an input signal specified by the digital speech signal sequence which is already decoded. Therefore, it is noted that the terms "digital speech signal sequence" and "input signal" may be used in two different meanings hereinunder so as to include both the pre-processing and the post-processing.
  • At any rate, the input signal includes the speech signal (namely, the digital speech signal sequence) and the noise signal and may be therefore considered as a combination of the digital speech signal sequence and the noise signal.
  • According to this invention, feature parameters are extracted from the input signal and may be, for example, selected one or ones of spectrum parameters representative of features of a spectrum in the input signal, pitch prediction gains representative of periodicity of the input signal, and the like. The feature parameters are used to determine either a speech duration or a non-speech duration by comparing the feature parameters with a threshold level.
  • Briefly, a preliminary sound source signal which specifies a sound source is obtained by the use of the input signal and the feature parameters on the pre-processing and the post-processing. Specifically, the preliminary sound source signal appears in the form of an error signal which is produced on the pre-processing by allowing the input signal to pass through an inverse filter controlled by the feature parameters.
  • On the other hand, the preliminary sound source signal appears in the form of a decoder output signal or a sequence of decoded signals which is decoded by the use of the feature parameters.
  • Since the speech signal has an amplitude greater than the noise signal in the preliminary sound source signal, it is possible to suppress the noise signal alone by comparing an amplitude of the preliminary sound source signal with a predetermined threshold level and to therefore attain a noise-suppressed signal. The noise-suppressed signal is reproduced by the use of the feature parameters into a noise-free output signal on the pre-processing or is produced as a noise-free decoded signal on the post-processing. The noise-free output signal may be encoded by an encoder after the pre-processing while the noise-free decoded signal may be converted into an audio signal after the post-processing.
  • Noise suppression may be carried out only within a selected one of the speech duration or the non-speech duration or within both the speech duration and the non-speech duration. Thus, this invention enables to suppress the noise signal on a waveform by the use of the feature parameters and is applicable to both the pre-processing and the post-processing.
  • Referring to Fig. 1, a noise suppressor according to a first embodiment of this invention is applicable to the pre-processing and is therefore supplied through an input terminal 10 with an input signal IN which includes a speech signal and a noise signal superposed on the speech signal. As mentioned before, the speech signal is given in the form of a sequence of digital speech signals. The input signal IN is given to a frame division circuit 11 and is divided by the frame division circuit 11 into a plurality of frames each of which has a length of, for example, 40 milliseconds. Each frame is further subdivided by a subframe division circuit 12 into a plurality of subframes each of which has a length of, for example, eight milliseconds.
  • The input signal IN is divided into the subframes, as mentioned above, and is sent in the form of a divided input signal sequence x(n) either at every frame or at every subframe to a feature parameter calculator 15 on one hand and to a noise suppression circuit 20 on the other hand. Herein, the divided input signal sequence x(n) may be referred to as an internal input signal.
  • In the illustrated example, the feature parameter calculator 15 is supplied with the internal input signal x(n) at every subframe. The feature parameter calculator 15 at first places a window to extract a piece of the internal input signal x(n) in relation to each subframe. The window is longer than each subframe length and may be, for example, 24 milliseconds.
  • Thereafter, the feature parameter calculator 15 calculates, as feature parameters, spectrum parameters indicative of features of a spectrum in the input signal, pitch prediction gains indicative of periodicity of the speech signal, and an average amplitude in each subframe. In this event, average power may be calculated in the feature parameter calculator 15. Such calculations of the feature parameters are known in the art and will not be described any longer. In any event, the feature parameters are produced as feature parameter signals from the feature parameter calculator 15.
  • Herein, it is to be noted that the feature parameter calculator 15 shown in Fig. 1 calculates the spectrum parameters of a predetermined order which may be, for example, a tenth order. In addition, the following description will be made on the assumption that linear prediction coefficients ai are used as the spectrum parameters. Although such linear prediction coefficients are calculated by using a well-known LPC analysis, Burg analysis, or the like, it is assumed in connection with the illustrated example that the Burg analysis is used to calculate the linear prediction coefficients. The Burg analysis is described in detail in a book (pages 82 to 87) which is written by Nakamizo et al and which is titled "Signal Analysis and System Identification" published by Corona Company Ltd, Tokyo, in 1988. Accordingly, description will be omitted from the instant specification as regards the Burg analysis.
  • Alternatively, the linear prediction coefficients may be also calculated by the use of a covariance method or a correlation method.
  • As mentioned before, the pitch prediction gains are also calculated in the feature parameter calculator 15. The pitch prediction gains are represented by Pg and are given by: Pg = n=0 N-1 x2(n) /( n=0 N-1 x2(n) - ( n=0 N-1 x(n)x(n-T))2 / n=0 N-1 x(n-T)2 ) where T is a delay time representative of a pitch period; n, a sample number; and N, a maximum sample number.
  • Instead of Equation (1), the pitch prediction gains Pg can be simply calculated by the use of the following equation: Pg = n=0 N-1 x(n)x(n-T) / n=0 N-1 x(n-T)2
  • The average amplitude is represented by R and is given by: R = 1/N n=0 N-1 x2(n)
  • Herein, it is readily possible to implement circuits for calculating the above-mentioned linear prediction coefficients, the pitch prediction gains Pg, and the average amplitude R by a combination of conventional circuit elements. Accordingly, specific circuits for calculating the linear prediction coefficients, the pitch prediction gains Pg, and the average amplitude will not be described later.
  • Thus, the feature parameter calculator 15 supplies a speech detection circuit 25 and the noise suppression circuit 20 with the feature parameter signals representative of the feature parameters, as mentioned above. In the illustrated example, the speech detection circuit 25 detects or determines either the speech duration or the non-speech duration of the speech signal in response to at least one of the feature parameters. To this end, a wide variety of methods can be applied to determine the speech duration or the non-speech duration. For example, the illustrated speech detection circuit 25 at first smooths the pitch prediction gains Pg and the average amplitude R to obtain smoothed pitch prediction gains Pg' and a smoothed average amplitude R' and thereafter compares the smoothed pitch prediction gains Pg' and the smoothed average amplitude R' with first and second threshold values TH1 and TH2, respectively.
  • The above-mentioned smoothing operation of the pitch prediction gains Pg and the average amplitude R is carried out in accordance with the following equation: P'j = (1 - δ) P'j-1 + δ·P, where P is representative of the pitch prediction gains or the average amplitude to be smoothed; δ is representative of a time constant for smoothing and takes a value between 0 and 1, both exclusive; and P'j and P'j-1 are representative of smoothed values at time instants j and j-1.
  • As a result of comparison, when the smoothed pitch prediction gains Pg' and the smoothed average amplitude R' are lower than the first and the second threshold values TH1 and TH2, respectively, the speech detection circuit 25 judges that the non-speech duration lasts in the internal input signal x(n). Otherwise, the speech detection circuit 25 judges that the speech duration lasts in the internal input signal x(n). Thus, the non-speech and the speech durations are detected by the speech detection circuit 25. In the example, the first and the second threshold values TH1 and TH2 may be invariable or variable with time.
  • As mentioned before, the speech detection circuit 25 comprises a calculation circuit for calculating the smoothed values (namely, the smoothed pitch prediction gains Pg' and the smoothed average amplitude R') in accordance with Equation 4 and a comparator unit for comparing the smoothed values with the first and the second threshold values TH1 and TH2. As a result, the illustrated speech detection circuit 25 can produce the smoothed average amplitude R' at every frame or at every subframe and a detection signal DT representative of either the speech or the non-speech duration at every frame or at every subframe.
  • The smoothed average amplitude R' is delivered to a memory circuit 30 while the detection signal DT is sent to the noise suppression circuit 20.
  • Referring to Fig. 2 in addition to Fig. 1, the noise suppression circuit 20 is operable to suppress the noise signal within at least one of the speech and the non-speech durations. In Fig. 2, the noise suppression circuit 20 comprises an inverse filter 201 supplied with the internal input signal x(n) from the input terminal 10 through the frame and the subframe division circuits 11 and 12. The feature parameters ai are also supplied from the feature parameter calculator 15 to the inverse filter 201. The inverse filter 201 carries out an inverse filtering operation to produce an inverse-filtered signal e(n) which may be called a preliminary sound source signal because the inverse-filtered signal e(n) specifies a sound source. Herein, the inverse-filtered signal e(n) is given by: e(n) = x(n) - i=1 P ai x(n - i), where P represents an order of the inverse filter 201. Thus, the inverse-filtered signal e(n) is dependent on the feature parameters and specifies the sound source.
  • The inverse-filtered signal e(n) includes a speech signal component and a noise signal component superposed on the speech signal component and appears in the form of a continuous signal. The inverse filter 201 may be simply called a filter circuit.
  • Now, it is to be noted that the inverse-filtered signal e(n) is specified by a comparatively large amplitude pulse within a portion of the speech signal component appearing in the speech duration because the speech signal has a pitch. On the other hand, the inverse-filtered signal e(n) exhibits a comparatively small amplitude within a portion of the noise signal.
  • Accordingly, it is possible to suppress the noise signal by comparing the inverse-filtered signal e(n) with a threshold level TH1.
  • More specifically, the noise suppression circuit 20 illustrated in Fig. 2 comprises a threshold value calculation circuit 202 supplied with the smoothed average amplitude R' which is calculated by the feature parameter calculator 15 in accordance with Equation 4 and which is memorized into the memory circuit 30. The threshold value calculation circuit 202 calculates the threshold value TH1 given by: TH1 = K2·R' to produce a threshold value signal representative of the threshold value TH1, where K2 is greater than zero. Thus, the threhold value TH1 is determined by the average amplitude R memorized in the memory circuit 30.
  • The inverse-filtered signal e(n) and the threshold value signal are sent to a suppressor unit 203 which is also given the detection signal DT from the speech detection circuit 25. The suppressor unit 203 is put into an active state or into an inactive state in response to the detection signal DT. In this event, the suppressor unit 203 may suppress the noise signal within at least one of the speech duration and the non-speech duration. In the illustrated example, it is assumed that the suppressor unit 203 is put into the active state within the non-speech duration in response to the detection signal DT, although the suppressor unit 203 may be put into the active state within the speech duration.
  • In addition, the suppressor unit 203 compares the inverse-filtered signal e(n) with the threshold value signal. The suppressor unit 203 attenuates the inverse-filtered signal e(n) by a predetermined amount or renders the inverse-filtered signal e(n) into zero when the inverse-filtered signal e(n) is smaller than the threshold value TH1. As a result, the suppressor unit 203 produces a noise-suppressed signal e' specified by:
    Figure 00150001
    where K is greater than zero and smaller than unity.
  • At any rate, a combination of the threshold value calculation circuit 202 and the suppressor unit 203 serves to suppress the noise signal included in the inverse-filtered signal e(n) and to produce the noise-suppressed signal e'(n) and may be collectively called a noise suppression portion.
  • The noise-suppressed signal e'(n) is sent to a reproduction circuit 204 together with the feature parameters ai. The reproduction circuit 204 reproduces the noise-suppressed signal e'(n) into a noise-suppressed speech signal x'(n) with reference to the feature parameters ai. In this event, the noise-suppressed speech signal x' is given by: x'(n) = e'(n) + i=1 P ai x'(n - i).
  • The noise-supressed speech signal x'(n) is delivered through an output terminal 35 of the noise suppression circuit 20 to an encoder (not shown) to be encoded. Thus, the noise-suppressed speech signal x'(n) is produced during the pre-processing prior to the encoding. Since the noise-suppression is carried out with reference to the feature parameters of the input signal IN, a phase component of the noise signal can also be suppressed in the above-mentioned example.
  • Referring to Fig. 3, a noise suppressor (depicted at 40) according to a second embodiment of this invention is operable to carry out post-processing after decoding. To this end, the illustrated noise processor 40 is connected to a decoder 45 which is supplied as a decoder input signal or an input signal DIN with feature parameters of a speech signal and an index signal related to a sound source. The decoder 45 itself may be similar to that known in the art and produces a sequence of decoded sound source signals v(n) representative of a sound source together with the feature parameters and the index signal, in a known manner. The decoded sound source signal sequence v(n) and the feature parameters and the index signal are sent to the noise suppressor 40.
  • In the noise suppressor 40, the decoded sound source signal sequence v(n) is given to a noise suppression circuit which is depicted at 50 and which is operable in a manner to be described later in detail. Furthermore, the illustrated noise suppressor 40 comprises a speech detection circuit 25' and a memory circuit 30' which may be similar to those illustrated in Fig. 1, respectively. From this fact, it is readily understood that the speech detection circuit 25' is operated in response to the feature parameters, such as the spectrum parameters, the pitch prediction gains Pg, and the average amplitude R, to detect either the speech duration or the non-speech duration. Thus, the speech detection circuit 25' supplies the noise suppression circuit 50 with a detection signal DT' indicative of either the speech duration or the non-speech duration. Like in Fig. 1, the speech detection circuit 25' calculates the smoothed average amplitude R' which is stored in the memory circuit 30'.
  • Referring to Fig. 4 together with Fig. 3, the noise suppression circuit 50 comprises a threshold calculator 501 supplied with the smoothed average amplitude R' to calculate a threshold value signal representative of a threshold value TH2, like in the threshold value calculation circuit 202. The threshold value signal is given to the suppressor unit 502 together with the detection signal DT'.
  • The suppressor unit 502 is put into an active state within at least one of the speech and the non-speech durations. Herein, it is assumed that the illustrated suppressor unit 502 becomes active only within the non-speech duration, like in the suppressor unit 203. In any event, the suppressor unit 502 produces a sequence of noise-suppressed sound source signals v'(n) given by:
    Figure 00170001
    where K is identical with K shown in Equation 7. The threshold value TH2 may be equal to that of Equation 7.
  • Turning back to Fig. 3, the noise-suppressed sound source signals v'(n) are sent to a speech reproducing circuit 52 which is supplied with the feature parameters from the decoder 45. The speech reproducing circuit 52 reproduces the noise-suppressed sound signals into a reproduced speech signal with reference to the feature parameters in a known manner. The reproduced speech signal is delivered to a loudspeaker or the like.
  • Thus, the noise suppressor according to this invention can be used in post-processing the decoded sound source signals DIN in the above-mentioned manner.
  • While this invention has thus far been described in conjunction with a few embodiments thereof, it will readily be possible for those skilled in the art to put this invention into practice in various other manners. For example, the feature parameters need not be always restricted to the linear prediction coefficients but may be any other parameters known in the art. In addition, it is possible to use any other parameters than the average amplitude, and the pitch prediction gains. The speech detection circuit 25 or 25' may be operated in a manner different from that illustrated in Figs. 1 and 3.
  • Moreover, the post-processing can be carried out to suppress the noise signal even when the feature parameters are not transmitted from a transmitter and are not received by the decoder 45 (Fig. 3). In this case, the speech signal is once reproduced by a receiver to form a reproduced speech waveform and to thereafter calculate feature parameters from the reproduced speech waveform in the manner mentioned in conjunction with Fig. 1. Thus, the calculated feature parameters can be used to suppress the noise signal in the above-mentioned manner.
  • With this structure, the noise suppression is possible during both the pre-processing and the post-processing of the speech signal. Moreover, it is also possible to suppress not only the noise signal appearing within the non-speech duration but also a non-speech signal superposed on the speech signal appearing within the speech duration. Such suppression can be accomplished on the waveform.

Claims (7)

  1. A noise suppressor supplied with an internal input signal (IN) which includes both a speech signal and a noise signal to produce an output signal substantially free from said noise signal, said speech signal being specified by a sound source, said noise suppressor comprising feature parameter calculating means (15) supplied with said internal input signal for calculating a feature parameter specifying a feature of said speech signal to produce a feature parameter signal representative of said feature parameter, and noise suppressing means (20) coupled to said feature parameter calculating means (15) for suppressing said noise signal from said internal input signal to produce said output signal, wherein said noise suppressing means comprises:
    a suppression unit (203) for suppressing the noise signal from a residual signal (e(n)) by estimating said noise signal to produce a nose-suppressed signal (e'(n)); and
    output means (204) for producing said noise-suppressed signal as said output signal; and is characterized by
    filter means (201) supplied with said feature parameter signal (ai) and said internal input signal for filtering said internal input signal (x(n)) to produce a filtered signal which is dependent on said feature parameter (ai) and which specifies said sound source in that said residual signal (e(n)) is calculated which represents the difference between said feature parameter signal representation and said internal input signal; wherein the suppression unit is coupled to said filter means (201).
  2. A noise suppressor as claimed in Claim 1, said speech signal being divisible into a speech duration and a non-speech duration, wherein said noise suppressor (20) further comprises:
    speech detection means (25) coupled to said feature parameter calculating means (15) for detecting said speech and said non-speech durations in response to the feature parameter signal to produce a detection signal representative of either one of said speech and said non-speech durations;
    average calculation means (30) coupled to said speech detection means for calculating an average value of either power or an amplitude within said non-speech duration to produce an average signal representative of said average value;
    said noise suppressing means (20) further comprising:
    threshold level calculating means (202) for calculating a threshold level from said average signal to supply said suppression unit (203) with a threshold level signals (TH1) representative of said threshold level, to make said suppression unit 203 compare said filtered signal with said threshold level signal, and to make said suppression unit suppress said noise signal.
  3. A noise suppressor as claimed in Claim 2,
    wherein said suppression unit 203 is further supplied with said detection signal (DT) to be put into an active state within at least one of said speech and said non-speech durations.
  4. A noise suppressor as claimed in Claim 1, 2 or 3,
    wherein said feature parameter calculating means (15) calculates, as said feature parameter (ai), spectrum parameters representative of a spectrum of said internal input signal, a pitch period of said internal input signal, and an average amplitude of said internal input signal.
  5. A noise suppressor according to claims 1, 2, 3, or 4, wherein said internal input signal is divided into a sequence of frames each of which lasts for a predetermined interval of time, said speech signal is generated by a sound source and has a spectrum specified by at least one feature parameter and is divisible into a speech duration and a non-speech duration, said noise suppressor comprises feature parameter calculating means for calculating said at least one feature parameter to produce a feature parameter signal representative of said at least one feature parameter and speech detection means coupled to said feature parameter calculating means (15) for detecting said speech and said non-speech durations in response to the feature parameter signal to produce a detection signal representative of either one of said speech and said non-speech durations,
    average memory means is coupled to said speech detection means for memorizing an average value of either one of power and an amplitude of said internal input signal within said non-speech duration to produce an average signal representative of said average value; and
    said noise suppressing means (20) is coupled to said feature parameter calculating means (15) said speech detection means, and said average calculating means for suppressing said noise signal with reference to said feature parameter signal, said detection signal, said average signal, and said internal input signal to produce said output signal.
  6. A noise suppressor operable in response to a feature parameter signal specifying a speech signal and to a sound source signal (v(n)) representative of a sound source of said speech signal to suppress a noise signal from the sound source signal and to produce an output signal (v'(n)) substantially free from said noise signal, said speech signal being divisible into a speech duration and a non-speech duration, said sound source signal appearing in the form of an error signal which is produced on the preprocessing by allowing an input signal to pass through an inverse filter controlled by said feature parameter signal,
    said noise suppressor being characterized by:
    a noise suppressing circuit (50) for suppressing said noise signal from said sound source signal with reference to said feature parameter signal to produce a noise-suppressed signal (v'(n));
    means (52) for producing said noise-suppressed signal as said output signal.
  7. A noise suppressor as claimed in Claim 6, characterized by:
    speech detection means (25') supplied with said feature parameter signals for detecting said speech and said non-speech durations to produce a detection signal representative of either one of said speech and said non-speech durations; and
    average memory means (30') coupled to said speech detection means for memorizing an average value of either one of power and an amplitude of said speech signal within said non-speech duration to produce an average signal representative of said average value;
    said noise suppressing circuit (50) suppressing said noise signal with reference to said average signal also.
EP19940118782 1993-11-29 1994-11-29 Noise suppressor available in pre-processing and/or post-processing of a speech signal Expired - Lifetime EP0655731B1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP29717693 1993-11-29
JP29717693A JP2739811B2 (en) 1993-11-29 1993-11-29 Noise suppression method
JP297176/93 1993-11-29

Publications (3)

Publication Number Publication Date
EP0655731A2 EP0655731A2 (en) 1995-05-31
EP0655731A3 EP0655731A3 (en) 1997-05-28
EP0655731B1 true EP0655731B1 (en) 2000-03-29

Family

ID=17843166

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19940118782 Expired - Lifetime EP0655731B1 (en) 1993-11-29 1994-11-29 Noise suppressor available in pre-processing and/or post-processing of a speech signal

Country Status (3)

Country Link
EP (1) EP0655731B1 (en)
JP (1) JP2739811B2 (en)
DE (1) DE69423703T2 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3591068B2 (en) * 1995-06-30 2004-11-17 ソニー株式会社 Noise reduction method for audio signal
US7225001B1 (en) 2000-04-24 2007-05-29 Telefonaktiebolaget Lm Ericsson (Publ) System and method for distributed noise suppression
KR101235830B1 (en) * 2007-12-06 2013-02-21 한국전자통신연구원 Apparatus for enhancing quality of speech codec and method therefor
CN103168326A (en) * 2010-08-11 2013-06-19 骨声通信有限公司 Background sound removal for privacy and personalization use
JP6759898B2 (en) 2016-09-08 2020-09-23 富士通株式会社 Utterance section detection device, utterance section detection method, and computer program for utterance section detection

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU633673B2 (en) * 1990-01-18 1993-02-04 Matsushita Electric Industrial Co., Ltd. Signal processing device
KR950013551B1 (en) * 1990-05-28 1995-11-08 마쯔시다덴기산교 가부시기가이샤 Noise signal predictting dvice
JPH05188994A (en) * 1992-01-07 1993-07-30 Sony Corp Noise suppression device

Also Published As

Publication number Publication date
JP2739811B2 (en) 1998-04-15
JPH07152395A (en) 1995-06-16
EP0655731A3 (en) 1997-05-28
DE69423703D1 (en) 2000-05-04
EP0655731A2 (en) 1995-05-31
DE69423703T2 (en) 2000-07-27

Similar Documents

Publication Publication Date Title
EP1338003B1 (en) Gains quantization for a celp speech coder
EP1157377B1 (en) Speech enhancement with gain limitations based on speech activity
US4852169A (en) Method for enhancing the quality of coded speech
EP0698877B1 (en) Postfilter and method of postfiltering
EP1008140B1 (en) Waveform-based periodicity detector
EP2093756B1 (en) A speech communication system and method for handling lost frames
US6023674A (en) Non-parametric voice activity detection
US9489964B2 (en) Effective pre-echo attenuation in a digital audio signal
KR20000075936A (en) A high resolution post processing method for a speech decoder
AU2001255422A1 (en) Gains quantization for a celp speech coder
US6246979B1 (en) Method for voice signal coding and/or decoding by means of a long term prediction and a multipulse excitation signal
KR102000227B1 (en) Discrimination and attenuation of pre-echoes in a digital audio signal
EP0655731B1 (en) Noise suppressor available in pre-processing and/or post-processing of a speech signal
KR102099293B1 (en) Audio Encoder and Method for Encoding an Audio Signal
EP0713208B1 (en) Pitch lag estimation system
EP1521242A1 (en) Speech coding method applying noise reduction by modifying the codebook gain
EP1521243A1 (en) Speech coding method applying noise reduction by modifying the codebook gain

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): DE FR GB NL

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): DE FR GB NL

17P Request for examination filed

Effective date: 19970415

17Q First examination report despatched

Effective date: 19980414

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB NL

RIC1 Information provided on ipc code assigned before grant

Free format text: 7G 10L 11/06 A, 7G 10L 101/12 B

REF Corresponds to:

Ref document number: 69423703

Country of ref document: DE

Date of ref document: 20000504

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20121130

Year of fee payment: 19

Ref country code: DE

Payment date: 20121121

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20121128

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: NL

Payment date: 20121116

Year of fee payment: 19

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 69423703

Country of ref document: DE

REG Reference to a national code

Ref country code: NL

Ref legal event code: V1

Effective date: 20140601

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20131129

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20140731

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20140601

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20140603

REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Ref document number: 69423703

Country of ref document: DE

Free format text: PREVIOUS MAIN CLASS: G10L0011060000

Ipc: G10L0021026400

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20131129

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20131202

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 69423703

Country of ref document: DE

Effective date: 20140603

Ref country code: DE

Ref legal event code: R079

Ref document number: 69423703

Country of ref document: DE

Free format text: PREVIOUS MAIN CLASS: G10L0011060000

Ipc: G10L0021026400

Effective date: 20141103