WO2001022401A1

WO2001022401A1 - Processing circuit for correcting audio signals, receiver, communication system, mobile apparatus and related method

Info

Publication number: WO2001022401A1
Application number: PCT/EP2000/008874
Authority: WO
Inventors: Charkani (El Hassani), Ahmed, N.
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 1999-09-20
Filing date: 2000-09-08
Publication date: 2001-03-29
Also published as: EP1131815A1; JP2003510643A; KR20010080476A; CN1322347A

Abstract

Transmission of audio signals, partitioned into frames, between a transmitter and a receiver may induce annoying artifacts in the signal. The invention aims at reducing these artifacts in the decoded frames by derivation of a magnitude (CAL) representative of the energy and comparison of this magnitude with a derived estimate (EST). A frame is corrected (CORR) when the magnitude exceeds in a great amount the estimate (DSP).

Description

Processing circuit for correcting audio signals, receiver, communication system, mobile apparatus and related method

FIELD OF THE INVENTION

The invention relates to a processing circuit for processing a digital audio signal transmitted as a series of samples and partitioned into successive frames, a frame consisting of a set of successive samples. The invention also relates to a receiver for receiving a digital signal transmitted as a series of samples and partitioned into successive frames, a frame being consisting of a set of successive samples.

The invention also relates to a communication system for transmitting data frames between a transmitter and a receiver. The invention also relates to a mobile apparatus provided with a processing circuit for carrying out a method in accordance with the invention on a speech signal before the speech signal is heard by a user.

The invention relates to a method of processing a digital signal transmitted as a series of samples and partitioned into successive frames, a frame being formed by a set of successive samples.

The invention may be particularly relevant when applied to the processing of audio signals in mobile radiotelephony.

BACKGROUND OF THE INVENTION A GSM audio signal transmitted from a transmitter to a receiver follows a conventional baseband transmission path in a transmission system as shown in Fig.1. First a block unit 1 , comprising a microphone and an analog to digital converter, receives a speech signal Si and converts this speech signal Si into a digital speech signal Sd. The digital speech signal Sd is then successively encoded by a speech encoder 2 and a channel encoder 3. These two encoding steps allow to compress, on average, the amount of transmitted data at any time. Therefore, they allow the overall bandwidth of the transmission system to be utilized more efficiently to allow several phone calls to be processed simultaneously. The channel encoder 3 encodes the signal coming in from the speech encoder 2 for detection and correction purposes and provides an encoded speech signal E. The encoded speech signal E is then conveyed for further channel decoding to a channel decoder 4 and conveyed to a correction unit 5, so that the signal coming from the channel decoder 4 can be partially removed from errors owing to coding and transmission. A partially corrected signal comes from the correction unit 5 and is then decoded by a speech decoder 6. Decoded output signal So finally comes from the speech decoder 6, and this decoded signal So is then delivered to a block unit 7 comprising a digital to analog converter and a loudspeaker for providing an audible analog signal to any user. In a conventional base-band transmission path like the one given in Fig.l, successive encoding and transmission of the signal may induce artifacts in the decoded output signal So. The signal So is transmitted as a series of signal samples and may be partitioned into successive frames composed of signal samples and the artifacts mentioned above may appear as unwanted annoying high level frames or distorted frames.

The International application WO 98/38764 describes a device for frame-error detection, wherein a frame is defined as being abnormal when a determined logical combination of several different comparison criteria is satisfied. For each frame an energy value is derived on the basis of the energy of previous frames. A frame may be defined as abnormal when the energy of the frame is above a determined threshold. A frame may also be defined as abnormal when the comparison of the energy of the frame with the energy of a preceding frame transgresses a maximum value. In this document, the calculation of the energy of a frame uses parameters from the speech encoder. And, the proposed method may only be applied for correcting signals coming from a normative Full-Rate codec.

SUMMARY OF THE INVENTION

An object of the invention is to improve the quality of an audio signal after any successive encoding, transmission and decoding. To this end, a processing circuit as described in the introduction comprises calculation means for deriving a magnitude representative of the energy of a frame, estimation means for deriving an estimate of said magnitude representative of the energy of the frame on the basis of the magnitude representative of the energy of at least a previous frame and correction means for processing the samples of a frame when the difference between the magnitude representative of the energy of said frame and the corresponding estimate is greater than a predetermined threshold.

For each frame, the derived magnitude is representative of the energy of a frame and does not depend on the energy of any previous frame. This energy value may be referred to as a short term energy. On the other hand, the estimate of the magnitude representative of the energy of the frame takes into account in its calculation the energy of at least a previous frame. For a given frame, these two values are compared and the difference between the magnitude representative of the energy and the estimate of said magnitude shall not exceed a predetermined threshold. In fact, the magnitude representative of the energy of a frame shall not abnormally exceed the corresponding estimate. A frame, whose magnitude of the energy is excessively greater than the corresponding estimate is considered an abnormal frame and is corrected by attenuating the level of the samples of said frame. Unlike a device described by the prior art, a processing circuit, proposed by the invention, is not limited to the processing of signals issuing from a certain type of codec. An advantage of the invention is therefore to allow a correction of a signal coming from any sort of speech codec because the correction according to the invention is performed after speech decoding. Besides in the prior art, the energy is derived for frames of a fixed length while this length is given by the Full- Rate codec requirements. In this document, each frame is 20ms long and is composed of 160 samples. According to the invention, the magnitude representative of the energy may be derived for frames of any length. Consequently, an advantage of the invention is to allow a detection and correction of erroneous frames, which may be adapted to the signal and, as a result, may be more precise than the device proposed in the prior art.

In an embodiment of the invention, said estimate is derived on the basis of at least a previous frame previously processed by the correction means when needed. The processing of the correction may use well known correction techniques such as techniques which attenuate the frame detected erroneous or techniques which replace it by an interpolation of other frames detected valid.

In this embodiment the frames used in the calculation of the estimate of the magnitude of the energy of a frame are either normal, or previously corrected by the correction means. In the device proposed by the prior art, a given frame is detected as being abnormal on the basis of a comparison of a certain value of the energy of the frame with the energy of a single previous non-erroneous frame. However, this non-erroneous reference frame may in fact be an abnormal frame and not detected as such by the device. This error may be spread because of the fact that many subsequent abnormal frames may not be detected by a device according to the prior art. In a processing circuit according to the invention, the detection of abnormal frames takes into account the evolution of the energy for a duration of several previous frames, independently of the type of frame. Any user may therefore be protected against any sudden high level sound which could be dangerous for the user's auditory system. In a preferred embodiment of the invention, the predetermined threshold depends on the estimation of the magnitude representative of the energy of the frame.

In this preferred embodiment, it is rendered possible to adapt the threshold as a function of the value of the estimate of the energy, which depends on the estimate of the energy of at least a previous frame. It allows a detection of abnormal frames which may be more precise than the one proposed in the prior art. But the threshold may also be adapted with respect to extern parameters which may for example represent the context of the previous frames. Such extern parameters may indicate e.g. a quality of the received signal.

BRIEF DESCRIPTION OF THE DRAWINGS

The particular aspects of the invention will now be explained with reference to the embodiments described hereinafter and considered in connection with the accompanying drawings, in which :

Fig.l is a block diagram of a conventional baseband transmission path of a communication system,

Fig.2 is a communication system in accordance with the invention, Fig.3 is a block diagram of a receiver in accordance with the invention, Fig.4 is a processing circuit in accordance with the invention, Fig.5 is a flowchart depicting a processing method in accordance with the invention,

Fig.6 is a graphic showing a maximum allowed deviation according to the invention.

DETAILED DESCRIPTION OF THE INVENTION Fig.2 shows a possible embodiment of a communication system 100, according to the invention, for transmitting encoded audio data frames D between at least a transmitter 101 and at least a receiver 102 through a communication channel 103. In this embodiment the data D are, for example, audio data transmitted between a base station 101 of a mobile telephony system and a cellular phone 102. The audio data D are conveyed in an environment 103, which may be air. Depending on the type of the communication, which can be bidirectional, the receiver 102 may be the transmitter and the transmitter 101 may be the receiver. This situation occurs when, for example, the cellular phone 102 sends a message towards the base station 101. Besides any encoder may be used for the encoding of the audio data D, such as for example, an EFR (Enhanced Full-Rate) encoder, a HR (Half-Rate) encoder or an AMR (Adaptive Multi-rate) encoder.

Fig.3 is a block diagram of a possible embodiment of a receiver 102 comprising a processing circuit 10 according to the invention. In this embodiment, the receiver 102 first comprises a demodulator DEM for demodulating a received encoded signal. For example an encoded GSM signal is conveyed through the air at the frequency rate 900Mhz. The demodulator receives this GSM signal and converts the signal into the baseband frequency range. The receiver 102 also comprises a channel decoder 4, a speech decoder 5, a correction unit 6 such as described previously. The processing circuit 10 according to the invention, is located after the speech decoder 5 and the correction unit 6 and processes the digital signal So coming from the correction unit 6. A corrected signal Scorr results therefrom. This corrected signal Scorr is applied to the digital to analog converter of the unit 7 and to the loudspeaker of the unit 7. As mentioned previously, the signal So may contain annoying artifacts. The processing circuit 10 allows to improve the audio quality of the signal So resulting in the corrected signal Scorr cleared of the artifacts initially present in the signal So.

According to an alternative of the invention, he processing unit 10 may be controlled by extern parameters delivered e.g. by the channel decoder 4 or the speech decoder 5. These extern parameters may be quality indicators for indicating a quality of the received signal. Such an indicator may be e.g. of the type of the RX_QUAL parameter from the GSM (global Mobile Communications) recommendation.

A preferred embodiment of a processing circuit in accordance with the invention is depicted in Fig.4. The processing circuit 10 comprises calculation means CAL which receives the digital decoded signal So, transmitted as a succession of samples x(k). There, the signal So is partitioned into frames Fn. A frame Fn is thus composed of a set of samples x(k) of the digital signal So, x(k) being the k* sample of the signal So and Fn being the n* frame of the signal So. In this preferred embodiment of the invention and in the following paragraphs, the frames Fn are consecutive, do not overlap each other and are of the same length L. L is the number of samples x(k) per frame Fn. Then, the calculation means CAL derives for each frame Fn a magnitude Mn representative of the energy of the frame Fn. In an embodiment of the invention, this magnitude Mn representative of the frame Fn is derived as the arithmetic mean of the absolute values of the samples x(k) of the frame Fn as shown in Equation (1).

Other ways of deriving the magnitude Mn representative of the energy may be valid and any other calculation of a mean of the samples x(k) of the frame Fn may be implemented such as, for example, the quadratic mean of the samples x(k) of the frame Fn. This derived magnitude Mn may be considered short term energy of the frame

Fn, since the value of the magnitude Mn does not depend on previous frames.

The processing circuit 10 also comprises estimation means EST for deriving an estimate En of the magnitude Mn of the energy of the frame Fn from the estimate of the magnitude representative of the energy of at least a previous frame. In an embodiment of the invention, any previous frame used in the calculation of the estimate En has been previously corrected by the processing unit 10. A simple way of deriving the estimate Mn is to use Equation (2) as follows :

En=(l-α)Mn+αEn-l (2) with α being positive and lower than 1 and En-1 being the estimate of the energy of the previous frame Fn-1. The obtained estimate En of the energy of the frame Fn is then stored in a memory unit MEM. When using Equation (2) it must be noted that only the first few previous frames have a strong influence on the calculation of the value of the estimate En of the magnitude Mn. Indeed in Equation (2) the magnitude Mj representative of the energy of the frame Fj, which is the j^th previous frame of the frame Fn, is multiplied by a coefficient α^J(l-α), with α lower than 1, and, therefore, becomes negligible in the calculation of the estimate En of the magnitude Mn representative of the energy of the frame Fn, when j becomes greater. Thus, in an embodiment of the invention, this estimate En may be obtained by low-pass filtering the magnitude En representative of the energy of the frame Fn.

For a given frame Fn, the derived magnitude Mn and estimate En are transmitted to a comparison unit COMP. The comparison unit COMP estimates a deviation Dn of the magnitude Mn representative of the energy of the frame Fn towards the estimate En of this magnitude Mn. Dn is the difference between the derived magnitude Mn and the estimate En, as shown in Equation (3).

Dn=Mn-En (3) This deviation Dn is then compared with a threshold Tn, determined in a threshold estimator TD.

For the purpose of calculating the threshold Tn, the estimate En is also transmitted to the unit TD. The threshold Tn is the maximum allowed amount of deviation Dn of the magnitude Mn representative of the energy of the frame Fn from its corresponding estimate En. This threshold Tn may possibly be fixed to a given value for all possible value of the magnitude Mn of the frames Fn of the signal So. However, the value of the threshold Tn may be a function of the estimate En of the magnitude Mn. A possible function is depicted in Fig.5. In this example, Tn is the minimum between a first increasing function fl , function of the mean Mn and a second decreasing function f2, also function of the mean Mn.

An allowed deviation Dn for a given value of the estimate En has to be situated in the up right positive quarter of the graphic of Fig.5 but can not take the values which are situated in a part, denoted A. The upper limit of the part A is defined, for a given value of the estimate En, by the minimum between the values fl (En) and f2(En). In this example, Tn is chosen low when the estimate En is low and high. Indeed for frames Fn of low energy like almost silent frames and for frames of high energy, the magnitude Mn shall not exceed by far the corresponding estimate En, otherwise it would result in annoying and even dangerous sudden noise for a listener. In another embodiment of the invention the threshold Tn may be a function of both the estimate En and the standard deviation of the magnitude representative of the energy. The possible values of the threshold Tn may be retrieved from a bidimensional table stored in a look-up table. This bidimensional table may contain the possible values of the threshold Tn for different values of the estimate En and the standard deviation. When, for a frame Fn, the deviation of the magnitude Mn towards the corresponding estimate is greater than the maximum allowed deviation Tn, the frame is considered abnormal and corrupted with artifacts. Any abnormal frame Fn is corrected in a correcting unit CORR. An abnormal frame Fn is corrected by multiplication of the samples x(k) of the frame Fn by a positive coefficient absolutely lower than 1. It results in a corrected frame Fcorr, whose samples x'(k) are attenuated in comparison with the samples x(k) of the original abnormal frame Fn. When the samples of an abnormal frame are multiplied by a null coefficient, the frame Fn is totally muted.

When, for a frame Fn, the deviation of the magnitude Mn from the corresponding estimate En is lower than the maximum allowed deviation Tn, the frame is considered normal and may be further transmitted without modification. This case includes the situation where the magnitude Mn is lower than the estimate En which means that the deviation Dn is negative. In this situation there is no need for correcting the frame Fn.

The corrected frames Fcorr and the normal frames Fn now form the corrected signal Scorr, which is transmitted to a digital to analog converter to be converted back to an o analog signal sent to a loudspeaker, for example. In this embodiment of the invention, any corrected frame Fcorr, corrected frame of an initial frame Fn, is also transmitted to the estimation means EST for derivation of the estimate of the magnitude of the energy of the frame Fcorr resulting in a corrected estimate. This corrected estimate is stored in the memory unit MEM and replaces the previous estimate Mn derived for the initial abnormal frame Fn. This corrected estimate may be used in Equation (2) in the calculation of the estimate Mn of a subsequent frame. Thus, in this embodiment of the invention, any calculation of an estimate based on Equation (2) is performed on the basis of estimate values corresponding to normal or corrected frames. This embodiment of a processing circuit according to the invention is by no means a limitation to the invention. It is also within the scope of the invention to consider the processing unit 10 of Fig.4 as a DSP unit itself which, in fact, would consist of the calculation means CAL, the estimation means EST, the memory unit MEM, the block unit TD, the comparison unit COMP and the correction means all together. The memory unit MEM could also be estern or intern of the DSP.

Fig.6 is a flowchart depicting a processing method according to the invention. As to this flowchart, each block represents a method step. A first step, not represented here, consists of partitioning the digital signal So into the successive frames Fn. Then, in a step 20, the magnitude Mn of the frame Fn is derived as performed in the calculation means CAL of Fig.4. In a step 30, the estimation En of the magnitude Mn is derived as performed in the estimate means EST of Fig.4. In a step 40, the threshold Tn is derived as performed in the unit TD of Fig.4. Then, in a step 50, the deviation Dn is derived and compared to the threshold Tn. Depending on this comparison, the frame Fn may be detected abnormal and be corrected in a step 60. When the frame Fn is normal, the frame is not modified and is transmitted as such. When Dn is greater than the threshold Tn, the frame Fn is abnormal and is corrected in a frame Fcorr. In an embodiment of the invention, the estimation performed in step 30 is recalculated for the corrected frame Fcorr.

It is also within the scope of the invention that the length L of the frames is not necessarily fixed and may vary during the processing of the signal So. For example a short length L of the frames, like, for example a length of 40 samples, may be chosen when a precise correction is required and when the signal is fluctuating a lot. However a long length L may be chosen when the signal is kept in a small range. Each time the length L of the frames is modified, the process is preferably reinitialized. Furthermore, the frame may overlap each other when a fine detection is required. It must be noted that in this text, the word "comprising" does not exclude the presence of elements or steps other than those listed in a claim.

Claims

CLAIMS:

1. A processing circuit for processing a digital audio signal transmitted as a series of samples and partitioned into successive frames, a frame consisting of a set of successive samples characterized in that it comprises calculation means for deriving a magnitude representative of the energy of a frame, estimation means for deriving an estimate of said magnitude representative of the energy of the frame from the magnitude representative of the energy of at least a previous frame and correction means for processing the samples of a frame when the difference between the magnitude of the energy of said frame and the corresponding estimate is greater than a predetermined threshold.

2. A processing circuit as claimed in claim 1 , characterized in that the frames are of the same length.

3. Processing circuit as claimed in claim 1 or 2, characterized in that said estimate is derived from at least a previous frame previously processed by the correction means when needed.

4. A processing circuit as claimed in any one of the claims 1 to 3, characterized in that the estimate of the energy of a frame is obtained by low-pass filtering the energy of the frame.

5. A processing circuit as claimed in any one of the claims 1 to 4, characterized in that the predetermined threshold depends on the estimate of the magnitude representative of the energy of said frame.

6. A processing circuit as claimed in any one of the claims 1 to 5, characterized in that the predetermined threshold is lowered for low and high values of the estimate of the magnitude representative of the energy of a frame.

7. A receiver for receiving a digital signal transmitted as a series of samples and partitioned into successive frames, a frame being formed by a set of successive samples characterized in that it comprises a processing circuit as claimed in any one of the claims 1 to 6.

8. A communication system for transmitting data frames between a transmitter and a receiver, the receiver comprising a processing circuit as claimed in any one of the claims 1 to 6.

9. A mobile apparatus comprising a speech decoder for providing a decoded speech signal, characterized in that the mobile apparatus further comprises a processing circuit as claimed in any one of the claims 1 to 6 for processing the decoded speech signal.

10. A method of processing a digital signal transmitted as a series of samples and partitioned into successive frames, a frame consisting of a set of successive samples, characterized in that the method comprises, for a frame, the steps of: deriving a magnitude representative of the energy of the frame, deriving an estimate of said magnitude from the magnitude representative of the energy of at least a previous frame, attenuating the values of the samples of said frame when the difference between the magnitude representing the energy and the corresponding estimate is greater than a predetermined threshold.