EP1145227A1

EP1145227A1 - Method and device for error concealment in an encoded audio-signal and method and device for decoding an encoded audio signal

Info

Publication number: EP1145227A1
Application number: EP00926896A
Authority: EP
Inventors: Pierre Lauber; Martin Dietz; Jürgen HERRE; Reinhold BÖHM; Ralph Sperschneider; Daniel Homm
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 1999-05-07
Filing date: 2000-04-12
Publication date: 2001-10-17
Anticipated expiration: 2020-04-12
Also published as: JP3623449B2; EP1145227B1; ATE221244T1; DE19921122C1; US7003448B1; DE50000306D1; WO2000068934A1; JP2002544550A

Abstract

In a method for concealing an error in an encoded audio signal a set of spectral coefficients is subdivided into at least two sub-bands ( 14 ), whereupon the sub-bands are subjected to a re-verse transform ( 16 ). A specific prediction is performed ( 18 ) for each quasi time signal of a sub-band to obtain an estimated temporal representation for a sub-band of a set of spectral coefficients following the current set. A forward transform ( 20 ) of the time signal of each sub-band provides estimated spectral coefficients which can be used ( 28 ) instead of erroneous spectral coefficients of a following set of spectral coefficients, e.g. in order to conceal transmission errors. Transforming at the sub-band level provides independence from transform characteristics such as block length, window type and MDCT algorithm while at the same time preserving spectral processing for error concealment. Thus the spectral characteristics of audio signals can also be taken into account during error concealment.

Description

Method and device for concealing an error in a coded audio signal and method and device for decoding a coded audio signal

description

The present invention relates to the coding or decoding of audio signals and in particular to the concealment of errors ("error concealment") in digitally coded audio signals.

With the increasing spread of modern audio encoders and corresponding audio decoders that work according to one of the MPEG standards, the transmission of coded audio signals via radio networks or via wired networks, such as. B. the Internet has already achieved great importance. When transmitting coded audio signals by means of digital broadcasting, but also when transmitting audio signals via wired networks, there is a non-ideal transmission channel which can lead to encoded audio signals being disturbed during transmission. Therefore, on the decoder side, the task arises of how to deal with transmission errors and how to "disguise" transmission errors. The error concealment serves to manipulate transmission errors in some way in order to improve the subjective auditory impression of such an error-prone decoded audio signal.

Several error concealment methods are already known. The simplest type of error concealment is the muting method, which is also referred to as "muting". If a decoder detects that data is missing or incorrect, it switches the playback off. The missing data are thus replaced by a zero signal. This prevents the decoder from making too loud or unpleasant noises due to a transmission error. will give. Due to psychoacoustic effects, this sudden drop and rise in the signal energy when the decoder again outputs error-free data is perceived as unpleasant.

Another known method that avoids the sudden drop and rise of signal energy is the data retry method. If, for example, a block or several blocks of audio data fails, some of the last data sent is repeated in a loop until error-free, i.e. H. sound data is intact. However, this method leads to annoying artifacts. If only short parts of the audio signal are repeated, the repeated signal sounds machine-like with a basic frequency at the repetition frequency, regardless of the original signal. If longer parts are repeated, certain echo effects arise, which are also perceived as annoying.

In the case of block-oriented transformation encoders / decoders, in which a spectral representation of a temporal audio signal is used, there would also be the possibility of performing a spectral value-based prediction in the case of faulty audio data. If it is determined that spectral values in a block are defective, these spectral values can be predicted, ie predicted or estimated, based on the spectral values of a preceding block or a plurality of previous blocks. The predicted spectral values correspond to the incorrect spectral values within certain limits if the audio signal is relatively stationary, ie if the audio signal is not subjected to such rapid changes in the signal envelope. For example, if a method operating according to the MPEG-AAC standard (ISO / IEC 13818-7 MPEG-2 Advanced Audio Coding) is considered, a normal block of coded audio data has 1024 spectral values. With the method of spectral value-based prediction, 1024 predictors working in parallel are therefore required in the decoder in order to, for example, B. in the event of a complete block failure ("frame loss") to be able to predict all spectral values.

A disadvantage of this method is the relatively high computing effort, which currently makes real-time decoding of a received multimedia or audio data signal impossible.

Another major disadvantage of this method is caused by the transformation algorithm used, the modified discrete cosine transformation (MDCT). It is well known that the MDCT algorithm does not provide an ideal Fourier spectrum, but rather a "spectrum" that differs from an ideal Fourier spectrum. Studies have shown that, for. B. a sine time function, which has a Fourier spectrum that has a single spectral line at the frequency of the sine function, has an MDCT "spectrum" that has a dominant spectral coefficient at the frequency of the sine function, but which also has additional spectral coefficients at other frequency values. In addition, the height of an MDCT "spectrum" of a sine function is not the same from block to block, but varies from block to block. Another fact is that the MDCT transformation is not strictly energy conserving. It can thus be stated that the MDCT transformation works exactly together with an inverse MDCT transformation, but that the MDCT spectrum has significant differences from a Fourier spectrum. A prediction of MDCT spectral coefficients by spectral value has therefore proven to be inadequate if high-quality requirements are made.

Another disadvantage of the spectral value-based prediction, in particular in connection with modern audio coding methods, is that modern audio coding methods use different window lengths or window shapes. In order to prevent the quantization noise introduced by the quantization of the MDCT spectral coefficients from "shifting" over a long block in the event of rapid changes in the audio signal to be coded, ie in the case of transients or stops. lubricates ", that is, so-called pre-echoes occur, modern transformation coders use transients in the case of transient audio signals, that is to say audio signals with stops, to increase the temporal resolution at the expense of the frequency resolution. However, this leads to the fact that with a spectral-value-based prediction both the window length is constant as well as window shape (there are also transition windows to initiate windowing from short to long blocks and vice versa), which also contributes to a complication of the spectral value-based prediction and would have a sensitive influence on the computing efficiency.

DE 40 34 017 AI relates to a method for detecting errors in the transmission of frequency-coded digital signals. In this case, an error function is formed from frequency coefficients of past and possibly future blocks, on the basis of which the occurrence of an error is determined. An incorrect frequency coefficient is no longer used for the evaluation of subsequent blocks.

DE 197 35 675 AI discloses a method for concealing errors in an audio data stream. For this, the spectral energy of a subset of intact audio data is calculated. After forming a template for replacement data based on the spectral energy calculated for the subgroup of the intact audio data, replacement data for faulty or non-existent audio data corresponding to the subgroup are generated on the basis of the template.

The object of the present invention is to provide a precise and flexible error concealment for audio signals, which can be implemented with limited computing effort.

This object is achieved by a method for concealing an error according to claim 1 and a device for Disguising an error according to claim 12 solved.

Another object of the present invention is to provide error-proof and flexible decoding of audio signals.

This object is achieved by a method for decoding a coded audio signal according to claim 10 and by a device for decoding a coded audio signal according to claim 13.

The present invention is based on the knowledge that the disadvantages of the spectral-value-based prediction, which consist in the dependency on the transformation algorithm used and in the dependency on window shape and block length, can be avoided by using a prediction for error concealment which, in the "quasi "Time range works. For this purpose, a set of spectral values, which preferably corresponds to a long block or a number of short blocks, is divided into subbands. A subband of the current set of spectral coefficients can then be back-transformed to obtain a time signal that corresponds to the spectral coefficients of the subband. In order to generate estimates for a subsequent set of spectral coefficients, a prediction is carried out on the basis of the time signal of this subband.

It should be noted that this prediction takes place in the quasi-time domain, since the temporal signal on the basis of which the prediction is carried out is only the time signal of a subband of the coded audio signal and not the time signal of the entire spectrum of the audio signal. The time signal generated by prediction is subjected to a forward transformation in order to obtain estimated, ie predicted spectral coefficients for the subband of the following set of spectral coefficients. Now it is found that in the following set of spectral coefficients If one or more incorrect spectral coefficients are present, the incorrect spectral coefficients can be replaced by the estimated, ie predicted, spectral coefficients.

In contrast to pure spectral value-based prediction, the method according to the invention for concealing errors requires less computation effort, since, because of the grouping of spectral coefficients, predictions only have to be carried out for each subband and no longer for each spectral coefficient. In addition, the method according to the invention provides a high degree of flexibility, since the properties of the signals to be processed can be taken into account.

The noise substitution according to the present invention works particularly well for tonal signals. However, it has been found that tonal signal components tend to occur in the lower frequency range of the spectrum of an audio signal, while the higher frequency signal components tend to be non-stationary, i. H. are intoxicated. "Noisy signal components" in the sense of the present description are signal components that are not very stationary. However, these noisy signal components do not necessarily have to represent noise in the classic sense, but only rapidly changing useful signals.

In order to further reduce the computing effort, the present invention therefore makes it possible to subject only lower-frequency signal components to a prediction, while higher-frequency signal components are not processed at all. In other words, it is possible to subject only the lower subband (s) to a backward transformation, a prediction and a forward transformation.

This feature of the present invention compares to a complete transformation of the whole Audio signal in the time domain and a prediction of the entire temporal audio signal from block to block using a so-called "long-term" predictor represent a significant advantage, since according to the invention the advantages of prediction in the time domain are combined with the advantages of spectral decomposition. Only the spectral decomposition enables properties of the audio signal, which are dependent on the frequency, to be taken into account. The number of subbands that is generated when dividing the set of spectral coefficients can be selected as desired. If only two subbands are selected, there is already the advantage of considering the tonality in the lower frequency range of the audio signal. If, on the other hand, a very large number of subbands are selected, the predictor in the quasi-time range will have a relatively short length such that its delay does not become too great. Since the individual subbands are preferably processed in parallel, many parallel predictor circuits would be necessary in an embodiment of the present invention using a hard-wired integrated circuit.

If the present invention is used in connection with a transformation encoder which uses different block lengths, there is the advantage that the predictor itself is independent of block length ("frame length") and window shape ("window shape"). In addition, the back-transformation eliminates the dependency on the transformation algorithm itself, which was carried out above with regard to the MDCT. Furthermore, the concept according to the invention for error concealment provides estimated spectral coefficients which are in phase due to the backward transformation, the prediction in the time domain and the forward transformation, ie there are no phase jumps in the time signal due to a predicted spectral coefficient compared to a time signal of a preceding intact set of spectral coefficients. Tonal signals can thus be substituted so well for incorrect or missing signal components that a common listener in the vast majority of cases will not even notice that an error has occurred.

Finally, the method according to the invention is particularly suitable for a combination with an error concealment technique, which is described in DE 197 35 675 AI, which is suitable for the substitution of noisy signal components. If tonal signal components of a missing block are obscured by the method according to the invention, and if noisy signal components are combined by the aforementioned known method, which is based on an energy similarity between substituted data and intact data, completely failed blocks can be obscured almost inaudibly by a normal listener .

Preferred embodiments of the present invention are explained in detail below with reference to the accompanying drawings. Show it:

1 shows a decoder which has an error concealment device according to the invention;

FIG. 2 is a more detailed block diagram of the error concealment device of FIG. 1;

FIG. 3 shows a more detailed block diagram of the error concealment device from FIG. 1, which also has noise substitution and operates based on the prediction gain;

4 shows a flow diagram for the method according to the invention for error concealment;

5 is a detailed block diagram of a preferred embodiment of the error concealer for an MPEG-2 AAC decoder;

Fig. 6 is a detailed block diagram of the predictor of Fig. 5; and

Fig. 7 is a schematic representation of the block structure according to the AAC standard.

Fig. 1 shows a block diagram of a decoder according to a preferred embodiment of the present invention. The decoder block diagram shown in FIG. 1 basically corresponds to the MPEG-2 AAC decoder as defined in the MPEG-2 AAC 13818-7 standard. The encoded audio signal first goes into a bitstream demultiplexer 100 to separate spectral data and side information. The Huffman-coded spectral coefficients are then fed into a Huffman decoder 200 in order to obtain quantized spectral values from the Huffman code words. The quantized spectral values are then fed into an inverse quantizer 300 and then multiplied by scale factor band by corresponding scale factors. The encoder according to the invention can have several additional functionalities following the inverse quantizer 300, such as, for. B. a middle / side level, a predictor level, a TNS level etc. as defined in the standard.

According to a preferred embodiment of the present invention, the decoder immediately before a synthesis filter bank 400 comprises an error concealer 500 which operates according to the invention and ensures that the effects of transmission errors in the encoded audio signal which is fed into the bitstream demultiplexer 100 are alleviated or can be made completely inaudible. In other words, the error concealer 500 causes transmission errors to be concealed, i. H. that they are not or only slightly audible in a temporal audio signal at the output of the synthesis filter bank.

FIG. 2 shows a general block diagram of the error concealer 500. The latter includes a backward transformation device 502, a device 504 for generating estimated values and a device 506 for forward transformation. Both the reverse transformation device 502 and the forward transformation device 506 can be controlled via a block type line 508, depending on the block type that is currently present. The error encryption device 500 further comprises a parallel branch in order to direct the input spectral coefficients directly bypassing the backward transformation device 502, the device for generating estimated values 504 and the forward transformation device 506 from the input to the output. This parallel branch comprises a time delay stage 510 in order to ensure that the estimated spectral coefficients for a subsequent block behind the forward transformation device 506 are present at an error selection device 512 at the same time as "real", possibly incorrect spectral coefficients for the following block, in order to possibly incorrect spectral coefficients to be able to replace the real spectral coefficients for the following block with estimated spectral coefficients for the following block. This spectral value-based replacement is represented by a switch symbol 512 in FIG. 2. It should be noted that the error replacement device 512 can operate either spectrally or block-wise or block-wise. Depending on the requirement, it can also work sub-band. The following set of spectral coefficients is then available at the output of the error replacement device 512, in which spectral coefficients which were originally faulty have been replaced by estimated spectral coefficients, ie in which errors are masked.

At this point, it should be noted that the block diagram shown in FIG. 2 represents only part of the error concealment device 500. However, this representation was chosen for reasons of clarity. As explained in more detail in FIG. 5 using a preferred exemplary embodiment of the present invention will be, the circuit shown in Fig. 2 is preceded by a device for subdivision into subbands. Analogously, the error replacement device 512 is followed by a device for undoing the division into subbands in such a way that the filter bank 400 (FIG. 1) obtains a "normal" set of spectral coefficients without noticing anything of the previous error concealment. Error concealer 500 (FIG. 1) thus comprises a plurality of circuits described with reference to FIG. 2, one circuit for each subband. The parallel circuits are connected on the input side by the device for dividing and on the output side by the device for undoing the division, as will be explained in detail later.

It has already been pointed out earlier that modern transformation coders use short windows to increase the temporal resolution in the case of transients in an audio signal to be coded. It is common for the number of temporal samples or the number of spectral coefficients in a long window or block to be an integer multiple of the number of temporal samples or spectral coefficients in a short window or block. An advantageous effect of the present invention is that the means 504 for generating estimated values can work independently of the transformation used, the block length used or the window type used. Therefore, both the reverse transformation device 502 and the forward transformation device 506 are controlled in a block-type-dependent manner in order to always supply or remove the same number of temporal samples from the device 504 for generating estimated values.

To further illustrate this property, reference is made below to Fig. 7 to illustrate the situation for MPEG-2 AAC. FIG. 7 comprises a time axis 700, with respect to which the extension of a long block 702 is shown

ll represents is. A long block contains 2048 samples, resulting in 1024 spectral coefficients when 50% window overlap is used, as is known. Background information on the modified discrete cosine transform (MDCT) used and the window overlap can be found in the standard already cited. 7 also shows eight short blocks 704, each of which has 256 samples, in order to again give 128 spectral coefficients due to the 50% overlap. For reasons of clarity, the overlap of the short blocks and the overlap of the long block with a preceding long block or with a preceding or a subsequent start or stop window were not shown in FIG. 7. In any event, it can be seen from Fig. 7 that the number of spectral coefficients of a long block is eight times the number of spectral coefficients of a short block. In other words, a long block comprises the same duration of the audio signal as eight short blocks.

As shown in FIG. 2, the backward transformation device 502 is controlled via the block type line 508 in such a way that it carries out eight successively backward transformations of the spectral coefficients in corresponding subbands of short blocks and simply strings the quasi-time signals obtained in series around the device 504 for generating estimated values with a time signal of a certain length. Analogously to this, the forward transformation device 506 will again carry out eight successive forward transformations, one after the other with the values which are serially output by the device 504 for generating estimated values. Thus, this "duty cycle" requires that the same number of spectral coefficients be output in the case of short blocks as in the case of long blocks. The spectral coefficients that are output by the error concealment device 500 in a “work cycle” are referred to in the sense of the present invention as Denoted set of estimated spectral coefficients. For practicality reasons, the number of spectral coefficients in a set corresponds to the number of spectral coefficients in a long block and the number of spectral coefficients from eight short blocks. Of course, any other ratio between long and short block can be used, for example 2, 4 or 16. Usually the situation will be such that the number of spectral coefficients of a long block is divisible by the number of spectral coefficients of a short block. However, should this not be the case for some reason, the number of a set of spectral coefficients would correspond to the smallest common multiple of long and short blocks, such that independence of the block type is achieved at the predictor level, ie in the device 504 for generating estimated values becomes.

3, which represents a preferred development of the error concealment device from FIG. 2, is discussed below. In particular, the error concealment device is expanded by a noise substitution device 514 which, depending on a prediction gain signal 516, can be connected to the error replacement device via a noise substitution switch 518 instead of the forward transformation device 506. The noise reduction device 514 works according to the method described in DE 197 35 675 AI in order to approximate noisy signal components in the audio signal. Since the spectral components are noisy, the phase of the spectral coefficients is no longer taken into account, but only the energy of several spectral coefficients in a subgroup. Depending on the energy in a subset of the last available intact audio data, the noise substitution device 514 generates a corresponding subset of spectral coefficients, the energy in the subset of the generated spectral coefficients being the energy of the corresponding subset of the preceding spectral coefficients corresponds to or is derived from the same. However, the phases of the spectral coefficients generated during noise substitution are randomly determined.

The noise substitution switch 518 is controlled by a prediction gain signal 516. In general, the prediction gain relates to the ratio of the output signal of the device 504 for generating estimated values to the input signal. If it is found that the output signal differs relatively little from the input signal in a subband, it can be assumed that the audio signal in this subband is relatively stationary, i. H. tonal, is. If, on the other hand, the output signal of the predictor differs greatly from the input signal, it can be assumed that the signal is unsteady; H. atonal or intoxicating. In this case, noise replacement will give better results than prediction because noisy signals per se cannot be reliably predicted. For example, the noise reduction switch 518 could be controlled to connect the forward transformer 506 to the error replacement 512 if the prediction gain exceeds a certain threshold, or to connect the noise replacement 514 to the error replacement 512 when the prediction gain falls below this threshold in order to optimally combine both substitution methods.

The method of noise substitution according to the invention is discussed in more detail below with reference to FIG. 4. First, a current set of spectral coefficients is received (10). For the sake of clarity, FIG. 4 assumes that the current set of spectral coefficients only has intact spectral coefficients or has already been subjected to an error concealment method according to FIG. 2 or 3. The current set of spectral coefficients is processed by the filter bank 400 (FIG. 1) and output to a loudspeaker, for example (12). On the other hand, the current set of spectral coefficients is used to predict, ie to estimate or predict, a subsequent set of spectral coefficients. For this purpose, the current set of spectral coefficients is subdivided into subbands (14). In the case of a long block, the division into subbands takes place in such a way that only one subband with a corresponding frequency range is generated per block. In the case of short blocks, the current set of spectral coefficients will comprise a plurality of complete spectra in time. Corresponding subbands are then generated in step 14 for each complete spectrum, ie a plurality of subbands per set of spectral coefficients.

After subdivision into subbands, a reverse transformation is carried out for each subband (16). In the case of long blocks, i.e. H. the number of spectral coefficients of a block corresponds to the number of spectral coefficients of a set, a single inverse transformation per subband is carried out before proceeding to prediction 18. In the case of short blocks, several back-transformations are carried out in accordance with the subbands of each “short” spectrum before a prediction 18 is then carried out for all subbands together.

The prediction 18 takes place in the quasi-time domain, ie for each subband "time" signal, in order to obtain an estimated subband time signal for the following sentence. This estimated quasi-time signal is then again subjected to a forward transformation 20, the forward transformation being carried out once only for a long block or N times for short blocks, where N is the ratio between the number of spectral coefficients of a long block to the number of Spectral coefficient of a short block. After step 20, estimated spectral coefficients are available for each subband. In a step 22, the subdivision introduced in step 14 is canceled, such that after step 22 there is a following set of spectral coefficients.

In step 24, the decoder receives the following set of spectral coefficients. This set is subjected to an error detection 26 to determine whether one spectral coefficient, several spectral coefficients or even all spectral coefficients of the following set are incorrect. The error detection takes place in a manner known to those skilled in the art, for example the CRC checksum (CRC = Cyclic Redundancy Code) being checked over a frame. If it is determined that a checksum calculated based on the transmitted data is different from a checksum transmitted using the data, the estimated spectral coefficients generated by step 22 can be used instead of the spectral coefficients of the defective block. The faulty spectral coefficients are thus exchanged for the estimated spectral coefficients (28). Finally, the error-concealed spectral coefficients of the following set are processed in order to be able to output the temporal samples (30).

The flow chart of FIG. 4 effectively represents a snapshot of processing from one set of spectral coefficients to a next set of spectral coefficients. Of course, if the flow chart of FIG. 4 is implemented, only a single filter bank 400 (FIG. 1) is used, to perform steps 12 and 30. In the same way, of course, only a single device for receiving the current set of spectral coefficients or for receiving the following set of spectral coefficients will be required in order to implement steps 10 and 24. The temporal synchronicity for steps 10 and 24 is determined in a device which implements the method according to the invention is ensured by the time delay stage 510 in the parallel branch (FIG. 2).

FIG. 5 shows a more detailed illustration of the general block diagram of FIG. 2 using the example of an MPEG-2 AAC transformation encoder which has the error concealment device 500 according to the invention. As has already been illustrated with reference to FIG. 2, the error concealment device 500 (FIG. 1) comprises a device 520 for dividing the blocks of spectral coefficients into preferably 32 subbands. In the case of long blocks, each subband has 32 spectral coefficients. Since the subbands of the short blocks cover the same frequency ranges, each subband has 4 spectral coefficients in the case of short blocks. A division of an entire spectrum into subbands of the same size is preferred for reasons of simplicity, but a subdivision into unequal subbands would also be possible, for example based on the psychoacoustic frequency groups. Each subband is then subjected to an inverse modified discrete cosine transformation. In the case of long blocks, the IMDCT runs once and receives 32 input values. In the case of short blocks, eight consecutive IMDCTs are carried out, each with 4 of the spectral coefficients, such that 32 quasi-time samples again result at the output. These are then fed to the predictor 504, which in turn generates 32 estimated quasi-time samples that are transformed using the MDCT 506. In the case of long blocks, a single MDCT with 32 temporal values is carried out, while in the case of short blocks eight temporally successive MDCTs with 4 samples each are carried out. Although only one branch for the zeroth subband is shown in FIG. 5, it should be noted that if the subbands are all of the same length, there is an identical branch for each subband. If the subbands have different lengths, the orders of the IMDCT and MDCT are adapted to them. For one Practical implementation offers parallel processing. However, serial processing of the subbands in succession is of course also possible if appropriate storage capacities are provided. The output values of the MDCT 506 for each subband are fed into a device 522 for undoing the division, ie into an inverse division device, in order to output an estimated set of spectral values in the preferred embodiment at the AAC-MDCT level.

6 shows a further detailed illustration of the predictor 504. In the preferred exemplary embodiment, the heart of the predictor 504 is a so-called LMSL predictor 504a, which has a length n = 32. Details of the LMSL predictor can be found in the book "Adaptive Signal Processing", Bernard Widrow, Samuel Stearns, Prentice-Hall, 1995, p. 99 ff. A time delay stage 504b is connected upstream of the LMSL predictor 504a. The predictor 504 further comprises a parallel-serial converter 504c on the input side and a serial-parallel converter 504d on the output side. It also has a prediction gain calculator 504e which compares the output of predictor 504a with the input signal to determine whether a steady-state signal or an unsteady-state signal has been processed. On the output side, the prediction gain calculation device 504e supplies the prediction gain signal 516, which is used to control the switch 518 (FIG. 3) in order to use either predicted spectral coefficients or spectral coefficients obtained by noise substitution for error concealment. Predictor 504, in its implementation as an LMSL predictor, also includes two switches 504f and 504g, which have two switch positions. Switch position "1" relates to the case that spectral coefficients of the following block are error-free, while switch position "2" relates to the case that spectral coefficients of the following block are incorrect. 6 shows the case in which the spectral coefficients are faulty. In this case, a reference signal with a value of 0 is fed into the predictor at switch 504g instead of the input signal. In the case of error-free spectral coefficients (switch position "1" of switch 504g), on the other hand, the output values of the parallel-serial converter are fed into the LMSL predictor from below.

If the error concealment method according to the invention is used in connection with an AAC encoder, it is preferred to use the corresponding transformation algorithms (MDCT or IMDCT) for all forward and backward transformations. For error concealment, however, it is not necessary that the same transformation method that was used when coding the audio signal is used for the backward or forward transformation in order to form the spectral coefficients.

Because of the subdivision of the spectrum into subbands and because of the individual transformations for each subband, frequency-time domain transformations with a lower order than the frequency resolution are used for each subband. Special predictive values for tonal signal components in the intermediate level are thus generated by means of the predictor. Time-frequency domain transformations of a lower order than the original frequency resolution are used as the forward transformation / synthesis, the same order being chosen as for the frequency-time domain transformation used. The error concealment according to the invention thus provides flexibility, on the one hand, by utilizing prior knowledge of spectral properties of audio signals and, on the other hand, independence from the transformation method used in the encoder by generating the estimated values in the quasi-time signal, ie not at the spectral coefficient level. If the prediction is used in the quasi-time domain to replace tonal signal components, and if the noise substitution is used for noisy spectral components, then: Errors for a large class of audio signals are masked in such a way that almost no audible interference occurs, even when the block is completely lost. Experiments have shown that with test signals that are not too critical, normal listeners, ie untrained test listeners, only heard irregularities in the audio signal in one out of 10 cases, even if the block was completely lost.

Claims

claims

1. A method of concealing an error in an encoded audio signal, the encoded audio signal having successive sets of spectral coefficients, a set of spectral coefficients being a spectral representation for a set of audio samples, comprising the following steps:

Subdividing (14) a current set of spectral coefficients into at least two subbands with different frequency ranges, a subband of the at least two subbands having at least two spectral coefficients;

Backward transforming (16) the spectral coefficients of the one subband to obtain a temporal representation of the at least two spectral coefficients of the one subband;

Performing (18) a prediction using the temporal representation of the at least two spectral coefficients of the one subband in order to obtain an estimated temporal representation for a subband of a sentence following the current sentence, the subband of the following sentence having the same frequency range as the subband of the current sentence includes;

Forward transforming (20) the estimated temporal representation to obtain at least two estimated spectral coefficients for the subband of the following set;

Determining (26) whether a spectral coefficient of the subband of the following set is defective; and

in response to the step of determining if there is an incorrect spectral coefficient (28) an estimated spectral coefficient instead of an incorrect spectral coefficient of the following set to obscure the incorrect spectral coefficient of the following set.

2. The method of claim 1, wherein the one subband that is processed in the step of backward transforming (16) has low-frequency spectral coefficients, while the other of the at least two sub-bands has higher-frequency spectral coefficients.

The method of claim 1 or 2, wherein the number of spectral coefficients in a set of spectral coefficients is equal to the number of spectral coefficients in a block (702) of first length and N times the spectral coefficients in a block (704) of second length, and in which N blocks (704) of the second length occur in succession, wherein

the dividing step (14) is carried out such that the subbands of the blocks of the first length comprise the same frequency ranges as the subbands of the blocks of the second length, such that the number of spectral coefficients of a subband of the block of the first length is equal is N times the number of spectral coefficients of the corresponding subband of the block with the second length;

the step of back-transforming (16) is performed sequentially for each corresponding subband of the N blocks of the second length to obtain a temporal representation of the spectral coefficients of corresponding subbands of the N blocks of the second length;

the step of performing (18) a prediction with the temporal representation of all corresponding ones

Sub-bands of the N blocks with the second length are performed; and the step of forward transforming (20) is carried out successively for each corresponding subband of the N blocks of the second length.

4. The method according to any one of the preceding claims, in which in the step of dividing (14) a plurality of subbands is generated such that all subbands together form the spectral representation of the encoded audio signal in a set of spectral coefficients.

5. The method as claimed in one of the preceding claims, in which, after the step of determining (26) whether a spectral coefficient of a subband is faulty, the following step is carried out:

Determining (504e) whether the spectral coefficient represents a tonal portion of the uncoded audio signal based on a comparison of the spectral coefficient with the corresponding estimated spectral coefficient;

if the spectral coefficient is determined to be tonal, using the estimated spectral coefficient, and if the spectral coefficient is determined to be non-tonal, perform noise replacement (514) for an incorrect spectral coefficient of the following set.

6. The method according to any one of claims 3 to 5, wherein the spectral coefficients are MDCT coefficients, the length of a set corresponds to the length of a long block and is 1024 MDCT coefficients, while a set of spectral coefficients comprises eight blocks of short length, of which each has 128 MDCT coefficients, and in which 32 subbands, 32 MDCT coefficients for a long block or 4 MDCT coefficients for a short block, are formed in the dividing step.

7. The method according to any one of the preceding claims, wherein in the step of performing (18) the prediction an adaptive feedback predictor (504a) is used, which is preferably an LMSL predictor.

8. The method as claimed in one of the preceding claims, in which the transformation algorithm on which the encoded audio signal is based is the same transformation algorithm which is used in the step of reverse transformation (16) and in the step of forward transformation (20).

The method of any preceding claim, wherein the transform algorithm used in the reverse transform step (16) is exactly inverse to the transform algorithm used in the forward transform step (20).

10. A method of decoding an encoded audio signal having successive sets of spectral coefficients, a set of spectral coefficients being a spectral representation for a set of audio samples:

Receiving (10) a current set of spectral coefficients;

Backward transforming (16) the spectral coefficients of the one subband to obtain a temporal representation of the at least two spectral coefficients of the one subband; Performing (18) a prediction using the temporal representation of the at least two spectral coefficients of the one subband in order to obtain an estimated temporal representation for a subband of a sentence following the current sentence, the subband of the following sentence having the same frequency range as the subband of the current sentence includes;

Receiving (24) a subsequent set of spectral coefficients and dividing the following set into subbands covering the same frequency range as the subbands of the current set;

Determining (26) whether a spectral coefficient of the subband of the following set is defective;

in response to the step of determining if there is an erroneous spectral coefficient, using (28) an estimated spectral coefficient instead of an erroneous spectral coefficient of the following set to obscure the erroneous spectral coefficient of the following set; and

Processing (30) the following set using the estimated spectral coefficient used in the using (28) step to obtain the following set of audio samples.

11. The method according to claim 10, wherein the spectral coefficients of the encoded audio signal are entropy-encoded and quantized, which follows the step of receiving (10) the current sentence or the following sentence. steps:

Undo (200) the entropy coding to obtain quantized spectral coefficients;

Requanting (300) the quantized spectral coefficients to obtain requantized spectral coefficients;

and wherein the processing step comprises the following step:

Reverse transforming (400) the following set using a transform algorithm that is inverse to the transform algorithm used to transform to obtain the spectral coefficients of the encoded audio signal.

12. An apparatus for concealing an error in a coded audio signal, the coded audio signal having successive sets of spectral coefficients, a set of spectral coefficients being a spectral representation for a set of audio samples, with the following features:

means (520) for dividing (14) a current set of spectral coefficients into at least two subbands with different frequency ranges, a subband of the at least two subbands having at least two spectral coefficients;

means (502) for reverse transforming (16) the spectral coefficients of the one subband to obtain a temporal representation of the at least two spectral coefficients of the one subband;

means (504) for performing (18) a prediction using the temporal representation of the at least two spectral coefficients of the one subband to obtain an estimated temporal representation for a subband of a sentence following the current sentence, the subband of the following sentence covering the same frequency range as the subband of the current sentence;

means (506) for forward transforming (20) the estimated temporal representation to obtain at least two estimated spectral coefficients for the subband of the following set;

means for determining (26) whether a spectral coefficient of the subband of the following set is defective; and

means (512) for using (28) an estimated spectral coefficient instead of an incorrect spectral coefficient of the following set to obscure the incorrect spectral coefficient of the following set.

13. Apparatus for decoding an encoded audio signal having successive sets of spectral coefficients, a set of spectral coefficients being a spectral representation for a set of audio samples:

means (100) for receiving (10) a current set of spectral coefficients;

means (502) for reverse transforming (16) the spectral coefficients of the one subband in order to obtain a temporal representation of the at least two spectral coefficients of the one subband;

means (504) for performing (18) a prediction using the temporal representation of the at least two spectral coefficients of the one subband in order to obtain an estimated temporal representation for a subband of a sentence following the current sentence, the subband of the following sentence same frequency range as the subband of the current set;

means (502, 510) for receiving (24) a following set of spectral coefficients and dividing the following set into subbands covering the same frequency range as the subbands of the current set;

means for determining (26) whether a spectral coefficient of the subband of the following set is defective;

means (512) for using (28) an estimated spectral coefficient instead of an incorrect spectral coefficient of the following set to obscure the incorrect spectral coefficient of the following set; and

means for processing (30) the following set using the estimated spectral coefficient to obtain the following set of audio samples.