WO2000039933A1 - Procede et dispositifs pour le codage ou le decodage d'un signal audio ou d'un train de bits - Google Patents

Procede et dispositifs pour le codage ou le decodage d'un signal audio ou d'un train de bits Download PDF

Info

Publication number
WO2000039933A1
WO2000039933A1 PCT/EP1998/008475 EP9808475W WO0039933A1 WO 2000039933 A1 WO2000039933 A1 WO 2000039933A1 EP 9808475 W EP9808475 W EP 9808475W WO 0039933 A1 WO0039933 A1 WO 0039933A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
spectral
code words
spectral values
priority
Prior art date
Application number
PCT/EP1998/008475
Other languages
German (de)
English (en)
Inventor
Ralph Sperschneider
Martin Dietz
Andreas Ehret
Karlheinz Brandenburg
Heinz GERHÄUSER
Ali Nowbakht-Irani
Pierre Lauber
Roland Bitto
Original Assignee
Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to DE19747119A priority Critical patent/DE19747119C2/de
Priority claimed from DE19840853A external-priority patent/DE19840853B4/de
Priority to DE19840853A priority patent/DE19840853B4/de
Priority to EP98119235A priority patent/EP0911981B1/fr
Priority to CA002356869A priority patent/CA2356869C/fr
Priority to JP2000591732A priority patent/JP3580777B2/ja
Application filed by Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. filed Critical Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Priority to AU21636/99A priority patent/AU754877B2/en
Priority to US09/869,401 priority patent/US6975254B1/en
Priority to KR10-2001-7008203A priority patent/KR100391935B1/ko
Priority to PCT/EP1998/008475 priority patent/WO2000039933A1/fr
Publication of WO2000039933A1 publication Critical patent/WO2000039933A1/fr
Priority to JP2004099418A priority patent/JP3902642B2/ja
Priority to JP2004099417A priority patent/JP4168000B2/ja
Priority to JP2004099419A priority patent/JP3978194B2/ja

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/66Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signals; for improving efficiency of transmission
    • H04B1/662Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signals; for improving efficiency of transmission using a time/frequency relationship, e.g. time compression or expansion
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/35Unequal or adaptive error protection, e.g. by providing a different level of protection according to significance of source information or by adapting the coding according to the change of transmission channel characteristics
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code

Definitions

  • the present invention relates to methods and devices for coding or decoding an audio signal or a bit stream, which can carry out error-resistant entropy coding or decoding and in particular error-resistant Huffman coding or decoding.
  • Modern audio coding methods or decoding methods which operate for example according to the MPEG layer 3 standard, are able to compress the data rate of audio signals by a factor of 12, for example, without noticeably deteriorating the quality thereof.
  • an audio signal is sampled, whereby a sequence of discrete-time samples is obtained.
  • this sequence of discrete-time samples is windowed using suitable window functions to obtain windowed blocks of temporal samples.
  • a block of time-windowed samples is then transformed into the frequency domain by means of a filter bank, a modified discrete cosine transformation (MDCT) or another suitable device in order to obtain spectral values which collectively contain the audio signal, ie. H.
  • MDCT modified discrete cosine transformation
  • the interference introduced by the quantization ie the quantization noise
  • the spectral values are therefore divided into so-called scale factor bands, which should correspond to the frequency groups of the human ear.
  • Spectral values in a scale factor group are multiplied by a scale factor in order to scale spectral values of a scale factor band as a whole.
  • the scale factor bands scaled by the scale factor are then quantized, whereupon quantized spectral values arise.
  • Huffman coding is usually used for entropy coding.
  • Huffman coding is a coding of variable length, ie the length of the code word for a value to be coded depends on its probability of occurrence. Logically, one assigns the shortest code, ie the shortest code word, to the most likely character, so that with Huffman coding very good redundancy reduction can be achieved.
  • An example of a well-known coding with variable length is the Morse alphabet.
  • Huffman codes are used to code the quantized spectral values.
  • a modern audio coder which works for example according to the MPEG-2 AAC standard, uses various Huffman code tables for coding the quantized spectral values, which are assigned to the spectrum in sections according to certain criteria. 2 or 4 spectral values are always coded together in one code word.
  • a difference between the MPEG-2 AAC method and the MPEG-Layer 3 method is that different scale factor bands, i. H.
  • Different spectral values can be grouped into any number of spectral sections or "sections".
  • a spectral section or "section” comprises at least four spectral values, but preferably more than four spectral values.
  • the entire frequency range of the spectral values is therefore divided into adjacent sections, with one section representing a frequency band, such that all sections together comprise the entire frequency range which is covered by the spectral values after the transformation thereof.
  • a section is now assigned a so-called Huffman table from a plurality of such tables, as is the case with the MPEG Layer 3 method, in order to achieve maximum redundancy reduction.
  • the Huffman code words for the spectral values are now in ascending frequency order in the bit stream of the AAC method, which usually has 1024 spectral values.
  • the information about the table used in each frequency section is transmitted in the page information. This situation is shown in Fig. 2.
  • FIG. 2 illustrates the exemplary case in which the bit Stream includes 10 Huffman code words. If a code word is always formed from a spectral value, 10 spectral values can be coded here. Usually, however, 2 or 4 spectral values are always coded together by one code word, which is why FIG. 2 represents a part of the coded bit stream which comprises 20 or 40 spectral values. In the case where each Huffman code word comprises 2 spectral values, the code word labeled No. 1 represents the first 2 spectral values, the length of code word No. 1 being relatively small, which means that the values of the first two Spectral values, ie the two lowest frequency coefficients, occur relatively frequently.
  • the code word with the number 2 however, has a relatively large length, which means that the amounts of the 3rd and 4th spectral coefficients are relatively rare in the encoded audio signal, which is why they are encoded with a relatively large amount of bits. From Fig. 2 it can also be seen that the code words with the numbers 3, 4 and 5, which represent the spectral coefficients 5 and 6, or 7 and 8 or 9 and 10, also occur relatively frequently, since the length of the individual Code words is relatively low. The same applies to the code words with the numbers 6-10.
  • the Huffman code words for the coded spectral values are arranged linearly increasing in frequency in the bit stream when considering a bit stream which is generated by a known coding device.
  • a major disadvantage of Huffman codes in the case of faulty channels is error propagation. For example, assume that code word # 2 in Figure 2 is disturbed. The length of this incorrect code word No. 2 is then also changed with a certain, not low probability. It is therefore different from the correct length. In the example of FIG. 2, if the length of code word No. 2 has been changed by a fault, it is no longer possible for an encoder to determine the beginnings of code words 3 - 10, ie almost the entire provided audio signal. All other code words can therefore no longer be decoded correctly after the disturbed code word, since it is not known where these code words begin and because an incorrect starting point was chosen due to the error.
  • European patent no. 0612156 proposes, as a solution to the problem of error propagation, to arrange part of the variable-length code words in a raster and to distribute the remaining code words into the remaining gaps so that the beginning of one without complete decoding or in the event of incorrect transmission Code words can be found more easily.
  • the known method provides a partial remedy for the error propagation by rearranging the code words.
  • a fixed place in the bit stream is agreed for some code words, while the remaining gaps are available for the remaining code words. This does not cost any additional bits, but prevents error propagation among the reordered code words in the event of an error.
  • the decisive parameter for the efficiency of the known method is how the grid is determined in practical use, i. H. how many halftone dots have to be used, the grid spacing of the halftone dots, etc.
  • the European patent 0612156 provides, in addition to the general advice to use a grid to contain the propagation of errors, no further information on how the grid should be designed efficiently, on the one hand to enable error-proof coding and, on the other hand, efficient coding.
  • the object of the present invention is to provide a concept for the robust, yet efficient coding and decoding of an audio signal or a bit stream. This object is achieved by a method for coding an audio signal according to claim 1 or 9, by a device for coding an audio signal according to claim 21 or 22, by a method for decoding a bit stream according to claim 23 or 24 and by a device for decoding a bit stream according to Claim 25 or 26 solved.
  • the present invention is based on the knowledge that the grid already proposed must be designed or occupied in a certain way in order to achieve efficient coding or decoding in addition to error-proof coding or decoding. It is essential here that the code words which are obtained by entropy coding in the form of a Huffman coding are inherently of a different length, since the greatest coding gain is achieved when a code word to be coded which occurs most frequently is assigned the shortest possible code word becomes. In contrast, a value to be coded that occurs relatively rarely, despite a relatively long code word that is assigned to it, leads to a statistically optimal amount of data. Code words which are obtained by Huffman coding therefore have different lengths per se.
  • priority code words are placed at the raster points such that, despite a possible error in the bit stream due to the raster, the start of the priority code words can always be reliably determined by a decoder.
  • Priority code words are to be understood as code words that are psychoacoustically important. This means that the spectral values, which are coded by so-called priority code words, contribute significantly to the auditory impression of a decoded audio signal. If, for example, the audio signal has a large proportion of speech, the priority code words could be the code words which represent rather lower spectral values, since the essential spectral information in this case occurs in the low spectral range.
  • the priority code words could be the code words which are assigned to the spectral values in the corresponding middle frequency range, since these are then the psychoacoustically significant spectral values.
  • Psychoacoustically important spectral values could also be spectral values that comprise a large amount, ie a large signal energy, in comparison to other spectral values in the spectrum.
  • psychoacoustically less significant code words which are also referred to as non-priority code words, fill the grid. They are therefore not aligned with halftone dots, but "sorted" in the still free spaces after the positioning of the priority code words on the halftone dots.
  • the priority code words associated with spectral values which are psychoacoustically important are thus arranged in a grid in such a way that the beginning of the priority code words coincides with the grid points.
  • the spectral values are grouped into spectral sections, with a different code table being assigned to each spectral section.
  • the assignment of a code table to a spectral section is based on signal statistical aspects, i. H. which code table is optimally suitable for coding a spectral section, the assignment of a code table to a spectral section is already known in the art.
  • a grid is now used which has several groups of grid points which are equidistant from one another in such a way that the spacing of the grid points of a group of grid points depends on the code table which is used for coding a spectral section. In a different spectral section, a different code table is used in order to achieve an optimal data reduction.
  • the other code Another equidistant group of raster points is in turn assigned to the table, the distance between two raster points of these other groups of raster points depending on the corresponding further code table.
  • the dependence of the distance between two halftone dots in the different groups of halftone dots can be determined in at least three different ways.
  • the maximum length of a code word in a code table is determined.
  • the distance between two halftone dots in the halftone dot group which is assigned to this code table can now be chosen to be equal to or greater than the maximum code word length in the code table, such that the longest code word of this code table has space in the raster.
  • the distance between two raster points of another group of raster points, which in turn corresponds to another code table is determined in accordance with the maximum code word length of this other code table.
  • the second alternative which is described below, can also contribute to an increase in the number of halftone dots. Due to the inherent properties of the Huffman code, code words that occur less frequently are rather longer than code words that occur more frequently. Therefore, if the grid point spacing is selected to be equal to or greater than the length of the code word with the maximum length of a table, codewords that are shorter than the grid point spacing are usually inserted into the grid. The grid point spacing can therefore also be chosen to be smaller than the length of the longest code word in a table. If a code word then occurs during coding that does not fit into the grid, the remainder that does not fit into the grid is not entered into the bit stream at another suitable point aligned with the grid.
  • an arrangement of the code words distributed in terms of frequency can be used, this method also being referred to as "scrambling". This has the advantage that so-called "burst" errors do not lead to incorrect decoding of a complete frequency band, but only produce small interferences in several different frequency ranges.
  • an arrangement instead of an arrangement of the code words increasing linearly with the frequency, an arrangement can also be used in which e.g. B. only every nth code word (e.g. every 2nd or every 3rd or every 4th, ...) is arranged in the grid. This makes it possible to cover the largest possible spectral range by means of priority code words, i. H. protect against error propagation if the number of possible raster points is less than the number of priority code words.
  • the priority code words be determined in a certain manner in order to achieve efficient operation. It is therefore preferable to say goodbye to the assumption that the psychoacoustically important code words, i. H. Priority code words that are that encode low frequency spectral values. This will happen very often, but it does not always have to be the case.
  • priority code words are encode psychoacoustically important spectral lines, which are usually spectral values with high energy. It is equally important that high energy spectral lines do not appear due to errors.
  • an indicator is used that is already given implicitly.
  • the indicator depends on the code table used.
  • the code table No. 1 includes z. B. spectral values with an absolute value from -1 to +1, while code table No. 11 can code spectral values from -8191 to +8191.
  • the higher the code table the larger the range of values allowed by it. This means that code tables with low numbers represent only relatively small values and therefore only allow relatively small errors, while code tables with higher numbers can represent relatively large values and therefore also relatively large errors.
  • the most important code table is therefore the highest code table (code table no.11 in the AAC standard), since this code table escapes with a range between -2 13 + 1 (-8191) and +2 13 - 1 (+8191 ) to- leaves .
  • short windows are used for transient signals in the AAC standard.
  • the frequency resolution is reduced in favor of a higher temporal resolution.
  • a determination of the priority code words is carried out in such a way that spectral values that are psychoacoustically significant, i. H. Spectral values at lower frequencies or spectral values from higher code tables can be placed on halftone dots.
  • a z. B. for the AAC standard carried out nesting by scale factor band is canceled.
  • Fig. 2 shows a linearly increasing with the frequency arrangement of code words according to the prior art.
  • priority code words are hatched in FIG. 2, which shows a known arrangement of code words of different lengths that increases linearly with frequency.
  • priority code words are code words No. 1 - No. 5.
  • the code words which are assigned frequency spectrally low values are priority code words if the audio signal contains a high speech component, for example, or a relatively large number of tones which are low frequency.
  • Codewords No. 6-10 in FIG. 2 relate to higher-frequency spectral values, which do contribute to the overall impression of the decoded audio signal, but which have no significant impact on the hearing impression and are therefore less important psychoacoustically.
  • D1 now shows a bit stream which has a number of halftone dots 10-18, the distance between the halftone dot 10 and the halftone dot 12 being designated as D1, while the distance between the halftone dot 14 and the halftone dot 16 18 is referred to as D2 .
  • the priority code words 1 and 2 are now aligned in a raster in order to ensure that the essential spectral components which are in the lower frequency range in the example signal shown in FIG. 2 are not subjected to error propagation during coding.
  • Non-priority code words which are not hatched in the figures, are arranged after the priority code words to fill the grid. It is not necessary that the non-priority code words fit into the grid in one piece, since the length of a Huffman code word is self-explanatory. A decoder therefore knows whether it has only read in part of a code word.
  • the second part of the 1 already applies to the second aspect of the present invention. If the grid spacing Dl were not changed to a smaller grid spacing D2, a grid with the spacing Dl in which all priority code words 1 to 5 are to be arranged would lead to such a long bit stream that there are not enough non-priority code words, so to speak. to fill in any gaps remaining in the grid. Therefore, only as many priority code words are extracted from an audio signal as can be used in the bit stream, so that essentially no vacancies remain, that is, so that the bit stream is not extended unnecessarily.
  • the width of the grid is therefore set as a function of the code table used.
  • spectral values can be grouped into spectral sections, with each spectral section then being assigned a code table that is optimal for the same, taking signal statistical aspects into account.
  • the maximum code word length in one code table usually differs from the maximum code word length in another table.
  • the spectral values caused by the Codewords 1 and 2 are associated with a first spectral section, while the spectral values represented by codewords 3-10 belong to a second spectral section.
  • the bit stream is now rasterized using two groups of raster points, the first group of raster points having raster points 10, 12 and 14, while the second group of raster points has raster points 14, 16 and 18.
  • the Huffman code table n has been assigned to the spectral section 0, while the Huffman code table m has been assigned to the spectral section 1.
  • code word 2 is the longest code word in table n that was assigned to spectral section 0.
  • the grid spacing of the 1st group of grid points is set greater or preferably equal to the maximum length of the code word in table n, that is, code word No. 2 in the example.
  • the width of the grid is set depending on the code table used. It should be noted that in this case, however, the table used in the decoder for decoding must already be known. However, this is the case since a code table number is transmitted as page information anyway for each spectral section, by means of which a decoder can identify the code table used from a predetermined set of 11 different Huffman tables in this example.
  • escape tables are used on the one hand to have relatively short code tables, but on the other hand to be able to encode relatively large values using the short code tables in conjunction with an escape table in the case of a value that exceeds the value range of a code table the code word for this spectral value specifies a predetermined value which signals the decoder that an escape table has additionally been used in the encoder.
  • a code table contains the values 0-2, a value of 3 in the code table would signal the decoder that an escape table is being used.
  • a value of the escape table is assigned to the code word with the value 3 of the "basic" code table, which together with the maximum value of the basic code table gives the corresponding spectral value.
  • the spacing of the raster points of a group is no longer set equal to the length of the longest code word in a code table, but rather to the length of the longest code word actually occurring in one Bit stream belonging to a code table.
  • the longest code word in the escape table that actually occurs with conventional audio signals is typically about 20 bits long. It is therefore possible to determine the number of halftone dots and thereby further increase the number of priority code words that can be aligned with the raster points by additionally transmitting the length of the longest code word of a block.
  • the grid length then results from the minimum of the actually occurring maximum code word length and the theoretically maximum code word length of the table currently being used. To determine the minimum, it is possible to use the actually occurring code word of each code table in an audio frame or only the longest code word of all code tables in an audio frame. This option also works for non-escape tables, ie for "basic" Huffman tables, but not nearly as efficiently as for the escape tables.
  • the transmission of the maximum length of a code word in a spectral section or block has another favorable side effect.
  • the decoder can recognize whether a longer code word is present in a bit stream that may be disturbed.
  • Long code words usually mean a high energy of the spectral values. If a very long code word is caused by a transmission error, extremely audible interference can occur.
  • the transmission of the maximum length thus enables detection of such an error in most cases and thus countermeasures, which are, for example, simply hiding this code word that is too long, or a more complicated concealment measure.
  • the flexibility would lead to essentially assigning a raster point to each code word, which of course is only possible with considerable effort.
  • the arrangement of the halftone dots i.e. the determination of the halftone dot spacings as a function of the code tables for each spectral section, however, allows a very efficient approximation to the optimal state, especially since far from all codewords are psychoacoustically important, and especially since all codewords that are less important psychoacoustically are also in the bitstream sort between the psychoacoustically important code words arranged in a grid, so to speak, so that no unused places in the bit stream are obtained.
  • the arrangement in the bit stream which increases linearly with frequency is abandoned and the code words for different spectral values are "scrambled".
  • FIG. 1 a somewhat nested linear arrangement of the code words with the frequency can be seen since the hatched priority code words are arranged in the ascending frequency direction and since the non-priority code words which are not hatched are also in ascending frequency order in the bit stream are sorted.
  • a so-called "burst" error now occur in the bit stream shown in FIG. H. a fault which leads to the damage of several successive code words, for example code words 6, 7a, 2, 3 and 7b would be destroyed simultaneously.
  • the priority code words and optionally also the non-priority code words for the spectral values are no longer arranged in ascending frequency order but "mixed" in such a way that they have a random or pseudo-random frequency arrangement.
  • a fourth aspect of the present invention instead of an arrangement of the priority code words or the non-priority code words that increases linearly with the frequency, an arrangement is possible which is only z. B. arranges every nth code word in the grid and sorts the remaining code words in between.
  • the number of halftone dots for a bit stream is limited by the total length and the selected halftone dot spacing. If, for example, a scan with low bandwidth is considered, the case may arise that the majority of the code words are psychoacoustically important code words, since the entire signal has a theoretically possible useful bandwidth of 8 kHz if a sampling rate of 16 kHz is used.
  • the methods and devices for decoding a bit stream work essentially in mirror image of the coding described.
  • a general method for decoding a bit stream which represents an encoded audio signal the encoded bit stream having code words of different lengths from a code table and a grid with equidistant raster points (10, 12, 14), the code words having priority code words which have specific spectral values represent that are psychoacoustically significant compared to other spectral values, and wherein priority code words are aligned with halftone dots, (a) the distance Dl between two adjacent halftone dots is detected. If the distance between two halftone dots is known, (b) the priority code words aligned with the halftone dots can be rearranged in the encoded bit stream, such that that a frequency linear arrangement thereof is obtained, the beginning of a priority code word coinciding with a raster point.
  • the priority code words are now again in the general frequency-linear arrangement shown in FIG. 2, with which (c) the priority code words can be decoded again with a code table to which they belong in order to obtain decoded spectral values.
  • the spacing of the raster points can be detected simply by taking from the side information of the bit stream which code table was coded with. Depending on the coding described, the distance is then, for example, the length of the longest code word in this table, which could be fixed in the encoder. If the distance is the length of the longest code word actually occurring in a part of the bit stream to which a code table is assigned, it is communicated to the decoder by means of the side information which is assigned to the bit stream, etc.
  • the decoder re-sorts the priority code words as well as the non-priority code words, e.g. B. applies a pointer to the encoded bit stream. If the raster spacing is known to the decoder, it can jump to a raster point in the case of frequency-linear arrangement of the priority code words and read in the code word beginning there. When the reading of a code word is finished, the pointer jumps to the next raster point and repeats the process described. Once all the priority code words have been read in, the non-priority code words are still in the bit stream.
  • the Non-priority code words are already arranged linearly with the frequency and can be decoded and re-transformed again without further sorting.
  • scramble information can either be transmitted as page information, or the scramble distribution is fixed a priori and thus also known to the decoder from the outset.
  • the fourth aspect There is always the option of agreeing a fixed distribution or making it variable and then communicating it to the decoder via side information.
  • the priority code words After determining a grid for a coded bit stream, by determining the grid spacing when using a single code table or the grid spacing when using multiple code tables, the priority code words must be positioned in the grid such that the start of each priority code word coincides with a grid point.
  • this positioning is carried out in such a way that code words from a sort of sort table are sorted sequentially into the quasi-empty grid. It starts with the first code word of the sort table.
  • priority code words By sorting the code words into the sort table, influence can be exerted on the priority code words, priority code words always being the code words of the sort table that have space in the grid, ie are available for the grid points.
  • code words from the sort table for which there are no more raster points there is no other option than to insert them into the free spaces of the bit stream.
  • the number of priority code words is not determined beforehand. Priority code words are written until the available memory for the coded bit stream is full, i.e. H. until no further priority code word can be written.
  • the size of the memory is determined from the total number of bits previously used for the spectral data, i. H. no additional bits are required due to the screening.
  • the memory is therefore limited by the number of code words so that the coding efficiency does not drop due to the grid arrangement. Of course, all code words could be placed on grid points to make them error-free. However, this would lead to a considerable decrease in the coding efficiency since the bits that remain free between the raster points are unused.
  • the first aspect of the present invention relates to determining the priority code words, i. H. of codewords that represent spectral values that are psychoacoustically significant compared to other spectral values.
  • a psychoacoustically important spectral line is, for example, a spectral line that contains more energy than another spectral line. In general it can be said that the more energy a spectral line contains, the more important it is. It is therefore important on the one hand that spectral lines with high energy are not disturbed, on the other hand it is just as important that spectral lines with high energy do not arise due to errors.
  • the high energy spectral lines are preferably in the lower part of the spectrum. This is true in many cases, but not in some.
  • the present invention overcomes this assumption by using an implicit indicator to measure the energy of the spectral line encoded in a codeword, or spectral lines when multiple spectral lines in one Codeword are coded to estimate.
  • This indicator is the code book or code table used, which can be a Huffman code table, for example.
  • code book or code table used, which can be a Huffman code table, for example.
  • eleven tables are used in the AAC standard. The value ranges of these tables differ significantly.
  • the maximum absolute values in Tables 1 to 11 are as follows:
  • the maximum error depends on the table. Taking into account the sign for each table, which is either explicitly present in the table or is transferred outside the table, it is twice the absolute value mentioned.
  • the priority code words are determined using the code table used, the indicator being the highest absolute value and implicitly the code table number.
  • code words are taken into account whose code table has the largest range of values. This is followed by code words whose code table has the second largest value, etc.
  • table 11 is therefore taken into account first, followed by tables 9 and 10, until finally tables 1 and 2 with the lowest priority follow.
  • Priority code words that are placed at raster points are thus the code words in the sort table for which raster points are present.
  • the second aspect of the present invention relates to the use of short sampling windows or "short windows" as opposed to long windows for transforming discrete-time samples of the audio signal into the frequency domain in order to obtain spectral values which represent the audio signal.
  • short sampling windows In the AAC standard as well as in the standard layer 3, short sampling windows ("short windows") are also defined. In the case of short windows, several short MDCTs are used instead of one long MDCT.
  • a group of eight MDCTs with 128 output values each is used. This increases the temporal resolution of the encoder at the expense of frequency resolution.
  • short windows are used for transient signals. If short windows are used, z. B. with AAC eight consecutive complete spectra, d. H. eight sets of spectral values, each set of spectral values spanning the entire spectrum. In contrast to the long windows, the distance between the spectral values is also eight times as large. This represents the reduced frequency resolution, but this gives the encoder a higher time resolution.
  • each group contains only one window. In this case, eight sets of scale factors must be transferred.
  • the AAC standard usually combines several windows into one group, taking psychoacoustic requirements into account. This reduces the number of scale factors to be transmitted in order to achieve better data compression.
  • the spectral data are transmitted sequentially in groups, ie written in an encoded bit stream. Within the groups a so-called interleaving, that is to say interleaving, is carried out.
  • presorting is performed according to the second aspect of the present invention.
  • the grouping and the scale factor band-by-band consideration are removed.
  • a new presort is carried out, in units or "units" of spectral lines.
  • each unit receives 4 spectral lines.
  • each window contains 32 units, which is 128 Corresponds to spectral lines.
  • the spectral data are arranged as follows:
  • This pre-sorting ensures that the individual spectral ranges of all windows are adjacent to each other, i. H. that low spectral values corresponding to the frequency from the individual sets of spectral values are written in the front area of the sorting table before the spectral values with high frequency. If the spectral values in the lower spectral range are of particular importance psychoacoustically, then starting from this described pre-sorting in the sorting table, sorting from the same into the grid can take place. For this pre-sorting of the code words, i. H.
  • the pre-sorting of code words into a sorting table corresponds to the determination of the priority code words, since this table itself determines whether code words can be written with a high degree of probability at raster points.
  • the code words that are very likely to can be positioned ie the priority code words, namely the code words at the beginning or in the front or top area of the sort table.
  • this presorting will not be carried out by means of a sort table, but the same can also be carried out by indexing, in such a way that the individual code words are indexed, this indexing determining the order in which the indexed code words are written into the bit stream become.
  • code tables are two-dimensional or four-dimensional, i. that is, a codeword encodes two or four spectral values, respectively. It is therefore expedient to group four spectral lines or a multiple thereof into one unit, since code words which code the same frequency range can be sorted in immediate succession.
  • the number of spectral lines from a unit is therefore preferably divisible by the different dimensions of the code tables, i. H. the number of lines per unit must be a common multiple of the number of lines per code word and ideally the smallest common multiple.
  • the present invention becomes particularly efficient through a combination of the first and the second aspect. If the rearrangement according to the invention has been carried out in units in the case of short windows, the priority code word determination can be followed by means of the code table indicator in such a way that the result of the unit rearrangement is rearranged again to ensure that the Code words from higher code tables become priority code words that are positioned at fixed grid points in order to achieve a high level of error security. This combination is not absolutely necessary, but gives the best results.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Procédé de codage d'un signal audio, selon lequel, en vue d'obtenir un train de bits codé, des valeurs détectées discrètes temporelles du signal audio sont transformées dans la zone de fréquences, en vue d'obtenir des valeurs spectrales. Les valeurs spectrales sont codées au moyen d'un tableau de code présentant un nombre limité de mots de code de différentes longueurs, en vue d'obtenir des valeurs spectrales codées par des mots de code, la longueur d'un mot de code associé à une valeur spectrale étant d'autant plus courte que la probabilité d'apparition de la valeur spectrale est plus élevée. On détermine ensuite une trame de balayage pour le train de bits codé, ladite trame présentantes points équidistants et la distance des points de la trame de balayage étant fonction du/ou des tableaux de code. En vue d'obtenir un codage Huffman qui ne soit pas sujet à l'erreur, des mots de code prioritaires représentant des valeurs spectrales déterminées qui sont, sur le plan psycho-acoustique, significatives comparativement aux autres valeurs spectrales, sont disposés dans la trame de balayage de telle façon que chaque mot de code prioritaire coïncide avec un point de la trame de balayage.
PCT/EP1998/008475 1997-10-24 1998-12-28 Procede et dispositifs pour le codage ou le decodage d'un signal audio ou d'un train de bits WO2000039933A1 (fr)

Priority Applications (12)

Application Number Priority Date Filing Date Title
DE19747119A DE19747119C2 (de) 1997-10-24 1997-10-24 Verfahren und Vorrichtungen zum Codieren bzw. Decodieren eines Audiosignals bzw. eines Bitstroms
DE19840853A DE19840853B4 (de) 1997-10-24 1998-09-07 Verfahren und Vorrichtungen zum Codieren eines Audiosignals
EP98119235A EP0911981B1 (fr) 1997-10-24 1998-10-12 Méthode et dispositif de codage/ décodage d'un signal audio
PCT/EP1998/008475 WO2000039933A1 (fr) 1997-10-24 1998-12-28 Procede et dispositifs pour le codage ou le decodage d'un signal audio ou d'un train de bits
JP2000591732A JP3580777B2 (ja) 1998-12-28 1998-12-28 オーディオ信号又はビットストリームの符号化又は復号化のための方法及び装置
CA002356869A CA2356869C (fr) 1998-12-28 1998-12-28 Procede et dispositifs pour le codage ou le decodage d'un signal audio ou d'un train de bits
AU21636/99A AU754877B2 (en) 1998-12-28 1998-12-28 Method and devices for coding or decoding an audio signal or bit stream
US09/869,401 US6975254B1 (en) 1998-12-28 1998-12-28 Methods and devices for coding or decoding an audio signal or bit stream
KR10-2001-7008203A KR100391935B1 (ko) 1998-12-28 1998-12-28 오디오 신호를 코딩 또는 디코딩하는 방법 및 디바이스
JP2004099418A JP3902642B2 (ja) 1997-10-24 2004-03-30 オーディオ信号又はビットストリームの符号化又は復号化のための方法及び装置
JP2004099417A JP4168000B2 (ja) 1997-10-24 2004-03-30 オーディオ信号又はビットストリームの符号化又は復号化のための方法及び装置
JP2004099419A JP3978194B2 (ja) 1997-10-24 2004-03-30 オーディオ信号又はビットストリームの復号化のための装置及び方法

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
DE19747119A DE19747119C2 (de) 1997-10-24 1997-10-24 Verfahren und Vorrichtungen zum Codieren bzw. Decodieren eines Audiosignals bzw. eines Bitstroms
DE19840853A DE19840853B4 (de) 1997-10-24 1998-09-07 Verfahren und Vorrichtungen zum Codieren eines Audiosignals
PCT/EP1998/008475 WO2000039933A1 (fr) 1997-10-24 1998-12-28 Procede et dispositifs pour le codage ou le decodage d'un signal audio ou d'un train de bits
JP2004099418A JP3902642B2 (ja) 1997-10-24 2004-03-30 オーディオ信号又はビットストリームの符号化又は復号化のための方法及び装置
JP2004099417A JP4168000B2 (ja) 1997-10-24 2004-03-30 オーディオ信号又はビットストリームの符号化又は復号化のための方法及び装置
JP2004099419A JP3978194B2 (ja) 1997-10-24 2004-03-30 オーディオ信号又はビットストリームの復号化のための装置及び方法

Publications (1)

Publication Number Publication Date
WO2000039933A1 true WO2000039933A1 (fr) 2000-07-06

Family

ID=36972705

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP1998/008475 WO2000039933A1 (fr) 1997-10-24 1998-12-28 Procede et dispositifs pour le codage ou le decodage d'un signal audio ou d'un train de bits

Country Status (3)

Country Link
JP (3) JP4168000B2 (fr)
DE (1) DE19747119C2 (fr)
WO (1) WO2000039933A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7610195B2 (en) 2006-06-01 2009-10-27 Nokia Corporation Decoding of predictively coded data using buffer adaptation

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19907728C2 (de) 1999-02-23 2001-03-01 Fraunhofer Ges Forschung Vorrichtung und Verfahren zum Erzeugen eines Datenstroms und Vorrichtung und Verfahren zum Lesen eines Datenstroms
DE19907729C2 (de) * 1999-02-23 2001-02-22 Fraunhofer Ges Forschung Verfahren und Vorrichtung zum Erzeugen eines Datenstroms aus Codeworten variabler Länge und Verfahren und Vorrichtung zum Lesen eines Datenstroms aus Codeworten variabler Länge

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0492537A1 (fr) * 1990-12-21 1992-07-01 Matsushita Electric Industrial Co., Ltd. Dispositif d'enregistrement d'information
EP0717503A2 (fr) * 1989-04-17 1996-06-19 Fraunhofer-Gesellschaft Zur Förderung Der Angewandten Forschung E.V. Procédé de codage et de décodage numérique

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0717503A2 (fr) * 1989-04-17 1996-06-19 Fraunhofer-Gesellschaft Zur Förderung Der Angewandten Forschung E.V. Procédé de codage et de décodage numérique
EP0492537A1 (fr) * 1990-12-21 1992-07-01 Matsushita Electric Industrial Co., Ltd. Dispositif d'enregistrement d'information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHAW-MIN LEI: "THE CONSTRUCTION OF EFFICIENT VARIABLE-LENGTH CODES WITH CLEAR SYNCHRONIZING CODEWORDS FOR DIGITAL VIDEO APPLICATIONS", VISUAL COMMUNICATION AND IMAGE PROCESSING '91: VISUAL COMMUNICATION, BOSTON, NOV. 11 - 13, 1991, vol. PART 2, no. VOL. 1605, 11 November 1991 (1991-11-11), KOU-HU TZOU;TOSHIO KOGA, pages 863 - 873, XP000479292 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7610195B2 (en) 2006-06-01 2009-10-27 Nokia Corporation Decoding of predictively coded data using buffer adaptation

Also Published As

Publication number Publication date
DE19747119C2 (de) 2003-01-16
DE19747119A1 (de) 1999-04-29
JP3902642B2 (ja) 2007-04-11
JP2004264860A (ja) 2004-09-24
JP3978194B2 (ja) 2007-09-19
JP2004234020A (ja) 2004-08-19
JP4168000B2 (ja) 2008-10-22
JP2004234021A (ja) 2004-08-19

Similar Documents

Publication Publication Date Title
EP1112621B1 (fr) Dispositif et procede pour effectuer un codage entropique de mots d'information et dispositif et procede pour decoder des mots d'information ayant subi un codage entropique
EP0910927B1 (fr) Procede de codage et de decodage de valeurs spectrales stereophoniques
DE3943879B4 (de) Digitales Codierverfahren
EP0193143B1 (fr) Procédé de transmission d'un signal audio
EP0609300B1 (fr) Procede de transmission ou de memorisation d'un signal audio code numerise compose d'une sequence de blocs d'informations, par des canaux brouilles
DE19921122C1 (de) Verfahren und Vorrichtung zum Verschleiern eines Fehlers in einem codierten Audiosignal und Verfahren und Vorrichtung zum Decodieren eines codierten Audiosignals
DE3639753C2 (fr)
DE60015448T2 (de) Teilband-Audiokodiersystem
EP0954909A1 (fr) Procede de codage d'un signal audio
DE60022837T2 (de) Vorrichtung zur Teilbandcodierung
EP1155498B1 (fr) Dispositif et procede pour produire un flux de donnees, et dispositif et procede pour lire un flux de donnees
DE19747119C2 (de) Verfahren und Vorrichtungen zum Codieren bzw. Decodieren eines Audiosignals bzw. eines Bitstroms
EP1458103B1 (fr) Méthode et dispositif de codage/décodage d'un signal audio
DE19742201C1 (de) Verfahren und Vorrichtung zum Codieren von Audiosignalen
DE19907729C2 (de) Verfahren und Vorrichtung zum Erzeugen eines Datenstroms aus Codeworten variabler Länge und Verfahren und Vorrichtung zum Lesen eines Datenstroms aus Codeworten variabler Länge

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA JP KR US

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref country code: JP

Ref document number: 2000 591732

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 21636/99

Country of ref document: AU

ENP Entry into the national phase

Ref document number: 2356869

Country of ref document: CA

Ref country code: CA

Ref document number: 2356869

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1020017008203

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 09869401

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 1020017008203

Country of ref document: KR

WWG Wipo information: grant in national office

Ref document number: 21636/99

Country of ref document: AU

WWG Wipo information: grant in national office

Ref document number: 1020017008203

Country of ref document: KR