WO2003088212A1 - Dispositif et procede pour coder un signal audio a temps discret et dispositif et procede pour decoder des donnees audio codees - Google Patents

Dispositif et procede pour coder un signal audio a temps discret et dispositif et procede pour decoder des donnees audio codees Download PDF

Info

Publication number
WO2003088212A1
WO2003088212A1 PCT/EP2002/013623 EP0213623W WO03088212A1 WO 2003088212 A1 WO2003088212 A1 WO 2003088212A1 EP 0213623 W EP0213623 W EP 0213623W WO 03088212 A1 WO03088212 A1 WO 03088212A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
integer
spectral values
difference
discrete
Prior art date
Application number
PCT/EP2002/013623
Other languages
German (de)
English (en)
Inventor
Ralf Geiger
Thomas Sporer
Karlheinz Brandenburg
Jürgen HERRE
Jürgen Koller
Joachim Deguara
Original Assignee
Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V filed Critical Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V
Priority to AU2002358578A priority Critical patent/AU2002358578B2/en
Priority to KR1020047016744A priority patent/KR100892152B1/ko
Priority to AT02792858T priority patent/ATE305655T1/de
Priority to DE50204426T priority patent/DE50204426D1/de
Priority to EP02792858A priority patent/EP1495464B1/fr
Priority to JP2003585070A priority patent/JP4081447B2/ja
Priority to CA002482427A priority patent/CA2482427C/fr
Publication of WO2003088212A1 publication Critical patent/WO2003088212A1/fr
Priority to US10/966,780 priority patent/US7275036B2/en
Priority to HK05109316A priority patent/HK1077391A1/xx

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0017Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components

Definitions

  • the present invention relates to audio coding / audio decoding and in particular to scalable coding / decoding algorithms with a psychoacoustic first scaling layer and a second scaling layer, which comprises additional audio data for lossless decoding.
  • Modern audio coding methods such as. B. MPEG Layer3 (MP3) or MPEG AAC use transformations such as the so-called modified discrete cosine transformation (MDCT) in order to obtain a block-wise frequency representation of an audio signal.
  • MP3 MPEG Layer3
  • MPEG AAC uses transformations such as the so-called modified discrete cosine transformation (MDCT) in order to obtain a block-wise frequency representation of an audio signal.
  • MDCT modified discrete cosine transformation
  • Such an audio encoder usually receives a stream of discrete-time audio samples. The stream of audio samples is windowed to obtain a windowed block of, for example, 1024 or 2048 windowed audio samples.
  • window functions are used for fenestration, e.g. B. a sine window, etc.
  • the windowed, time-discrete audio samples are then converted into a spectral representation using a filter bank.
  • this can be a Fourier transform, or for special reasons a variant of the Fourier transform, such as. B. an FFT or, as has been stated, an MDCT can be used.
  • the block of audio spectral values at the output of the filter bank can then be further processed as required.
  • the audio spectral values are quantized, the quantization levels typically being selected so that the quantization noise introduced by the quantization lies below the psychoacoustic masking threshold, ie is "masked away".
  • the quantization is a lossy coding.
  • the quantized spectral values are then entropy-encoded, for example by means of a Huff an encoding.
  • a bit stream is formed from the entropy-coded quantized spectral values by means of a bit stream multiplexer, which can be stored or transmitted.
  • the bit stream is divided into coded quantized spectral values and side information using a bit stream demultiplexer.
  • the entropy-coded quantized spectral values are first entropy-decoded in order to obtain the quantized spectral values.
  • the quantized spectral values are then inversely quantized in order to obtain decoded spectral values which have quantization noise, but which is below the psychoacoustic masking threshold and will therefore be inaudible.
  • These spectral values are then converted into a temporal representation by means of a synthesis filter bank in order to obtain time-discrete decoded audio samples.
  • a transformation algorithm inverse to the transformation algorithm must be used in the synthesis filter bank.
  • Fig. 4a First, for example, 2048 time-discrete audio samples are taken and windowed using a device 402.
  • the window which the device 402 embodies has a window length of 2N samples and supplies a block of 2N windowed samples on the output side.
  • a second block of 2N windowed samples is formed by means of a device 404, which is shown separately from the device 402 in FIG. 4a only for reasons of clarity.
  • the 2048 samples fed into device 404 are not the discrete-time audio samples immediately following the first window, but rather contain the second half of the samples windowed by device 402 and additionally only contain 1024 "new" samples.
  • the overlap is by one.
  • Device 406 in .Fig. 4a symbolically represented, which causes a degree of overlap of 50%.
  • Both the 2N windowed samples output by means 402 and the 2N windowed samples output by means 404 are then subjected to the MDCT algorithm by means 408 and 410, respectively.
  • the device 408 supplies N spectral values for the first window in accordance with the known MDCT algorithm, while the device 410 also supplies N spectral values, but for the second window, with an overlap of 50% between the first window and the second window.
  • the N spectral values of the first window are transmitted to a device 412 which uses an inverse modified discrete cosine transform. one performs, fed. The same applies to the N spectral values of the second window. These are fed to a device 414, which also carries out an inverse modified discrete cosine transformation. Both the device 412 and the device 414 each deliver 2N samples for the first window and 2N samples for the second window.
  • a sample y ** the second half of the first window, that is, with an index N + k
  • a sample y 2 from the first half of the second window that is, with an index k
  • the window function implemented by the device 402 or 404 is designated with w (k), where the index k represents the time index, then the condition must be fulfilled that.
  • the window weight w (k) squared added to the window weight w (N + k) squared together gives 1, where k runs from 0 to Nl. If a sine window is used, its window weights are the first Half wave follow the sine function, this condition is always fulfilled, since the square of the sine and the square of the cosine together give the value 1 for each angle.
  • a disadvantage of the window method described in FIG. 4a with subsequent MDCT function is the fact that the windowing is achieved by multiplying a discrete-time sample value when a sine window is considered by a floating point number, since the sine of an angle between 0 and 180 degrees apart from the 90 degree angle does not result in an integer. So even if integer, discrete-time samples are windowed, floating-point numbers arise after the window.
  • the residual signal which may still be present is encoded with a time-domain encoder and written into a bit stream which, in addition to the time-domain-encoded residual signal, also comprises encoded spectral values which have been quantized in accordance with the quantizer settings which were present at the time the iteration was terminated.
  • the quantizer used does not have to be controlled by a psychoacoustic model, so that the coded spectral values are obtained. are typically more accurate, quantized than they should be based on the psychoacoustic model.
  • the first lossy data compression module e.g. B. comprises an MPEG encoder, which has a block-wise digital signal form as an input signal and generates the compressed bit stream.
  • the coding is undone and an encoded / decoded signal is generated. This signal is compared to the original input signal by separating the encoded / decoded signal from the original Input signal is subtracted. The error signal is then fed into a second module, where lossless bit conversion is used. This conversion has two steps.
  • the first step is to convert from a two's complement format to a sign amount format.
  • the second step is to convert from a vertical magnitude sequence to a horizontal bit sequence in a processing block.
  • the lossless data conversion is carried out in order to maximize the number of zeros or to maximize the number of successive zeros in a sequence in order to achieve the best possible compression of the temporal error signal which is present as a sequence of digital numbers.
  • This principle is based on a bit slice arithmetic coding (BSAC) scheme, which is described in the specialist publication “Multi-Layer Bit Sliced Bit Rate Scalable Audio Coder ⁇ , 103rd AES Convention, Preprint No. 4520, 1997 , is shown.
  • BSAC bit slice arithmetic coding
  • a disadvantage of the concepts described above is the fact that the data for the lossless extension layer, ie the additional data which are required to achieve lossless decoding of the audio signal, must be obtained in the time domain.
  • complete decoding including frequency / time conversion, is required to obtain the encoded / decoded signal in the time domain, by means of sample-wise difference formation between the original audio input signal and the encoded / decoded audio signal due to the psychoacoustic coding is lossy, the error signal is calculated.
  • This concept is particularly disadvantageous in that in the encoder that generates the audio data stream, both a complete time-frequency Implementation device such. B. a filter bank or z. B.
  • an MDCT algorithm is required for the forward transformation, and at the same time, only to generate the error signal, a complete inverse filter bank or a complete synthesis algorithm is required.
  • the encoder must therefore, in addition to its inherent encoder functionalities, also contain the complete decoder functionality. If the encoder is implemented in software, both memory capacities and processor capacities are required for this, which leads to an encoder implementation with increased effort.
  • the object of the present invention is to create a less complex concept by means of which an audio data stream can be generated which can be decoded at least almost without loss.
  • a device for encoding a discrete-time audio signal according to claim 1 by a method for encoding a discrete-time audio signal according to claim 21, by a device for decoding encoded audio data according to claim 22, by a method for decoding encoded audio data according to claim 31 or solved by a computer program according to claim 32 or 33.
  • the present invention is based on the finding that the additional audio data which enable lossless decoding of the audio signal can be obtained by providing a block of quantized spectral values as usual and then inversely quantized in order to have inversely quantized spectral values that are due to quantization using a psychoacoustic model are lossy. These inverse quantized spectral values are then rounded to obtain a rounding block of rounded inverse quantized spectral values.
  • an integer transformation algorithm is used as a reference for difference formation, which generates an integer block of spectral values from a block of integer time-discrete samples, which only has integer spectral values.
  • the combination of the spectral values in the rounding block and in the integer block is now performed spectrally, that is to say in the frequency domain, so that no synthesis algorithm, that is to say an inverse filter bank or an inverse MDCT algorithm, etc., is required in the encoder itself. Due to the integer transformation algorithm and the rounded quantization values, the combination block which has the difference spectral values only comprises integer values which can be entropy-coded in any known manner. It should be noted that for entropy coding "of the combination block, any. Entropy encoders can be used, such as. B. Huffman encoder or arithmetic encoder etc.
  • Any encoder can also be used to encode the quantized spectral values of the quantization block.
  • the coding / decoding concept according to the invention is compatible with modern coding tools, such as. B. window switching, TNS or center / side coding for multi-channel audio signals.
  • an MDCT is used to provide a quantization block of spectral values quantized using a psychoacoustic model.
  • IntMDCT it is preferred to use a so-called IntMDCT as the integer transformation algorithm.
  • the conventional MDCT can be dispensed with and the IntMDCT can be used as an approximation for the MDCT in that the integer spectrum obtained by the integer transformation algorithm is supplied to a psychoacoustic quantizer is used to obtain quantized IntMDCT spectral values, which are then inversely quantized and rounded again to be compared with the original integer spectral values.
  • the IntMDCT which generates integer spectral values from integer, discrete-time samples.
  • processors typically work with integers, or each floating point number can be represented as an integer. If an integer arithmetic is used in a processor, rounding of the inversely quantized spectral values can be dispensed with, since due to the arithmetic of the processor there are anyway rounded values, namely within the accuracy of the LSB, ie the least significant bit. In this case, completely lossless processing is achieved, ie processing within the accuracy of the processor system used. Alternatively, however, rounding can be carried out to a coarser accuracy in that the difference signal in the combination block is rounded to the accuracy defined by a rounding function. The introduction of rounding beyond the inherent rounding of a processor system allows flexibility to influence the “degree” of the losslessness of the coding in order to create an almost lossless encoder in the sense of data compression.
  • the decoder according to the invention is characterized in that both the psychoacoustically encoded audio data and the additional audio data are extracted from the audio data, subjected to a possibly existing entropy decoding and then processed as follows. First, the quantization block is inversely quantized in the decoder and rounded using the same rounding function that was also used in the encoder, in order then to be added to the entropy-decoded additional audio data.
  • the decoder then has both a psychoacoustically compressed spectral representation of the audio signal and a lossless representation of the audio signal, the psychoacoustically compressed spectral representation of the audio signal having to be converted into the time domain in order to obtain a lossy coded / decoded audio signal while the lossless representation is converted into the time domain using an integer transformation algorithm inverse to the integer transformation algorithm in order to obtain an audio signal which is encoded / decoded almost losslessly or, as has been explained, almost without loss.
  • FIG. 1 shows a block diagram of a preferred device for processing discrete-time audio samples in order to obtain integer values from which integer spectral values can be determined;
  • 3 shows a representation to illustrate the decomposition of the MDCT with 50 percent overlap in rotations and DCT-IV operations
  • 4a is a schematic block diagram of a known encoder with MDCT and 50 percent overlap
  • FIG. 4b is a block diagram of a known decoder for decoding the values generated by FIG. 4a;
  • FIG. 5 shows a basic block diagram of a preferred encoder according to the invention
  • FIG. 6 shows a basic block diagram of an alternative encoder preferred according to the invention.
  • FIG. 7 shows a basic block diagram of a decoder preferred according to the invention
  • 8a shows a schematic representation of a bit stream with a first scaling layer and a second scaling layer
  • 8b shows a schematic representation of a bit stream with a first scaling layer and a plurality of further scaling layers
  • FIG. 9 shows a schematic representation of binary-coded differential spectral values to illustrate possible scalings with regard to the accuracy (bits) of the differential spectral values and / or with regard to the frequency (sampling rate) of the differential spectral values.
  • the encoder according to the invention shown in FIG. 5 includes an input 50, into which a time-discrete audio signal can be fed, and an output 52, from which coded audio data can be output.
  • the time-discrete audio signal fed in at input 50 is fed into a device 52 for supplying a quantization block, which on the output side supplies a quantization block of the time-discrete audio signal, which has 54 quantized spectral values of the time-discrete audio signal 50 using a psychoacoustic model.
  • the encoder according to the invention further comprises a device for generating an integer block using an integer transformation algorithm 56, the integer algorithm being effective to generate integer spectral values from integer discrete-time samples.
  • the encoder according to the invention further comprises means 58 for inversely quantizing the quantization block output by means 52 and, if an accuracy other than processor accuracy is required, a rounding function. If you want to go as far as the accuracy of the processor system as it has been executed, the rounding function is already inherent in the inverse quantization of the quantization block, since a processor that has an integer arithmetic is not capable anyway to deliver non-integer values.
  • the device 58 thus supplies a so-called rounding block, which comprises inversely quantized spectral values that are integers, that is, have been inherently or explicitly rounded.
  • Both the rounding block and the integer block are fed to a combination device which, using a difference formation, supplies a difference block with difference spectral values, ⁇ , where the expression "- - difference - block" is intended to indicate this that the difference spectral values are values that include differences between the integer block and the rounding block.
  • Both the quantization block which is output from the device 52 and the difference block which is output from the difference-forming device 58 are fed to a processing device 60 which, for. B. performs a normal processing of the quantization block, and further z. B. entropy coding the difference block.
  • the means 60 for processing outputs coded audio data at the output 52, which both contain information about the quantization block and include information about the difference block.
  • the discrete-time audio signal is converted into its spectral representation by means of an MDCT and then quantized.
  • the device 52 for supplying the quantization block thus consists of the MDCT device 52a and a quantizer 52b.
  • the processing device 60 shown in FIG. 5 is also shown as a bitstream coding device 60a for bitstream coding the quantization block which is output by the device 52b, and by an entropy encoder 60b for entropy coding the Difference blocks shown.
  • the bitstream encoder 60a outputs the psychoacoustically encoded .. Au.dioda-th.-_, .. during, the. Entropse encoder 60b outputs an entropy-coded difference block.
  • the two output data of blocks 60a and 60b can be combined in a suitable manner in a bit stream which has the psychoacoustically encoded audio data as the first scaling layer and which has the additional audio data for lossless decoding as the second scaling layer.
  • the scaled bit stream then corresponds to the encoded audio data shown in FIG. 5 at the output 52 of the encoder.
  • the MDCT block 52a of FIG. 6 can be omitted, as is indicated in FIG. 5 by a dashed arrow 62.
  • the integer spectrum is represented by the integer transformation device 56 is supplied, both fed into the difference forming device 58 and into the quantizer 52b of FIG. 6.
  • the spectral values which are generated by the integer transformation are here to a certain extent as an approximation for a conventional MDCT Spectrum used.
  • This embodiment has the advantage that only the IntMDCT algorithm is present in the encoder, and that both the IntMDCT algorithm and the MDCT 7 algorithm need not be present in the encoder.
  • the solid blocks and lines represent a conventional audio encoder according to one of the MPEG standards, while the dashed blocks and lines represent the extension of such a conventional MPEG encoder. It can thus be seen that no fundamental change to the usual MPEG encoder is required, but that the extraction according to the invention, for additional audio data, adds a “lossless coding by means of an integer transformation without changing the basic encoder / decoder structure can be.
  • FIG. 7 shows a basic block diagram of a decoder according to the invention for decoding the coded audio data output at the output 52 of FIG. 5. These are first broken down into psychoacoustically coded audio data on the one hand and the additional audio data on the other.
  • the psychoacoustically encoded audio data are fed to a conventional bitstream decoder 70, while the additional audio data, if they have been entropy-encoded in the encoder, are entropy-decoded by means of an entropy decoder 72.
  • an entropy decoder 72 At the output of the bitstream decoder 70 of FIG. 7 there are quantized spectral values which are fed to an inverse quantizer 74, which in principle can be constructed identically to the inverse quantizer in the device of FIG. 6.
  • a rounding device 76 is also provided in the decoder, which performs the same rounding algorithm or the same rounding function for mapping a real number to an integer as is also used in the device 58 of FIG. 6 can be implemented.
  • a combiner 78 on the decoder side the rounded inversely quantized spectral values are combined with the entropy-coded additional audio data, preferably additively, so that inverse quantized spectral values are present in the decoder at the output of the device 74 and secondly integer spectral values at the output of the combiner 78 available.
  • the spectral values of the device 74 on the output side can then be converted into the time domain by means of a device 80 for carrying out an inverse modified discrete cosine transformation in order to obtain a lossy psychoacoustically coded and again decoded audio signal.
  • a device 82 for performing an inverse integer MDCT IntMDCT
  • the output signal of the combiner 78 is also converted into its temporal representation in order to produce a losslessly coded / decoded audio signal or, if a correspondingly coarse rounding has been used, an almost generate losslessly encoded and decoded audio signal.
  • a particular preferred embodiment of the entropy encoder 60b of FIG. 6 is discussed below. After several, in a conventional modern MPEG encoder If there are more code tables that are selected depending on an average statistic of the quantized spectral values, it is preferred to use the same code tables or codebooks also for the entropy coding of the difference block at the output of the combiner 58. Since the amount of the difference block, that is to say the remaining IntMDCT spectrum, depends on the accuracy of the quantization, a codebook selection for the entropy encoder 60b can be carried out without additional page information.
  • the spectral coefficients that is to say the quantized spectral values in the quantization block, are grouped into scale factor bands, the spectral values being weighted with an amplification factor which is derived from a corresponding scale factor which is assigned to a scale factor band is. Because in this known encoder concept an uneven quantizer
  • _ is used ... to. weighted, “quantize spectral value, the size of the residual values, ie the spectral values at the output of combiner 58, depends not only on the scale factors, but also on the quantized values themselves. However, after both the scale factors and the quantized ones Spectral values in the bit stream which is generated by the device 60a of FIG. 6, that is to say are contained in the psychoacoustically coded audio data, it is preferred to carry out a codebook selection in the encoder depending on the size of the difference spectral values and also in the decoder determine the code tables used in the encoder on the basis of both the scale factors transmitted in the bit stream and the quantized values.
  • the entropy coding After no side- to entropy-encode the difference spectral values at the output of the combiner 58 If information has to be transmitted, the entropy coding only leads to data rate compression without any signaling bits in the data stream having to be used as side information for the entropy encoder 60b.
  • a window switch is used to avoid pre-echoes in transient audio signal areas.
  • This technique is based on the possibility of individually selecting window shapes in each half of the MDCT window and allows the block size to be varied in successive blocks.
  • • fashion is the integer transform algorithm in the form of the IntMDCT, the loading train to FIGS. Is received 1 to 3, carried out also different window shapes when windowing and in time-domain aliasing portion of the MDCT Use disassembly. It is therefore preferred to use the ... Ganzzahi-Tran.s, formation. ⁇ a. algorithm and .... also, for .den. Transformation algorithm to generate the quantization block using the same window decisions.
  • TNS Temporal Noise Shaping
  • MS center / side stereo coding
  • TNS coding just like with MS coding, the spectral values are modified before quantization.
  • the integer transformation algorithm is designed to add both TNS coding and center / side coding of integer spectral values. to let.
  • the TNS technique is based on an adaptive forward prediction of the MDCT values over frequency.
  • the same prediction filter that is signal-adaptively calculated by a conventional TNS module is preferably also used to predict the integer spectral values, and if this results in non-integer values, a subsequent rounding can be used to get integer values again to create. This rounding preferably takes place after each prediction step.
  • the original spectrum can be reconstructed in the decoder by using the inverse filter and the same rounding function.
  • MS coding can also be applied to IntMDCT spectral values by using rounded Givens rotations at an angle of ⁇ / 4 based on the lifting scheme. This allows the original IntMDCT values to be reconstructed in the decoder.
  • the concept according to the invention is backwards compatible.
  • the hearing-adapted encoder or decoder is not changed, but only expanded. Additional information for the lossless components can be transmitted backwards compatible in the ear-coded bit stream, for example with MPEG-2 AAC in the "Ancilliary Data" field.
  • MPEG-2 AAC in the "Ancilliary Data" field.
  • this additional data can be evaluated and, together with the quantized MDCT spectrum from the hearing-adapted decoder, the IntMDCT spectrum can be reconstructed without loss.
  • scalable data streams comprise different scaling layers, of which at least the lowest scaling layer can be transmitted and decoded independently of the higher scaling layers. Additional scaling layers or enhancement layers are added in the case of scalable processing of data of the first scaling layer or basic layer.
  • a fully equipped encoder can generate a scaled data stream which has a first scaling layer and which in principle has any number of other scaling layers.
  • the coded signal can still be transmitted via the transmission channel, but only in the form of the first scaling layer or a specific number of further scaling layers, the specific number being smaller than the total number of scaling layers generated by the encoder ,
  • the encoder can already match the basic scaling layer, adapted to the channel to which it is connected or generate a first scaling layer and a number of further scaling layers depending on the channel.
  • the scalable concept also has the advantage that it is backwards compatible. This means that a decoder that is only able to process the first scaling layer can simply ignore the second and further scaling layers in the data stream and can produce a useful output signal. If, on the other hand, the decoder is a typically more modern decoder which can process a plurality of scaling layers from the scaled data stream, this encoder can be addressed with the same data stream as a basic decoder.
  • the basic scalability is that the quantization block, i.e. the output of the bit stream encoder 60a, is written into a first scaling layer 81 of FIG. 8, which, if FIG . , is considered,., psychoacoustically coded-- data- -z »-B-- for-- is included.
  • the preferably entropy-coded difference spectral values, which are generated by the combination device 58, are written with simple scalability into the second scaling layer, which is denoted by 82 in FIG. 8a and thus comprises the additional audio data for one frame.
  • both scaling layers 81 and 82 can be transmitted to the decoder.
  • the transmission channel is a narrow-band transmission channel in which only the first scaling layer "fits”
  • the second scaling layer can simply be removed from the data stream before the transmission, so that a Decoder is only addressed with the first scaling layer.
  • a "basic decoder" that can only process the psychoacoustically encoded data can simply omit the second scaling layer 82 if it has received it via a broadband transmission channel.
  • the decoder is a fully equipped decoder, which is both a "psychoacoustic Decoding algorithm and an integer decoding algorithm includes, so this fully equipped decoder can use both the first scaling layer and the second scaling layer for decoding to produce a losslessly encoded and decoded output signal.
  • the psychoacoustic becomes again in a first scaling layer. encoded .data, for ..a .. frame.
  • the * second scaling layer of Fig. 8a is now scaled more finely, so that several scaling layers arise from this second scaling layer in Fig. 8a, e.g. a (smaller) second scaling layer, a third scaling layer, a fourth scaling layer, etc.
  • FIG. 9 schematically represents binary coded spectral values.
  • Each line 90 in FIG. 9 represents a binary coded difference spectral value.
  • the difference spectral values are ordered according to frequency, as indicated by an arrow 91 , So a difference Spectral value 92 is a higher frequency than the difference spectral value 90.
  • the first column of the panel in FIG. 9 presents the most significant bit of a difference spectral value.
  • the second digit represents the bit with the MSB-1 value.
  • the third column represents a bit with the MSB-2 value.
  • the third to last column represents a bit with the LSB + 2 value.
  • the penultimate column represents a bit with the Significance LSB + 1.
  • the last column represents a bit with .. Significance LSB, ie the least significant bit of a difference spectral value.
  • accuracy scaling is carried out such that the second scaling layer is e.g. 16 most significant bits of a difference spectral value are taken, then, if desired, entropy encoded by the entropy encoder 60b.
  • a decoder that uses the second scaling layer gets _ output side ... difference spectral values • • - • with an accuracy of 16 bits, so that the second scaling layer together with the first scaling layer produces a losslessly decoded audio signal in CD quality supplies. It is known that CD quality audio samples are 16 bits wide.
  • the encoder can also generate a third scaling layer, which comprises the last eight bits of a difference spectral value and also entropy as required is encoded (device 60 of FIG. 6).
  • a fully equipped decoder which receives the data stream with the first scaling layer, the second scaling layer (16 most significant bits of the difference spectral values) and the third scaling layer (8 least significant bits of a difference spectral value), the decoder can use all three scaling layers Deliver losslessly coded / decoded audio signal in studio quality, i.e. with a word width of 24 bits at the output of the decoder.
  • word width 16 bits for an audio CD, while in the studio area 24 bits or 20 bits are used.
  • the audio signal represented with 24-bit accuracy is represented in the integer spectral range with the aid of the inverse IntMDCT and scalably combined with an ear adapted MDCT-based audio encoder output signal.
  • the transmitted values are preferably scaled back to the original range, for example 24 bits, by multiplying them by, for example, 2 8 .
  • An inverse IntMDCT is then applied to the correspondingly scaled back values.
  • the accuracy scaling according to the invention in the frequency domain, it is further preferred to also use the redundancy in the LSBs. If an audio signal has very little energy, for example in the upper frequency range, this is also expressed in the IntMDCT spectrum in very small values, which are, for example, significantly smaller than the z. B. with 8 bit possible values (-128, ..., 127). This manifests itself in a compressibility of the LSB values of the IntMDCT Spectrum. Furthermore, it is pointed out that with very small difference spectral values, a number of bits from MSB to MSB-n are typically zero, and that only then with a bit with a value MSB-n-1 does the first, leading 1 in a binary coded difference spectral value occurs. In such a case, if a difference spectral value in the second scaling layer comprises only zeros, entropy coding is particularly well suited for further data compression.
  • a sampling rate scalability is preferred for the second scaling layer 82 of FIG. 8a.
  • Sampling rate scalability is achieved in that in the second scaling layer, as shown on the right in FIG. 9, the difference spectral values are contained up to a first cut-off frequency, while in a third scaling layer the difference spectral values with a frequency .. .between the. first ... cut-off frequency ⁇ and the..maximum frequency are included.
  • a further scaling can also be carried out, so that several scaling layers are made from the entire frequency range.
  • the second scaling layer in FIG. 9 comprises differential spectral values up to a frequency of 24 kHz, which corresponds to a sampling rate of 48 kHz.
  • the third scaling layer then contains the difference spectral values from 24 kHz to 48 kHz, which corresponds to a sampling rate of 96 kHz.
  • the second scaling layer and the third scaling layer all bits of a difference spectral value do not necessarily have to be encoded.
  • the second scaling layer could comprise the bits MSB to MSB-x of the difference spectral values up to a certain cut-off frequency.
  • a third scaling layer could then comprise the bits MSB to MSB-x of the difference spectral values from the first cutoff frequency to the maximum frequency.
  • a fourth scaling layer could then include the remaining bits for the difference spectral values up to the cutoff frequency.
  • the last scaling layer could then include the remaining bits of the difference spectral values for the upper frequencies. This concept will result in dividing the tableau in Fig. 9 into four quadrants, each quadrant representing a scaling layer.
  • a scalability between 48 kHz and 96 kHz sampling rate is described in a preferred embodiment of the present invention.
  • the 96 kHz scanning signal is initially only encoded and transmitted up to half in the lossless extension layer in the IntMDCT range. If the upper part is not additionally transmitted, it is assumed to be zero in the decoder. With the inverse IntMDCT (same length as in the encoder), a 96 kHz signal is generated which contains no energy in the upper frequency range and can therefore be undersampled to 48 kHz without loss of quality.
  • the accuracy scaling can be softened to a certain extent.
  • the first scaling layer can also have spectral values with z. B. have more than 16 bits, the next scaling layer then still having the difference.
  • the second scaling layer thus has the difference spectral values with less accuracy, while in the next scaling layer the rest, ie the difference between the complete spectral values and the spectral values contained in the second scaling layer is transmitted. A variable reduction in accuracy is thus achieved.
  • the inventive method for coding or decoding is preferably on a digital storage medium, such as. B. a floppy disk, stored with electronically readable control signals, wherein the control signals can work together with a programmable computer system so that the coding and / or decoding method can / can be carried out.
  • a digital storage medium such as. B. a floppy disk
  • the control signals can work together with a programmable computer system so that the coding and / or decoding method can / can be carried out.
  • the methods according to the invention can thus be implemented in a computer chart with a program code for carrying out the methods according to the invention when the program runs on a computer.
  • the IntMDCT transformation algorithm which is described in "Audio Coding Based on Integer Transforms", 111th AES Assembly, New York, 2001, is dealt with as an example of an integer transformation algorithm.
  • the IntMDCT is particularly favorable because it has the attractive properties of the MDCT, such as a good spectral representation of the audio signal, a critical sampling and a block overlap
  • the good approximation of the MDCT by an IntMDCT also allows only one in the encoder shown in FIG Transform algorithm to use as it through an arrow 62 is shown in FIG. 5.
  • the essential properties of this special form of an integer transformation algorithm are explained with reference to FIGS. 1 to 4.
  • FIG. 1 shows an overview diagram for the device according to the invention for processing discrete-time samples, which represent an audio signal, in order to obtain integer values on which the Int-MDCT integer transformation algorithm is based.
  • the discrete-time samples are windowed by the device shown in FIG. 1 and optionally converted into a spectral representation.
  • the discrete-time samples, which are fed into the device at an input 10, have a window w with a length that is 2N discrete-time
  • the device 14 for executing an integer DCT to be converted into a spectral representation The integer DCT is designed to generate N output values from N input values, which is in contrast to the MDCT function 408 of FIG. 4a, which generates only N spectral values from 2N windowed sample values on the basis of the MDCT equation.
  • a discrete-time sample value which is selected by the device 16, lies in the first quarter of the window.
  • the other discrete-time sample lies in the second quarter of the window, as it is based on 3 is executed in more detail.
  • the vector generated by the device 16 is now subjected to a rotation matrix of the dimension 2 ⁇ 2, this operation not being carried out directly, but by means of several so-called lifting matrices.
  • a lifting matrix has the property that it has only one element which depends on the window w and is not equal to "1" or "0".
  • Each of the three lifting matrices to the right of the equal sign has the value "1" as the main diagonal elements. Furthermore, in each lifting matrix, a secondary diagonal element is equal to 0, and a secondary diagonal element is dependent on the angle of rotation ⁇ .
  • the vector is now mulched with the third lifting matrix, ie the lifting matrix on the far right in the equation above. tiplied to get a first result vector.
  • This is represented in FIG. 1 by a device 18.
  • the first result vector is now rounded with an arbitrary rounding function, which maps the set of real numbers into the set of integers, as represented by a device 20 in FIG. 1.
  • a rounded first result vector is obtained at the output of the device 20.
  • the rounded first result vector is now fed into a device 22 for multiplying it by the middle, ie second, lifting matrix in order to obtain a second result vector which is rounded in a device 24 in order to obtain a rounded second result vector.
  • the rounded second result vector is now fed into a device 26, namely to multiply it by the lifting matrix listed on the left in the above equation, ie first, lifting matrix, in order to obtain a third result vector, which is finally rounded by means of a device 28, and finally to obtain integer windowed sample values at the output 12 which, if a spectral representation thereof is desired, must now be processed by the device 14 in order to obtain integer spectral values at a spectral output 30.
  • the device 14 is preferably designed as an integer DCT or an integer DCT.
  • the coefficients of the DCT-IV form an orthonormal N x N matrix.
  • Each orthogonal N x N matrix can be broken down into N (Nl) / 2 Givens rotations, as stated in the specialist publication PP Vaidyanathan, "Multirate Systems And Filter Banks", Prentice Hall, Englewood Cliffs, 1993. It should be noted that there are also other decompositions.
  • DCT-IV includes non-symmetric basis functions, i.e. H. a cosine quarter wave, a cosine 3/4 wave, a cosine 5/4 wave, a cosine 7/4 wave, etc., has the discrete cosine transformation z.
  • DCT-II Type II (DCT-II), axisymmetric and point-symmetric basic functions.
  • the 0th basic function has a constant component, - * the first basic function is a half cosine wave, the second basic function is a complete cosine wave, etc. Due to the fact that the DCT-II takes special account of the constant component, it becomes used for video coding, but not for audio coding, since in contrast to video coding, the direct component is not relevant for audio coding.
  • An MDCT with a window length of 2N can be reduced to a discrete cosine transformation of type IV with a length N. This is achieved by the TDAC Operation is performed explicitly in the time domain, and then the DCT-IV is applied. With a 50% overlap, the left half of the window for a block t overlaps the right half of the previous block, ie block t-1.
  • the overlapping part of two successive blocks t-1 and t is preprocessed in the time domain, ie before the transformation, as follows, ie processed between input 10 and output 12 of FIG. 1: ...
  • the values marked with the tilde are the values at the output 12 of Fig. 1, while without tilde in the above equation denoted x values r -hinte-values at the input 10 and the means' 16 • for selecting sin "d. ' ⁇ 'The Läüf' index k runs from 0 to N / 2-1, while w is the window function provides DAR.
  • window functions w can be used as long as they meet this TDAC condition.
  • a cascaded encoder and decoder is described below with reference to FIG. 2.
  • the discrete-time samples x (0) to x (2N-1), which are "windowed" together by a window, are first selected by the device 16 of FIG. 1 in such a way that the sample value x (0) and the sample value x ( Nl), d. H. a sample from the first quarter of the window and a sample from the second quarter of the window are selected to form the vector at the output of device 16.
  • the intersecting * arrows- represent schematically - the —L-if-t-ing multiplications and subsequent rounding of the devices 18, 20 or 22, 24 or 26, 28 in order to achieve the at the entrance of the DCT-IV blocks to get integer windowed samples.
  • a second vector is further formed from the samples x (N / 2- 1) and x (N / 2), ie again a sample from the first quarter of the window and a sample from the second Quarter of the window, selected and again processed by the algorithm described in FIG. 1. Similarly, all other sample pairs from the first and second quarters of the window are processed. The same processing is done for the third and fourth quarters of the first Window performed. Now there are 2N windowed integer samples at the output 12, which are now fed into a DCT-IV transformation as shown in FIG. 2. In particular, the integer windowed samples of the second and third quarters are fed into a DCT.
  • the windowed integer samples of the first quarter of the window are processed into a previous DCT-IV along with the windowed integer samples of the fourth quarter of the previous window.
  • the fourth quarter of the windowed integer samples in FIG. 2 is fed together with the first quarter of the next window into a DCT-IV transformation.
  • the mean integer DCT-IV transformation 32 shown in FIG. 2 now supplies N integer spectral values y (0) to y (Nl). These integer spectral values may now be entropy encoded, for example, simply, without an intervening quantization is necessary because the windowing and transformation provides integer • output values.
  • a decoder is shown in the right half of FIG.
  • the decoder consisting of inverse transformation and "inverse windowing" works inversely to the encoder. It is known that an inverse DCT-IV can be used for the inverse transformation of a DCT-IV, as shown in FIG. 2.
  • the output values of the decoder DCT-IV 34 are now, as shown in FIG. 2, inversely processed with the corresponding values of the preceding transformation or the subsequent transformation, in order to derive from the integer windowed sample values at the output of the device 34 or the previous and subsequent transformation again to generate discrete-time audio samples x (0) to x (2N-1).
  • Equation 6 The values x, y on the right side of Equation 6 are integers. However, this does not apply to the value x sin ⁇ .
  • the rounding function r must be introduced here, as in the following equation
  • the device 24 carries out this operation.
  • the inverse mapping (in the decoder) is defined as follows:
  • the minus sign before the rounding operation shows that the integer approximation of the lifting step can be reversed without introducing an error. Applying this approximation to each- & r. ⁇ fz-r Hrei Liftin ⁇ steps leads to an integer Approximation of the Givens rotation.
  • the rounded rotation (in the encoder) can be reversed (in the decoder) without introducing an error by going through the inverse rounded lifting steps in reverse order, ie when decoding the algorithm of Fig. 1 from the bottom to the bottom is performed above.
  • Givens rotation is therefore broken down into lifting matrices which are carried out sequentially, with a rounding step being introduced after each lifting matrix multiplication, in such a way that the floating point numbers are rounded off immediately after they have arisen, in such a way that before every multiplication of a result vector by a lifting matrix the result vector only has integers.
  • any PCM samples for example, as they are stored on a CD, are integer number values whose range of values varies depending on the bit width, i. H. depending on whether the discrete-time digital input values are 16-bit values or
  • the transformation shown provides integer output values instead of floating point values. It provides a perfect reconstruction so that no error is introduced when performing a forward and then a reverse transformation.
  • the transformation is a replacement for the modified discrete cosine transformation.
  • other transformation methods can also be carried out in whole numbers, as long as they are broken down into rotations and It is possible to break down the rotations into lifting steps.
  • the integer MDCT has the most favorable properties of the MDCT. It has an overlapping structure, which results in better frequency selectivity than with non-overlapping block transformations. Because of the TDAC function, which is already taken into account in the window before the transformation, a critical sampling is maintained so that the total number of spectral values which represent an audio signal is equal to the total number of input samples.
  • integer processing lends itself to an efficient hardware implementation, since only multiplication steps are used, which can easily be broken down into shift-add steps (shift / add steps), which can be implemented simply and quickly in terms of hardware.
  • Software implementation is of course also possible.
  • the integer transformation provides a good spectral representation of the audio signal and still remains in the range of the integers. When applied to tonal parts of an audio signal, it results in good energy concentration.
  • An efficient lossless coding scheme can thus be built up by simply using the in Fig. 1 windowing / transformation is cascaded with an entropy encoder.
  • stacked coding using escape values as used in MPEG AAC, is favorable. It is preferred to scale down all values by a certain power of two until they fit into a desired code table, and then additionally code the omitted least significant bits. Compared to the alternative of. Using larger code tables, the described alternative is cheaper in terms of memory consumption for storing the code tables.
  • An almost lossless encoder could also be obtained by simply omitting certain of the least significant bits.
  • TNS Open loop prediction
  • TNS Closed Loop Predictor
  • quantization after prediction adapts the resulting quantization noise to the temporal structure of the audio signal and therefore prevents it Pre-echoes in psychoacoustic audio encoders.
  • the second alternative, ie with a closed-loop predictor is more suitable for lossless audio coding, since the closed-loop prediction allows an exact reconstruction of the input signal.
  • middle-side coding can also be used without loss if a rounded rotation with an angle D / 4 is used.
  • the rounded rotation has the advantage of energy conservation.
  • the use of 's'ög' e-called joint stereo coding can be switched for each band on or off, as it is also performed in the standard MPEG AAC. Additional angles of rotation can also be taken into account in order to be able to reduce redundancy between two channels more flexibly.

Abstract

Selon la présente invention, un signal audio à temps discret est traité (52) afin de fournir (52) un bloc de quantification avec des valeurs spectrales quantifiées. Une représentation spectrale en nombres entiers est produite à partir du signal audio à temps discret, par utilisation d'un algorithme de transformation (56) en nombres entiers. Le bloc de quantification qui a été produit à l'aide d'un modèle psychoacoustique (54) est inversement quantifié et arrondi (58) afin d'établir une différence entre les valeurs spectrales en nombres entiers et les valeurs spectrales arrondies inversement quantifiées. Le bloc de quantification seul fournit, après le décodage, un signal audio à codage/décodage psychoacoustique avec pertes, alors que le bloc de quantification avec le bloc de combinaison fournit, lors du décodage, un signal audio codé ou à nouveau décodé, sans perte ou quasiment sans perte. La production du signal différentiel dans le domaine fréquentiel permet d'obtenir une structure de codeur/décodeur simplifiée.
PCT/EP2002/013623 2002-04-18 2002-12-02 Dispositif et procede pour coder un signal audio a temps discret et dispositif et procede pour decoder des donnees audio codees WO2003088212A1 (fr)

Priority Applications (9)

Application Number Priority Date Filing Date Title
AU2002358578A AU2002358578B2 (en) 2002-04-18 2002-12-02 Device and method for encoding a time-discrete audio signal and device and method for decoding coded audio data
KR1020047016744A KR100892152B1 (ko) 2002-04-18 2002-12-02 시간-이산 오디오 신호를 부호화하기 위한 장치 및 방법그리고 부호화 오디오 데이터를 복호화하기 위한 장치 및방법
AT02792858T ATE305655T1 (de) 2002-04-18 2002-12-02 Vorrichtung und verfahren zum codieren eines zeitdiskreten audiosignals und vorrichtung und verfahren zum decodieren von codierten audiodaten
DE50204426T DE50204426D1 (de) 2002-04-18 2002-12-02 Vorrichtung und verfahren zum codieren eines zeitdiskreten audiosignals und vorrichtung und verfahren zum decodieren von codierten audiodaten
EP02792858A EP1495464B1 (fr) 2002-04-18 2002-12-02 Dispositif et procede pour coder un signal audio a temps discret et dispositif et procede pour decoder des donnees audio codees
JP2003585070A JP4081447B2 (ja) 2002-04-18 2002-12-02 時間離散オーディオ信号を符号化する装置と方法および符号化されたオーディオデータを復号化する装置と方法
CA002482427A CA2482427C (fr) 2002-04-18 2002-12-02 Dispositif et procede pour coder un signal audio a temps discret et dispositif et procede pour decoder des donnees audio codees
US10/966,780 US7275036B2 (en) 2002-04-18 2004-10-15 Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data
HK05109316A HK1077391A1 (en) 2002-04-18 2005-10-20 Device and method for coding and decoding audio signal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE10217297A DE10217297A1 (de) 2002-04-18 2002-04-18 Vorrichtung und Verfahren zum Codieren eines zeitdiskreten Audiosignals und Vorrichtung und Verfahren zum Decodieren von codierten Audiodaten
DE10217297.8 2002-04-18

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US10/966,780 Continuation US7275036B2 (en) 2002-04-18 2004-10-15 Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data

Publications (1)

Publication Number Publication Date
WO2003088212A1 true WO2003088212A1 (fr) 2003-10-23

Family

ID=28798541

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2002/013623 WO2003088212A1 (fr) 2002-04-18 2002-12-02 Dispositif et procede pour coder un signal audio a temps discret et dispositif et procede pour decoder des donnees audio codees

Country Status (9)

Country Link
EP (1) EP1495464B1 (fr)
JP (1) JP4081447B2 (fr)
KR (1) KR100892152B1 (fr)
CN (1) CN1258172C (fr)
AT (1) ATE305655T1 (fr)
CA (1) CA2482427C (fr)
DE (2) DE10217297A1 (fr)
HK (1) HK1077391A1 (fr)
WO (1) WO2003088212A1 (fr)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007509362A (ja) * 2003-10-10 2007-04-12 エージェンシー フォー サイエンス,テクノロジー アンド リサーチ デジタル信号をスケーラブルビットストリームにエンコードする方法、及びスケーラブルビットストリームをデコードする方法
EP1852849A1 (fr) * 2006-05-05 2007-11-07 Deutsche Thomson-Brandt Gmbh Procédé et appareil d'encodage sans perte d'un signal source utilisant un courant de données encodées avec perte et un courant d'extension de données encodées sans perte
EP1883067A1 (fr) * 2006-07-24 2008-01-30 Deutsche Thomson-Brandt Gmbh Méthode et appareil pour l'encodage sans perte d'un signal source, utilisant un flux de données encodées avec pertes et un flux de données d'extension sans perte.
KR100813193B1 (ko) * 2004-02-13 2008-03-13 프라운호퍼-게젤샤프트 츄어 푀르더룽 데어 안게반텐 포르슝에.파우. 정보 신호의 양자화 방법 및 장치
KR100814673B1 (ko) * 2004-02-13 2008-03-18 프라운호퍼-게젤샤프트 츄어 푀르더룽 데어 안게반텐 포르슝에.파우. 오디오 부호화
JP2009266250A (ja) * 2003-09-29 2009-11-12 Agency For Science Technology & Research 時間ドメインから周波数ドメインへ及びそれとは逆にデジタル信号を変換する方法
US7761303B2 (en) 2005-08-30 2010-07-20 Lg Electronics Inc. Slot position coding of TTT syntax of spatial audio coding application
US7774199B2 (en) 2005-10-05 2010-08-10 Lg Electronics Inc. Signal processing using pilot based coding
US8037114B2 (en) 2004-12-13 2011-10-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method for creating a representation of a calculation result linearly dependent upon a square of a value
US8095358B2 (en) 2005-10-24 2012-01-10 Lg Electronics Inc. Removing time delays in signal paths
CN103038822A (zh) * 2010-07-30 2013-04-10 高通股份有限公司 用于多级形状向量量化的系统、方法、设备和计算机可读媒体
US8494667B2 (en) 2005-06-30 2013-07-23 Lg Electronics Inc. Apparatus for encoding and decoding audio signal and method thereof
US9093065B2 (en) 2006-09-20 2015-07-28 Thomson Licensing Method and device for transcoding audio signals exclduing transformation coefficients below −60 decibels
US9208792B2 (en) 2010-08-17 2015-12-08 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for noise injection
WO2018175119A1 (fr) * 2017-03-22 2018-09-27 IMMERSION SERVICES LLC dba IMMERSION NETWORKS Système et procédé de traitement de données audio
US11281312B2 (en) 2018-01-08 2022-03-22 Immersion Networks, Inc. Methods and apparatuses for producing smooth representations of input motion in time and space

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102006051673A1 (de) * 2006-11-02 2008-05-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zum Nachbearbeiten von Spektralwerten und Encodierer und Decodierer für Audiosignale
DE102007003187A1 (de) 2007-01-22 2008-10-02 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zum Erzeugen eines zu sendenden Signals oder eines decodierten Signals
KR101149448B1 (ko) 2007-02-12 2012-05-25 삼성전자주식회사 오디오 부호화 및 복호화 장치와 그 방법
EP2015293A1 (fr) * 2007-06-14 2009-01-14 Deutsche Thomson OHG Procédé et appareil pour coder et décoder un signal audio par résolution temporelle à commutation adaptative dans le domaine spectral
CN103594090B (zh) * 2007-08-27 2017-10-10 爱立信电话股份有限公司 使用时间分辨率能选择的低复杂性频谱分析/合成
EP2063417A1 (fr) * 2007-11-23 2009-05-27 Deutsche Thomson OHG Formage de l'erreur d'arrondi pour le codage et décodage basés sur des transformées entières
EP2144230A1 (fr) 2008-07-11 2010-01-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Schéma de codage/décodage audio à taux bas de bits disposant des commutateurs en cascade
EP3640941A1 (fr) * 2008-10-08 2020-04-22 Fraunhofer Gesellschaft zur Förderung der Angewand Schéma connectable de codage/décodage audio multirésolution
WO2011122875A2 (fr) * 2010-03-31 2011-10-06 한국전자통신연구원 Procédé et dispositif de codage, et procédé et dispositif de décodage
JP5799707B2 (ja) * 2011-09-26 2015-10-28 ソニー株式会社 オーディオ符号化装置およびオーディオ符号化方法、オーディオ復号装置およびオーディオ復号方法、並びにプログラム
EP2830058A1 (fr) 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Codage audio en domaine de fréquence supportant la commutation de longueur de transformée
CN105632503B (zh) * 2014-10-28 2019-09-03 南宁富桂精密工业有限公司 信息隐藏方法及系统
EP3471271A1 (fr) * 2017-10-16 2019-04-17 Acoustical Beauty Convolutions améliorées de signaux numériques utilisant une optimisation des exigences de bits d'un signal numérique cible
WO2019091576A1 (fr) * 2017-11-10 2019-05-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Codeurs audio, décodeurs audio, procédés et programmes informatiques adaptant un codage et un décodage de bits les moins significatifs
CN107911122A (zh) * 2017-11-13 2018-04-13 南京大学 基于分解压缩的分布式光纤振动传感数据无损压缩方法
EP3775821A1 (fr) 2018-04-11 2021-02-17 Dolby Laboratories Licensing Corporation Fonctions de perte basées sur la perception pour le codage et le décodage audio sur la base d'un apprentissage automatique
DE102019204527B4 (de) * 2019-03-29 2020-11-19 Technische Universität München Kodierungs-/dekodierungsvorrichtungen und verfahren zur kodierung/dekodierung von vibrotaktilen signalen
KR102250835B1 (ko) * 2019-08-05 2021-05-11 국방과학연구소 수동 소나의 협대역 신호를 탐지하기 위한 lofar 또는 demon 그램의 압축 장치

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
GEIGER R ET AL: "AUDIO CODING BASED ON INTEGER TRANSFORMS", PREPRINTS OF PAPERS PRESENTED AT THE 111TH AES CONVENTION, no. 5471, 30 November 2001 (2001-11-30), pages 1 - 9, XP008006797 *
GEIGER R ET AL: "INTMDCT-A LINK BETWEEN PERCEPTUAL AND LOSSLESS AUDIO CODING", 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS. (ICASSP), vol. 3 OF 4, 13 May 2002 (2002-05-13) - 17 May 2002 (2002-05-17), Orlando, Florida, USA, pages II - 1813-II-1816, XP001097166, ISBN: 0-7803-7402-9 *
HANS, M.; SCHAFER, R.W.: "Lossless compression of digital audio", SIGNAL PROCESSING MAGAZINE, IEEE, vol. 18, no. 4, July 2001 (2001-07-01), pages 21 - 32, XP001053611 *
MORIYA T ET AL: "A design of lossy and lossless scalable audio coding", 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS. (ICASSP). ISTANBUL, TURKEY, JUNE 5-9, 2000, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), NEW YORK, NY: IEEE, US, vol. 2 OF 6, 5 June 2000 (2000-06-05), pages II889 - II892, XP002901946, ISBN: 0-7803-6294-2 *
NOLL P AND LIEBCHEN T: "DIGITAL AUDIO: FROM LOSSLESS TO TRANSPARENT CODING", IEEE SIGNAL PROCESSING WORKSHOP, 1999, Poznan, pages 53 - 60, XP000926389 *
RAAD, M. AND MERTINS, A.: "From lossy to lossless audio coding using SPITH", PROC. OF THE 5TH INT. CONF. ON DIGITAL AUDIO EFFECTS, DAFX-02, 26 September 2002 (2002-09-26) - 28 September 2002 (2002-09-28), Hamburg, Deutschland, pages 245 - 250, XP002255007, Retrieved from the Internet <URL:www.unibw-hamburg.de/EWEB/ANT/dafx2002/papers/DAFX02_Raad_Mertins_lossy_lossless_coding.pdf> [retrieved on 20030918] *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009266250A (ja) * 2003-09-29 2009-11-12 Agency For Science Technology & Research 時間ドメインから周波数ドメインへ及びそれとは逆にデジタル信号を変換する方法
JP4849466B2 (ja) * 2003-10-10 2012-01-11 エージェンシー フォー サイエンス, テクノロジー アンド リサーチ デジタル信号をスケーラブルビットストリームにエンコードする方法、及びスケーラブルビットストリームをデコードする方法
JP2007509362A (ja) * 2003-10-10 2007-04-12 エージェンシー フォー サイエンス,テクノロジー アンド リサーチ デジタル信号をスケーラブルビットストリームにエンコードする方法、及びスケーラブルビットストリームをデコードする方法
KR100813193B1 (ko) * 2004-02-13 2008-03-13 프라운호퍼-게젤샤프트 츄어 푀르더룽 데어 안게반텐 포르슝에.파우. 정보 신호의 양자화 방법 및 장치
KR100814673B1 (ko) * 2004-02-13 2008-03-18 프라운호퍼-게젤샤프트 츄어 푀르더룽 데어 안게반텐 포르슝에.파우. 오디오 부호화
US8037114B2 (en) 2004-12-13 2011-10-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method for creating a representation of a calculation result linearly dependent upon a square of a value
US8494667B2 (en) 2005-06-30 2013-07-23 Lg Electronics Inc. Apparatus for encoding and decoding audio signal and method thereof
US7761303B2 (en) 2005-08-30 2010-07-20 Lg Electronics Inc. Slot position coding of TTT syntax of spatial audio coding application
US7774199B2 (en) 2005-10-05 2010-08-10 Lg Electronics Inc. Signal processing using pilot based coding
US8095358B2 (en) 2005-10-24 2012-01-10 Lg Electronics Inc. Removing time delays in signal paths
US8095357B2 (en) 2005-10-24 2012-01-10 Lg Electronics Inc. Removing time delays in signal paths
US8326618B2 (en) 2006-05-05 2012-12-04 Thomson Licensing Method and apparatus for lossless encoding of a source signal, using a lossy encoded data steam and a lossless extension data stream
WO2007128662A1 (fr) * 2006-05-05 2007-11-15 Thomson Licensing procédé et appareil pour un codage sans perte d'un signal source, à l'aide d'un flux de données codées avec perte et d'un flux de données d'extension sans perte
EP1852849A1 (fr) * 2006-05-05 2007-11-07 Deutsche Thomson-Brandt Gmbh Procédé et appareil d'encodage sans perte d'un signal source utilisant un courant de données encodées avec perte et un courant d'extension de données encodées sans perte
EP1883067A1 (fr) * 2006-07-24 2008-01-30 Deutsche Thomson-Brandt Gmbh Méthode et appareil pour l'encodage sans perte d'un signal source, utilisant un flux de données encodées avec pertes et un flux de données d'extension sans perte.
WO2008012211A1 (fr) * 2006-07-24 2008-01-31 Thomson Licensing Procédé et appareil de codage sans perte d'un signal source avec utilisation d'un flux de données codées avec pertes et d'un flux de données d'extension sans pertes
US9093065B2 (en) 2006-09-20 2015-07-28 Thomson Licensing Method and device for transcoding audio signals exclduing transformation coefficients below −60 decibels
US9236063B2 (en) 2010-07-30 2016-01-12 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for dynamic bit allocation
CN103038822A (zh) * 2010-07-30 2013-04-10 高通股份有限公司 用于多级形状向量量化的系统、方法、设备和计算机可读媒体
US8924222B2 (en) 2010-07-30 2014-12-30 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for coding of harmonic signals
CN103038822B (zh) * 2010-07-30 2015-05-27 高通股份有限公司 用于多级形状向量量化的系统、方法、设备和计算机可读媒体
US9208792B2 (en) 2010-08-17 2015-12-08 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for noise injection
WO2018175119A1 (fr) * 2017-03-22 2018-09-27 IMMERSION SERVICES LLC dba IMMERSION NETWORKS Système et procédé de traitement de données audio
US10339947B2 (en) 2017-03-22 2019-07-02 Immersion Networks, Inc. System and method for processing audio data
US10354668B2 (en) 2017-03-22 2019-07-16 Immersion Networks, Inc. System and method for processing audio data
US10354669B2 (en) 2017-03-22 2019-07-16 Immersion Networks, Inc. System and method for processing audio data
US10354667B2 (en) 2017-03-22 2019-07-16 Immersion Networks, Inc. System and method for processing audio data
US10861474B2 (en) 2017-03-22 2020-12-08 Immersion Networks, Inc. System and method for processing audio data
US11289108B2 (en) 2017-03-22 2022-03-29 Immersion Networks, Inc. System and method for processing audio data
US11562758B2 (en) 2017-03-22 2023-01-24 Immersion Networks, Inc. System and method for processing audio data into a plurality of frequency components
US11823691B2 (en) 2017-03-22 2023-11-21 Immersion Networks, Inc. System and method for processing audio data into a plurality of frequency components
US11281312B2 (en) 2018-01-08 2022-03-22 Immersion Networks, Inc. Methods and apparatuses for producing smooth representations of input motion in time and space

Also Published As

Publication number Publication date
HK1077391A1 (en) 2006-02-10
CN1258172C (zh) 2006-05-31
KR100892152B1 (ko) 2009-04-10
ATE305655T1 (de) 2005-10-15
EP1495464A1 (fr) 2005-01-12
JP2005527851A (ja) 2005-09-15
KR20050007312A (ko) 2005-01-17
JP4081447B2 (ja) 2008-04-23
EP1495464B1 (fr) 2005-09-28
DE10217297A1 (de) 2003-11-06
CA2482427C (fr) 2010-01-19
DE50204426D1 (de) 2005-11-03
CA2482427A1 (fr) 2003-10-23
AU2002358578A1 (en) 2003-10-27
CN1625768A (zh) 2005-06-08

Similar Documents

Publication Publication Date Title
EP1495464B1 (fr) Dispositif et procede pour coder un signal audio a temps discret et dispositif et procede pour decoder des donnees audio codees
EP1502255B1 (fr) Dispositif et procede de codage echelonnable et dispositif et procede de decodage echelonnable
EP1647009B1 (fr) Procede et dispositif pour traiter un signal
DE19747132C2 (de) Verfahren und Vorrichtungen zum Codieren von Audiosignalen sowie Verfahren und Vorrichtungen zum Decodieren eines Bitstroms
EP1609084B1 (fr) Dispositif et procede de conversion en une representation transformee ou de conversion inverse de ladite representation transformee
DE60310716T2 (de) System für die audiokodierung mit füllung von spektralen lücken
DE602004004818T2 (de) Audiosignalcodierung oder -decodierung
DE102006022346B4 (de) Informationssignalcodierung
DE19811039B4 (de) Verfahren und Vorrichtungen zum Codieren und Decodieren von Audiosignalen
DE69731677T2 (de) Verbessertes Kombinationsstereokodierverfahren mit zeitlicher Hüllkurvenformgebung
EP1397799B1 (fr) Procede et dispositif de traitement de valeurs d&#39;echantillonnage audio discretes dans le temps
DE69737489T2 (de) Formung des erkennbaren Rauschsignals in der Zeitdomäne mittels LPC-Voraussage im Frequenzraum
US7275036B2 (en) Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data
EP1654674B1 (fr) Dispositif et procede pour traiter au moins deux valeurs d&#39;entree
DE4320990B4 (de) Verfahren zur Redundanzreduktion
DE102006051673A1 (de) Vorrichtung und Verfahren zum Nachbearbeiten von Spektralwerten und Encodierer und Decodierer für Audiosignale
WO1999017587A1 (fr) Procede et dispositif pour coder un signal stereo temporellement discret
DE102019204527B4 (de) Kodierungs-/dekodierungsvorrichtungen und verfahren zur kodierung/dekodierung von vibrotaktilen signalen
DE19742201C1 (de) Verfahren und Vorrichtung zum Codieren von Audiosignalen
DE19829284C2 (de) Verfahren und Vorrichtung zum Verarbeiten eines zeitlichen Stereosignals und Verfahren und Vorrichtung zum Decodieren eines unter Verwendung einer Prädiktion über der Frequenz codierten Audiobitstroms
DE10065363A1 (de) Vorrichtung und Verfahren zum Decodieren eines codierten Datensignals

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2482427

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2002792858

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 10966780

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 1020047016744

Country of ref document: KR

Ref document number: 2002358578

Country of ref document: AU

Ref document number: 2003585070

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 20028289749

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 2002792858

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020047016744

Country of ref document: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWG Wipo information: grant in national office

Ref document number: 2002792858

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 2002358578

Country of ref document: AU