CN1258172C

CN1258172C - Device and method for encoding a time-discrete audio signal and method for decoding coded audio data

Info

Publication number: CN1258172C
Application number: CNB028289749A
Authority: CN
Inventors: 拉尔夫·盖格; 托马斯·思博尔; 卡尔海因兹·勃兰登堡; 朱尔根·赫尔; 朱尔根·科洛尔; 乔吉姆·德格拉
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2002-04-18
Filing date: 2002-12-02
Publication date: 2006-05-31
Anticipated expiration: 2022-12-02
Also published as: DE10217297A1; WO2003088212A1; EP1495464B1; EP1495464A1; CA2482427A1; KR100892152B1; AU2002358578A1; HK1077391A1; CA2482427C; JP4081447B2; KR20050007312A; DE50204426D1; JP2005527851A; CN1625768A; ATE305655T1

Abstract

According to the invention, a time-discrete audio signal is processed (52) in order to provide a quantization block with quantized spectral values (52). In addition, a whole-number spectral representation is generated from a time-discrete audio signal, using a whole-number transformation algorithm (56). The quantization block, which has been generated using a psychoacoustic model (54), is inverse quantized and rounded (58) to form a differential between the whole-number spectral values and the inverse quantized rounded spectral values. The quantization block alone produces a psychoacoustic encoded/decoded audio signal affected by loss after the decoding process, whereas the quantization block together with the combination block provides a loss-free, or practically loss-free encoded and decoded audio signal during said decoding process. The generation of the differential signal in the frequency range allows a simpler encoder/decoder structure to be produced.

Description

Apparatus and method to coding audio signal and decoding

Technical field

The present invention relates to audio coding/decoding, especially relate to the coding/decoding algorithm that can expand (scalable), this algorithm has comprised psychoacoustic first extension layer and second extension layer that comprises the auxiliary audio frequency data that is used for losslessly encoding.

Background technology

The contemporary audio coding method, as MPEG Layer3 (MP3) or MPEG ACC, the conversion of use as so-called correction discrete cosine transform (MDCT) obtains the frequency representation to the data block formula of sound signal.Such audio coder obtains a data stream of time-discrete audio sample usually.The data stream of audio sample by windowization (windowed) in order to obtain for example window data piece of the audio sample of 1024 or 2048 windowizations.Used multiple window function in order to carry out windowization, for example sine-window etc.

Subsequently, the time discrete audio sample of windowization is converted into frequency spectrum designation by bank of filters.In principle, Fourier transform perhaps is used for the multiple Fourier transform of cause specific, and as FFT, the MDCT that perhaps sets forth previously may be used to this.Then, can do further processing as requested in the data block of the audible spectrum value of bank of filters output.In the audio coder of quoting, follow quantification by audible spectrum in the above, wherein the typical case selects quantized level, covers under the threshold value in psychologic acoustics so that be quantized the quantizing noise of introducing, that is to say to be " covered up ".Quantification is a kind of lossy coding.In order to obtain further data volume reduction, the spectrum value of quantification is by entropy coding, for example by Huffman encoding.By adding supplementary, as scale factor (scale factors) etc., the bit stream that can be stored or transmit forms from the spectrum value that entropy coding quantizes by the bit stream multiplexer.

In audio decoder, bit stream is separated spectrum value and the supplementary that multiplexer is divided into coded quantization by a bit stream.The quantification spectrum value of entropy coding is at first decoded by entropy, to obtain to quantize spectrum value.Be reversed quantification then through the spectrum value that quantizes, obtaining to comprise the decoding spectrum value of quantizing noise, yet this quantizing noise is covered under the threshold value in physiological acoustics, thereby is unheard.These spectrum values are converted into the time representation mode by the composite filter group then, to obtain time-discrete decoded audio sampling.In the composite filter group, must use a kind of mapping algorithm opposite with mapping algorithm.And after frequency-time conversion or inverse transformation, window must be cancelled.

In order to obtain good frequency selectivity, contemporary audio scrambler typical case land productivity uses piece overlapping.This situation is shown in Fig. 4 a.At first, take out for example 2048 time-discrete audio samples by installing 402, and windowization.Realize that the device 402 of this window has the length of window of 2N sampling, and the data block of a 2N window sampling is provided at output terminal.In order to obtain windows overlay,, formed the data block of second 2N window sampling by install 404 (only be in order to explain clearlyer, this device is separated to describe with device 402) in Fig. 4 a.Yet, the time discrete audio sample that 2048 samplings that are admitted to device 404 are not and then first windows, but the latter half of the sampling that has comprised by device 402 window changes has only comprised 1024 " newly " in addition and samples.In Fig. 4 a, schematically illustrate by installing 406 that this is overlapping, caused 50% degree of overlapping.Then, to exporting by 2N window sampling of device 402 and exporting, realize the MDCT algorithm with device 408 and 410 respectively by 2N window sampling of device 404.Device 408, also provides N spectrum value and install 410, but has been to be used for second window for first window provides N spectrum value according to known MDCT algorithm, wherein have between first and second window 50% overlapping.

In demoder, the N of first a window spectrum value shown in Fig. 4 b, is admitted to device 412 and realizes revising inverse discrete cosine transform.Same operation is applied N spectrum value in second window.They are admitted to device 414, have also realized the correction inverse discrete cosine transform.Device 412 all is respectively first window with device 414 and second window provides 2N sampling.

In device 416, in Fig. 4 b, represent with TDAC (time domain is mixed repeatedly cancellation), consider that two windows are overlapping.Especially, the latter half of first window sampling y ₁The sampling y of the first half of (just having coefficient N+k) and second window ₂(just having coefficient k) addition, like this at output terminal, just the demoder place generates N decoded time-domain sampling.

It should be noted that the function by the device 416 that is also referred to as the addition function, the windowization that realizes in the scrambler shown in Fig. 4 a is considered automatically that to a certain extent " oppositely windowization " takes place so needn't have significantly in the demoder shown in Fig. 4 b.

When being designated as w (k) by installing 402 or 404 window functions of realizing, wherein coefficient k is represented time coefficient, window weight w (k) after the condition that must satisfy is square with square after window weight w's (N+k) and equal 1, wherein the scope of k from 0 to N-1.When using sine-window, the weight of this window is followed the preceding half-wave of sine function, and this condition satisfies all the time because arbitrarily the sine at angle square and cosine square be 1.

The shortcoming of describing in Fig. 4 a according to the windowhood method of MDCT function is, by time-discrete sampling is multiplied each other windowization, when considering that it is a sine-window, it is reached by a floating number, because the sine at an angle between 0 to 180 degree can not produce integer, unless this angle equals 90 degree.Be convenient integer time discrete sampling during, after windowization, also can produce floating number by window.

Therefore, even when applied mental encoded acoustic not, just when needs obtain lossless coding, in order to carry out suitable easy-to-handle entropy coding, also be necessary at the device 408 or the quantification of installing 410 output.

When known conversion, as described on Fig. 4 a basis, be applied to lossless audio coding, need to use extraordinary quantification, can ignore owing to floating number rounds the resultant error that causes, perhaps error signal needs for example to be encoded extraly in time domain.

Notion of the prior art just quantizes very well to be adjusted so that can be left in the basket owing to floating number rounds the erroneous results that causes therein, and is for example disclosed such in German patent DE 19,742 201 C1.Here, sound signal is converted into its frequency spectrum designation and is quantized, with the spectrum value that obtains to quantize.The spectrum value that quantizes is reversed quantification then, transforms to time domain, and is compared with original sound signal.If error, the error between the sound signal after original audio signal and the quantification/inverse quantization just, at one more than the error threshold, quantizer can be adjusted more accurately in feedback, and then compares.When being lower than error threshold, stop iteration.The residual signal that may still exist is by a time domain coding device coding and be written into a bit stream, and this bit stream also comprises the coding spectrum value after the quantizer adjustment of basis existence when iteration is cancelled quantizes except the residual signal of time domain coding.It should be noted that quantizer not necessarily must control by psychoacoustic model, so that the spectrum value of coding quantizes more accurately than the spectrum value that obtains owing to the employing psychoacoustic model usually.

At publication " A Design of Lossy and Lossless Scalable AudioCoding " (T.Moriya et al., Proc.ICASSP, an extendible scrambler has been described 2000), this scrambler comprises as a mpeg encoder and diminishes data compressing module as first, this module has the digital signal form of a data block form as input signal, and generates the bit stream of compression.Encoding in another existing local decoder is cancelled once more, and has generated an encoding/decoding signal.This signal is compared with initial input signal by deduct encoding/decoding signal from the initial input signal.Error signal is sent to second module then, has used a harmless bit pad there.This conversion had two steps.The first step comprises a conversion from the two's complement form to the sign magnitude form.Second step be included in one handle to determine in from the conversion of vertical sequence of values to a horizontal bit sequence.Lossless data conversion is performed so that zero quantity maximization or make zero quantity maximization continuously in the sequence, so that obtain the time error signal of representing as numeric results as well as possible.This principle is based at publication " Multi-Layer Bit Sliced Bit Rate Scalable AudioCoder " (103 ^RdAES Convention, Preprint No.4520,1997) in bit slice algorithm coding (BSAC) scheme illustrated.

The shortcoming of above-mentioned notion is the data that are used for the lossless extension layer, and the auxiliary data that just is used for obtaining the lossless audio signal decoding must obtain in time domain.This means that acquisition need comprise the complete decoding of frequency/time change for the encoding/decoding signal that obtains time domain, so the formation by the sampling difference between original audio input signal and coding/decoding sound signal comes error signal, the coding/decoding sound signal is owing to being the psychologic acoustics coding thereby diminishing.The shortcoming of this notion especially is when scrambler generates audio data stream, two kinds of time/frequency-transposition arrangements completely, as bank of filters or as the MDCT algorithm, all be required for the conversion of forward direction, on the other hand, just to producing error signal, need a complete inverse filterbank or one composition algorithm completely.Thereby scrambler also must have decoder function completely except its intrinsic encoder functionality.If scrambler realized by software, then be this to memory property and all requirements to some extent of processor performance, thereby cause the realization of scrambler to increase expense.

Summary of the invention

The object of the present invention is to provide a kind of less notion that spends, utilize this notion, can produce the audio data stream of decoding in almost harmless mode with a kind of.

This target by in the claim 1 to the device of time-discrete coding audio signal, in the claim 21 to the method for time-discrete coding audio signal, the device of in the claim 22 voice data of having encoded being decoded, the method of in the claim 31 voice data of having encoded being decoded, perhaps the computer program in the claim 32 or 33 is realized.

The present invention is based on such discovery, can carry out the auxiliary audio signal of losslessly encoding to sound signal can be by providing a data block that quantizes spectrum value as usually, then it is carried out the spectrum value that inverse quantization obtains inverse quantization and realize, the spectrum value of inverse quantization is owing to used psychoacoustic model to quantize thereby diminished.The spectrum value of these inverse quantization is rounded then, to obtain the piece that rounds through the spectrum value of the inverse quantization that rounds.As the reference that forms difference, according to the present invention, used a kind of integer transform algorithm, this algorithm has generated the spectrum value integer piece that only comprises the integer spectrum value from an integer time discrete sampling block.According to the present invention, be that mode with spectrum value realizes now in the combination that rounds piece and the spectrum value in the integer piece, that is to say in frequency domain and realize, so do not need composition algorithm at scrambler itself, just inverse filterbank or oppositely MDCT algorithm etc.Because integer transform algorithm and round quantized value, the combined block that comprises the different spectral value only comprises can be with the round values of some known way entropy coding.It should be noted that entropy coder arbitrarily may be used to the entropy coding of combined block, as huffman encoder and algorithm coding device etc.

Quantification spectrum value coding to quantize block also can use scrambler arbitrarily, as is known contemporary audio scrambler instrument commonly used.

It should be noted that coding/decoding notion of the present invention and modern code device are compatible, as the center/edge coding of window switching, TNS or multi channel audio signal.

In a preferred embodiment of the invention, the spectrum value quantize block that provides an applied mental acoustic model to quantize with MDCT.In addition, preferably use a so-called IntMDCT as the integer transform algorithm.

In alternate embodiment of the present invention, can not use common MDCT, and IntMDCT can be used as the approximate of MDCT, promptly the integer frequency spectrum that obtains by the integer transform algorithm is used to the IntMDCT spectrum value that the psychologic acoustics quantizer obtains to quantize, this spectrum value and then be reversed quantizes and rounds, to compare with original integer spectrum value.In this case, only need single conversion, just IntMDCT produces the integer spectrum value from the sampling of integer time discrete.

Typically, the processor processing integer, perhaps each floating number is represented as integer.If an integer arithmetic is used for a processor, it can need not the spectrum value of inverse quantization is rounded, because because the algorithm of the processor value of rounding, just within LSB degree of accuracy scope, promptly least significant bit (LSB) always exists.Under these circumstances, realized lossless process completely, just the processing within the processor accuracy rating that is used.Yet alternatively, also can be rounded to a precision roughly, so that the differential signal in the synthetic piece is rounded to one by the determined degree of accuracy of bracket function.Round in order to generate a scrambler almost harmless on the data compression meaning, outside disposal system originally rounds, to have introduced, strengthened dirigibility like this, thereby influenced the harmless degree of coding.

Originally outstanding especially in psychologic acoustics coding audio data and auxiliary audio frequency data two aspects according to demoder of the present invention, auxiliary audio frequency data extracts from voice data, carries out possible entropy decoding, does following processing then again.At first quantize block is reversed quantification in the demoder, and the same algorithm that rounds rounds in use and the scrambler, can be added to so subsequently on the entropy decoding auxiliary audio frequency data.In demoder, the frequency spectrum designation of the sound signal of psychologic acoustics compression and the harmless expression of sound signal exist simultaneously then, wherein the audio signal frequency spectrum of psychologic acoustics compression represents to be transformed time domain, to obtain a harmless coding/decoding sound signal, and described harmless expression is by using with harmless for obtaining, perhaps as described above, basic harmless coding/decoding sound signal and the opposite integer transfer algorithm of integer transfer algorithm that uses transforms to time domain.

Description of drawings

Clearer in the description that above-mentioned and other target of the present invention and characteristic will be below with accompanying drawing combines:

Fig. 1 is used for discrete audio sample of processing time, to obtain therefrom can determine the circuit block diagram of the integer-valued preferred device of integer spectrum value;

Fig. 2 is the MDCT and a reverse synoptic diagram of the decomposition of MDCT in Givens rotation and two DCT-IV operations;

Fig. 3 is the legend representative that has 50% overlapping MDCT to decompose in rotation and DCT-TV operation;

Fig. 4 a is a schematic block circuit diagram with MDCT and 50% overlapping known encoder;

Fig. 4 b is the circuit block diagram that is used for known decoder that the value that Fig. 4 a generates is decoded;

Fig. 5 is a preferred schematic circuit block diagram according to scrambler of the present invention;

Fig. 6 is a schematic circuit block diagram that can be used as alternative preferred creative demoder;

Fig. 7 is the schematic circuit block diagram of a creative preferred demoder;

Fig. 8 a is the bit stream synoptic diagram with one first extension layer and one second extension layer;

Fig. 8 b is the bit stream synoptic diagram with one first extension layer and a plurality of other extension layers;

Fig. 9 is the synoptic diagram of binary coding difference spectrum value, is used to represent relevant with the degree of accuracy (position) of difference spectrum value and/or relevant with the frequency (sampling rate) of difference spectrum value possible extensive ratio.

Embodiment

On the basis of Fig. 5 to 7, will touch upon below creative encoder circuit (Fig. 5 and Fig. 6) or a creative preferred decoder circuit (Fig. 7).Scrambler of the present invention shown in Figure 5 comprises an input end 50, and time-discrete sound signal is admitted to this input end, also comprises an output terminal 52, the voice data that its output has been encoded.The time-discrete sound signal at input end 50 places by feedthrough 52 so that a quantize block to be provided, this piece provides the quantize block of time-discrete sound signal, this quantize block to comprise the quantification spectrum value of the time discrete spectral audio signal 50 of using physiological acoustics model 54 at output terminal.Scrambler of the present invention also comprises the device that uses an integer transform algorithm 56 to generate an integer piece, and wherein this integer arithmetic is effective to generating the integer spectrum value from the sampling of integer time discrete.

Creative scrambler also comprises and being used for from installing the device 58 that inverse quantization is carried out in the output of the 52 pairs of quantize block, and, when the needs precision different, also comprise a bracket function with the processor precision.As described, if reached the precision of processor system, then bracket function is included in the inverse quantization of quantize block inherently, in any case because the processor with integer arithmetic is that non integer value can not be provided.Provide a kind of so-called piece that rounds so install 58, it comprises inherently or explicitly is rounded inverse quantization spectrum value for integer.Round piece and integer piece and all be fed to and be used for the coupling apparatus that usage variance forms provides the difference with difference spectrum value, here term " difference block " means that the difference spectrum value is to comprise the numerical value that integer is determined and rounded the difference between the piece.

All be admitted to treating apparatus 60 from the quantize block of installing 52 outputs and from the difference block that output difference forms device 58, realize handling, and for example cause entropy coding difference block as common quantize block.Treating apparatus 60 is at the voice data of output terminal 52 outputs through coding, and these data comprise the information of quantize block, also comprise the information of difference block.

In first preferred embodiment, as shown in Figure 6, time-discrete sound signal is converted into frequency spectrum designation by the MDCT method, is quantized then.Device 52 is used to provide quantize block, has MDCT device 52a and a quantizer 52b.

In addition, the most handy IntMDCT56 generates the integer piece as the integer transfer algorithm.

In Fig. 6, treating apparatus 60 shown in Figure 5 is also described as bit stream code device 60a and entropy coder 60b, bit stream code device 60a is used for the quantize block of device 52b output is carried out the bit stream coding, and entropy coder 60b is used for difference block is carried out entropy coding.The voice data of bitstream encoder 60a output physiological acoustics coding, and the difference block of entropy coder 60b output entropy coding.Two kinds of output blocks of

module

60a and 60b can be combined into bit stream by a kind of suitable manner, this bit stream with the voice data of physiological acoustics coding as first extension layer, and being used for the auxiliary audio frequency data of losslessly encoding as second extension layer.This is consistent with the voice data of having encoded at output terminal 52 places at scrambler shown in Figure 5 then through the bit stream of expansion.

In an alternative preferred embodiment, can not use the MDCT piece 52a among Fig. 6 because it in Fig. 5 by a dotted line arrow 62 hinted.In this case, the integer frequency spectrum that provides of integer transform device 56 is sent to device 58 and the quantizer 52b that forms difference among Fig. 6.The spectrum value that is produced by the integer transform algorithm is used as the approximate of common MDCT frequency spectrum by a kind of mode here.The benefit of this embodiment is that only the IntMDCT algorithm is present in the scrambler, rather than IntMDCT and MDCT algorithm all need to exist.

Refer again to Fig. 6, it should be noted that the representative of real frame and solid line follows an ordinary audio scrambler of a certain mpeg standard, empty frame and dotted line are then represented the expansion of so common mpeg encoder.Therefore, can see not needing common mpeg encoder is carried out radical change, but catch the auxiliary audio frequency data of lossless coding, not need to change the basic structure of encoder/decoder by the method that increases integer transformer.

Fig. 7 shows a schematic circuit block diagram that is used for the voice data of having encoded at Fig. 5 output terminal 52 places is exported the creative demoder of decoding.It is decomposed into the psychologic acoustics coding audio data at first on the one hand, is decomposed into auxiliary audio frequency data on the other hand.The psychologic acoustics coding audio data is admitted to a common bit stream decoding device 70, and auxiliary audio frequency data, when after by the scrambler entropy coding, by scrambler 72 entropy codings.The output of bit stream decoding device 70 exist to quantize spectrum value in Fig. 7, can be sent to the identical inverse quantizer 74 of inverse quantizer structure in the device with Fig. 6 on these spectrum value principles.Reach a precision different if desired with the processor precision, in demoder, also provide one and rounded device 76, it is the same with the device 58 of Fig. 6 to round device 76, has realized a real number is mapped as the same algorithm of an integer or same bracket function.In a decoding end colligator 78, through the inverse quantization spectrum value that rounding preferably by mutually in addition the mode of spectrum value combine with the entropy coding auxiliary audio frequency data, make in demoder, the inverse quantization spectrum value appears at the output of device 74 on the one hand, and the integer spectrum value appears at the output of colligator 78 on the other hand.

Then, in order to carry out the inverse discrete cosine transform through revising, the spectrum value of output that can be by installing 80 handle assemblies 74 transforms to time domain, to obtain a psychologic acoustics coding that diminishes and the sound signal of decoding again.In order to carry out reverse integer MDCT (IntMDCT), can also transform to its time form by the output signal of installing 82 compositors 78, to produce a harmless coding/decoding sound signal, perhaps when adopting more rough rounding, the sound signal that can produce an almost harmless coding and decode again.

Entropy coder 60b among Fig. 6 particularly preferred embodiment below.In common modern mpeg encoder, a plurality of code tables are to select according to the average statistics amount that quantizes spectrum value.The difference block that is preferably in the output of compositor 58 uses identical code table or code book to carry out entropy coding.Because the size of difference block, promptly residual IntMDCT frequency spectrum depends on the precision of quantification, so the code table of entropy coder 60b is selected and can be carried out under the situation of assistant edge information not having.

In a MPEG-2AAC demoder, spectral coefficient just quantizes spectrum value, is grouped together in the scale factor in the quantize block, and wherein spectrum value comes weighting with the gain factor from the corresponding scale factor relevant with scale factor.Because in this known scrambler notion, a quantizer heterogeneous is used to quantize the spectrum value of weighting, the size of residue, just the spectrum value of the output of colligator 58 not only depends on scale factor, also depends on quantized value self.But because scale factor and quantize spectrum value and be included in the bit stream that the device 60a by Fig. 6 generates, just in the psychologic acoustics coding audio data, best size according to the difference spectrum value realizes the code book selection in the demoder, and on the basis of scale factor that in bit stream, transmits and quantized value, determine employed code table in the demoder.Because the output terminal at compositor 58 does not need to transmit supplementary so that the difference spectrum value is carried out entropy coding, entropy coding only causes the data transfer rate compression, and need not expand the supplementary of any signalling bit as entropy coder 60b in data stream.

In an audio coder of following standard MPEG-2 AAC, switch the forward echo of avoiding in the transient audio signal territory with window.This technology can change the size of piece based on the possibility of selecting window shape in per half MDCT window respectively in continuous blocks.Same, the integer transform algorithm of IntMDCT form (this algorithm is explained referring to figs. 1 through 3) has also used different window shape to carry out in windowization with the mixed Tewo branch that decomposes at time domain MDCT.Thereby, for preferably using identical window, the mapping algorithm of integer transform algorithm and generating quantification piece differentiates.

In a scrambler of following MPEG-2AAC, also there is multiple other coding tools, only introduce TNS (time-domain noise reshaping) and centre/edge (CS) stereo coding here.In the TNS coding, just, before quantification, spectrum value is revised in the CS coding.Then, the IntMDCT value, integer piece just, between poor, and quantize the MDCT value and increased.According to the present invention, form the integer spectrum value that the integer transform algorithm is admitted TNS coding and centre/edge coding.The TNS technology is based on the self-adaptive forward of MDCT value on frequency predicted.The identical predictive filter that general T NS module by a signal adaptive mode calculates preferably also is used to predict the integer spectrum value, and if wherein produced non integer value, then can use to round downwards, produce round values once more.This rounds and preferably occurs in after each prediction steps.In demoder, initial spectrum can be rebuild once more by using inverse filter and same bracket function.Equally, the CS coding also can be used for the IntMDCT spectrum value by the Givens rotation that rounds that use has angle π/4 based on lift method.Therefore, the initial IntMDCT value in demoder can be rebuild.

It should be noted that in notion of the present invention can be applied to all auditory adaptation audio coders based on MDCT with the preferred embodiment of IntMDCT as the integer transform algorithm.Just as an example, these scramblers are the scramblers according to MPEG-4 AAC extensibility, the low time delay of MPEG-4 AAC, MPEG-4 BSAC, MPEG-4 Twin VQ, DolbyAC-3 etc.

Especially it should be noted that this creative notion is backwards-compatible.Auditory adaptation coding or demoder are not changed, and only are to be expanded.The supplementary of harmless component can be transmitted in the auditory adaptation mode bitstream encoded with backward compatible manner, as the MPEG-2 AAC in " auxiliary data " territory.The extention of the auditory adaptation demoder of front is represented by dotted lines in Fig. 7, and it can come together to estimate and the reconstruction auxiliary data with quantification MDCT frequency spectrum with from the IntMDCT frequency spectrum that the auditory adaptation demoder obtains with lossless manner.

Harmless or almost lossless coding down additional, but the creationary notion of psychologic acoustics coding especially is fit to generation, transmission and decoding extended data stream.But known extended data stream comprises many different extension layers.Wherein, minimum at least extension layer can be sent out and irrespectively decode with higher extension layer.But in the extension process of data, other extension layer or enhancement layer are added in first extension layer or the basic unit.A complete scrambler can produce extendible data stream, but this data stream has first extension layer, but also has other extension layer of arbitrary number on the principle.An advantage of extensibility notion is, if there is a broadband transmission channel to use, but the extended data stream that is produced by scrambler can send fully.That is to say, but comprise that all extension layers all can transmit by broadband transmission channel.But, if having only the transmission channel of an arrowband, still can send through encoded signals, but can only send with the form of other extension layer of first extension layer or certain number by transmission channel.Wherein the number of other extension layer is less than all expansion numbers of plies that produced by scrambler.Certainly, but connect with channel and the scrambler of adaptive channel may produce basic extension layer or first extension layer and other extension layer a plurality of and that channel is relevant.

At demoder one end, but expansion concept also has an advantage, and that is exactly backwards-compatible.This means that the demoder that can only handle first extension layer ignored second and other extension layer in the data stream, and can produce a useful output signal.But, if demoder is a typical modern more demoder, can handle a plurality of extension layers in the extended data stream, this scrambler can be handled identical data stream as basic demoder so.

In the present invention, basic extensibility is the module that quantizes, i.e. the output of bitstream encoder 60a is written in first extension layer 81 of Fig. 8, and under the situation of considering Fig. 6, it comprises psychologic acoustics coded data, for example frame.The difference spectrum value that preferably passes through entropy coding that produces by synthesizer 58 is written in second extension layer, and this simple extensibility is represented with 82 in Fig. 8 a.Therefore concerning frame, comprise auxiliary audio frequency data.

If the transmission channel from the scrambler to the demoder is a broadband transmission channel, extension layer 81 and 82 can send to demoder.If but this transmission channel is a narrow-band transmission channel, having only first extension layer is " meeting ", and second extension layer can directly remove from data stream before data send, so demoder is only handled first extension layer.

At demoder one end, " the basic demoder " that can only handle the psychologic acoustics coded data can directly be ignored second extension layer when receiving second extension layer by broad-band channel.If but this demoder is a demoder completely that contains psychologic acoustics decoding algorithm and integer solution code calculation, it can be decoded with first and second extension layer so, to produce lossless coding and decoded output signal.

Schematically illustrated a preferred embodiment of the present invention among Fig. 8 a, the psychologic acoustics coded data that is used for frame also is placed in first extension layer.Second extension layer among Fig. 8 a quantized more subtly, a plurality of extension layers, for example (littler) second extension layer, the 3rd extension layer, the 4th extension layer or the like occur in feasible this second extension layer from Fig. 8.

Especially be fit to further quantize from the difference spectrum value of totalizer 58 output, as based on shown in Figure 9.Fig. 9 has schematically illustrated binary-coded spectrum value.Every row 90 among Fig. 9 is represented a binary-coded difference spectrum value.The difference spectrum value is classified according to frequency in Fig. 9, represents with arrow 91 on figure.Difference spectrum value 92 ratios divide spectrum value 90 that higher frequency is arranged.In the form among Fig. 9 first row are represented a highest significant position in the difference spectrum value; It is the bit of MSB-1 that second digit is represented significance bit; It is the bit of MSB-2 that third digit is represented significance bit.It is the bit of LSB+2 that row second from the bottom are represented significance bit; It is the bit of LSB+1 that row last are represented significance bit; It is the bit of LSB that last row are represented number of significant digit, just the least significant bit (LSB) of a difference spectrum value.

In a preferred embodiment of the invention, for example 16 highest significant positions of difference spectrum value occur in second extension layer, to realize precise quantification, so if desired, can carry out entropy coding by entropy coder 60b.The demoder that adopts second extension layer is at the precision acquisition difference spectrum value of output terminal with 16 bits, and such second extension layer and first extension layer provide the losslessly encoding sound signal of a CD Quality together.The CD Quality audio sample of known existence 16 bits.

On the other hand, if the sound signal of studio tonequality is offered scrambler, that is, each sampling comprises the sound signal of 24 bits, then scrambler can further produce the 3rd extension layer of last 8 bits that comprise the difference spectrum value, and carries out entropy coding (device 60 of Fig. 6) as required.

One completely demoder obtain the data stream of first extension layer, second extension layer (16 highest significant positions of difference spectrum value) and the 3rd extension layer (difference spectrum value 8 time high significance bits), this demoder can provide a coding/decoding sound signal harmless, studio tonequality, that is to say, adopt whole three extension layers to provide the sampling word of 24 bits wide at the output terminal of demoder.

It should be noted that field, studio sound intermediate frequency signal has longer sampling word length than general consumer field sound signal.In consumer field, among the audio frequency CD signal letters wide be 16 bits, and be 24 or 20 bits in the field, studio.

Based on the notion at IntMDCT field convergent-divergent, as previously mentioned, all three kinds of precision (16 bits, 20 bits or 24 bits) or the minimum arbitrary accuracy that quantizes with 1 bit all can be quantized coding.

Here, the sound signal of representing with 24 bit accuracy representing in the integer frequency domain by means of reverse IntMDCT, and and the audio coding output signal that adapts to of hearing based on MDCT quantize combination.

The integer difference value that is used for harmless expression is not to encode fully at an extension layer now, but at first encodes with a kind of lower precision.Only in other extension layer, be sent as the required residue of accurate expression.Yet a kind of replacement scheme is, a difference spectrum value can promptly for example be represented with 24 bits in other extension layer by complete expression, but like this for the extension layer of decoding this other, no longer needs following extension layer.Yet, this situation can cause higher bit stream size, but when the bandwidth of transmission channel does not have problems, will simplify in decoder end,, adopt an extension layer just enough all the time to decoding because but extension layer no longer needs to combine in demoder.

If for example least-significant byte LSB as shown in Figure 9, no longer sends, just can be implemented in the extensibility between 24 bits and 16 bits when beginning.

For the value inverse transformation that will transmit with lower accuracy to time domain, the value that is transmitted preferably is expanded go back to the prime area, for example 24 bits for example use 2 ⁸Multiply by the value of being transmitted.A reverse IntMDCT is applied to the value that corresponding expansion is returned.

In the accuracy quantification in frequency domain according to the present invention, also preferably utilize the redundancy among the LSB.If for example a sound signal has very little energy at the top frequency domain, this represents with very little value in the IntMDCT frequency spectrum, for example these values are significantly smaller than the value (128 that can for example represent with 8 bits, ..., 127), in the compressibility of the LSB of IntMDCT frequency spectrum value, also embodied this situation.And it should be noted that: in very little difference spectrum value, a plurality of bits from MSB to MSB-1 typically are equal to zero; Be before the bit of MSB-n-1 in significance bit, first in the binary-coded difference spectrum value 1 do not exist.In this case, but only comprise in zero when the difference spectrum value in second extension layer, entropy coding especially is fit to further data compression.

According to an alternative embodiment of the invention, preferably use the sampling rate extendability for second extension layer 82 of Fig. 8 a.The sampling rate extendability realizes by being the difference spectrum value that is included in first cutoff frequency in second extension layer to the maximum, shown in Fig. 9 the right, and in other extension layer, comprises the difference spectrum value of frequency between first cutoff frequency and maximum frequency.Certainly, can realize further expansion, to form a plurality of extension layers at whole frequency domain.

In a preferred embodiment of the invention, second extension layer among Fig. 9 comprises that frequency is the difference spectrum value of 24kHz to the maximum, corresponding to the sampling rate of 48kHz.The 3rd extension layer comprises the difference spectrum value from 24kHz to 48kHz, corresponding to the sampling rate of 96kHz.

What need further attention is in second extension layer and the 3rd extension layer, not to be that all positions in the difference spectrum value all need coding.In other form of synthetic extendability, second extension layer can comprise the position from MSB to MSB-X of the difference spectrum value that is a certain cutoff frequency to the maximum.The 3rd extension layer can comprise the position from MSB to MSB-X of the difference spectrum value from first cutoff frequency to highest frequency then.The 4th extension layer can comprise the remaining bit of the difference spectrum value that is cutoff frequency to the maximum.Last extension layer comprises the remaining bit of the difference spectrum value of upper frequency.This notion will make the form among Fig. 9 be divided into four quadrants, and each quadrant is represented an extension layer.

In the extensibility of frequency, in a preferred embodiment of the invention, an extensibility between 48kHz and 96kHz sampling rate has been described.The sampled signal of 96kHz is at first only encoded half in the IntMDCT zone of lossless extension layer, and is transmitted.If the first half is not transmitted in addition, it is assumed that zero in demoder.In reverse IntMDCT (with the same length of scrambler), produced the signal of a 96kHz, the superincumbent frequency domain of this signal does not comprise energy, thereby may not have under the situation of mass loss with 48kHZ by double sampling.

But consider the size of extension layer, Fig. 9 has that the difference spectrum value is preferably in top quantification in the quadrant of fixed boundary, because in an extension layer, for example in fact only need comprise 16 or 8 or be cutoff frequency to the maximum or be higher than the spectrum value of cutoff frequency.

A kind of ratio as an alternative " has been softened " quadrant boundary of Fig. 9 to a certain extent.In the example of frequency extensibility, this means not because the difference spectrum value before cutoff frequency does not change and is zero behind cutoff frequency, just use so-called " brick wall low pass ".Opposite, the difference spectrum value also can come filtering by any low pass that some obstruction is lower than the spectrum value of cutoff frequency,, more than cutoff frequency, the difference spectrum value also still has energy, though energy is reducing.In the extension layer that generates thus, also be included in the above spectrum value of cutoff frequency.Yet because these spectrum values are less relatively, they can be carried out entropy coding effectively.The highest extension layer has at the fully differential spectrum value and is included in poor between the spectrum value of second extension layer in this case.

Precise quantification also can be softened to a certain extent equally.First extension layer for example also comprises more than 16 spectrum value, wherein still has this difference in next extension layer.Usually, second extension layer has the lower difference spectrum value of precision, and in next extension layer, remaining, but the difference between the spectrum value that comprises in the just complete spectrum value and second extension layer is transmitted.By this method, realized the variable precision reduction.

Creative coding or coding/decoding method are more suitable for being stored in the electronic storage medium with electronically readable control signal, and as floppy disk, wherein control signal can cooperate with a programmable computer system, thereby carry out coding and/or coding/decoding method.In other words, when program product is carried out on computers, there is a computer program, to realize coding and/or coding/decoding method with computer code of the machine-readable carrier of being stored in.When program was carried out in computing machine, method of the present invention can realize by the computer program with computer code of carrying out method of the present invention.

Below,, need be presented in " Audio CodingBased on Interger Transforms " (111 as the example of an integer transform algorithm ^ThAES convention, NewYork, 2001) the middle IntMDCT mapping algorithm of describing.Because IntMDCT has the attracting characteristic of MDCT algorithm, overlapping as the good frequency spectrum designation of sound signal, strict sampling and piece, IntMDCT is especially favored.A kind of by the good approximate mapping algorithm that can only use in the scrambler of Fig. 5 of IntMDCT, shown in the arrow 62 of Fig. 5 to MDCT.On the basis of Fig. 1 to 4, explained the important attribute of the integer transform algorithm of this special form.

Fig. 1 shows the creative preferred device into the time-discrete sampling of handling the expression sound signal, to obtain to make the effective round values of IntMDCT integer transform algorithm.Time-discrete sampling is by windowization and converted to frequency spectrum designation by device shown in Figure 1 alternatively.The time-discrete sampling of input end 10 that is admitted to device is by the window w windowization of a length for 2N time discrete sampling, to obtain the sampling of integer window at output terminal 12, these samplings are adapted to pass through converting means, are converted into frequency spectrum designation in particular for the device 14 of carrying out integer DCT.Integer DCT is used for producing N output valve from N input value, and this MDCT function 408 with Fig. 4 a is opposite, and function 408 only produces N spectrum value according to the MDCT equation from 2N window value.

For the sampling of window time discrete, at first in device 16, select two time-discrete samplings, they represent the vector of a time discrete sampling together.The time discrete sampling that device 16 is selected is positioned at the first quartile of window.Another time discrete sampling is positioned at second quadrant of window, and it is explained more in detail on the basis of Fig. 3.For the vector that device 16 generates, use the matrix rotation of one 2 * 2 dimension, wherein this operation is not carried out immediately, but carries out by a plurality of so-called " lifting matrixes ".

One promotes matrix and has and only comprise an element relevant with window w and be not equal to 0 or 1 attribute.

At " Factoring Wavelet Transforms Into Lifting Steps " (IngridDaubechies and Wim Sweldens, preprint, Bell Laboratories, LucentTechnologies, 1996) in described by the factorization of wavelet transformation to lifting step.Generally speaking, lifting scheme be perfect reconstruction wave filter with same low pass or Hi-pass filter between simple relation.Every pair of complementary filter can be factorized as lifting step.This is especially suitable for the Givens rotation.Consider that polyphase matrix is the situation of Givens rotation.Then, use following formula:

(\begin{matrix} \cos α - \sin α \\ \sin α \cos α \end{matrix}) = (\begin{matrix} 1 & \frac{(\cos α - 1)}{\sin α} \\ 0 & 1 \end{matrix}) (\begin{matrix} 1 & 0 \\ \sin α & 1 \end{matrix}) (\begin{matrix} 1 & \frac{(\cos α - 1)}{\sin α} \\ 0 & 1 \end{matrix}) - - - (1)

Each all has 1 as the principal diagonal element three lifting matrixes on equal sign the right.In addition, promote in the matrix at each, the element on principal diagonal does not equal 0, and the element on principal diagonal is not relevant with rotation angle α.

Vector promotes matrix multiple with the 3rd now, just multiply by rightmost lifting matrix in the following formula, obtains first result vector, describes this process with device 18 in Fig. 1.As among Fig. 1 by shown in the device 20, with one arbitrarily bracket function first result vector is rounded, this bracket function is mapped as one group of integer with one group of real number.First result vector after the output of device 20 has obtained rounding.First result vector after this rounds is sent to device 22, multiplies each other with of centre, just multiply by second on the right, obtains second result vector, second result vector after reusing appts 24 rounds and obtains rounding then.Second result vector after rounding delivered to device 26 and the leftmost lifting matrix multiple of above-mentioned equation, just first, obtain the 3rd result vector, still round at last with device 28, obtain the sampling of integer window at output terminal 12 places at last, if wish to obtain its frequency spectrum designation, then need it to be handled, thereby obtain the integer spectrum value at frequency spectrum output terminal 30 places by installing 14.

Device 14 is preferably realized as integer DCT.

According to length is the type 4 (DCT-IV) of N, and discrete cosine transform provides with following formula:

X_{t} (m) = \sqrt{\frac{2}{N}} Σ_{k = 0}^{N - 1} x (k) \cos (\frac{π}{4 N} (2 k + 1) (2 m + 1)) - - - (2)

The coefficient of DCT-IV forms an orthonormal N * N matrix, as publication " Multirate System And Filter Banks " (P.P.Vaidyanathan, PrenticeHall, Englewood Cliffs, 1993) described in, each quadrature N * N matrix can resolve into N (N-1)/2 Givens rotation.It should be noted that also and can further decompose.

For the classification of different DCT algorithms, can be with reference to " SignalProcessing With Lapped Transforms " book of H.S.Malvar, Artech House publishing house published in 1992.In general, the DCT algorithm is distinguished according to their basis function type.And comprise asymmetrical basis function among the preferred DCT-IV here, that is to say one 1/4 cosine wave (CW), one 3/4 cosine wave (CW), one 5/4 cosine wave (CW), one 7/4 cosine wave (CW) or the like, this discrete cosine transform, for example Type II (DCT-II) has rotational symmetry and point-symmetric basis function.The zero level basis function is a DC component, and first order basis function is half cosine wave (CW), and second level basis function is whole cosine wave (CW), or the like.Owing to consider DC component especially in DCT-II, it is applied in the video coding rather than is used in the audio coding because different with the video coding be that the DC component in the audio coding is incoherent.

How relevant with window function the rotation angle α that explains the Givens rotation below is.

Length of window is that the MDCT of 2N can reduce to the IV type discrete cosine transform that length is N.This can use DCT-IV and realize then by carry out the TDAC operation in time domain.Because 50% is overlapping, is used for the left side window of piece t and previous piece, the right-hand part of the t-1 that just determines is overlapping.The lap of two continuous blocks t and t-1 promptly before conversion, just in the input 10 of Fig. 1 with export between 12, carries out pre-service in time domain, as follows:

(\begin{matrix} {\tilde{x}}_{t} (k) \\ {\tilde{x}}_{t - 1} (N - 1 - k) \end{matrix}) = (\begin{matrix} w (\frac{N}{2} + k) & - w (\frac{N}{2} - 1 - k) \\ w (\frac{N}{2} - 1 - k) & w (\frac{N}{2} + k) \end{matrix}) (\begin{matrix} x_{t} (\frac{N}{2} + k) \\ x_{t} (\frac{N}{2} - 1 - k) \end{matrix}) - - - (3)

The numerical value that indicates wave above the letter is the value at output terminal 12 places of Fig. 1, and the x value that does not indicate wave in the following formula is represented the value at input end 10 places or installed the value that is used to select of 16 back.The span of coefficient k is from 0 to (N/2)-1, and w represents window function.

Relation below the TDAC condition of window function w has as can be known:

w {(\frac{N}{2} + k)}^{2} + w {(\frac{N}{2} - 1 - k)}^{2} = 1 - - - (4)

For some angle [alpha] _k, k=0,1 ..., (N/2)-1, this pre-service in time domain can be write as Givens rotation, this has explained in front.

The angle [alpha] of Givens rotation and the relation of window function w are as follows:

α＝arctan[w(N/2-1-k)/w(N/2+k)4 (5)

As long as it should be noted that the TDAC condition that meets, window function w can use arbitrarily.

Below, based on Fig. 2, the encoder of a cascade has been described.At first selected to x (2N-1) by the window time discrete sampling x (0) of " windowization " together by the device among Fig. 1 16, feasible sampling x (0) and x (N-1), promptly, form vector with output at device 16 from the sampling of first four/part of window and selected from the sampling of second four of a window/part.The arrow that intersects is represented to promote and multiply each other and round in succession installing 18,20 or 22,24 or 26,28, obtains the sampling of integer windowization with the input end at the DCT-IV piece.

As described above, when first vector is processed, second vector also chosen from sampling x (N/2-1) and x (N/2), that is to say, another is from the sampling of first four/part of window with from the sampling of second four of a window/part, again by the algorithm process described in Fig. 1.The every other sampling that comes from first four/part of window and second four/a part of is to all by similar processing.Third and fourth a four/part of first window is similarly handled.As shown in Figure 2, have the integer sampling of N " windowization " at output terminal 12 places, it is sent to the DCT-IV conversion." windowization " integer sampling of second and the 3rd four especially a ,/part is sent to DCT.The sampling of " windowization " integer of first of window four/part is sampled with " windowization " integer of the 4th a four/part of previous window and is handled among the DCT-IV that is admitted to the front.Similarly, " windowization " integer sampling of the 4th a four/part is sampled with " windowization " integer of first four/part of a back window and is sent to the DCT-IV conversion among Fig. 2.Central integer DCT-IV conversion 32 shown in Fig. 2 provides the spectrum value y (0) of N integer to y (N-1).Because window process and conversion process provide the output valve of integer, therefore do not need inverse quantization just these integer spectrum values directly can be carried out entropy coding.

At right one side of something of Fig. 2 a demoder has been described.This demoder comprises reciprocal transformation and " oppositely windowization ", and it is worked in the mode opposite with scrambler.Known reciprocal transformation for DCT-IV need use reverse DCT-IV as shown in Figure 2.As shown in Figure 2, for again in device 34 output terminal or once preceding and conversion next time from integer " windowization " sampling generation time discrete tone sampling x (0) to x (2N-1), with preceding once with after once the value of conversion the output valve of demoder DCT-IV34 is carried out reverse process.

The operation of output terminal is rotated by a reverse Givens and is finished, and promptly piece 26,28 or 22,24 or 18,20th passes through in an opposite direction.Second lifting matrix based on equation 1 can be described more in detail.When (in scrambler) second result vector forms by first result vector after will rounding and second lifting matrix multiple (device 22), following result is arranged:

The value x on equation 6 the right, y is an integer.Yet this value of not being suitable for xsin α.Here, need introduce bracket function r, it is represented with following equation:

This operation has been carried out and has been installed 24 function.

Reverse mapping in the demoder can be defined as follows:

Because the minus sign before rounding operation, clearly the integer approximation of lifting step can be reversed, and can not introduce mistake.The integer approximation that any one approximate application in these three lifting step has all been caused the Givens rotation.(in the scrambler) rounds rotation and can (in demoder) be reversed, and can not introduce mistake, promptly oppositely rounds along lifting step to pass through with opposite order, that is to say, the algorithm of Fig. 1 is bottom-up execution in decoding.

If bracket function r is point-symmetric, it is identical that the rotation that oppositely rounds is rotated with rounding of angle-α, as follows:

(\begin{matrix} \cos α & \sin α \\ - \sin α & \cos α \end{matrix}) - - - (9)

The lifting matrix that is used for demoder promptly is used for reverse Givens rotation, can directly be obtained by equation (1) in this case, only needs simply " sin α " item to be replaced with " sin α ".

Below, on the basis of Fig. 3, mention the decomposition of the common MDCT with overlaid windows 40 to 60 once more.Window 40 to 60 difference overlapping 50%.Each window, the Givens rotation interior or in third and fourth a four/part of window of at first first and second of window a four/part is performed, and is as shown in arrow 48.Then, the value that is rotated, the integer of windowization sampling just is admitted to the DCT of a N to N, makes that the 4th of second and the 3rd four/part of a window or next window is converted to frequency spectrum designation by the DCT-IV algorithm together with first four/part.

So, common Givens rotation is broken down into the lifting matrix, these matrixes are carried out in proper order, wherein after each lifting matrix multiple, introduce a step that rounds, making floating number produce the back at them is just rounded immediately, before each result vector and lifting matrix multiple, result vector has only integer like this.

Output valve is integer always, preferably also uses the integer input value.This does not represent limitation of the present invention, because each is as the PCM sampling of example, because they are stored on the CD, it is round values, its span is according to the wide variety of position, that is to say, and be sixteen bit or 21 change according to the time discrete digital input value.Yet, as set forth, by carrying out reverse rotation with opposite order, whole process can be reversed.Therefore, there is a MDCT integer approximation value, promptly harmless conversion with perfect reconstruction.

Shown in conversion integer output valve rather than floating point values be provided.It provides one perfectly to rebuild, thus carry out in the ban a forward direction conversion, then carry out one after in the time of conversion, do not introduce mistake.This conversion according to a preferred embodiment of the present invention, is to revising the replacement of discrete cosine transform.Yet other conversion methods also can be carried out by the mode of integer, are possible as long as be decomposed into rotation and rotation is decomposed into lifting step.

Integer MDCT has most of good characteristic of MDCT.It has an overlapping structure, can obtain thus than better frequency selectivity in the conversion of zero lap piece.Because the TDAC function, the windowization before the conversion has been considered this function, has kept strict sampling, makes all spectrum values of representing a sound signal equal the sum of input sample.

Compare with a common MDCT that the floating-point sampling is provided, in the preferred integer transform of describing, only in spectral regions, compare with common MDCT with very little signal intensity, noise has strengthened, and does not make it oneself become an important signal intensity and this noise strengthens.For this reason, integer is handled and to be helped effective hardware to realize, because only used the multiplication step, and multiplication can be decomposed into displacement and addition step at an easy rate, and these two kinds operate in and all are easy to realization fast in the hardware.Certainly, software realizes it also being feasible.

Integer transform provides a good frequency spectrum designation of sound signal, and still is retained in the integer zone.When it is applied to the phonological component of a sound signal, can cause excellent energy to be assembled.By this method, effective lossless coding scheme can be by realizing with simple cascade windowization/conversion as shown in Figure 1.Especially, use the storehouse coding of outlier to be popular, as using among the MPEG AAC.Preferably satisfy a desirable code table up to them, then the least significant bit (LSB) of ignoring is encoded by using two specific power to reduce all values.Compare with the alternative method of using bigger code table, consider the needed storage consumption of storage code table, this method is better.Also can obtain a kind of almost harmless scrambler by only omitting some least significant bit (LSB) simply.

Especially for voice signal, the entropy coding of integer spectrum value makes high coding gain become possibility.For the transient state part of signal, coding gain is very low, promptly since the smooth frequency spectrum of transient signal that is to say because sub-fraction equals or 0 spectrum value no better than.As at J.Herre, " the Enhancing the Performance of Perceptual AudioCoders by Using Temporal Noise Shaping (TNS) " 101 of J.D.Johnston ^StAESConvention, Los Angeles, 1996, described in the preprint 4384, yet this flatness may be by being utilized with the linear prediction in the frequency domain.It is to use open-loop prediction that a replacement scheme is arranged, and also having a replacement scheme is to predict with closed loop.First kind of scheme, promptly the open-loop prediction device is called as TNS.Quantification after the prediction causes the result to quantize the time domain structure of noise adaptation in sound signal, has therefore stoped the forward echo in the psychologic acoustics audio coder.For lossless audio coding, second kind of scheme is more suitable for, and closed loop fallout predictor just is because the closed loop prediction allows the accurate reconstruction of input signal.When frequency spectrum that this technology is applied to being generated, behind every grade of predictive filter, must carry out one and round step, be retained in the integer zone making it.By using inverse filter and same bracket function, initial frequency spectrum can accurately produce.

In order to utilize the redundancy of two interchannels in the data reduction, when using rounding of α/4 angles to rotate, in lossless manner, also can use centre-edge coding.Compare with the method for difference with the sum that calculates between the stereophonic signal left and right acoustic channels, this benefit that rounds rotation is to keep energy.Use so-called technology in conjunction with stereo coding to be opened or to close, as also being to realize like this in standard MPEG AAC for each wave band.In order to reduce the redundancy of two interchannels more neatly, also can consider other anglec of rotation.

Claims

1. be used for the device of time-discrete coding audio signal with the voice data after obtaining encoding comprised:

Being used for applied mental acoustic model (54) provides the device (52) of the quantize block of the time-discrete sound signal that is quantized;

Be used for this quantize block of inverse quantization, and the spectrum value of inverse quantization is rounded, with the device that rounds piece (58) of the spectrum value of the inverse quantization that obtains to be rounded;

Be used to utilize the integer transform algorithm to generate the device (56) of the integer piece of integer spectrum value, described integer transform algorithm is used for generating from integer time discrete sampling module the integer piece of spectrum value;

Be used for coupling apparatus (58), to obtain to have the difference block of difference spectrum value according to the difference formation difference block that rounds spectrum value between piece and the integer piece; And

Be used to handle the device (60) of quantize block and difference block, comprise the voice data of coding of the information of the information of quantize block and difference block with generation.

2. device as claimed in claim 1 wherein is used to the device (52) that provides by a MDCT, produces the MDCT module of a MDCT spectrum value from the time block of time audio signal value, and

Quantize this MDCT module with psychoacoustic model, comprise the quantize block of the MDCT spectrum value of quantification with generation.

3. device as claimed in claim 2, the device (56) that wherein is used to produce the integer piece is carried out an IntMDCT on time block, comprise the integer piece of IntMDCT spectrum value with generation.

4. as the described device of the arbitrary claim in front, the device (52) that wherein is used to provide calculates quantize block with the floating-point transfer algorithm.

5. device as claimed in claim 1, the device (52) that wherein is used to provide use the integer piece that produces by the device (56) that is used to generate to calculate quantize block.

6. device as claimed in claim 1,

The device (60) that wherein is used to handle carries out entropy coding (60a) to quantize block, to obtain the quantize block of entropy coding;

Carry out entropy coding (60b) to rounding piece, to obtain the piece that rounds of entropy coding; And

The quantize block of entropy coding is converted to first extension layer of the extended data stream of presentation code voice data, and entropy coding is rounded second extension layer that piece is converted to extended data stream.

7. device as claimed in claim 6,

The device (60) that wherein is used to handle uses in a plurality of code tables also according to the spectrum value that quantizes, and quantize block is carried out entropy coding, and

The device (60) that wherein is used for handling is selected in a plurality of code tables also according to the attribute that quantizes available quantizer, is used for difference block is carried out the quantize block of entropy coding with generation.

8. device as claimed in claim 1,

The device (52) that the provides attribute according to sound signal wherein is provided, selects in a plurality of windows, carry out windowization with time block to audio signal value; And

The device (56) that wherein is used to generate is selected for the integer transfer algorithm carries out identical window.

9. device as claimed in claim 1,

The device that wherein is used to generate has used an integer transfer algorithm, comprising:

Corresponding to the window (w) of 2N time-discrete sampling time-discrete sampling carry out windowization with length, so that the time discrete sampling of windowization to be provided, by producing the conversion of N output valve from N input value, with time-discrete unscented transformation is frequency spectrum designation, and wherein the window process comprises following substep:

Select (16) time-discrete samplings from four of window/part, and select a time-discrete sampling, to obtain the vector of time discrete sampling from the other four/part of this window;

Use a rotation square formation, its dimension and vector are complementary to the dimension of vector, and wherein rotation matrix can be represented with a plurality of lifting matrixes, and one of them promotes matrix and only comprises an element according to window (w), and be not equal to 1 or 0, wherein use substep and comprise following substep:

Multiply each other (18) with promoting matrix and vector, obtain first result vector;

Round the component of first result vector, first result vector that obtains rounding with the bracket function (r) that real number is mapped as integer; And

Execution subsequently promotes matrix multiple (22) with another one and rounds the step of (24), finish up to all lifting matrixes are all processed, obtain a rotating vector, it comprises from the integer window sampling of four of window/part with from the integer window sampling of the other four/part of this window, and

Execution is sampled for all time discretes of the remaining four/part of window and is carried out the step of windowization, obtains 2N filtered round values; And

For second and the 3rd four/a part of filtered integer sample values by window, by integer DCT, be the integer unscented transformation (14) of a N windowization frequency spectrum designation, obtain N integer spectrum value.

10. device as claimed in claim 1,

The device (52) that quantize block wherein is provided is realized prediction for spectrum value on the frequency with a predictive filter, at quantization step (52b) before to obtain being illustrated in the prediction residual spectrum value of the quantize block after quantizing;

A prediction unit wherein also is provided, and it is predicted on frequency the integer spectrum value of integer piece, wherein also provides to round device, rounds with the prediction residual spectrum value that the integer spectrum value that rounds piece owing to expression is obtained.

11. device as claimed in claim 1,

Wherein the time discrete sound signal comprises at least two channels:

The device (52) that provides wherein is provided comes implementation center/edge coding with the spectrum value of time discrete tone signal, after the quantification of center/edge spectrum value, obtaining quantize block, and

The device (56) that wherein is used to generate the integer piece also is provided by the center/edge coding corresponding to the center/edge coding of the device that is used to provide (52).

12. device as claimed in claim 1, the device (60) that wherein is used to handle produces a MPEG-2ACC data stream, has wherein introduced the auxiliary data supplementary that is used for the integer transform algorithm in a zone.

13. device as claimed in claim 1,

The voice data that device (60) the output process that wherein is used to handle is encoded is as the data stream that has a plurality of extension layers.

14. device as claimed in claim 13,

Wherein be used for the device (60) handled and inserted information, and in second extension layer (82), inserted information about difference block about quantize block at first extension layer (81).

15. device as claimed in claim 13,

Wherein be used for the device (60) handled and inserted information, and in the second and the 3rd extension layer, inserted information at least about difference block about quantize block at first extension layer.

16. device as claimed in claim 15,

Wherein in second extension layer, comprise the difference spectrum value that has the precision that is reduced, but in high one-level or more senior extension layer, comprise the residual fraction of difference spectrum value.

17. device as claimed in claim 15,

Wherein the information about difference block comprises binary coding difference spectrum value;

Second extension layer that wherein is used for the difference spectrum value comprises a plurality of bits from the highest significant position (MSB) of difference spectrum value to time high significance bit (MSB-x); And

Wherein comprise a plurality of bits from inferior high significance bit (MSB-x-1) to least significant bit (LSB) (LSB) at the 3rd extension layer.

18. device as claimed in claim 17,

Wherein time discrete sound signal width is that the sampled form of 24 bits is represented, and

The device (60) that wherein is used for handling inserts 16 bits of the higher significance bit of difference spectrum value at second extension layer, in the 3rd extension layer, insert remaining 8 bits of difference spectrum value, demoder has reached CD Quality with second extension layer like this, if wherein adopt the 3rd extension layer, demoder just can reach the tonequality of studio.

19. device as claimed in claim 15,

The device (60) that wherein is used for handling has inserted to small part difference spectrum value at second extension layer, and the expression low-pass filter signal has inserted difference spectrum value and the initial difference between the difference spectrum value in second extension layer in the another one extension layer.

20. device as claimed in claim 15,

The device (60) that wherein is used for handling has inserted the difference spectrum value that is up to certain cutoff frequency to small part at second extension layer, and has inserted in the 3rd extension layer to the difference spectrum value of small part from certain cutoff frequency to higher frequency.

21. time-discrete coding audio signal to obtain the method for coding audio data, being comprised:

Applied mental acoustic model (54) provides the quantize block of spectrum value of the time discrete sound signal of (52) quantifications;

Inverse quantization (58) quantize block, and round the spectrum value of this inverse quantization, to obtain rounding the piece that rounds of inverse quantization spectrum value;

Use an integer transform algorithm to produce the integer piece of (56) integer spectrum values, this integer transform algorithm produces the integer piece of spectrum value from integer time discrete sampling block;

According at the spectral difference score value that rounds between piece and the integer piece, form (58) difference block, to obtain having the difference block of difference spectrum value; And

Handle (60) quantize block and difference block, comprise about the information of quantize block with about the coding audio data of the information of difference block with generation.

22. be used for device that the voice data of having encoded is decoded, this voice data of having encoded produces from a time discrete sound signal, the quantize block of spectrum value of the time discrete sound signal of (52) quantifications is provided by applied mental acoustic model (54), by inverse quantization (58) quantize block and round the spectrum value of inverse quantization, inverse quantization spectrum value after obtaining to round round piece, by using the integer transform algorithm that produces the integer piece of spectrum value from the data block of integer time discrete sampling, produce the integer piece of (56) integer spectrum value, form (58) difference block by basis in the difference that rounds the spectrum value between piece and the integer piece, to obtain the difference block of difference spectrum value, comprising:

Be used to handle the device (70) of coding audio data, obtain a quantize block and difference block;

Be used for inverse quantization and round the device (74) of this quantize block, with the quantize block of the inverse quantization that obtains an integer;

Be used for obtaining a binding modules with the device (78) of spectrum value mode in conjunction with integer quantisation piece and difference block;

Use this binding modules and the integer transform algorithm opposite, produce the device (82) of the time representation of a time discrete sound signal with the integer transform algorithm.

23. the decoding device described in claim 22,

Wherein coding audio data is extendible, and comprises a plurality of extension layers;

The device (70) that wherein is used for handling this coding audio data is determined quantize block from coding audio data, as first extension layer, and determines difference block from coding audio data, as second extension layer.

24. device as claimed in claim 22,

Wherein the information about difference block comprises binary code differential spectrum value,

Wherein coding audio data is extendible, and comprises a plurality of extension layers,

The device (70) that wherein is used for handling this coding audio data determines quantize block from coding audio data, as first extension layer, and extracts representing of difference spectrum value with the precision that has reduced, as second extension layer.

25. device as claimed in claim 24,

The device (70) that wherein is used to handle this coding audio data extracts a plurality of bits from highest significant position to inferior high significance bit as second extension layer, and wherein time high significance bit is higher than the least significant bit (LSB) in the difference spectrum value, and

The device (82) of time representation that is used to generate the discrete tone signal produced the disappearance bit of difference spectrum value with comprehensive method before using the integer transform algorithm.

26. device as claimed in claim 25,

Wherein device (82) is carried out the expansion of second extension layer for comprehensive generation, wherein in expansion, use a scale factor, it equals 2n, and wherein n is the inferior high number of significant bits that is not included in second extension layer, perhaps uses dither algorithm for comprehensive the generation.

27. device as claimed in claim 22,

Wherein coding audio data is extendible, and comprises a plurality of extension layers, and

The device (70) that is used for handling this coding audio data is determined quantize block from coding audio data, as first extension layer, and the difference spectrum value of definite low-pass filtering, as second extension layer.

28. device as claimed in claim 22,

The device (70) that wherein is used for handling this coding audio data is determined quantize block from coding audio data, as first extension layer; Determine to be up to the difference spectrum value of first cutoff frequency, as second extension layer, wherein first cutoff frequency is littler than the maximum frequency of the difference spectrum value that can produce in scrambler.

29. device as claimed in claim 28,

Wherein be used for the device (82) that the rise time represents the input value of the integer transform algorithm of total length is made as predetermined value, these are worth on the cutoff frequency of second extension layer; And by by corresponding to the maximum frequency of difference spectrum value and the factor selected by the ratio of frequency, just use after the reverse integer transform algorithm, reduce the time representation of discrete tone signal sample time, wherein difference spectrum value maximum frequency can be produced by scrambler.

30. device as claimed in claim 29,

Wherein the predetermined value of all input values on cutoff frequency is zero.

31. to the method that the voice data of having encoded is decoded, the voice data of wherein having encoded by provide, inverse quantization, generation, formation and processing, from time-discrete sound signal, produce, this method comprises:

Handle (70) coding audio data, to obtain a quantize block and a difference block;

Inverse quantization (74) quantize block also rounds, to obtain the quantize block of an integer inverse quantization;

Mode combination (78) this integer quantisation piece and difference block with spectrum value obtains a binding modules; And

Use this binding modules, and the use integer transform algorithm opposite with the integer transform algorithm, the time representation that produces (82) time discrete sound signal.