CN1525438A

CN1525438A - Stereo audio encoding method and device, audio stream decoding method and device

Info

Publication number: CN1525438A
Application number: CNA031650384A
Authority: CN
Inventors: 金重会; 金尚煜; 吴殷美
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2002-12-14
Filing date: 2003-09-19
Publication date: 2004-09-01
Anticipated expiration: 2023-09-17
Also published as: KR100923297B1; CN1901043B; CN1276407C; CN1901043A; KR20040053501A

Abstract

A stereo audio coding method, an apparatus thereof, a decoding method and an apparatus thereof are provided to supply more stable sound quality at a low frequency bandwidth and encode stereo audio into less bits. A conversion unit(11) converts audio samples obtained from a plurality of channels, respectively. A psychological sound unit(12) supplies attack sensing information to the conversion unit(11), ties the converted audio signal as proper subband signals, calculates a masking threshold in each subband by using the masking phenomenon due to the interaction of each signal and then supplies the calculated value to a quantizing unit(13). The quantizing unit(13) quantizes correlation cancellation samples of each band on the basis of corresponding scale factor information and supplies the quantized samples. A bit packing unit(14) codes the quantized samples without loss and packages the coded samples by the unit of frame.

Description

Stereo audio coding method and device, audio stream coding/decoding method and device

Technical field

The present invention relates to the Code And Decode voice data, and especially to relate to the voice data that obtains be that stereo audio data is carried out Methods for Coding and device thereof from a plurality of passages, be used for the method and the device thereof of decoded audio stream.

Background technology

Stereo audio, promptly the composite signal of the sound signal that is provided by a plurality of passages is compared with the single audio frequency that provides from single passage, and it provides stereo and has high demand for the audience.

Yet, storage or transmit stereo audio, i.e. the composite signal of a plurality of single audio frequency signals that provided by a plurality of passages is with storage or to transmit that single audio frequency compares be complicated and costliness.

Summary of the invention

The invention provides and be used for the method and apparatus that coding/decoding has the voice data of scalability, can provide fine granularity scalability (FGS) to have than low-complexity simultaneously thus.

The present invention also is provided for the method and apparatus that coding/decoding has the voice data of scalability, even also can provide better audio quality that FGS is provided simultaneously in lower layer thus.

According to an aspect of the present invention, a kind of method of the voice data with scalability of being used to encode is provided, comprising the stereo audio coding method, this method comprises carries out wavelet transform to n the audio sample that obtains from the n passage, utilize the n*n de-correlation-matrix from the audio sample of n wavelet transform, to eliminate the interchannel redundant information, quantize therefrom to have eliminated the sampling of redundant information, and the sampling of this quantification of nondestructively encoding, n is the integer more than or equal to 2 here.

Elimination interchannel redundant information comprises according to predetermined cost function selects stereo tupe or normal mode, if select this stereo tupe, by the audio sample of n conversion is determined so that the n*n de-correlation-matrix of the n*n of an entropy minimization element multiplies each other and obtains one or more decorrelations and sample with having, so that minimize entropy, and quantize this sampling and comprise and quantize this decorrelation sampling.

According to a further aspect in the invention, a kind of stereo audio coding method is provided, comprise a conversion left side and right audio sample, multiply by the decorrelation sampling that has obtained therefrom to eliminate the interchannel redundant information by the de-correlation-matrix of formula 1 expression by audio sample, formula 1 left side and right conversion:

[\begin{matrix} a & b \\ c & d \end{matrix}] . . . 1

Wherein a, b, c and d are real numbers, quantize this decorrelation sampling, and the sampling of this quantification of nondestructively encoding.

The audio sample that multiply by this left side and right conversion comprises according to predetermined cost function selects stereo tupe or normal mode, if select this stereo tupe,, n converting audio frequency sampling obtain one or more decorrelation samplings by being multiply by the de-correlation-matrix with a, b, c and d element of being determined so that minimizing entropy.

This stereo audio coding method comprises that further coding is used to indicate this stereo tupe whether selecteed flag information and the stereo process information that comprises the matrix element information of element a, the b, c and the d that notifys de-correlation-matrix, and will this stereo process information and the sampling by the lossless coding acquisition be packaged in the frame unit.

In obtaining the decorrelation sampling, utilize de-correlation-matrix to obtain the decorrelation sampling by the stereo processing in centre/side, formula 2 by formula 2 expressions:

[\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] . . . 2

In obtaining the decorrelation sampling, utilize the de-correlation-matrix of representing by formula 3 to handle and obtain the decorrelation sampling, formula 3 by intensity:

[\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}] . . . 3

And this quantised samples of lossless coding further comprises the coding of the level difference between a left side and the right audio sample as additional information.

In obtaining the decorrelation sampling, utilize de-correlation-matrix to obtain the decorrelation sampling, formula 4 by formula 4 expressions:

[\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}] . . . 4

Wherein θ represents the sound source direction of a left side and right audio sample.

In the quantification of decorrelation sampling, carry out quantification based on psychoacoustic model.

And the lossless coding of this quantised samples is arithmetic coding or huffman coding, and the lossless coding of this quantised samples comprises based on a plurality of predetermined layer with scalability quantised samples is encoded.

According to a further aspect of the invention, a kind of method that is used for decoded audio stream is provided, comprise the sampling that obtains to quantize by this audio stream of nondestructively decoding, sampling by this quantification of re-quantization obtains n sampling, by the audio sample that the n*n correlation matrix obtains n conversion is multiply by in n sampling, and obtain n audio sample corresponding to the n passage by on the audio sample of n conversion, carrying out unfilial son's wave conversion, wherein n is the integer more than or equal to 2.

The audio sample that obtains n conversion comprises and determines whether to have selected stereo tupe, and if select this stereo tupe, multiply by the audio sample that correlation matrix obtains n conversion by this n is sampled.

According to another aspect of the present invention, a kind of method that is used for decoded audio stream is provided, comprise the sampling that obtains to quantize by this audio stream of losslessly encoding, the sampling that should quantize by re-quantization obtains the decorrelation sampling, the correlation matrix that multiply by formula 4 expressions by 2 decorrelations are sampled obtains the audio sample of a left side and right conversion, formula 4:

{[\begin{matrix} a & b \\ c & d \end{matrix}]}^{- 1} . . . 4

Wherein, a, b, c and d are real numbers, and by inverse transformation should a left side and the audio sample of right conversion obtain a left side and right audio sample.

In the audio sample that obtains a left side and right conversion, utilize correlation matrix that following formula 5 represents to obtain the sampling of this left side and right conversion, formula 5 by the stereo processing in centre/side:

{[\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}]}^{- 1} . . . 5

The audio sample that obtains a left side and right conversion comprises and utilizes the correlation matrix of following formula 6 expressions to handle the sampling that obtains left and right audio frequency conversion by intensity, formula 6:

[\begin{matrix} 1 & 0 \\ level_diff & 0 \end{matrix}] . . . 6

Redundancy believes that wherein level_diff is the level difference between a left side and the right audio sample, this level difference of inverse transformation, and from the level difference of inverse transformation, obtain a left side and right audio sample.

In the audio sample that obtains a left side and right conversion, the correlation matrix that utilizes following formula 7 to represent obtains the sampling of this left side and right conversion, formula 7:

{[\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}]}^{- 1} . . . 7

Wherein θ represents the direction of a left side and right audio sample.

The losslessly encoding of this audio stream is arithmetic decoding or Hofmann decoding, and is included as a plurality of predetermined layer with scalability audio stream is decoded.

According to another aspect of the present invention, a kind of stereo audio coding device is provided, be included in n the audio sample that from the n passage, obtains respectively and carry out the converter unit of wavelet transform, multiply by the redundant information elimination unit that the n*n de-correlation-matrix is removed the interchannel redundant information by audio sample with n conversion, the quantifying unit that the sampling of therefrom having eliminated redundant information is quantized, and come the position packaged unit of execute bit packing by this quantised samples of lossless coding, wherein n is the integer more than or equal to 2.

This redundant information is eliminated the unit and is selected stereo tupe or normal mode according to predetermined cost function, if select this stereo tupe, multiply by matrix by audio sample and obtain one or more decorrelation samplings with n*n element determining like this with n conversion, so that the entropy minimization of the sampling that obtains after removing the interchannel redundant information, and quantifying unit quantizes this decorrelation sampling.

Another aspect according to the present invention, a kind of stereo audio coding device is provided, comprise and provide the psychologic acoustics unit that closes psychological acoustic model information, converter unit based on a relevant psychoacoustic model information conversion left side and right audio sample, multiply by the redundant information elimination unit that has obtained therefrom to eliminate the decorrelation sampling of interchannel redundant information by the de-correlation-matrix of formula 1 expression by audio sample, formula 1 with a left side and right conversion:

[\begin{matrix} a & b \\ c & d \end{matrix}] . . . 1

Wherein a, b, c and d are real numbers, and quantifying unit is based on this decorrelation of psychoacoustic model information quantization sampling, and the sampling of position packaged unit by this quantification of lossless coding is with the pack unit of framing, position.

This redundant information is eliminated the unit and is selected stereo tupe or normal mode according to predetermined cost function, if and selected stereo tupe, multiply by the de-correlation-matrix that comprises element a, the b, c and the d that determine like this by audio sample and obtain one or more decorrelation samplings, so that the entropy minimization of the sampling that after removing the interchannel redundant information, obtains conversion.

Whether the stereo tupe of this packaged unit coding indication selecteed flag information and the stereo process information that comprises the matrix element information of element a, the b, c and the d that notify de-correlation-matrix.

This redundant information is eliminated the unit and is obtained the decorrelation sampling, and the de-correlation-matrix of being represented by formula 2 by the stereo processing and utilizing in centre/side obtains this decorrelation sampling, formula 2:

[\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] . . . 2

This redundant information is eliminated the de-correlation-matrix of being represented by formula 3 by the intensity processing and utilizing unit and is obtained the decorrelation sampling, formula 3:

[\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}] . . . 3

And the position packaged unit is encoded as additional information to the level difference between a left side and the right audio sample.

The de-correlation-matrix that this redundant information is eliminated following formula 4 expressions of unit by using obtains the decorrelation sampling, formula 4:

[\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}] . . . 4

Wherein θ represents the direction of the sound source of a left side and right audio sample.

The position packaged unit is a plurality of predetermined layer with scalability this quantised samples of encoding.

Interchangeable, a kind of device that is used for decoded audio stream is provided, the unit of unpacking that comprises the sampling that obtains to quantize by this audio stream of losslessly encoding, the sampling of being somebody's turn to do quantification by re-quantization is to obtain the inverse quantization unit of n sampling, by the redundant information recovery unit that the n*n correlation matrix obtains the audio sample of n conversion is multiply by in n sampling, and by carry out the inverse transformation block that unfilial son's wave conversion obtains n audio sample of corresponding n passage on the audio sample of n conversion, wherein n is the integer more than or equal to 2.

And, a kind of device that is used for decoded audio stream is provided, comprise the unit of unpacking that obtains quantised samples by this audio stream of losslessly encoding, obtain the inverse quantization unit of decorrelation sampling by this quantised samples of re-quantization, by multiply by the sample redundant information recovery unit of the audio sample that obtains a left side and right conversion of 2 decorrelations by the correlation matrix of formula 4 expression, formula 4:

{[\begin{matrix} a & b \\ c & d \end{matrix}]}^{- 1} . . . 4

Wherein a, b, c and d are real numbers, and by inverse transformation should a left side and the audio sample of right conversion obtain the inverse transformation block of a left side and right audio sample.

This redundant information recovery unit utilizes the correlation matrix acquisition left side of following formula 5 expressions and the audio sample of right conversion, formula 5:

{[\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}]}^{- 1} . . . 5

And the correlation matrix that this redundant information recovery unit utilizes following formula 7 to represent obtains the audio sample of a left side and right conversion, formula 7:

{[\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}]}^{- 1} . . . 7

Wherein θ represents the direction of a left side and right audio sample.

Interchangeable, a kind of device that is used for decoded audio stream is provided, comprise the unit of unpacking that obtains quantised samples by this audio stream of losslessly encoding, obtain the inverse quantization unit of n sampling by this quantised samples of re-quantization, the correlation matrix that utilizes 6 expressions of following formula is handled the redundant information recovery unit of the sampling that obtains a left side and right audio frequency conversion, formula 6 by intensity:

[\begin{matrix} 1 & 0 \\ level_diff & 0 \end{matrix}] . . . 6

Wherein level_diff is the level difference between a left side and the right audio sample, and this level difference of transformation by reciprocal direction and obtain the reverse transformation unit of a left side and right audio sample from the level difference of transformation by reciprocal direction.

Arithmetic decoding or Hofmann decoding are carried out in this unit of unpacking, and are this audio stream of decoding of a plurality of predetermined layer with scalability.

Description of drawings

By preferred embodiments of the present invention will be described in detail with reference to the annexed drawings, above-mentioned target of the present invention and advantage will become clearer, wherein:

Fig. 1 is the block diagram of code device according to the preferred embodiment of the invention;

Fig. 2 is the detailed diagram that redundant information is eliminated the unit among Fig. 1;

Fig. 3 is the wavelet diagrammatic sketch according to the psychoacoustic model structure;

Fig. 4 is the reference diagram that is explained in more detail redundant information elimination unit operations in the stereo tupe;

Fig. 5 is the frame assumption diagram that forms the audio stream of coding according to the present invention;

Fig. 6 is the block diagram according to decoding device of the present invention;

Fig. 7 is the detailed diagram of redundant information recovery unit among Fig. 6;

Fig. 8 is a process flow diagram of explaining coding method according to an embodiment of the invention;

Fig. 9 is a process flow diagram of explaining coding method according to another embodiment of the present invention;

Figure 10 is a process flow diagram of explaining the coding method of another embodiment according to the present invention;

Figure 11 explains the process flow diagram of coding/decoding method according to an embodiment of the invention;

Figure 12 explains the process flow diagram of coding/decoding method according to another embodiment of the present invention;

Figure 13 is a process flow diagram of explaining the coding/decoding method of another embodiment according to the present invention;

Embodiment

Describe the preferred embodiments of the present invention in detail referring now to accompanying drawing.

With reference to figure 1, this code device becomes less position by eliminating the interchannel redundant data with audio data coding, and comprises converter unit 11, psychologic acoustics unit 12, and quantifying unit 13, position packaged unit 14 and redundant information are eliminated unit 15.

The audio sample that these converter unit 11 conversion obtain from a plurality of passages.In more detail, this converter unit 11 is received as the pulse code modulation (pcm) voice data of time-domain audio signal, and is frequency-region signal with reference to the information of the relevant psychoacoustic model that is provided by psychologic acoustics unit 12 with this signal transformation.When the difference between the characteristic of the sound signal that the people can feel when not being too big, between the characteristic of the signal that the people can feel in the frequency-domain audio signals that obtains by conversion and the characteristic of the signal that the people can not feel very big difference is arranged in time domain.Therefore, be assigned to the figure place of frequency band separately, can improve the efficient of compression by difference.In this embodiment, converter unit 11 is carried out wavelet transform according to the wavelet structure by the psychoacoustic model structure shown in Figure 3 on a left side and right audio sample.Left side audio sample is the pcm audio data that obtain from left passage, and right audio sample is the pcm audio data that obtain from right passage.In MDCT, because unnecessary high frequency decomposes in low-frequency band, even slight distortion also can cause the reduction that people's ear can be felt.Yet in wavelet transform, time/frequency resolution is more suitable, so that also can provide more stable audio quality in having the low layer of low-frequency band.Therefore, this wavelet transform provides higher-quality sound for the audience.

This psychologic acoustics unit 12 provides the information of the psychological acoustic model in pass such as the impact sensory information for converter unit 11, and the sound signal of converter unit 11 conversion is grouped into suitable subband signal.And this psychologic acoustics unit 12 calculates masking threshold in each subband by using the masking effect that causes in the interaction between the signal separately, and provides this threshold value to quantifying unit 13.This masking threshold is because the maximal value of the signal that the interaction people between the sound signal can not feel.In this embodiment, this psychologic acoustics unit 12 forces down the masking threshold that (BMLD) calculates stereo component by using the ears masking level.

This redundant information is eliminated unit 15 by multiply by de-correlation-matrix to eliminate the interchannel redundant information from a plurality of converting audio frequency samplings of respective channel.When the quantity of passage was n, this de-correlation-matrix was the n dimension matrix with n*n element, and has inverse matrix.This n*n element is determined so that the entropy of the audio sample that obtains after eliminating redundant information is minimized.This de-correlation-matrix can offer each terminal node that is used for wavelet transform, or is provided in the unit of the frame that forms coded audio stream.The sampling of therefrom having eliminated redundant information is called the decorrelation sampling.The quantity of decorrelation sampling is less than or equal to the quantity of passage.In this embodiment, there is no need all audio samples and all offer the elimination that redundancy message elimination unit 15 is used for redundant information.In the situation that has relative bulk redundancy information, the efficient of data compression has improved.Yet in the situation that has relative small amount of redundancy information, the efficient of data compression can be ignored, and has strengthened the complicacy of handling.Therefore, consider the amount of redundant information, determine whether that audio sample is sent to redundant information elimination unit 15 and is used for stereo processing.

Since the audio sample of wavelet transform time/the frequency resolution rate is not constant, be not easy to carry out the audio sample operation by frequency line.Therefore, compare not high with other classic methods based on the efficient of waveform transformation elimination interchannel redundant information.On the other hand, according to the present invention, this redundant information is eliminated unit 15 and is utilized and multiply by de-correlation-matrix, obtains more high-level efficiency in the interchannel redundant information removing thus from the audio sample of wavelet transform.

Quantifying unit 13 is based on the scale factor information corresponding to sound signal, and the sound signal of scalar quantization in each frequency band is so that the masking threshold that the size of quantization noise provides less than psychologic acoustics unit 12 in the frequency band can not be felt this noise with the person who happens to be on hand for an errand.Then, the sampling of these quantifying unit 13 these quantifications of output.In other words, by use masking threshold that in psychologic acoustics unit 12, calculates and the noise ratio that in each frequency band, produces be masking by noise than (NMR), this quantifying unit 13 is carried out and is quantized so that the NMR value is 0 decibel or littler in whole frequency band.This NMR is 0 decibel or means that more for a short time the people can not feel this quantizing noise.

This this quantised samples of packaged unit 14 lossless codings and the sampling that will encode are the unit packing with the frame.The representative example of lossless coding comprises arithmetic coding and huffman coding.A frame is the elementary cell that constitutes coding stream.Can determine the form of frame in various manners.The indication redundant information is eliminated the flag information whether processing of unit 15 is employed, and the information of relevant de-correlation-matrix is a kind of additional information, and it is encoded and is packaged in the unit of frame and sends decode system to.Eliminate under the situation that the stereo processing of unit 15 is set to carry out all the time in redundant information, maybe can understand the information whether relevant stereo processing is employed, then there is no need to transmit this information and give decode system if decode system is understood.

Especially, 14 pairs of additional informations of this packaged unit and the quantised samples that belongs to each layer are encoded, and encoded signals is packaged in the hierarchy.This additional information comprises proportional band information, coding band information, their ratio factor information, and the encoding model information in every layer.This proportional band information and coding band information can be packed as heading message, and send decoding device to afterwards.In addition, this proportional band information and coding band information can be encoded and pack as every layer additional information, and send decoding device afterwards to.This proportional band information and coding band information can not send decoding device to, because they wash for decode system and understand in some cases.

Fig. 2 is the detailed diagram that redundant information is eliminated unit 15.

With reference to figure 2, this redundant information is eliminated the unit and is selected stereo tupe or normal mode according to predetermined cost function.This cost function determines whether to offer more high quality audio of people's ear, in other words, selects this stereo tupe based on psychoacoustic model.If the selection normal mode, input sample is output without any processing.If select stereo tupe, this input sample multiply by de-correlation-matrix of the present invention and output is sampled as one or more decorrelations.

Fig. 4 is the reference diagram that is explained in more detail the operation of redundant information elimination unit 15 in stereo tupe.

With reference to figure 4, this breath of coughing is eliminated unit 15 and is received left converting audio frequency sampling left and right converting audio frequency sampling right, and multiply by the de-correlation-matrix of formula 1 expression, and output decorrelation sampling s1 and s2.

[\begin{matrix} s 1 \\ s 2 \end{matrix}] = [\begin{matrix} a & b \\ c & d \end{matrix}] [\begin{matrix} left \\ right \end{matrix}] . . . 1

Wherein It is de-correlation-matrix.In this matrix, element a, b, c and d are optimised, so that minimize the entropy of this decorrelation sampling s1 and s2.Therefore, the stereo processing in centre/side of Advanced Audio Coding (AAC) and intensity stereo are handled and can be utilized de-correlation-matrix to represent.

For example, this redundant information is eliminated unit 15 and can be obtained decorrelation sampling s1 and s2 based on centre/side stereo processing use formula 2:

[\begin{matrix} s 1 \\ s 2 \end{matrix}] = [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] [\begin{matrix} left \\ right \end{matrix}] . . . 2

And i last of the twelve Earthly Branches redundant information is eliminated unit 15 also can obtain decorrelation sampling s1 and s2 based on intensity stereo processing use formula 3:

[\begin{matrix} s 1 \\ s 2 \end{matrix}] = [\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}] [\begin{matrix} left \\ right \end{matrix}]

Or

[\begin{matrix} s 1 \\ s 2 \end{matrix}] = [\begin{matrix} 0 & 1 \\ 0 & 0 \end{matrix}] [\begin{matrix} left \\ right \end{matrix}] . . . 3

Wherein left represents left converting audio frequency sampling, and right represents right converting audio frequency sampling, and

With

It is de-correlation-matrix.Under the situation of the de-correlation-matrix of using formula 3 expressions, the level difference between 14 pairs of left sides of this packaged unit and the right audio sample is encoded, and with this coded signal packing framing.Can determine the packed locations of level difference in frame between left audio sample and the right audio sample in various manners.Especially, because de-correlation-matrix

With Do not have inverse matrix, therefore should use predetermined matrix in decoding, this will be described later.

Further, when have directionality in left converting audio frequency sampling left and right converting audio frequency sampling right, this redundant information is eliminated unit 15 and can be utilized following formula 4 to obtain the decorrelation sampling:

[\begin{matrix} s 1 \\ s 2 \end{matrix}] = [\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}] [\begin{matrix} left \\ right \end{matrix}] . . . 4

Wherein Be de-correlation-matrix, and θ represent the direction of left audio sample and right audio sample.

Fig. 5 is the structural drawing that forms the frame of the audio stream of coding according to the present invention.

Comprise heading message, additional information and the voice data of encoding with reference to 5, one frames of figure according to the present invention.

This additional information comprises stereo process information, and this information is the information that relevant interchannel redundant information is eliminated in the encoded stereo audio frequency according to the present invention.This stereo process information comprises flag information and matrix element information.This flag information notice is during encoding, and whether stereo tupe is selected, and whether the step of elimination interchannel redundant information is carried out.This matrix element information notice forms the element of used de-correlation-matrix.Interchangeable, under the situation of selecting stereo tupe all the time, can omit this flag information.

Fig. 6 is the block diagram according to decoding device of the present invention.

With reference to figure 6, this decoding device is decoded to the audio stream of coding after eliminating the interchannel redundant information, and comprises the unit 21 of unpacking, inverse quantization unit 22, inverse transformation block 23 and redundant information recovery unit 25.

This unit 21 of unpacking separates the voice data of heading message, additional information and coding from the frame that constitutes audio stream, and decodes them.Especially, this coding audio data of losslessly encoding.In the present invention, the voice data of this unpack unit 21 arithmetic decodings or this coding of Hofmann decoding and obtain quantised samples.

Especially, this unit 21 of unpacking splits bit stream to destination layer, and the bit stream in every layer is decoded.In other words, the additional information that comprises corresponding every layer ratio factor information and encoding model information is decoded, and afterwards based on the encoding model information that obtains, and the coded quantization sampling that belongs to every layer is decoded and recovered this quantised samples.

Therebetween, by decode in every layer additional information or from the heading message of bit stream, obtain this proportional band information and the coding band information.Interchangeable, this decoding device can be stored this proportional band information and coding band information in advance.

This inverse quantization unit 22 is carried out re-quantization according to the ratio factor information corresponding to each sampling to the quantised samples that the unit 21 of unpacking obtains, and obtains the decorrelation sampling.

This redundant information recovery unit 25 is by multiply by the audio sample that correlation matrix obtains n conversion with n decorrelation sampling.In more detail, this redundant information recovery unit 25 is included in by the flag information in the stereo process information in the additional information of unit 21 decodings of unpacking by ginseng, determines whether stereo tupe is selected.If this stereo tupe is selected, this redundant information recovery unit 21 is by multiply by the audio sample that the correlation matrix that obtains based on matrix element information obtains n conversion with n decorrelation sampling.For example, this redundant information recovery unit 25 obtains left converting audio frequency sampling and the sampling of right converting audio frequency by the 2*2 correlation matrix is multiply by in 2 decorrelation samplings.

Unfilial son's wave conversion is carried out in 23 pairs of n converting audio frequency samplings of this inverse transformation block, and obtains n the audio sample (n is the integer more than or equal to 2) corresponding to the n passage.For example, inverse transformation is carried out in 23 pairs of left converting audio frequency samplings of this inverse transformation block and the sampling of right converting audio frequency, and obtains a left side and right audio sample.In other words, the interchannel redundant information that these inverse transformation block 23 recoveries have been eliminated during encoding, frequency/time map is transformed to them pcm audio data and exports this transform data corresponding to the audio sample of each passage.

Fig. 7 is the detailed diagram of redundant information recovery unit among Fig. 6.

With reference to figure 7, this redundant information recovery unit 25 receives decorrelation sampling s1 and s2, and exports the sampling with redundant information of recovering by formula 5, that is, this left side and right transformed samples are respectively:

[\begin{matrix} left \\ right \end{matrix}] = {[\begin{matrix} a & b \\ c & d \end{matrix}]}^{- 1} [\begin{matrix} s 1 \\ s 2 \end{matrix}] . . . 5

Wherein

It is correlation matrix.This correlation matrix can be provided on each terminal node that is used for unfilial son's wave conversion or is provided at each frame that constitutes audio stream.Here, obtain matrix element a, b, c and d in the matrix element information in the stereo process information from be included in additional information.Therefore, handling the sampling of therefrom having eliminated the interchannel redundant information by the stereo processing in centre/side of AAC and intensity stereo can utilize correlation matrix to be resumed sampling corresponding to each passage.

As the example of formula 5, this redundant information recovery unit 25 can recover this left side and right transformed samples from the sampling of therefrom having eliminated the interchannel redundant information by the stereo processing in centre/side of AAC based on formula 6:

[\begin{matrix} left \\ right \end{matrix}] = {[\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}]}^{- 1} [\begin{matrix} s 1 \\ s 2 \end{matrix}] . . . 6

And, can handle the sampling of therefrom having eliminated the interchannel redundant information by intensity based on formula 7 and revert to this left side and right transformed samples:

[\begin{matrix} left \\ right \end{matrix}] = [\begin{matrix} 1 & 0 \\ level_diff & 0 \end{matrix}] [\begin{matrix} s 1 \\ s 2 \end{matrix}]

Or

[\begin{matrix} left \\ right \end{matrix}] = [\begin{matrix} 0 & 1 \\ 0 & level_diff \end{matrix}] [\begin{matrix} s 1 \\ s 2 \end{matrix}] . . . 7

Wherein left represents left converting audio frequency sampling, and right represents right converting audio frequency sampling, and With

It is respectively correlation matrix.Especially,

With

Be not the inverse matrix of the de-correlation-matrix in coding, used, and level_diff is the level difference between a left side and the right audio sample, and transmits from coded system.

Further, this redundant information recovery unit 25 utilizes following formula 8 can conversion therefrom to eliminate the sampling of interchannel redundant information based on the directivity of sound source, and sampling is reverted to a left side and right transformed samples.

[\begin{matrix} left \\ right \end{matrix}] = [\begin{matrix} \cos θ & \sin θ \\ - \sin θ & \cos θ \end{matrix}] [\begin{matrix} s 1 \\ s 2 \end{matrix}] . . . 8

Wherein

Be de-correlation-matrix, and θ represent the directivity of a left side and right audio sample.

Based on said structure, with the method for describing according to coding/decoding stereo sound frequency sampling of the present invention.

Fig. 8 is the process flow diagram of explaining according to the coding method of the embodiment of the invention.

With reference to figure 8, in step 801, to constituting the pcm audio data of stereo audio, promptly the n that obtains from the n passage audio sample carried out wavelet transform.In step 802, utilize the n*n de-correlation-matrix to eliminate the interchannel redundant information of this n wavelet transform audio sample.Here, the n*n matrix element is determined so that the entropy minimization of the sampling that obtains after eliminating the interchannel redundant information.Below, in step 803, quantification is carried out in the sampling of therefrom having eliminated the interchannel redundant information by the reference psychoacoustic model.In step 804, the sampling that lossless coding should quantize.Here, n is the integer more than or equal to 2.

Fig. 9 is a process flow diagram of explaining coding method according to another embodiment of the present invention.

With reference to figure 9, in step 901, to constituting the pcm audio data of stereo audio, promptly the n that obtains from the n passage audio sample carried out wavelet transform.In step 902, determine to have selected stereo tupe or normal mode based on predetermined cost function.If selected stereo tupe, in step 903, utilize de-correlation-matrix from the wavelet transform sampling, to eliminate the interchannel redundant information.In de-correlation-matrix, matrix element is determined so that the entropy minimization of the sampling that obtains after eliminating the interchannel redundant information.Below, in step 904, therefrom eliminated the sampling of redundant information by the reference psychoacoustic model and carried out quantification.In step 905, the sampling that lossless coding should quantize.If selected normal mode, the step 903 of eliminating the interchannel redundant information is skipped over, and quantizes this wavelet transform sampling in step 904, and in this quantised samples of step 905 lossless coding.

Figure 10 is a process flow diagram of explaining coding method according to yet another embodiment of the invention.

With reference to Figure 10, in step 1001, to constituting the pcm audio data of stereo audio, promptly a left side and right audio sample are carried out wavelet transform.In step 1002, by the wavelet transform voice data be multiply by de-correlation-matrix Eliminate the interchannel redundant information.Matrix element a, b, c and d are determined so that the entropy minimization of the sampling that obtains after eliminating the interchannel redundant information.Below, in step 1003, quantification is carried out in the sampling of therefrom having eliminated redundant information by the reference psychoacoustic model.In step 1004, the sampling that lossless coding should quantize.In step 1005, for predetermined destination layer, i.e. multilayer, scalable ground execute bit is packed.

Figure 11 is the process flow diagram of explaining according to the coding/decoding method of the embodiment of the invention.

With reference to Figure 11, in 1101, receive audio stream and nondestructively decoding, and obtain the sampling of quantification.In step 1102, the sampling of this quantification is by re-quantization.In step 1103,, n re-quantization sampling obtain n converting audio frequency sampling afterwards by being multiply by the n*n correlation matrix.In step 1104, this n converting audio frequency sampling is by unfilial son's wave conversion, and acquisition is corresponding to n audio sample of n passage.Here, n is the integer more than or equal to 2.

Figure 12 explains the process flow diagram of coding/decoding method according to another embodiment of the present invention.

With reference to Figure 12, in step 1201, receive audio stream and be the unit losslessly encoding, and obtain the sampling that quantizes with the frame.In step 1202, the sampling of this quantification by re-quantization to obtain n sampling.In step 1203, by determining whether to have selected stereo tupe with reference to the stereo process information in the additional information that is included in the frame.If selected stereo tupe, in step 1204, obtain the n*n correlation matrix, and it be multiply by n sampling by the stereo process information of reference.This correlation matrix is provided on each node that is used for unfilial son's wave conversion or is provided at each frame place.In step 1205, the sampling of n converting audio frequency is by unfilial son's wave conversion and obtain n audio sample corresponding to the n passage.Here, n is the integer more than or equal to 2.If selected normal mode, the step 1204 of recovering the interchannel redundant information is skipped over, and handles directly entering into step 1205.

Figure 13 is a process flow diagram of explaining coding/decoding method according to yet another embodiment of the invention.

With reference to Figure 13, in step 1301, receive audio stream and be the unit losslessly encoding, and obtain the sampling that quantizes with the frame.In step 1302, the sampling of this quantification is sampled to obtain decorrelation by re-quantization.In step 1303, obtain correlation matrix by matrix element information with reference to the stereo process information in the additional information that is included in the frame And this matrix be multiply by 2 decorrelation samplings, obtain a left side and right transformed samples thus.This correlation matrix is provided on each node that is used for unfilial son's wave conversion or is provided at each frame place.In step 1304, left and right converting audio frequency is sampled by unfilial son's wave conversion, and obtains a left side and the right audio sample of corresponding 2 passages.

If the audio stream that receives be with the hierarchy packing having the bit stream of scalability, before step 1201 shown in Figure 12 and step 1301 shown in Figure 13, can carry out unpacking of intended target layer.

As mentioned above,, in low-frequency band, can provide more stable audio quality, and stereo audio can encode with less position, and will consider people's psychologic acoustics characteristic well according to the present invention.In other words, can improve audio quality, and utilize corresponding matrix can eliminate and recover the interchannel redundant information effectively by the psychologic acoustics characteristic of describing the people well.

In the prior art, for example, the stereo processing in centre/side or intensity stereo are handled the high frequency that is used to resemble MDCT and are decomposed, for time/the frequency resolution rate is not constant situation, this data processing is very difficult.Yet, according to the present invention, introduce matrix operation and be used for wavelet transform, effectively eliminate the interchannel redundant information thus and realize the stereo or intensity stereo processing in centre/side.

Claims

1. stereo audio coding method comprises:

N the audio sample that obtains from the n passage carried out wavelet transform;

Utilize the n*n de-correlation-matrix from the audio sample of n wavelet transform, to eliminate the interchannel redundant information;

Quantize therefrom to have eliminated the sampling of redundant information; And

The sampling that lossless coding has quantized, wherein n is the integer more than or equal to 2.

2. stereo audio coding method as claimed in claim 1, wherein eliminate the interchannel redundant information and comprise:

Select stereo tupe or normal mode according to predetermined cost function; And

If select this stereo tupe, multiply by n*n de-correlation-matrix by audio sample and obtain one or more decorrelation samplings with n*n element determining like this with n conversion, so that entropy minimization, and quantize described sampling and comprise and quantize described decorrelation sampling.

3. stereo audio coding method as claimed in claim 1, wherein de-correlation-matrix is provided at each terminal node that is used for wavelet transform, or is provided at each the frame place that obtains after the lossless coding.

4. stereo audio coding method as claimed in claim 1, wherein whether this cost function can be provided for people's ear according to high quality audio more and determine whether to select this stereo tupe.

5. stereo audio coding method comprises:

A conversion left side and right audio sample;

Multiply by the decorrelation sampling that has obtained therefrom to eliminate the interchannel redundant information by the de-correlation-matrix of following formula 1 expression by audio sample with a left side and right conversion:

[\begin{matrix} a & b \\ c & d \end{matrix}] . . . 1

Wherein a, b, c and d are real numbers;

Quantize described decorrelation sampling; And

The sampling that lossless coding has quantized.

6. stereo audio coding method as claimed in claim 5, the audio sample that wherein multiply by this left side and right conversion comprises:

Select stereo tupe or normal mode according to predetermined cost function; And

If select this stereo tupe, by being multiply by the de-correlation-matrix with a, b, c and d element of determining like this, n converting audio frequency sampling obtain one or more decorrelation samplings, so that entropy minimization.

7. stereo audio coding method as claimed in claim 6 further comprises:

Whether this stereo tupe of coded representation selecteed flag information, and the stereo process information that comprises the matrix element information of element a, the b, c and the d that notify de-correlation-matrix; And

Be packaged in the frame unit with this stereo process information with by the sampling that lossless coding obtains.

8. stereo audio coding method as claimed in claim 7, wherein de-correlation-matrix is provided at each terminal node that is used for wavelet transform, or is provided at each the frame place that obtains after the lossless coding.

9. stereo audio coding method as claimed in claim 5, wherein in obtaining the decorrelation sampling, utilize the de-correlation-matrix of following formula 2 expressions to obtain this decorrelation sampling by the stereo processing in centre/side:

[\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] . . . 2

10. stereo audio coding method as claimed in claim 5, wherein in obtaining the decorrelation sampling, the de-correlation-matrix of utilizing following formula 3 to represent is handled by intensity and is obtained the decorrelation sampling:

[\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}] . . . 3

11. stereo audio coding method as claimed in claim 5 wherein in obtaining the decorrelation sampling, utilizes the de-correlation-matrix of following formula 4 expressions to obtain the decorrelation sampling:

[\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}] . . . 4

12. stereo audio coding method as claimed in claim 5, wherein whether this cost function can be provided for people's ear according to high quality audio more and determine whether to select this stereo tupe.

13. stereo audio coding method as claimed in claim 5 wherein in the quantification of decorrelation sampling, is carried out quantification based on psychoacoustic model.

14. stereo audio coding method as claimed in claim 5, wherein the lossless coding of quantised samples is arithmetic coding or huffman coding,

15. stereo audio coding method as claimed in claim 5, wherein the lossless coding of quantised samples comprises based on a plurality of predetermined layer with scalability quantised samples is encoded.

16. a method that is used for decoded audio stream comprises:

The sampling that obtains to quantize by the losslessly encoding audio stream;

Sampling by the described quantification of re-quantization obtains n sampling;

By the audio sample that the n*n correlation matrix obtains n conversion is multiply by in n sampling; And

Obtain n audio sample corresponding to the n passage by the audio sample of n conversion being carried out unfilial son's wave conversion, wherein n is the integer more than or equal to 2.

17. coding/decoding method as claimed in claim 16, the audio sample that wherein obtains n conversion comprises:

Determine whether to have selected stereo tupe; And

If selected this stereo tupe, by the audio sample that correlation matrix obtains n conversion is multiply by in described n sampling.

18. a method that is used for decoded audio stream comprises:

The sampling that obtains to quantize by the losslessly encoding audio stream;

Sampling by the described quantification of re-quantization obtains the decorrelation sampling;

By the audio sample that is obtained a left side and right conversion by the correlation matrix of following formula 4 expressions is multiply by in 2 decorrelation samplings:

{[\begin{matrix} a & b \\ c & d \end{matrix}]}^{- 1} . . . 4

Wherein a, b, c and d are real numbers; And

By inverse transformation should a left side and the audio sample of right conversion obtain a left side and right audio sample.

19. coding/decoding method as claimed in claim 18, wherein correlation matrix is provided at each node that is used for unfilial son's wave conversion, or is provided at each frame place.

20. coding/decoding method as claimed in claim 18, wherein in the audio sample that obtains a left side and right conversion, the correlation matrix that utilizes following formula 5 to represent obtains the sampling of a left side and right conversion by the stereo processing in centre/side:

{[\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}]}^{- 1} . . . 5

21. coding/decoding method as claimed in claim 18, the audio sample that wherein obtains a left side and right conversion comprise that the correlation matrix that utilizes following formula 6 expressions handles the sampling that obtains left and right audio frequency conversion by intensity:

[\begin{matrix} 1 & 0 \\ level_diff & 0 \end{matrix}] . . . 6

Wherein, level_diff is the level difference between a left side and the right audio sample;

This level difference of inverse transformation; And

From the level difference of inverse transformation, obtain a left side and right audio sample.

22. coding/decoding method as claimed in claim 18, wherein in the audio sample that obtains a left side and right conversion, the correlation matrix that utilizes following formula 7 to represent obtains the sampling of this left side and right conversion:

{[\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}]}^{- 1} . . . 7

Wherein θ represents the direction of a left side and right audio sample.

23. coding/decoding method as claimed in claim 18, wherein the losslessly encoding of audio stream is arithmetic decoding or Hofmann decoding.

24. coding/decoding method as claimed in claim 18, wherein the losslessly encoding of audio stream is included as a plurality of predetermined layer with scalability audio stream is decoded.

25. a stereo audio coding device comprises:

N the audio sample that obtains respectively carried out the converter unit of wavelet transform from the n passage;

Multiply by the redundant information elimination unit that the n*n de-correlation-matrix is eliminated the interchannel redundant information by audio sample with n conversion;

The quantifying unit that the sampling of therefrom having eliminated redundant information is quantized; And

By the position packaged unit that the described quantised samples of lossless coding comes execute bit to pack, wherein n is the integer more than or equal to 2.

26. stereo audio coding device as claim 25, wherein this redundant information is eliminated the unit and is selected stereo tupe or normal mode according to predetermined cost function, if and this stereo tupe of selection, multiply by matrix by audio sample and obtain one or more decorrelation samplings with n*n element determining like this with n conversion, so that the entropy minimization of the sampling that obtains after removing the interchannel redundant information, and quantifying unit quantizes described decorrelation sampling.

27. a stereo audio coding device comprises:

Provide the psychologic acoustics unit that closes psychological acoustic model information;

Based on the information conversion left side of relevant psychoacoustic model and the converter unit of right audio sample;

Multiply by the redundant information elimination unit that has obtained therefrom to eliminate the decorrelation sampling of interchannel redundant information by the de-correlation-matrix of following formula 1 expression by audio sample with a left side and right conversion:

[\begin{matrix} a & b \\ c & d \end{matrix}] . . . 1

Wherein a, b, c and d are real numbers;

Quantize the quantifying unit of described decorrelation sampling based on the information of relevant psychoacoustic model; And

Sampling by the described quantification of lossless coding is the position packaged unit of unit with the position packing with the frame.

28. stereo audio coding device as claim 27, wherein this redundant information is eliminated the unit and is selected stereo tupe or normal mode according to predetermined cost function, if and selected stereo tupe, multiply by the de-correlation-matrix that comprises element a, the b, c and the d that determine like this by audio sample and obtain one or more decorrelation samplings, so that the entropy minimization of the sampling that after eliminating the interchannel redundant information, obtains conversion.

29. as the stereo audio coding device of claim 27, wherein encode by selecteed flag information and the stereo process information that comprises the matrix element information of element a, the b, c and the d that notify de-correlation-matrix to representing stereo tupe for this packaged unit.

30. as the stereo audio coding device of claim 29, wherein this redundant information is eliminated the unit and obtained the decorrelation sampling, the de-correlation-matrix of representing by the following formula 2 of the stereo processing and utilizing in centre/side obtains this decorrelation sampling:

[\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] . . . 2

31. as the stereo audio coding device of claim 29, wherein this redundant information is eliminated the de-correlation-matrix of representing by the following formula 3 of intensity processing and utilizing the unit and is obtained the decorrelation sampling:

[\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}] . . . 3

And the position packaged unit encodes the level difference between a left side and the right audio sample as additional information.

32. as the stereo audio coding device of claim 29, wherein this redundant information is eliminated the de-correlation-matrix acquisition decorrelation sampling of following formula 4 expressions of unit by using:

[\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}] . . . 4

33. as the stereo audio coding device of claim 29, wherein this packaged unit is carried out arithmetic coding or huffman coding.

34. as the stereo audio coding device of claim 29, its meta packaged unit is a plurality of predetermined layer with scalability this quantised samples of encoding.

35. a device that is used for decoded audio stream comprises:

Obtain the unit of unpacking of quantised samples by the losslessly encoding audio stream;

Obtain the inverse quantization unit of n sampling by the sampling of the described quantification of re-quantization;

By the redundant information recovery unit that the n*n correlation matrix obtains the audio sample of n conversion is multiply by in n sampling; And

Carry out the inverse transformation block that unfilial son's wave conversion obtains n audio sample of corresponding n passage by the audio sample to n conversion, wherein n is the integer more than or equal to 2.

36. as the decoding device of claim 35, wherein this redundant information recovery unit determines whether to have selected stereo tupe, and if selected this stereo tupe, obtain n converting audio frequency sampling by n sampling being multiply by correlation matrix.

37. a device that is used for decoded audio stream comprises:

Obtain the inverse quantization unit of decorrelation sampling by the described quantised samples of re-quantization;

By the redundant information recovery unit that is obtained the audio sample of a left side and right conversion by the correlation matrix of following formula 4 expressions is multiply by in 2 decorrelation samplings:

{[\begin{matrix} a & b \\ c & d \end{matrix}]}^{- 1} . . . 4

Wherein a, b, c and d are real numbers; And

By inverse transformation should a left side and the audio sample of right conversion obtain the inverse transformation block of a left side and right audio sample.

38. as the decoding device of claim 37, wherein correlation matrix is provided at each node that is used for unfilial son's wave conversion, or is provided at each frame place.

39. as the decoding device of claim 37, wherein this redundant information recovery unit utilizes the correlation matrix of following formula 5 expressions to obtain the audio sample of a left side and right conversion:

{[\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}]}^{- 1} . . . 5

40. as the decoding device of claim 37, wherein this redundant information recovery unit utilizes the correlation matrix of following formula 7 expressions to obtain the audio sample of a left side and right conversion:

{[\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}]}^{- 1} . . . 7

Wherein θ represents the direction of a left side and right audio sample.

41. a device that is used for decoded audio stream comprises:

Obtain the inverse quantization unit of n sampling by the described quantised samples of re-quantization;

Utilize the correlation matrix of following formula 6 expressions to handle the redundant information recovery unit that obtains a left side and right audio frequency transformed samples by intensity:

[\begin{matrix} 1 & 0 \\ level_diff & 0 \end{matrix}] . . . 6

Wherein level_diff is the level difference between a left side and the right audio sample; And

This level difference of inverse transformation also obtains the inverse transformation block of a left side and right audio sample from the level difference of inverse transformation.

42. as the decoding device of claim 37, arithmetic decoding or Hofmann decoding are carried out in the unit of wherein unpacking.

43. as the decoding device of claim 37, the unit of wherein unpacking is a plurality of predetermined layer decoded audio stream with scalability.