CN101379552B

CN101379552B - Apparatus and method for encoding/decoding signal

Info

Publication number: CN101379552B
Application number: CN2007800045087A
Authority: CN
Inventors: 郑亮源; 房熙锡; 吴贤午; 金东秀; 林宰显
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2006-02-07
Filing date: 2007-02-07
Publication date: 2013-06-19
Anticipated expiration: 2027-02-07
Also published as: CN101379555B; CN101379555A; CN101385076B; CN101379552A; CN101385076A; CN101379553A; CN101385075B; CN101385077B; CN101385077A; CN101379553B; CN101379554A; CN101385075A; CN101379554B

Abstract

An encoding method and apparatus and a decoding method and apparatus are provided. The decoding method includes extracting a three-dimensional (3D) down-mix signal and spatial information from an input bitstream, removing 3D effects from the 3D down-mix signal by performing a 3D rendering operation on the 3D down-mix signal, and generating a multi-channel signal using the spatial information and a down-mix signal obtained by the removal. Accordingly, it is possible to efficiently encode multi-channel signals with 3D effects and to adaptively restore and reproduce audio signals with optimum sound quality according to the characteristics of a reproduction environment.

Description

The apparatus and method that are used for encoding/decoding signal

Technical field

The present invention relates to coding/decoding method and coding/decoding device, make it possible to produce three-dimensional (3D) acoustic coding/decoding device but relate in particular to audio signal, and the coding/decoding method that utilizes this coding/decoding device.

Background technology

Code device reduces with multi-channel signal the signal that audio mixing becomes to have less sound channel, and will be sent to decoding device through the signal of reduction audio mixing.Then, decoding device recovers multi-channel signal from the signal through the reduction audio mixing, and uses the multi-channel signal that recovers as the three or more loudspeaker reproduction of 5.1 channel loudspeakers and so on.

Multi-channel signal can be reproduced by 2 channel loudspeakers such as earphone.In this case, in order to make the user feel sound by 2 channel loudspeakers outputs as from three or more sound sources reproductions, be necessary to develop and encode or the decoding multi-channel signal makes it possible to produce three-dimensional (3D) treatment technology of 3D effect.

Summary of the invention

Technical matters

The invention provides and a kind ofly can reproduce by processing expeditiously the signal with 3D effect coding/decoding device and the coding/decoding method of multi-channel signal in various reproducing environment.

Technical solution

According to an aspect of the present invention, provide the coding/decoding method of a kind of decoding from the signal of incoming bit stream, this coding/decoding method comprises: extract to reduce arbitrarily audio signal and compensate from incoming bit stream and should reduce arbitrarily the necessary compensated information of audio signal; Utilize compensated information compensation reduction audio signal arbitrarily; And by being played up operation, any reduction audio signal execution 3D that compensates comes generating three-dimensional (3D) reduction audio signal.

According to another aspect of the present invention, provide the coding/decoding method of a kind of decoding from the signal of incoming bit stream, this coding/decoding method comprises: extract from incoming bit stream and reduce arbitrarily audio signal and compensate this required compensated information of reduction audio signal arbitrarily; This compensated information is incorporated into and the filter information that the wave filter of playing up operation for 3D is relevant; And generate 3D reduction audio signal by using the filter information that is obtained by this combination to play up operation to any reduction audio signal execution 3D.

According to another aspect of the present invention, the decoding device of a kind of decoding from the signal of incoming bit stream is provided, this decoding device comprises: the bit split cells, and it extracts to reduce arbitrarily audio signal and compensate from incoming bit stream and should reduce arbitrarily the necessary compensated information of audio signal; Reduction audio mixing compensating unit, it utilizes compensated information compensation reduction audio signal arbitrarily; And the 3D rendering unit, it generates 3D reduction audio signal by any reduction audio signal execution 3D through compensation is played up operation.

According to another aspect of the present invention, providing a kind of has for any computer readable recording medium storing program for performing of computer program of carrying out above-mentioned coding/decoding method.

Beneficial effect

According to the present invention, can encode efficiently has the multi-channel signal of 3D effect, and recovers adaptively and reproducing audio signal with optimum tonequality according to the characteristic of reproducing environment.

Brief Description Of Drawings

Fig. 1 is the block diagram of coding/decoding device according to an embodiment of the invention;

Fig. 2 is the block diagram of code device according to an embodiment of the invention;

Fig. 3 is the block diagram of decoding device according to an embodiment of the invention;

Fig. 4 is the block diagram of code device according to another embodiment of the invention;

Fig. 5 is the block diagram of decoding device according to another embodiment of the invention;

Fig. 6 is the block diagram of decoding device according to another embodiment of the invention;

Fig. 7 is the block diagram of three-dimensional according to an embodiment of the invention (3D) rendering device;

Fig. 8 to 11 illustrates bit stream according to an embodiment of the invention;

Figure 12 is for the treatment of reducing arbitrarily the block diagram of the coding/decoding device of audio signal according to embodiments of the invention;

Figure 13 reduces arbitrarily the block diagram of audio signal compensation/3D rendering unit according to an embodiment of the invention;

Figure 14 is according to the block diagram of embodiments of the invention for the treatment of the decoding device of compatibility reduction audio signal;

Figure 15 is the block diagram that reduces according to an embodiment of the invention the compatible processing/3D rendering unit of audio mixing; And

Figure 16 is used for eliminating the block diagram of the decoding device of crosstalking according to embodiments of the invention.

Preferred forms of the present invention

Hereinafter with reference to the accompanying drawing that exemplary embodiment of the present invention is shown, the present invention is described more fully.Fig. 1 is the block diagram of coding/decoding device according to an embodiment of the invention.With reference to figure 1, coding unit 100 comprises multi-channel encoder 110, three-dimensional (3D) rendering unit 120, reduction audio mixing scrambler 130 and bit packaged unit 140

Multi-channel encoder 110 will have the reduction audio signal of multichannel information reduction audio mixing one-tenth such as stereo or monophonic signal of a plurality of sound channels, and generate the spatial information about the sound channel of this multi-channel signal.Needing spatial information is in order to recover multi-channel signal from the reduction audio signal.

The example of spatial information comprises: levels of channels poor (CLD), the sound channel predictive coefficient (CPC) of difference of the energy level of a pair of sound channel of indication---namely be used for generating based on 2 sound channel signals the sound channel mistiming (CTD) in the time interval between correlativity (ICC) between the predictive coefficient of 3 sound channel signals, the sound channel of correlativity between a pair of sound channel of indication and a pair of sound channel.

3D rendering unit 120 generates 3D reduction audio signal based on the reduction audio signal.3D reduction audio signal can be 2 sound channel signals with three or more directivity, therefore can be reproduced and have 3D effect by 2 channel loudspeakers such as earphone.In other words, 3D reduction audio signal can be reproduced by 2 channel loudspeakers, make the user feel 3D reduction audio signal seem from the sound source with three or more sound channels reproduce the same.The direction of sound source can be determined based at least one in the difference of the phase place of the time interval between poor, two sound of the intensity of two sound that are input to respectively two ears and two sound.Therefore, how 3D rendering unit 120 can utilize its sense of hearing to determine that the 3D position of sound source converts the reduction audio signal to 3D reduction audio signal based on the mankind.

3D rendering unit 120 can generate 3D reduction audio signal by utilizing filter filtering reduction audio signal.In this case, can be by external source with the wave filter relevant information---be input to 3D rendering unit 120 as filter coefficient.3D rendering unit 120 can utilize the spatial information that is provided by multi-channel encoder 110 to generate 3D reduction audio signal based on the reduction audio signal.More specifically, 3D rendering unit 120 can be by utilizing spatial information will reduce multi-channel signal that audio signal converts imagination to and the multi-channel signal of this imagination of filtering converts the reduction audio signal to 3D reduction audio signal.

3D rendering unit 120 can generate 3D reduction audio signal by utilizing header related transfer function (HRTF) filter filtering reduction audio signal.

HRTF is a kind of transport function, and it describes the transmission of sound wave between the sound source of optional position and ear-drum, and returns according to the direction of sound source and the value of height change.If utilize HRTF filtering there is no the signal of directivity, can hear that this signal is as reproducing from certain direction.

3D rendering unit 120 can be carried out 3D and play up operation in the frequency domain in for example discrete Fourier transform (DFT) (DFT) territory or fast fourier transform (FFT) territory and so on.In this case, 3D rendering unit 120 can be carried out DFT or FFT before 3D plays up operation, perhaps can carry out contrary DFT (IDFT) or contrary FFT (IFFT) after 3D plays up operation.

3D rendering unit 120 can be carried out 3D and play up operation in quadrature mirror filter (QMF)/hybrid domain.In this case, 3D rendering unit 120 can be carried out QMF/ hybrid analysis and synthetic operation before or after 3D plays up operation.

3D rendering unit 120 can be carried out 3D and play up operation in time domain.3D rendering unit 120 can determine that will carry out 3D in which territory plays up operation according to the functipnal capability of required tonequality and coding/decoding device.

The reduction audio signal of reduction audio mixing scrambler 130 coding multi-channel encoder 110 outputs or the 3D reduction audio signal of being exported by 3D rendering unit 120.Reduction audio mixing scrambler 130 can utilize the audio coding method such as advanced audio decoding (AAC) method, MPEG layer 3 (MP3) method or bit slice algorithm decoding (BSAC) method to come the reduction audio signal of coding multi-channel encoder 110 outputs or the 3D reduction audio signal of being exported by 3D rendering unit 120.

The reduction audio mixing scrambler 130 non-3D reduction audio signal of codified or 3D reduction audio signal.In this case, encoded non-3D reduction audio signal and encoded 3D reduction audio signal both can be included in bit stream to be transmitted.

Bit packaged unit 140 is based on spatial information and or encoded non-3D reduces audio signal or encoded 3D reduction audio signal generates bit stream.

The bit stream that is generated by bit packaged unit 140 can comprise that the reduction audio signal that spatial information, indication are included in bit stream is non-3D reduction audio signal or the reduction audio mixing identification information of 3D reduction audio signal and the information (for example, HRTF coefficient information) that identifies the wave filter that is used by 3D rendering unit 120.

In other words, the bit stream by 140 generations of bit packaged unit can comprise that the non-3D that also processes without 3D reduces audio signal and processes at least one that operates in the scrambler 3D reduction audio signal of obtaining and the reduction audio mixing identification information that identifies the type that is included in the reduction audio signal in bit stream by the 3D that is carried out by code device.

Can select or determine that according to the characteristic of the ability of coding/decoding device shown in Figure 1 and reproducing environment which in non-3D reduction audio signal and scrambler 3D reduction audio signal will be included in the bit stream that is generated by bit packaged unit 140 by the user.

The HRTF coefficient information can comprise the contrafunctional coefficient of the HRTF that is used by 3D rendering unit 120.The HRTF coefficient information can only comprise the brief information of the coefficient of the HRTF that is used by 3D rendering unit 120, for example, and the envelope information of HRTF coefficient.If will comprise that the bit stream of the contrafunctional coefficient of HRTF is sent to decoding device, decoding device does not need to carry out the operation of HRTF coefficients conversion, therefore can reduce the calculated amount of decoding device.

The bit stream that is generated by bit packaged unit 140 also can comprise the information about the energy variation in the signal that is caused by the filtering based on HRTF, that is, about the energy of the difference of the energy of signal that will filtering and the energy of the signal of filtering or signal that will filtering and the information of the ratio of the energy of the signal of filtering.

The bit stream that is generated by bit packaged unit 140 also can comprise indication, and whether it comprises the information of HRTF coefficient.If the HRTF coefficient is included in the bit stream that is generated by bit packaged unit 140, this bit stream also can comprise indication it comprises the information of coefficient or the contrafunctional coefficient of HRTF of the HRTF that is used by 3D rendering unit 120.

Comprise bit split cells 210, reduction audio mixing demoder 220,3D rendering unit 230 and multi-channel decoder 240 with reference to figure 1, the first decoding unit 200.

Bit split cells 210 receives incoming bit stream from coding unit 100, and extracts encoded reduction audio signal and spatial information from this incoming bit stream.220 pairs of encoded reduction audio signal of reduction audio mixing demoder are decoded.Reduction audio mixing demoder 220 can utilize the audio signal decoding method such as AAC method, MP3 method or BSAC method that encoded reduction audio signal is decoded.

As mentioned above, the encoded reduction audio signal of extracting from incoming bit stream can be that encoded non-3D reduction audio signal or encoded, scrambler 3D reduce audio signal.The encoded reduction audio signal that indication is extracted from incoming bit stream is that encoded non-3D reduction audio signal or information encoded, scrambler 3D reduction audio signal can be included in incoming bit stream.

If the encoded reduction audio signal of extracting from incoming bit stream is scrambler 3D reduction audio signal, encoded reduction audio signal can easily be reproduced after by 220 decodings of reduction audio mixing demoder.

On the other hand, if the encoded reduction audio signal of extracting from incoming bit stream is non-3D reduction audio signal, encoded reduction audio signal can be by 220 decodings of reduction audio mixing demoder, and can play up operation by the 3D that is carried out by the 3rd rendering unit 233 by the reduction audio signal that decoding is obtained and convert demoder 3D reduction audio signal to.Demoder 3D reduction audio signal can easily be reproduced.

3D rendering unit 230 comprises the first renderer 231, the second renderer 232 and the 3rd renderer 233.The first renderer 231 generates the reduction audio signal by the scrambler 3D reduction audio signal execution 3D that is provided by reduction audio mixing demoder 220 is played up operation.For example, the first renderer 231 can generate non-3D reduction audio signal by remove 3D effect from scrambler 3D reduction audio signal.The 3D effect of scrambler 3D reduction audio signal may not be removed fully by the first renderer 231.In this case, the reduction audio signal by the first renderer 231 outputs can have identical 3D effect.

The first renderer 231 can the 3D reduction audio signal that be provided by reduction audio mixing demoder 220 be provided to inverse filter with the wave filter of 3D rendering unit 120 use of coding unit 100 with the reduction audio signal of 3D effect from its removal.The information of the inverse filter of the wave filter that uses about the wave filter that used by 3D rendering unit 120 or by 3D rendering unit 120 can be included in incoming bit stream.

The wave filter that is used by 3D rendering unit 120 can be hrtf filter.In this case, the coefficient of the HRTF that is used by coding unit 100 or the contrafunctional coefficient of HRTF also can be included in incoming bit stream.If the coefficient of the HRTF that is used by cell encoder 100 is included in incoming bit stream, the HRTF coefficient can be reversed and change, and can play up the result that operating period uses this inverse conversion at the 3D that is carried out by the first renderer 231.If the contrafunctional coefficient of the HRTF that is used by coding unit 100 is included in incoming bit stream, they can be played up operating period at the 3D that is carried out by the first renderer 231 and easily use, and do not carry out any inverse conversion operation.In this case, can reduce the calculated amount of the first decoding device 100.

Incoming bit stream also can comprise filter information (for example, whether the coefficient of the HRTF that used by coding unit 100 of indication is included in the information in incoming bit stream) and indicate this filter information whether to be reversed the information of changing.

Multi-channel decoder 240 is based on generate the 3D multi-channel signal with three or more sound channels from the reduction audio signal of its removal 3D effect and the spatial information that extracts from incoming bit stream.

The second renderer 232 can generate the 3D reduction audio signal with 3D effect by the reduction audio signal execution 3D that removes 3D effect from it is played up operation.In other words, the first renderer 231 is removed 3D effect from the scrambler 3D reduction audio signal that is provided by reduction audio mixing demoder 220.Afterwards, the second renderer 232 can utilize the wave filter of the first decoding device, generates the desired combination 3D reduction audio signal with 3D effect of the first decoding device 200 by the reduction audio signal execution 3D that is obtained by the first renderer 231 execution removals being played up operation.

The first decoding device 200 can comprise two or more the renderer that wherein is combined with in first, second, and third renderer 231,232 and 233 of carrying out same operation.

The bit stream that is generated by coding unit 100 can be imported into the second decoding device 300 with structure different from the first decoding device 200.The second decoding device 300 can generate 3D reduction audio signal based on the reduction audio signal that is included in the bit stream of its input.

More specifically, the second decoding device 300 comprises bit split cells 310, reduction

audio mixing demoder

320 and 3D rendering unit 330.Bit split cells 310 receives incoming bit stream from coding unit 100, and extracts encoded reduction audio signal and spatial information from this incoming bit stream.320 pairs of encoded reduction audio signal of reduction audio mixing demoder are decoded.330 pairs of reduction audio signal through decoding of 3D rendering unit are carried out 3D and are played up operation, make the reduction audio signal through decoding can be converted into 3D reduction audio signal.

Fig. 2 is the block diagram of code device according to an embodiment of the invention.With reference to figure 2, this code device comprises rendering unit 400 and 420 and multi-channel encoder 410.The detailed description of the cataloged procedure identical with the embodiment of Fig. 1 will be omitted.

With reference to figure 2,3D rendering unit 400 and 420 can be separately positioned on the front and back of multi-channel encoder 410.Therefore, multi-channel signal can carry out 3D by 3D rendering unit 400 to be played up, and then, the multi-channel signal of playing up through 3D can be by multi-channel encoder 410 codings, reduces audio signal thereby generate pretreated, scrambler 3D.Perhaps, multi-channel signal can reduce audio mixing by multi-channel encoder 410, then, can carry out 3D by 3D rendering unit 420 and plays up through the signal of reduction audio mixing, thereby generate through aftertreatment, scrambler reduction audio signal.

The indication multi-channel signal carries out after reduction still reduces audio mixing before audio mixing during information that 3D plays up can be included in bit stream to be transmitted.

3D rendering unit 400 and 420 both can be arranged on front or the back of multi-channel encoder 410.

Fig. 3 is the block diagram of decoding device according to an embodiment of the invention.With reference to figure 3, this decoding device comprises 3D rendering unit 430 and 450 and multi-channel decoder 440.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

With reference to figure 3,3D rendering unit 430 and 450 can be separately positioned on the front and back of multi-channel decoder 440.3D rendering unit 430 can be removed 3D effect from scrambler 3D reduction audio signal, and will be input to multi-channel decoder 430 by the reduction audio signal that removal is obtained.Then, multi-channel decoder 430 decodable codes are to the reduction audio signal of its input, thereby generate pretreated 3D multi-channel signal.Perhaps, multi-channel decoder 430 can recover multi-channel signal from encoded 3D reduction audio signal, and 3D rendering unit 450 can remove 3D effect from the multi-channel signal that recovers, thereby generates through the 3D of aftertreatment multi-channel signal.

3D plays up operation and the operation of reduction audio mixing subsequently generates if the scrambler 3D that is provided by code device reduction audio signal is by carrying out, and scrambler 3D reduction audio signal can be decoded by carrying out multi-channel decoding operation and 3D subsequently to play up operation.On the other hand, play up operation and generate if scrambler 3D reduction audio signal reduce audio mixing operation and 3D subsequently by executions, scrambler 3D reduction audio signal can be by carrying out that 3D plays up that operation operates with subsequently multi-channel decoding and decoded.

Can extract the encoded 3D reduction audio signal of indication from the bit stream that code device transmits is to play up by execution 3D before reducing the audio mixing operation or after the operation of reduction audio mixing the information that operation is obtained.

3D rendering unit 430 and 450 both can be arranged on front or the back of multi-channel decoder 440.

Fig. 4 is the block diagram of code device according to another embodiment of the invention.With reference to figure 4, code device comprises

multi-channel encoder

500,3D rendering unit 510, reduction audio mixing scrambler 520 and bit packaged unit 530.The detailed description of the cataloged procedure identical with the embodiment of Fig. 1 will be omitted.

With reference to figure 4, multi-channel encoder 500 generates reduction audio signal and spatial information based on the input multi-channel signal.3D rendering unit 510 generates 3D reduction audio signal by reduction audio signal execution 3D is played up operation.

Can select or determine whether that according to the ability of code device, characteristic or the desired tonequality of reproducing environment the reduction audio signal is carried out 3D plays up operation by the user.

The reduction audio signal that reduction audio mixing scrambler 520 coding multi-channel encoders 500 generate or the 3D reduction audio signal that is generated by 3D rendering unit 510.

Bit packaged unit 530 based on spatial information and or encoded reduction audio signal or encoded, scrambler 3D reduction audio signal generate bit stream.The bit stream that is generated by bit packaged unit 530 can comprise that the encoded reduction audio signal that indication is included in bit stream does not have the non-3D reduction audio signal of 3D effect or the reduction audio mixing identification information with scrambler 3D reduction audio signal of 3D effect.More specifically, reduction audio mixing identification information can indicate the bit stream that is generated by bit packaged unit 530 whether to comprise non-3D reduction audio signal, scrambler 3D reduction audio signal or the two.

Fig. 5 is the block diagram of decoding device according to another embodiment of the invention.With reference to figure 5, decoding device comprises bit split cells 540, reduction

audio mixing demoder

550 and 3D rendering unit 560.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

With reference to figure 5, bit split cells 540 extracts encoded reduction audio signal, spatial information and reduction audio mixing identification information from incoming bit stream.The encoded reduction audio signal of reduction audio mixing identification information indication is the encoded 3D reduction audio signal that does not have the encoded non-3D reduction audio signal of 3D effect or have 3D effect.

If incoming bit stream comprises non-3D reduction audio signal and 3D reduction audio signal, only can select or come to extract one of non-3D reduction audio signal and 3D reduction audio signal according to the ability of decoding device, characteristic or the required tonequality of reproducing environment from incoming bit stream by the user.

550 pairs of encoded reduction audio signal of reduction audio mixing demoder are decoded.If the reduction audio signal of obtaining by the decoding of being carried out by reduction audio mixing demoder 550 is to play up by carrying out 3D the scrambler 3D reduction audio signal that operation is obtained, this reduction audio signal can easily be reproduced.

On the other hand, if be the reduction audio signal with 3D effect by the reduction audio signal of being obtained by the decoding that reduces 550 execution of audio mixing demoder, 3D rendering unit 560 can generate demoder 3D reduction audio signal by playing up to operate to the reduction audio signal execution 3D that is obtained by the 550 execution decodings of reduction audio mixing demoder.

Fig. 6 is the block diagram of decoding device according to another embodiment of the invention.With reference to figure 6, decoding device comprises bit split cells 600, reduction audio mixing demoder 610, a 3D rendering unit 620, the 2nd 3D rendering unit 630 and filter information storage unit 640.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

That bit split cells 600 extracts from incoming bit stream is encoded, scrambler 3D reduction audio signal and spatial information.610 pairs of reduction audio mixing demoders encoded, scrambler 3D reduction audio signal decodes.

The one 3D rendering unit 620 is used and is used for carrying out the inverse filter of wave filter that 3D plays up the code device of operation, removes 3D effect from the scrambler 3D reduction audio signal that the decoding of being carried out by reduction audio mixing demoder 610 is obtained.The utilization of the second rendering unit 630 is stored in wave filter in decoding device by removing the reduction audio signal of obtaining and carry out 3D and play up operation and generate the combination 3D reduction audio signal with 3D effect being carried out by a 3D rendering unit 620.

The 2nd 3D rendering unit 630 can be utilized its characteristic wave filter different from the wave filter of the coding unit of playing up operation for execution 3D to carry out 3D and play up operation.For example, the 2nd 3D rendering unit 630 can be utilized the different HRTF of coefficient of the HRTF that its coefficient and code device use to carry out 3D and play up operation.

Filter information storage unit 640 is stored the filter information about the wave filter of playing up for execution 3D, for example, and the HRTF coefficient information.The 2nd 3D rendering unit 630 can utilize the filter information that is stored in filter information storage unit 640 to generate combination 3D reduction audio mixing.

Filter information storage unit 640 can be stored many filter informations that correspond respectively to a plurality of wave filters.In this case, can select or select one of many filter informations according to ability or the desired tonequality of decoding device by the user.

Can have different ear structures from not agnate people.Therefore, the HRTF coefficient for Different Individual optimization can differ from one another.Decoding device shown in Fig. 6 can generate the 3D reduction audio signal for user optimization.In addition, what the decoding device shown in Fig. 6 can be regardless of the type of the HRTF that is provided by 3D reduction audio signal supplier, reduces audio signal and generate the 3D with 3D effect corresponding with the desired hrtf filter of user.

Fig. 7 is the block diagram of 3D rendering device according to an embodiment of the invention.With reference to figure 7, the 3D rendering device comprises the first and second territory converting units 700 and 720 and 3D rendering unit 710.Play up operation in order to carry out 3D in predetermined territory, the first and second territory converting units 700 and 720 can be separately positioned on the front and back of 3D rendering unit 710.

With reference to figure 7, input reduction audio signal can convert frequency domain reduction audio signal to by the first territory converting unit 700.More specifically, the first territory converting unit 700 can will be inputted the reduction audio signal by execution DFT or FFT and convert DFT territory reduction audio signal or FFT territory reduction audio signal to.

3D rendering unit 710 generates multi-channel signal by spatial information being put on the frequency domain reduction audio signal that is provided by the first territory converting unit 700.Afterwards, 3D rendering unit 710 generates 3D reduction audio signal by the filtering multi-channel signal.

The 3D reduction audio signal that is generated by 3D rendering unit 710 converts time domain 3D reduction audio signal to by the second territory converting unit 720.More specifically, the second territory converting unit 720 can be carried out IDFT or IFFT to the 3D reduction audio signal that is generated by 3D rendering unit 710.

During frequency domain 3D reduction audio signal converted time domain 3D reduction audio signal to, loss of data or the data distortion of aliasing and so on may occur.

In order to generate multi-channel signal and the 3D reduction audio signal in frequency domain, the spatial information of each parameter band can be mapped to frequency domain, and a plurality of filter coefficients can be converted to frequency domain.

3D rendering unit 710 can multiply each other to generate 3D reduction audio signal by making frequency domain reduction audio signal, spatial information and the filter coefficient that the first territory converting unit 700 provides.

Have M useful signal by the time-domain signal that the reduction audio signal, spatial information and a plurality of filter coefficient that all represent is multiplied each other obtain in M point frequency domain.For expression reduction audio signal, spatial information and filter coefficient in M point frequency domain, can carry out M point DFT or M point FFT.

Useful signal is the signal that not necessarily has 0 value.For example, can generate x useful signal altogether by obtaining x signal via sampling from sound signal.In this x useful signal, y useful signal is by zero padding.Then, the decreased number of useful signal is to (x-y).Afterwards, have the signal of a useful signal and signal with b useful signal by convolution, thereby obtain (a+b-1) individual useful signal altogether.

Reducing multiplying each other of audio signal, spatial information and filter coefficient in M point frequency domain can provide and convolution in time domain reduction audio signal, effect that spatial information is identical with filter coefficient.Signal with (3*M-2) individual useful signal can generate by the result that reduction audio signal, spatial information and filter coefficient in M point frequency domain is converted to time domain and this conversion of convolution.

The number of the useful signal in the signal that results conversion to the time domain that therefore, multiplies each other and will multiply each other by reduction audio signal, spatial information and the filter coefficient that makes in frequency domain is obtained may be different from the number of useful signal in the signal that obtains by reduction audio signal, spatial information and filter coefficient in the convolution time domain.As a result, during the reduction of the 3D in frequency domain audio signal is converted to time-domain signal, aliasing can occur.

In order to prevent aliasing, the summation of the number of the number of the useful signal of the reduction audio signal in time domain, the number of useful signal of spatial information that is mapped to frequency domain and filter coefficient can not be greater than M.The number of useful signal that is mapped to the spatial information of frequency domain can be determined according to the number of the point of frequency domain.In other words, if the spatial information that each parameter band is represented is mapped to N point frequency domain, the number of the useful signal of spatial information can be N.

Comprise the first zero padding unit 701 and the first frequency domain converting unit 702 with reference to figure 7, the first territory converting units 700.The 3rd rendering unit 710 comprises map unit 711, time domain converting unit 712, the second zero padding unit 713, the second frequency domain converting unit 714, multi-channel signal generation unit 715, the 3rd zero padding unit 716, the 3rd frequency domain converting unit 717 and 3D reduction audio signal generation unit 718.

The reduction audio signal with X sample in the first zero padding 701 pairs of unit time domain is carried out the zero padding operation and is made the number of samples of reduction audio signal to increase to M from X.The first frequency domain converting unit 702 is reduced audio signal with zero padding and is converted M point frequency-region signal to.Reduction audio signal through zero padding has M sample.In M sample of the reduction audio signal of zero padding, only X sample is useful signal.

Map unit 711 maps to N point frequency domain with the spatial information of each parameter band.Time domain converting unit 712 will be converted to time domain by the spatial information that the mapping that map unit 711 is carried out is obtained.The spatial information that obtains by time domain converting unit 712 execution conversions has N sample.

The spatial information with N sample in the second zero padding 713 pairs of unit time domain is carried out the zero padding operation, makes the number of samples of spatial information to increase to M from N.The second frequency domain converting unit 714 will convert through the spatial information of zero padding M point frequency-region signal to.Spatial information through zero padding has N sample.In N sample of the spatial information of zero padding, only

Individual sample is effective.

The spatial information that multi-channel signal generation unit 715 provides by the reduction audio signal that makes the first frequency domain converting unit 712 and provide and the second frequency domain converting unit 714 generation multi-channel signal that multiplies each other.The multi-channel signal that is generated by multi-channel signal generation unit 715 has M useful signal.On the other hand, the multi-channel signal that the reduction audio signal that is provided by the first frequency domain converting unit 712 by convolution in time domain and the spatial information that is provided by the second frequency domain converting unit 714 obtain has (X+N-1) individual useful signal.

The zero padding operation can be carried out to the Y filter coefficient that represents in time domain in the 3rd zero padding unit 716, make the number of sample can increase to M.The 3rd frequency domain converting unit 717 converts the zero padding filter coefficient to M point frequency domain.Filter coefficient through zero padding has M sample.In M sample, only Y sample is useful signal.

A plurality of filter coefficients that 3D reduction audio signal generation unit 718 provides by multi-channel signal that multi-channel signal generation unit 715 is generated and the 3rd frequency domain converting unit 717 multiply each other to generate 3D and reduce audio signal.The 3D reduction audio signal that is generated by 3D reduction audio signal generation unit 718 has M useful signal.On the other hand, the 3D reduction audio signal that the multi-channel signal that is generated by multi-channel signal generation unit 715 by convolution in time domain and the filter coefficient that is provided by the 3rd frequency domain converting unit 717 obtain has (X+N+Y-2) individual useful signal.

Be arranged to satisfy following equation: M 〉=(X+N+Y-2) prevent that aliasing from being possible by the M point frequency domain that first, second, and third frequency domain converting unit 702 is used.In other words, might prevent aliasing by M point DFT or the M point FFT that first, second, and third frequency domain converting unit 702,714 and 717 can be carried out satisfy following equation: M 〉=(X+N+Y-2).

Conversion to frequency domain can utilize the bank of filters except DFT bank of filters, fft filters group and QMF group to carry out.The generation of 3D reduction audio signal can utilize hrtf filter to carry out.

The number of the useful signal of spatial information can utilize the method except said method to regulate, and maybe can utilize most effective in said method and need a kind of minimum method of calculated amount to regulate.

Not only be converted to from frequency domain at signal, coefficient or spatial information time domain or conversely during, and be converted to from the QMF territory at signal, coefficient or spatial information hybrid domain or conversely during, all aliasing can occur.The above-mentioned method that prevents aliasing also be used in signal, coefficient or spatial information from the QMF territory be converted to hybrid domain or conversely during prevent that aliasing from occuring.

The spatial information that is used for generation multi-channel signal or 3D reduction audio signal can change.Result as spatial information changes can occur as the signal of noise discontinuous in output signal.

Noise in output signal can utilize smoothing method to reduce, and can prevent that by this smoothing method spatial information from changing fast.

For example, when the first spatial information that puts on the first frame at the first frame and the second frame when adjacent one another are and the spatial information that puts on the second frame not simultaneously, occur most probably discontinuous between the first frame and the second frame.

In this case, can utilize the first spatial information compensation second space information or utilize second space information compensation the first spatial information, make the difference of the first spatial information and second space information to reduce, thereby can be reduced by the discontinuous noise that causes between the first and second frames.More specifically, at least one in the first spatial information and second space information can replace with the mean value of the first spatial information and second space information, thereby reduces noise.

Noise is also probably due to discontinuous generation the between a pair of adjacent parameter frequency band.For example, when the first and second parameter band when adjacent one another are corresponding to the 3rd spatial information of the first parameter band with corresponding to the 4th spatial information of the second parameter band not simultaneously, discontinuous may the generation between the first and second parameter band.

In this case, can utilize the 4th spatial information compensation the 3rd spatial information or utilize the 3rd spatial information compensation the 4th spatial information, make the difference of the 3rd spatial information and the 4th spatial information to reduce, and can be reduced by the discontinuous noise that causes between the first and second parameter band.More specifically, at least one in the 3rd spatial information and the 4th spatial information can replace with the mean value of the 3rd spatial information and the 4th spatial information, thereby reduces noise.

By between a pair of consecutive frame or the discontinuous noise that causes between a pair of adjacent parameter frequency band can utilize the method except said method to reduce.

More specifically, each frame can be multiply by the window such as peaceful (Hanning) window of the Chinese, and " overlapping and interpolation " scheme can be put on the result of multiplication, make the change between frame to reduce.Perhaps, the output signal that is applied with many spatial informations can be smoothed, makes the change between a plurality of frames that can prevent output signal.

The spatial information that can utilize ICC and so on for example carries out following adjusting with the decorrelation between the sound channel in the DFT territory.

Can multiply by the degree that predetermined value is regulated decorrelation by the coefficient that makes the signal that is input to one to two (OTT) or two to three (TTT) frame.Predetermined value can be limited by following equation: (A+ (1-A*A) ^0.5*i), and wherein the A indication puts on the ICC value of the predetermined frequency band of OTT or TTT frame, and i indication imaginary part.Imaginary part can be positive or negative.

Predetermined value can be with the weighting factor according to the characteristic of signal, the characteristic of signal for example the signal of energy level, each frequency of signal energy response or apply the type of the frame of ICC value A.As the result of introducing weighting factor, the degree that can further regulate decorrelation, but and level and smooth or interpolation method between application of frame.

As above described with reference to figure 7, can be converted to the HRTF of frequency domain or header coherent pulse response (HRIR) by use and generate 3D reduce audio signal in frequency domain.

Perhaps, 3D reduction audio signal can generate by convolution HRIR in time domain and reduction audio signal.The 3D reduction audio signal that generates in frequency domain can be stayed in frequency domain, and do not carried out contrary territory conversion.

For convolution HRIR in time domain and reduction audio signal, can use finite impulse response (FIR) (FIR) wave filter or infinite impulse response (IIR) wave filter.

As mentioned above, can utilize to relate to the HRTF in frequency domain or the combination that is converted to the first method of the HRIR of frequency domain, the second method that relates to convolution HRIR in time domain or the first and second methods according to the encoding apparatus and decoding apparatus of the embodiment of the present invention and generate 3D reduction audio signal.

Fig. 8 to 11 illustrates bit stream according to an embodiment of the invention.

With reference to figure 8, bit stream comprises: comprise the multi-channel decoding information field that generates the multi-channel signal information needed, comprise the 3D that generates 3D reduction audio signal information needed and play up information field and comprise to utilize and be included in the information in the multi-channel decoding information field and be included in the header fields that 3D plays up the required header information of information in information field.Bit stream can only comprise that multi-channel decoding information field, 3D play up one or two in information field and header fields.

With reference to figure 9, the bit stream that contains the necessary supplementary of decode operation can comprise: comprise whole encoded signal header information the customized configuration header fields and comprise a plurality of frame data field about the supplementary of a plurality of frames.More specifically, each frame data field can comprise: comprise the frame header fields of header information of respective frame and the frame parameter data field that comprises the spatial information of respective frame.Perhaps, each in frame data field only can comprise the frame parameter data field.

Each in the frame parameter data field can comprise a plurality of modules, and each module comprises sign and supplemental characteristic.Module is to comprise the supplemental characteristic such as spatial information and the data set that improves necessary other data of signal tonequality such as the gain of reduction audio mixing and smoothed data.

If in the situation that receive about the module data by the information of frame header fields appointment without any additional mark, if the information by the appointment of frame header fields is further classified, if perhaps receive additional mark and data together with not by the information of frame header appointment, module data can not comprise any sign.

About the supplementary of 3D reduction audio signal, for example the HRTF coefficient information, can be included at least one in customized configuration header fields, frame header fields and frame parameter data field.

With reference to Figure 10, bit stream can comprise: comprise a plurality of multi-channel decoding information fields that generate the necessary information of multi-channel signal and comprise a plurality of 3D that generate the necessary information of 3D reduction audio signal and play up information field.

When received bit flowed, decoding device can be played up information field with multi-channel decoding information field or 3D and carries out decode operation and skip any multi-channel decoding information field and 3D that does not use and play up information field in decode operation.In this case, can determine which in information field multi-channel decoding information field and 3D play up and will be used to carry out decode operation according to the type of the signal that will reproduce.

In other words, in order to generate multi-channel signal, decoding device can be skipped 3D and play up information field, and reads the information that is included in the multi-channel decoding information field.On the other hand, in order to generate 3D reduction audio signal, decoding device can be skipped the multi-channel decoding information field, and reads and be included in 3D and play up information in information field.

Some the method for skipping in a plurality of fields in bit stream is as follows.

At first, the field length information about the bit size of field can be included in bit stream.In this case, can skip this field by skipping corresponding to the bit number of field bit size.Field length information can be arranged on the beginning of field.

The second, synchronization character can be arranged on end or the beginning of field.In this case, can skip this field by the position location field based on synchronization character.

The 3rd, if determine in advance and fixed the length of field, can skip this field by skipping corresponding to the data volume of the length of this field.Fixed field length information about field length can be included in bit stream or be stored in decoding device.

The 4th, can utilize two or more the combination in above-mentioned field skipping method to skip one of a plurality of fields.

It is to skip the necessary information of field that field such as field length information, synchronization character or fixed field length information is skipped information, it can be included in one of customized configuration header fields shown in Figure 9, frame header fields and frame parameter data field, maybe it can be included in field shown in Figure 9 field in addition.

For example, in order to generate multi-channel signal, decoding device can be skipped 3D with reference to the field length information, synchronization character or the fixed field length information that are arranged on each 3D and play up the beginning of information field and play up information field, and reads the information that is included in the multi-channel decoding information field.

On the other hand, in order to generate 3D reduction audio signal, decoding device can be skipped the multi-channel decoding information field with reference to field length information, synchronization character or the fixed field length information of the beginning that is arranged on each multi-channel decoding information field, and reads and be included in 3D and play up information in information field.

Bit stream can comprise that the data that indication is included in this bit stream are that the generation multi-channel signal is necessary or generating 3D reduces the necessary information of audio signal.

Yet, even bit stream does not comprise any spatial information such as CLD, and only comprise that generating 3D (for example reduces the necessary data of audio signal, the hrtf filter coefficient), also can generate the necessary decoding data of 3D reduction audio signal by utilization and reproduce multi-channel signal, and not need spatial information.

For example, obtain conduct about the stereo parameter of the spatial information of two sound channels from the reduction audio signal.Then, convert stereo parameter to about a plurality of sound channels to be reproduced spatial information, and generate multi-channel signal by being put on by the spatial information that conversion is obtained the reduction audio signal.

On the other hand, even only comprising, bit stream generates the necessary data of multi-channel signal, also can reproduce the reduction audio signal and not need additional decode operation, maybe can process to reproduce 3D reduction audio signal by utilizing additional hrtf filter to carry out 3D to the reduction audio signal.

Generate the necessary data of multi-channel signal and generate the necessary data of 3D reduction audio signal if bit stream comprises, can allow the user to determine to reproduce multi-channel signal or 3D reduction audio signal.

To describe with reference to corresponding syntax separately the method for skipping data in detail hereinafter.

Syntax 1 indication is take the method for frame as the unit decoded audio signal.

[syntax 1]

SpatialFrame()
	{
Framinglnfo()；
	bslndependencyFIag；
OttData()；
	TttData()；
SmgData()；
	TempShapeData()；
if[bsArbitraryDownmix){
	ArbitraryDownmixData()；
}
	if(bsResidualCoding){
ResidualData()；
	}
}

In syntax 1, Ottdata () and TttData () mean the module of recovering the necessary parameter of multi-channel signal (such as the spatial information that comprises CLD, ICC and CPC) from the reduction audio signal, and SmgData (), TempShapeData (), ArbitraryDownmixData () and ResidualData () mean the module of improving the necessary information of tonequality by correction coding contingent distorted signals of operating period.

For example, if the parameter such as CLD, ICC or CPC and the information that is included in modules A rbitraryDownmixData () are only used during decode operation, the module SmgData () and the TempShapeData () that are arranged between module TttData () and ArbitraryDownmixData () are unessential.Therefore, skip module SmgData () and TempShapeData () is efficient.

Describe with reference to following syntax 2 method of skipping according to an embodiment of the invention module in detail hereinafter.

[syntax 2]

:
	TttData()；
SkipData(){
	bsSkipBits；
}
	SmgData()；
TempShapeData()；
	if[bsArbitraryDownmix){
ArbitraryDownmixData()；
	}
:

With reference to syntax 2, before module SkipData () can being arranged on the module that will be skipped, and the bit size of the module that is skipped is designated as bsSkipBits in module SkipData ().

In other words, suppose that module SmgData () and TempShapeData () will be skipped, and the module SmgData () of combination and the bit size of TempShapeData () are 150, can be set to 150 by bsSkipBits and skip module SmgData () and TempShapeData ().

The method of skipping module of describing in detail according to another embodiment of the invention with reference to syntax 3 hereinafter.

[syntax 3]

:
	TttData()；
bsSkipSyncflag；
	SmgData()；
TempShapeData()；
	bsSkipSyncword；
if[bsArbitraryDownmix){
	ArbitraryDownmixData()；
}
	:

With reference to figure 3, can skip unnecessary module by using bsSkipSyncflag and bsSkipSyncword, bsSkipSyncflag is that to indicate the sign that whether uses synchronization character, bsSkipSyncword be the synchronization character that can be arranged on the end of the module that will be skipped.

More specifically, if will indicate that bsSkipSyncflag is arranged so that synchronization character can use, and indicates the one or more modules between bsSkipSyncflag and synchronization character bsSkipSyncword---be that module SmgData () and TempShapeData () can be skipped.

With reference to Figure 11, bit stream can comprise: comprise the multichannel header fields of reproducing the necessary header information of multi-channel signal, comprise the 3D that reproduces the necessary header information of 3D reduction audio signal and play up header fields and comprise and reproduce a plurality of multi-channel decoding information fields that multi-channel signal institute must data.

In order to reproduce multi-channel signal, decoding device can be skipped 3D and play up header fields, and from multichannel header fields and multi-channel decoding information field reading out data.

Skip method that 3D plays up header fields with above identical with reference to the described field skipping method of Figure 10, therefore, can skip its detailed description.

In order to reproduce 3D reduction audio signal, decoding device can be played up the header fields reading out data from multi-channel decoding information field and 3D.For example, decoding device can utilize the reduction audio signal that is included in the multi-channel decoding information field and the HRTF coefficient information that is included in 3D reduction audio signal to generate 3D reduction audio signal.

Figure 12 is according to an embodiment of the invention for the treatment of reducing arbitrarily the block diagram of the coding/decoding device of audio signal.With reference to Figure 12, reduce arbitrarily audio signal and be the reduction audio signal except the reduction audio signal that is generated by the multi-channel encoder 801 that is included in code device 800.The detailed description of the process identical with the embodiment of Fig. 1 will be omitted.

With reference to Figure 12, code device 800 comprises multi-channel encoder 801, spatial information synthesis unit 802 and comparing unit 803.

Multi-channel encoder 801 will be inputted multi-channel signal reduction audio mixing and become stereo or monophony reduction audio signal, and generate from the necessary fundamental space information of reduction audio signal recovery multi-channel signal.

Comparing unit 803 will reduce audio signal and compare with reducing arbitrarily audio signal, and the comparison-based result generates compensated information.Compensated information be compensation arbitrarily the reduction audio signal make any reduction audio signal to be converted into to approach the reduction audio signal necessary.Decoding device can utilize compensated information to compensate any reduction audio signal, and utilizes through any reduction audio signal of compensation and recover multi-channel signal.The multi-channel signal that recovers more is similar to original input multi-channel signal than the multi-channel signal that recovers from any reduction audio signal that is generated by multi-channel encoder 801.

Compensated information can be the reduction audio signal and reduce arbitrarily the poor of audio signal.Decoding device can compensate any down-mix audio signal in the Calais with the difference of reducing arbitrarily audio signal and any reduction audio signal mutually by reducing audio signal.

The reduction audio signal can be the reduction audio mixing gain of indication reduction audio signal and the difference of the energy level that reduces arbitrarily audio signal with the difference of reducing arbitrarily audio signal.

Can determine the gain of reduction audio mixing for each frequency band, each time/time slot and/or each sound channel.For example, the gain of part reduction audio mixing can be determined for each frequency band, and the gain of another part reduction audio mixing can be determined for each time slot.

The gain of reduction audio mixing can or be that each frequency band that reduces arbitrarily audio signal optimization is determined for each parameter band.Parameter band is the frequency interval that is applied with the spatial information of parameter type.

Can be with reduction audio signal and the residual quantity of reducing arbitrarily the energy level of audio signal.The resolution of quantization level of difference that quantizes the reduction audio signal and reduce arbitrarily the energy level of audio signal can and quantize the reduction audio signal and reduces arbitrarily the resolution of quantization level of the CLD between audio signal identical or different.In addition, reduction audio signal and the quantification of difference that reduces arbitrarily the energy level of audio signal can relate to and use all or part of of the quantization level that quantizes the reduction audio signal and reduce arbitrarily the CLD between audio signal.

Because reduction audio signal and the resolution of difference of energy level of reducing arbitrarily audio signal are generally lower than reduction audio signal and the resolution of reducing arbitrarily the CLD between audio signal, so compare with the resolution that quantizes to reduce audio signal and reduce arbitrarily the quantization level of the CLD between audio signal, the resolution that quantizes reduction audio signal and the quantization level of the difference of the energy level that reduces arbitrarily audio signal can have small value.

The compensation compensated information of reduction audio signal arbitrarily can be the extend information that comprises residual information, and its appointment can not utilize the component of the input multi-channel signal of any reduction audio signal or reduction audio mixing gain recovery.Decoding device can utilize extend information to recover to utilize the component of the input multi-channel signal of any reduction audio signal or reduction audio mixing gain recovery, thereby recovers hardly the signal that can distinguish with original input multi-channel signal.

The method that generates extend information is as follows.

Multi-channel encoder 801 can generate the information relevant with the component that reduces the input multi-channel signal that audio signal lacks as the first extend information.Decoding device can recover hardly the signal that can distinguish with original input multi-channel signal by the first extend information being applied to utilize reduction audio signal and basic spatial information to generate multi-channel signal.

Perhaps, multi-channel encoder 801 can utilize reduction audio signal and fundamental space information to recover multi-channel signal, and the difference of multi-channel signal that generates the multi-channel signal that recovers and original input is as the first extend information.

Comparing unit 803 can generate the component of the reduction audio signal that lacks with any reduction audio signal---namely can not utilize the component of the reduction audio signal of reduction audio mixing gain compensation---, and relevant information is as the second extend information.Decoding device can utilize any reduction audio signal and the second extend information to recover almost the signal that can not distinguish with the reduction audio signal.

In addition to the above methods, extend information also can utilize various residual error interpretation methods to generate.

The gain of reduction audio mixing and extend information both can be used as compensated information.More specifically, can obtain for the whole frequency band of reduction audio signal reduction audio mixing gain and extend information, and can be with them together as compensated information.Perhaps, the gain of reduction audio mixing can be used as the compensated information for a part of frequency band of reduction audio signal, and with the compensated information of extend information as another part frequency band of reduction audio signal.For example, extend information can be used as the compensated information of the low-frequency band of reduction audio signal, and will reduce the audio mixing gain as the compensated information of the high frequency band of reduction audio signal.

Also can with except the low-frequency band of reduction audio signal, be used as compensated information such as the peak value of appreciable impact tonequality or the relevant extend information of the fractional reduction audio signal recess.

The synthetic fundamental space information (for example, CLD, CPC, ICC and CTD) of spatial information synthesis unit 802 and compensated information, thereby span information.In other words, the spatial information that is sent to decoding device can comprise fundamental space information, the gain of reduction audio mixing and the first and second extend informations.

Spatial information can be included in bit stream together with reducing arbitrarily audio signal, and bit stream can be sent to decoding device.

Extend information and any reduction audio signal can utilize the audio coding method such as AAC method, MP3 method or BSAC method to encode.Extend information and any reduction audio signal can utilize identical audio coding method or different audio coding methods to encode.

If extend information and arbitrarily the reduction audio signal utilize identical audio coding method coding, decoding device can utilize single audio frequency coding/decoding method decode extend information and reduction audio signal arbitrarily.In this case, always can be decoded because reduce arbitrarily audio signal, so extend information also always can be decoded.Yet, generally be input to decoding device as pulse code modulation (PCM) (PCM) signal because reduce arbitrarily audio signal, the type that is used for the audio codec of any reduction of coding audio signal may not easily be identified, therefore, the type that is used for the audio codec of coding extend information may can not easily be identified.

Therefore, the audio codec information relevant with the type of the audio codec that is used for coding any reduction audio signal and extend information can be inserted into bit stream.

More specifically, audio codec information can be inserted the customized configuration header fields of bit stream.In this case, decoding device can extract audio codec information from the customized configuration header fields of bit stream, and uses the audio codec information decoding that extracts to reduce arbitrarily audio signal and extend information.

On the other hand, if reduce arbitrarily audio signal and extend information utilizes different coding methods to encode, extend information may not be decoded.In this case, because can not identify the end of extend information, so can not carry out further decode operation.

In order to address this problem, the audio codec information relevant with the type of the audio codec of reduction audio signal and extend information arbitrarily of being respectively used to encode can be inserted the customized configuration header fields of bit stream.Then, decoding device can read audio codec information from the customized configuration header fields of bit stream, and with the reading information extend information of decoding.If decoding device does not comprise the decoding unit of any decodable code extend information, may not further carry out the decoding of extend information, and can read the extend information information afterwards that is right after.

The audio codec information relevant with the type of the audio codec that is used for the coding extend information can be represented by the syntax elements of the customized configuration header fields that is included in bit stream.For example, audio coding decoding information can represent by 4 bit syntax elements bsResidualCodecType, as following table 1 indicating.

Table 1

Extend information not only can comprise residual information, also can comprise the sound channel extend information.The sound channel extend information is to have the more necessary information of multi-channel signal of multichannel with being expanded into by the multi-channel signal that utilizes the spatial information decoding to obtain.For example, the sound channel extend information can be that 5.1 sound channel signals or 7.1 sound channel signals are expanded into the 9.1 necessary information of sound channel signal.

Extend information can be included in bit stream, and bit stream can be sent to decoding device.Then, decoding device can compensate the reduction audio signal, or utilizes extend information to expand multi-channel signal.Yet decoding device can be skipped extend information, rather than extracts extend information from bit stream.For example, generate 3D reduction audio signal in the situation that the 3D reduction audio signal of utilization in being included in bit stream generates multi-channel signal or the utilization reduction audio signal in being included in bit stream, decoding device can be skipped extend information.

Skipping the method that is included in the extend information in bit stream can be with above identical with reference to one of described field skipping method of Figure 10.

For example, extend information can utilize at least one in the fixed bit size information of fixed bit size of the synchronization character of bit size information, the beginning that is attached to the field that comprises extend information or end of bit size of the beginning that is attached to the bit stream that comprises extend information and indication extension information and indication extension information to skip.Bit size information, synchronization character and fixed bit size information all can be included in bit stream.Also the fixed bit size information can be stored in decoding device.

With reference to Figure 12, decoding unit 810 comprises reduction audio

mixing compensating unit

811,3D rendering unit 815 and multi-channel decoder 816.

Reduction audio mixing compensating unit 811 utilizes the compensated information that is included in spatial information---and for example utilize the gain of reduction audio mixing or extend information to compensate any reduction audio signal.

3D rendering unit 815 generates demoder 3D reduction audio signal by the reduction audio signal execution 3D through compensation is played up operation.Multi-channel decoder 816 utilizes through the reduction audio signal of compensation and the fundamental space information that is included in spatial information and generates the 3D multi-channel signal.

Reduction audio mixing compensating unit 811 can compensate any reduction audio signal in the following manner.

If compensated information is the gain of reduction audio mixing, reduce audio mixing compensating unit 811 and utilize reduction audio mixing gain compensation to reduce arbitrarily the energy level of audio signal, make any reduction audio signal to be converted into and be similar to the signal that reduces audio signal.

If compensated information is the second extend information, reduces audio mixing compensating unit 811 and can utilize the second extend information compensation to reduce arbitrarily the component that audio mixing information lacks.

Multi-channel decoder 816 can generate multi-channel signal by prematrix M1, audio mixing matrix M 2 and rearmounted matrix M 3 sequentially being put on reduction audio mixing matrix signal.In this case, during the second extend information is used in audio mixing matrix M 2 is put on the reduction audio signal, compensation reduces audio signal.In other words, the second extend information can be used for compensating the reduction audio signal that has been applied with prematrix M1.

As mentioned above, can optionally compensate each in a plurality of sound channels by extend information being applied to generate multi-channel signal.For example, if extend information is put on the center channel of audio mixing matrix M 2, can be by L channel and the right channel component of extend information compensation reduction audio signal.If extend information is put on the L channel of audio mixing matrix M 2, can be by the left channel component of extend information compensation reduction audio signal.

The gain of reduction audio mixing and extend information both can be used as compensated information.For example, can utilize the extend information compensation low-frequency band of reduction audio signal arbitrarily, and can utilize reduction audio mixing gain compensation to reduce arbitrarily the high frequency band of audio signal.In addition, but also can utilize extend information compensation except the low-frequency band of reducing arbitrarily audio signal, reduce arbitrarily audio signal such as the peak value of appreciable impact tonequality or the sector of breakdown of recess.With will can be included in bit stream by the relevant information of the part of extend information compensation.Whether the reduction audio signal that indication is included in bit stream is whether information and the indication bit stream that reduces arbitrarily audio signal comprises that the information of compensated information can be included in bit stream.

For the reduction audio signal clipped wave that prevents from being generated by coding unit 800, can be with the reduction audio signal divided by predetermined gain.Predetermined gain can have quiescent value or dynamic value.

Reduction audio mixing compensating unit 811 can recover original reduction audio signal by utilizing predetermined gain to be compensated for as to prevent the reduction audio signal that slicing weakens.

Can easily reproduce any reduction audio signal by 811 compensation of reduction audio mixing compensating unit.Perhaps, also any reduction audio signal to be compensated can be input to 3D rendering unit 815, and can convert demoder 3D reduction audio signal to by 3D rendering unit 815.

With reference to Figure 12, reduction audio mixing compensating unit 811 comprises the first territory converter 812, compensation processor 813 and the second territory converter 814.

The territory that the first territory converter 812 will reduce arbitrarily audio signal converts predetermined domain to.Compensation processor 813 utilizes compensated information---for example, the gain of reduction audio mixing or extend information---compensate any reduction audio signal in predetermined domain.

The compensation of reduction audio signal can be carried out in the QMF/ hybrid domain arbitrarily.For this reason, the first territory converter 812 can be carried out the QMF/ hybrid analysis to any reduction audio signal.The first territory converter 812 can convert the territory that reduces arbitrarily audio signal to the territory except the QMF/ hybrid domain, for example, and the frequency domain such as DFT or FFT territory.The compensation of reduction audio signal also can be carried out in the territory except the QMF/ hybrid domain arbitrarily, for example, and frequency domain or time domain.

The second territory converter 814 converts the territory of any reduction audio signal through compensating to the territory identical with original any reduction audio signal.More specifically, the second territory converter 814 converts the territory through any reduction audio signal of compensation to the territory identical with original any reduction audio signal by oppositely carrying out by the performed territory conversion operations of the first territory converter 812.

For example, the second territory converter 814 can be by the QMF/ mixing is synthetic convert any reduction audio signal through compensation to time-domain signal to carrying out through any reduction audio signal of compensation.Equally, the second territory converter 814 can be carried out IDFT or IFFT to any reduction audio signal through compensation.

Be similar to 3D rendering unit 710 shown in Figure 7,3D rendering unit 815 can be played up operation in frequency domain, QMF/ hybrid domain or time domain, any reduction audio signal execution 3D through compensating.For this reason, this 3D rendering unit 815 can comprise territory converter (not shown).The territory converter converts the territory of any reduction audio signal through compensation to will carry out the territory that 3D plays up operation, or the territory of the signal that operation obtains is played up in conversion by 3D.

Wherein the territory of reduction audio signal can be with wherein 815 pairs of 3D rendering unit be identical or different through the territory that any reduction audio signal execution 3D of compensation plays up operation arbitrarily in compensation processor 813 compensation.

Figure 13 is the block diagram that reduces according to an embodiment of the invention audio mixing compensation/3D rendering unit 820.With reference to Figure 13, reduction audio mixing compensation/3D rendering unit 820 comprises that the first territory converter 821, the second territory converter 822, compensation/3D play up processor 823 and the 3rd territory converter 824.

Reduction audio mixing compensation/3D rendering unit 820 can carry out compensating operation to any reduction audio signal in individual domain and 3D plays up operation, thereby reduces the calculated amount of decoding device.

More specifically, the first territory converter 821 territory that will reduce arbitrarily audio signal converts to and wherein will carry out the first territory that compensating operation and 3D play up operation.The second territory converter 822 transformed space information, it comprises that generating the necessary fundamental space information of multi-channel signal reduces arbitrarily the necessary compensated information of audio signal with compensation, makes spatial information become applicable to the first territory.Compensated information can comprise at least one in the gain of reduction audio mixing and extend information.

For example, the second territory converter 822 can be mapped to frequency band with the compensated information corresponding to parameter band in the QMF/ hybrid domain, makes compensated information to become and easily is applicable to frequency domain.

The first territory can be frequency domain, QMF/ hybrid domain or the time domain such as DFT or FFT.Perhaps, the first territory can be the territory except the territory of statement herein.

In the transition period of compensated information, time delay can occur.In order to address this problem, the second territory converter 822 can be carried out the delay compensation operation, makes the territory of compensated information and the time delay between the first territory to be compensated.

Compensation/3D plays up processor 823 and utilizes the spatial information through changing to carry out compensating operation to any reduction audio signal in the first territory, then the signal that obtains by compensating operation is carried out 3D and plays up operation.Compensation/3D plays up processor 823 can be by playing up operation from different order execution compensating operation and the 3D of this paper statement.

Compensation/3D plays up processor 823 can play up operation to any reduction audio signal execution compensating operation and 3D simultaneously.For example, compensation/3D plays up processor 823 and can generate through the 3D of compensation reduction audio signal by with new filter coefficient, any reduction audio signal execution 3D in the first territory being played up operation, and this new filter coefficient is compensated information and common combination of playing up the existing filter coefficient that uses in operation at 3D.

The 3rd territory converter 824 will compensate/and territory that 3D plays up the 3D reduction audio signal that processor 823 generates converts frequency domain to.

Figure 14 is according to the block diagram of embodiments of the invention for the treatment of the decoding device 900 of compatibility reduction audio signal.With reference to Figure 14, decoding device 900 comprises the first multi-channel decoder 910, the compatible processing unit 920 of reduction audio mixing, the second

multi-channel decoder

930 and 3D rendering unit 940.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

Compatible reduction audio signal is can be by the reduction audio signal of two or more multi-channel decoder decodings.In other words, compatible reduction audio signal is at first for being scheduled to multi-channel decoder optimization, then can processing the reduction audio signal that operation converts the signal of optimizing for the multi-channel decoder except this predetermined multi-channel decoder to by compatibility.

With reference to Figure 14, suppose that the compatibility reduction audio signal of input is optimized for the first multi-channel decoder 910.In order to make the compatibility reduction audio signal of the second multi-channel decoder 930 decoding inputs, the compatible processing unit 920 of reduction audio mixing can be carried out the compatible operation of processing to the compatibility reduction audio signal of input, makes the compatibility reduction audio signal of input can be converted into the signal of optimizing for the second multi-channel decoder 930.The compatibility reduction audio signal that the first multi-channel decoder 910 is inputted by decoding generates the first multi-channel signal.The first multi-channel decoder 910 can be by only not needing spatial information to decode to generate multi-channel signal with the compatibility reduction audio signal of input.

The second multi-channel decoder 930 utilizes the reduction audio signal of being obtained by the compatibility processing operation of compatible processing unit 920 execution of reduction audio mixing to generate the second multi-channel signal.3D rendering unit 940 can be carried out 3D and plays up operation and generate demoder 3D reduction audio signal by the compatibility of being carried out by the compatible processing unit 920 of reduction audio mixing being processed reduction audio signal that operation obtains.

Can utilize the compatibility information such as inverse matrix, will convert the reduction audio signal of optimizing for the multi-channel decoder except predetermined multi-channel decoder to for the compatibility reduction audio signal of predetermined multi-channel decoder optimization.For example when having the first and second multi-channel encoders that utilize the different coding method and utilizing the first and second multi-channel decoder of different coding/coding/decoding method, code device can put on matrix the reduction audio signal that the first multi-channel encoder generates, thereby generates the compatibility reduction audio signal of optimizing for the second multi-channel decoder.Then, decoding device can put on inverse matrix the compatibility reduction audio signal that is generated by code device, thereby generates the compatibility reduction audio signal of optimizing for the first multi-channel decoder.

With reference to Figure 14, the compatible processing unit 920 of reduction audio mixing can utilize inverse matrix to carry out the compatible operation of processing to the compatibility reduction audio signal of input, thereby generates the reduction audio signal of optimizing for the second multi-channel decoder 930.

The relevant information of the inverse matrix of using with the compatible processing unit 920 of reduction audio mixing can be stored in decoding device 900 in advance, maybe can be included in the bit stream that code device transmits.In addition, to be included in reduction audio signal in incoming bit stream be to reduce arbitrarily audio signal or the information of compatible reduction audio signal can be included in incoming bit stream in indication.

With reference to Figure 14, the compatible processing unit 920 of reduction audio mixing comprises the first territory converter 921, compatible processor 922 and the second territory converter 923.

The territory of the compatibility reduction audio signal that the first territory converter 921 will be inputted converts predetermined domain to, and compatible processor 922 utilizes the compatibility information such as inverse matrix to carry out the compatible operation of processing, and makes the compatible reduction of the input audio signal in predetermined domain can be converted into the signal of optimizing for the second multi-channel decoder 930.

Compatible processor 922 can be carried out the compatible operation of processing in the QMF/ hybrid domain.For this reason, the first territory converter 921 can be carried out the QMF/ hybrid analysis to the compatibility reduction audio signal of input.Equally, the first territory converter 921 can convert the territory of the compatibility reduction audio signal of input to the territory except the QMF/ hybrid domain, for example, frequency domain such as DFT or FFT territory, and compatible processor 922 can be in the territory except the QMF/ hybrid domain---as carrying out the compatible operation of processing in frequency domain or time domain.

The territory that operates the compatibility reduction audio signal of obtaining is processed in the second territory converter 923 conversions by compatibility.More specifically, the second territory converter 923 can convert the territory identical with the compatible reduction of original input audio signal to process the territory that operates the compatibility reduction audio signal of obtaining by compatibility by oppositely carrying out by the performed territory conversion operations of the first territory converter 921.

For example, the second territory converter 923 can be by carrying out synthetic will the processing by compatibility of QMF/ hybrid domain and operate the compatibility of obtaining and reduce audio signal and convert time-domain signal to being processed compatibility reduction audio signal that operation obtains by compatibility.Perhaps, the second territory converter 923 can be carried out IDFT or IFFT to the compatibility reduction audio signal of being obtained by the compatible processing operation.

3D rendering unit 940 can be played up operation in frequency domain, QMF/ hybrid domain or time domain, compatibility reduction audio signal execution 3D that obtained by compatibility processing operation.For this reason, this 3D rendering unit 940 can comprise territory converter (not shown).The territory of the compatibility reduction audio signal that the territory converter will be inputted converts to wherein will carry out the territory that 3D plays up operation, or the territory that operates the signal that obtains is played up in conversion by 3D.

Wherein compatible processor 922 is carried out compatible territory of processing operation and can be carried out 3D with 3D rendering unit 940 wherein to play up the territory of operation identical or different.

Figure 15 is the block diagram that reduces according to an embodiment of the invention the compatible processing/3D rendering unit 950 of audio mixing.With reference to Figure 15, the compatible processing/3D rendering unit 950 of reduction audio mixing comprises that the first territory converter 951, the second territory converter 952, compatibility/3D play up processor 953 and the 3rd territory converter 954.

Compatible processing/3D the rendering unit 950 of reduction audio mixing is carried out compatible processing operation in individual domain and 3D plays up operation, thereby reduces the calculated amount of decoding device.

Compatibility that the first territory converter 951 will be inputted reduction audio signal is converted to wherein will be carried out compatible the processing and operate and 3D plays up the first territory of operation.The second territory converter 952 transformed space information and compatibility informations, for example inverse matrix, make spatial information and compatibility information to become and be applicable to the first territory.

For example, the second territory converter 952 can be mapped to frequency domain with the inverse matrix corresponding to parameter band in the QMF/ hybrid domain, makes inverse matrix can easily be applicable to frequency domain.

The first territory can be frequency domain, QMF/ hybrid domain or the time domain such as DFT or FFT territory.Perhaps, the first territory can be the territory except the territory of statement herein.

In the transition period of spatial information and compatibility information, but time of origin postpones.

In order to address this problem, the second territory converter 952 can be carried out the delay compensation operation, makes the territory of spatial information and compensated information and the time delay between the first territory to be compensated.

Compatibility/3D plays up processor 953 and utilizes through the compatibility information of conversion the compatible reduction of the input in the first territory audio signal is carried out the compatible operation of processing, and then plays up operation to processes the compatibility reduction audio signal execution 3D that operation obtains by compatibility.Compatibility/3D plays up processor 953 can be by playing up operation from the compatible processing operation of different order execution and the 3D of this paper statement.

Compatibility/3D plays up processor 953 and can be simultaneously the compatibility reduction audio signal of input be carried out and compatiblely processed operation and 3D plays up operation.For example, compatibility/3D plays up processor 953 can generate 3D reduction audio signal by with new filter coefficient, the compatibility of the input in the first territory reduction audio signal execution 3D being played up operation, and this new filter coefficient is compatibility information and the combination of usually playing up at 3D the existing filter coefficient that uses in operation.

The 3rd territory converter 954 converts the territory that compatibility/3D plays up the 3D reduction audio signal that processor 953 generates to frequency domain.

Figure 16 is used for eliminating the block diagram of the decoding device of crosstalking according to embodiments of the invention.With reference to Figure 16, decoding device comprises bit split cells 960, reduction

audio mixing demoder

970,3D rendering unit 980 and cross-talk cancellation unit 990.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

3D reduction audio signal by 980 outputs of 3D rendering unit can be by headphone reproduction.Yet by away from user's loudspeaker reproduction the time, crosstalking between sound channel probably occurs when 3D reduction audio signal.

Therefore, decoding device can comprise the cross-talk cancellation unit 990 of 3D reduction audio signal being carried out the elimination operation of crosstalking.

Decoding device can be carried out sound field and process operation.

Sound field is processed the sound field information of using in operation, that is, sign wherein will be reproduced the information in the space of 3D reduction audio signal, can be included in the incoming bit stream that is transmitted by code device, or can be selected by decoding device.

Incoming bit stream can comprise reverberation time information.Can be controlled at sound field according to reverberation time information and process the wave filter that uses in operation.

The reverberation part of forward part and back can differentially be carried out sound field processing operation for morning.For example, early forward part can utilize the FIR wave filter to process, and the reverberation of back part can utilize iir filter to process.

More specifically, can be by carrying out convolution operation with the FIR wave filter or operate by carrying out multiply operation and results conversion to the time domain of multiply operation is come forward part execution morning sound field is processed in time domain in time domain.Sound field is processed operation and can the reverberation to the back partly be carried out in time domain.

The present invention can be embodied as the computer-readable code that writes on computer readable recording medium storing program for performing.Computer readable recording medium storing program for performing can be the recording unit of data any type of storing in the computer-readable mode wherein.The example of computer readable recording medium storing program for performing comprises ROM, RAM, CD-ROM, tape, floppy disk, optical data storage, the carrier wave data transmission of the Internet (for example, by).Computer readable recording medium storing program for performing can be distributed on a plurality of computer systems that are connected to network, make computer-readable code to write or from its execution to it in the mode of disperseing.Realize that function program required for the present invention, code and code segment can easily be explained by those of ordinary skill in the art.

As mentioned above, according to the present invention, coding has the multi-channel signal of 3D effect expeditiously, and to recover adaptively with reproducing audio signal according to the characteristic of reproducing environment with optimum tonequality be possible.

Industrial applicibility

Other are realized in the scope of following claim.For example, can be applied to various applications and various product according to marshalling of the present invention, data decoding and entropy decoding.The storage medium of the storage data of application one aspect of the present invention within the scope of the invention.

Claims

1. one kind is used for decoding from the method for the signal of incoming bit stream, comprising:

Extract the reduction audio signal from described incoming bit stream;

Extract spatial information and reduction audio mixing information from described incoming bit stream, whether the described reduction audio signal of described reduction audio mixing information indication is to reduce arbitrarily audio signal;

If the described reduction audio signal of described reduction audio mixing information indication is to reduce arbitrarily audio signal, extract the necessary compensated information of the described any reduction audio signal of compensation from described incoming bit stream, and utilize the described any reduction audio signal of described compensated information compensation; And

Multiply each other generating three-dimensional (3D) reduction audio signal by any reduction audio signal, described spatial information and the filter coefficient through compensation that makes frequency domain,

Wherein:

Described any reduction audio signal is the reduction audio signal except the reduction audio signal that is generated by code device,

Described spatial information is reduced at multi-channel signal to be determined when audio mixing becomes the reduction audio signal that is generated by code device,

Described compensated information comprises the information relevant with the difference of the reduction audio signal that is generated by code device and described any reduction audio signal.

2. the method for claim 1, is characterized in that, described compensated information comprises the reduction audio mixing gain information relevant with the ratio of the energy level of the energy level of the reduction audio signal that is generated by code device and described any reduction audio signal.

3. the method for claim 1, is characterized in that, described compensated information with from quantize about the different resolution of the spatial information of a plurality of sound channels.

4. the method for claim 1, is characterized in that, the described compensation of described any reduction audio signal is comprised:

Described any reduction audio signal is converted to the second territory from the first territory;

Utilize the described any reduction audio signal in described compensated information compensation described the second territory; And

Described any reduction audio signal through compensation is converted to described the first territory from described the second territory.

5. the method for claim 1, is characterized in that, described generation 3D reduction audio signal comprises:

Described any reduction audio signal through compensation is converted to will carry out the 3D that 3D plays up operation and play up the territory; And

In described 3D plays up the territory, described any reduction audio signal through compensation is carried out 3D and play up operation.

6. one kind is used for decoding from the device of the signal of incoming bit stream, comprising:

The bit split cells, it extracts reduction audio signal, spatial information and reduction audio mixing information from described incoming bit stream, whether the described reduction audio signal of described reduction audio mixing information indication is to reduce arbitrarily audio signal, if the described reduction audio signal of described reduction audio mixing information indication is to reduce arbitrarily audio signal, described bit split cells extracts the necessary compensated information of the described any reduction audio signal of compensation from described incoming bit stream;

Reduction audio mixing compensating unit, it utilizes the described any reduction audio signal of described compensated information compensation; And

The 3D rendering unit, it multiplies each other to generate 3D reduction audio signal by any reduction audio signal, described spatial information and the filter coefficient through compensation that makes frequency domain,

Wherein:

7. device as claimed in claim 6, is characterized in that, described compensated information comprises the reduction audio mixing gain information relevant with the ratio of the energy level of the energy level of the reduction audio signal that is generated by code device and described any reduction audio signal.

8. device as claimed in claim 6, is characterized in that, described compensated information with from quantize about the different resolution of the spatial information of a plurality of sound channels.

9. device as claimed in claim 6, is characterized in that, described reduction audio mixing compensating unit comprises:

The first territory converting unit is used for described any reduction audio signal is converted to the second territory from the first territory;

Compensation processor is used for utilizing described compensated information to compensate described any reduction audio signal in described the second territory; And

The second territory converting unit is used for described any reduction audio signal through compensation is converted to described the first territory from described the second territory.

10. device as claimed in claim 6, it is characterized in that, described 3D rendering unit is converted to the 4th territory with described any reduction audio signal through compensation from the 3rd territory, in described the 4th territory, described any reduction audio signal through compensation is carried out 3D and play up operation, and will play up the signal that operation obtains by described 3D and be converted to described the 3rd territory from described the 4th territory.