CN101385076A

CN101385076A - Apparatus and method for encoding/decoding signal

Info

Publication number: CN101385076A
Application number: CNA2007800045157A
Authority: CN
Inventors: 郑亮源; 房熙锡; 吴贤午; 金东秀; 林宰显
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2006-02-07
Filing date: 2007-02-07
Publication date: 2009-03-11
Anticipated expiration: 2027-02-07
Also published as: CN101379552B; CN101379552A; CN101379555A; CN101379553B; CN101385075A; CN101379554A; CN101385075B; CN101385076B; CN101385077B; CN101385077A; CN101379553A; CN101379554B; CN101379555B

Abstract

An encoding method and apparatus and a decoding method and apparatus are provided. The decoding method includes extracting a three-dimensional (3D) down-mix signal and spatial information from an input bitstream, removing 3D effects from the 3D down-mix signal by performing a 3D rendering operation on the 3D down-mix signal, and generating a multi-channel signal using the spatial information and a down-mix signal obtained by the removal. Accordingly, it is possible to efficiently encode multi-channel signals with 3D effects and to adaptively restore and reproduce audio signals with optimum sound quality according to the characteristics of a reproduction environment.

Description

The apparatus and method that are used for encoding/decoding signal

Technical field

The present invention relates to coding/decoding method and coding/decoding device, make it possible to produce three-dimensional (3D) acoustic coding/decoding device but relate in particular to audio signal, and the coding/decoding method that utilizes this coding/decoding device.

Background technology

Code device reduces the signal that audio mixing becomes to have less sound channel with multi-channel signal, and will be sent to decoding device through the signal of reduction audio mixing.Then, decoding device recovers multi-channel signal from the signal through the reduction audio mixing, and uses the multi-channel signal that recovers as the three or more loudspeaker reproduction of 5.1 channel loudspeakers and so on.

Multi-channel signal can be reproduced by 2 channel loudspeakers such as earphone.In this case, feel by the sound of 2 channel loudspeakers output as from three or more sound sources reproductions, be necessary to develop and encode or the decoding multi-channel signal makes it possible to produce three-dimensional (3D) treatment technology of 3D effect in order to make the user.

Summary of the invention

Technical matters

The invention provides a kind of coding/decoding device and coding/decoding method that can in various reproducing environment, reproduce multi-channel signal by handling signal expeditiously with 3D effect.

Technical solution

According to an aspect of the present invention, provide a kind of coding/decoding method that recovers multi-channel signal, this coding/decoding method comprises: extract three-dimensional (3D) reduction audio signal and spatial information from incoming bit stream; Play up operation from 3D reduction audio signal removal 3D effect by 3D reduction audio signal being carried out 3D; And utilize spatial information and generate multi-channel signal by the reduction audio signal that removal is obtained.

According to another aspect of the present invention, provide a kind of coding/decoding method that recovers multi-channel signal, this coding/decoding method comprises: extract 3D reduction audio signal and spatial information from incoming bit stream; Utilize 3D reduction audio signal and spatial information to generate multi-channel signal; And play up operation and come to remove 3D effect from multi-channel signal by multi-channel signal being carried out 3D.

According to another aspect of the present invention, the coding method that provides a kind of coding to have the multi-channel signal of a plurality of sound channels, this coding method comprises: multi-channel signal is encoded into the reduction audio signal with less sound channel; Generation is about the spatial information of a plurality of sound channels; By being played up operation, reduction audio signal execution 3D generates 3D reduction audio signal; And generation comprises the bit stream of 3D reduction audio signal and spatial information.

According to another aspect of the present invention, the coding method that provides a kind of coding to have the multi-channel signal of a plurality of sound channels, this coding method comprises: multi-channel signal is carried out 3D play up operation; To play up the multi-channel signal that obtains of operation by 3D and be encoded into 3D reduction audio signal with less sound channel; Generation is about the spatial information of a plurality of sound channels; And generation comprises the bit stream of 3D reduction audio signal and spatial information.

According to another aspect of the present invention, provide a kind of decoding device that recovers multi-channel signal, this decoding device comprises: the bit split cells, and it extracts encoded 3D reduction audio signal and spatial information from incoming bit stream; Reduction audio mixing demoder, it is decoded to encoded 3D reduction audio signal; The 3D rendering unit, it is by carrying out 3D through the 3D of decoding reduction audio signal and play up operation and come from removing 3D effect through the 3D of decoding reduction audio signal what being carried out by reduction audio mixing demoder that decoding obtains; And multi-channel decoder, it utilizes spatial information and carries out the reduction audio signal of removing and obtaining by the 3D rendering unit and generates multi-channel signal.

According to another aspect of the present invention, provide a kind of decoding device that recovers multi-channel signal, this decoding device comprises: the bit split cells, and it extracts encoded 3D reduction audio signal and spatial information from incoming bit stream; Reduction audio mixing demoder, it is decoded to encoded 3D reduction audio signal; Multi-channel decoder, it utilizes spatial information and carries out the 3D reduction audio signal of decoding and obtaining by reduction audio mixing demoder and generates multi-channel signal; And the 3D rendering unit, it comes to remove 3D effect from multi-channel signal by multi-channel signal execution 3D is played up operation.

According to another aspect of the present invention, provide a kind of coding to have the code device of the multi-channel signal of a plurality of sound channels, this code device comprises: the multi-channel encoder device, and it is encoded into reduction audio signal with less sound channel with multi-channel signal and generates spatial information about a plurality of sound channels; The 3D rendering unit, it generates 3D reduction audio signal by reduction audio signal execution 3D is played up operation; Reduction audio mixing scrambler, its coding 3D reduction audio signal; And the bit packaged unit, its generation comprises the encoded 3D reduction audio signal and the bit stream of spatial information.

According to another aspect of the present invention, the code device that provides a kind of coding to have the multi-channel signal of a plurality of sound channels, this code device comprises: the 3D rendering unit, it is carried out 3D to multi-channel signal and plays up operation; The multi-channel encoder device, it will be played up the multi-channel signal that obtains of operation by 3D and be encoded into the 3D reduction audio signal with less sound channel and generate spatial information about a plurality of sound channels; Reduction audio mixing scrambler, its coding 3D reduction audio signal; And the bit packaged unit, its generation comprises the encoded 3D reduction audio signal and the bit stream of spatial information.

According to another aspect of the present invention, provide a kind of bit stream, it comprises: data field, and it comprises the information about 3D reduction audio signal; The filter information field, it comprises that sign is used to generate the filter information that 3D reduces the wave filter of audio signal; First header fields, it comprises whether indication filter information field comprises the information of filter information; Second header fields, what it comprised that indication filter information field comprises is the information of the coefficient or the coefficient of the inverse filter of wave filter of wave filter; And the spatial information field, it comprises the spatial information about a plurality of sound channels.

According to another aspect of the present invention, provide a kind of any computer readable recording medium storing program for performing of computer program that is used for carrying out above-mentioned coding/decoding method or above-mentioned coding method that has.

Beneficial effect

According to the present invention, can encode efficiently has the multi-channel signal of 3D effect, and recovers adaptively and reproducing audio signal with optimum tonequality according to the characteristic of reproducing environment.

Brief Description Of Drawings

Fig. 1 is the block diagram of coding/decoding device according to an embodiment of the invention;

Fig. 2 is the block diagram of code device according to an embodiment of the invention;

Fig. 3 is the block diagram of decoding device according to an embodiment of the invention;

Fig. 4 is the block diagram of code device according to another embodiment of the invention;

Fig. 5 is the block diagram of decoding device according to another embodiment of the invention;

Fig. 6 is the block diagram of decoding device according to another embodiment of the invention;

Fig. 7 is the block diagram of three-dimensional according to an embodiment of the invention (3D) rendering device;

Fig. 8 to 11 illustrates bit stream according to an embodiment of the invention;

Figure 12 is the block diagram that is used to handle the coding/decoding device of any reduction audio signal according to embodiments of the invention;

Figure 13 is the block diagram that reduces audio signal compensation/3D rendering unit according to an embodiment of the invention arbitrarily;

Figure 14 is used to handle the compatible block diagram that reduces the decoding device of audio signal according to embodiments of the invention;

Figure 15 is the block diagram that reduces the compatible processing/3D rendering unit of audio mixing according to an embodiment of the invention; And

Figure 16 is the block diagram that is used to eliminate the decoding device of crosstalking according to embodiments of the invention.

Preferred forms of the present invention

Hereinafter will the present invention more fully be described with reference to the accompanying drawing that exemplary embodiment of the present invention is shown.Fig. 1 is the block diagram of coding/decoding device according to an embodiment of the invention.With reference to figure 1, coding unit 100 comprises multi-channel encoder device 110, three-dimensional (3D) rendering unit 120, reduction audio mixing scrambler 130 and bit packaged unit 140

Multi-channel encoder device 110 will have the reduction audio signal of multichannel information reduction audio mixing one-tenth such as stereo or monophonic signal of a plurality of sound channels, and generate the spatial information about the sound channel of this multi-channel signal.Needing spatial information is in order to recover multi-channel signal from the reduction audio signal.

The example of spatial information comprises: levels of channels poor (CLD), the sound channel predictive coefficient (CPC) of difference of indicating the energy level of a pair of sound channel---promptly be used for generating the sound channel mistiming (CTD) in the time interval between correlativity (ICC) between the predictive coefficient of 3 sound channel signals, the sound channel of correlativity between a pair of sound channel of indication and a pair of sound channel based on 2 sound channel signals.

3D rendering unit 120 generates 3D reduction audio signal based on the reduction audio signal.3D reduction audio signal can be 2 sound channel signals with three or more directivity, therefore can and have 3D effect by the reproduction of 2 channel loudspeakers such as earphone.In other words, 3D reduction audio signal can be reproduced by 2 channel loudspeakers, make the user feel 3D reduction audio signal seem from the sound source with three or more sound channels reproduce the same.The direction of sound source can be determined based in the difference of the phase place of the time interval between poor, two sound of the intensity of two sound that are input to two ears respectively and two sound at least one.Therefore, how 3D rendering unit 120 can utilize its sense of hearing to determine that the 3D position of sound source converts the reduction audio signal to 3D reduction audio signal based on the mankind.

3D rendering unit 120 can generate 3D reduction audio signal by utilizing filter filtering reduction audio signal.In this case, can be by external source with the wave filter relevant information---be input to 3D rendering unit 120 as filter coefficient.3D rendering unit 120 can utilize the spatial information that is provided by multi-channel encoder device 110 to generate 3D reduction audio signal based on the reduction audio signal.More specifically, 3D rendering unit 120 can be by utilizing spatial information will reduce multi-channel signal that audio signal converts imagination to and the multi-channel signal of this imagination of filtering converts the reduction audio signal to 3D reduction audio signal.

3D rendering unit 120 can generate 3D reduction audio signal by utilizing header related transfer function (HRTF) filter filtering reduction audio signal.

HRTF is a kind of transport function, and it describes the transmission of sound wave between the sound source of optional position and the ear-drum, and returns according to the direction of sound source and the value of height change.If utilize HRTF filtering not have the signal of directivity, can hear that then this signal is as reproducing from certain direction.

3D rendering unit 120 can be carried out 3D and play up operation in the frequency domain in for example discrete Fourier transform (DFT) (DFT) territory or fast Fourier transform (FFT) territory and so on.In this case, 3D rendering unit 120 can be carried out DFT or FFT before 3D plays up operation, perhaps can play up the operation back at 3D and carry out contrary DFT (IDFT) or contrary FFT (IFFT).

3D rendering unit 120 can be carried out 3D and play up operation in quadrature mirror filter (QMF)/hybrid domain.In this case, 3D rendering unit 120 can be carried out QMF/ hybrid analysis and synthetic operation before or after 3D plays up operation.

3D rendering unit 120 can be carried out 3D and play up operation in time domain.3D rendering unit 120 can determine that will carry out 3D in which territory plays up operation according to the functipnal capability of required tonequality and coding/decoding device.

Reduction audio mixing scrambler 130 codings are by the reduction audio signal of multi-channel encoder device 110 outputs or the 3D reduction audio signal of being exported by 3D rendering unit 120.Reduction audio mixing scrambler 130 can utilize the audio coding method such as advanced audio decoding (AAC) method, MPEG layer 3 (MP3) method or bit slice algorithm decoding (BSAC) method to encode by the reduction audio signal of multi-channel encoder device 110 outputs or the 3D reduction audio signal of being exported by 3D rendering unit 120.

Non-3D reduction audio signal of reduction audio mixing scrambler 130 codifieds or 3D reduction audio signal.In this case, encoded non-3D reduction audio signal and encoded 3D reduction audio signal both can be included in the bit stream to be transmitted.

Bit packaged unit 140 is based on spatial information and or encoded non-3D reduces audio signal or encoded 3D reduction audio signal generates bit stream.

The bit stream that is generated by bit packaged unit 140 can comprise that spatial information, indication are included in reduction audio mixing identification information that reduction audio signal right and wrong 3D reduction audio signal in the bit stream still is a 3D reduction audio signal and the sign information (for example, HRTF coefficient information) by 3D rendering unit 120 employed wave filters.

In other words, the bit stream by 140 generations of bit packaged unit can comprise that the non-3D that also handles without 3D reduces audio signal and handles at least one of operating in the scrambler 3D reduction audio signal of obtaining and the reduction audio mixing identification information that identifies the type that is included in the reduction audio signal in the bit stream by the 3D that is carried out by code device.

Can select or determine that according to the characteristic of the ability of coding/decoding device shown in Figure 1 and reproducing environment in non-3D reduction audio signal and the scrambler 3D reduction audio signal which will be included in the bit stream that is generated by bit packaged unit 140 by the user.

The HRTF coefficient information can comprise the contrafunctional coefficient by 3D rendering unit 120 employed HRTF.The HRTF coefficient information can only comprise the brief information by the coefficient of 3D rendering unit 120 employed HRTF, for example, and the envelope information of HRTF coefficient.Be sent to decoding device if will comprise the bit stream of the contrafunctional coefficient of HRTF, then decoding device does not need to carry out HRTF coefficient conversion operations, therefore can reduce the calculated amount of decoding device.

The bit stream that is generated by bit packaged unit 140 also can comprise the information about the energy variation in the signal that is caused by the filtering based on HRTF, that is, about the energy of the difference of the energy of signal that will filtering and the energy of the signal of filtering or signal that will filtering and the information of the ratio of the energy of the signal of filtering.

The bit stream that is generated by bit packaged unit 140 also can comprise indication, and whether it comprises the information of HRTF coefficient.If the HRTF coefficient is included in the bit stream that is generated by bit packaged unit 140, then this bit stream also can comprise indication it comprises that coefficient by 3D rendering unit 120 employed HRTF still is the information of contrafunctional coefficient of HRTF.

Comprise bit split cells 210, reduction

audio mixing demoder

220,3D rendering unit 230 and multi-channel decoder 240 with reference to figure 1, the first decoding unit 200.

Bit split cells 210 receives incoming bit stream from coding unit 100, and extracts encoded reduction audio signal and spatial information from this incoming bit stream.220 pairs of encoded reduction audio signal of reduction audio mixing demoder are decoded.Reduction audio mixing demoder 220 can utilize the audio signal decoding method such as AAC method, MP3 method or BSAC method to come encoded reduction audio signal is decoded.

As mentioned above, the encoded reduction audio signal of extracting from incoming bit stream can be that encoded non-3D reduction audio signal or encoded, scrambler 3D reduce audio signal.The encoded reduction audio signal that indication is extracted from incoming bit stream is that encoded non-3D reduction audio signal or information encoded, scrambler 3D reduction audio signal can be included in the incoming bit stream.

If the encoded reduction audio signal of extracting from incoming bit stream is a scrambler 3D reduction audio signal, then encoded reduction audio signal can easily be reproduced after by 220 decodings of reduction audio mixing demoder.

On the other hand, if the encoded reduction audio signal right and wrong 3D reduction audio signal of from incoming bit stream, extracting, then encoded reduction audio signal can be by 220 decodings of reduction audio mixing demoder, and can play up operation by the 3D that is carried out by the 3rd rendering unit 233 by the reduction audio signal that decoding is obtained and convert demoder 3D reduction audio signal to.Demoder 3D reduction audio signal can easily be reproduced.

3D rendering unit 230 comprises first renderer 231, second renderer 232 and the 3rd renderer 233.First renderer 231 generates the reduction audio signal by the scrambler 3D reduction audio signal execution 3D that is provided by reduction audio mixing demoder 220 is played up operation.For example, first renderer 231 can generate non-3D reduction audio signal by remove 3D effect from scrambler 3D reduction audio signal.The 3D effect of scrambler 3D reduction audio signal may not be removed fully by first renderer 231.In this case, the reduction audio signal by 231 outputs of first renderer can have identical 3D effect.

First renderer 231 can convert the 3D reduction audio signal that is provided by reduction audio mixing demoder 220 inverse filter of the 3D rendering unit 120 employed wave filters that use coding unit 100 to the reduction audio signal of 3D effect from its removal.About being included in the incoming bit stream by 3D rendering unit 120 employed wave filters or by the information of the inverse filter of 3D rendering unit 120 employed wave filters.

By 3D rendering unit 120 employed wave filters can be hrtf filter.In this case, also can be included in the incoming bit stream by the coefficient of coding unit 100 employed HRTF or the contrafunctional coefficient of HRTF.If the coefficient by cell encoder 100 employed HRTF is included in the incoming bit stream, then the HRTF coefficient can be reversed and change, and can play up the result that operating period uses this inverse conversion at the 3D that is carried out by first renderer 231.If the contrafunctional coefficient by coding unit 100 employed HRTF is included in the incoming bit stream, then they can be played up operating period at the 3D that is carried out by first renderer 231 and easily use, and do not carry out any inverse conversion operation.In this case, can reduce the calculated amount of first decoding device 100.

Incoming bit stream also can comprise filter information (for example, whether indication is included in information in the incoming bit stream by the coefficient of coding unit 100 employed HRTF) and indicate this filter information whether to be reversed the information of changing.

Multi-channel decoder 240 is based on generate the 3D multi-channel signal with three or more sound channels from the reduction audio signal of its removal 3D effect and the spatial information that extracts from incoming bit stream.

Second renderer 232 can generate the 3D reduction audio signal with 3D effect by the reduction audio signal execution 3D that removes 3D effect from it is played up operation.In other words, first renderer 231 is removed 3D effect from the scrambler 3D reduction audio signal that is provided by reduction audio mixing demoder 220.Afterwards, second renderer 232 can utilize the wave filter of first decoding device, by removing the reduction audio signal of obtaining and carry out 3D and play up operation and generate the desired combination 3D reduction audio signal with 3D effect of first decoding device 200 being carried out by first renderer 231.

First decoding device 200 can comprise two or more the renderer that wherein is combined with in first, second and the 3rd renderer 231,232 and 233 of carrying out same operation.

The bit stream that is generated by coding unit 100 can be imported into second decoding device 300 with structure different with first decoding device 200.Second decoding device 300 can generate 3D reduction audio signal based on the reduction audio signal that is included in the bit stream of its input.

More specifically, second decoding device 300 comprises bit split cells 310, reduction

audio mixing demoder

320 and 3D rendering unit 330.Bit split cells 310 receives incoming bit stream from coding unit 100, and extracts encoded reduction audio signal and spatial information from this incoming bit stream.320 pairs of encoded reduction audio signal of reduction audio mixing demoder are decoded.330 pairs of reduction audio signal through decoding of 3D rendering unit are carried out 3D and are played up operation, make that the reduction audio signal through decoding can be converted into 3D reduction audio signal.

Fig. 2 is the block diagram of code device according to an embodiment of the invention.With reference to figure 2, this code device comprises

rendering unit

400 and 420 and multi-channel encoder device 410.The detailed description of the cataloged procedure identical with the embodiment of Fig. 1 will be omitted.

With reference to figure 2,

3D rendering unit

400 and 420 can be separately positioned on the front and back of multi-channel encoder device 410.Therefore, multi-channel signal can carry out 3D by 3D rendering unit 400 to be played up, and then, the multi-channel signal of playing up through 3D can reduce audio signal thereby generate through pretreated, scrambler 3D by multi-channel encoder device 410 codings.Perhaps, multi-channel signal can reduce audio mixing by multi-channel encoder device 410, then, can carry out 3D by 3D rendering unit 420 and plays up through the signal of reduction audio mixing, thereby generate through aftertreatment, scrambler reduction audio signal.

The indication multi-channel signal carries out the information that 3D plays up and can be included in the bit stream to be transmitted after reduction still reduces audio mixing before the audio mixing.

3D rendering unit

400 and 420 boths can be set at the front or the back of multi-channel encoder device 410.

Fig. 3 is the block diagram of decoding device according to an embodiment of the invention.With reference to figure 3, this decoding device comprises

3D rendering unit

430 and 450 and multi-channel decoder 440.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

With reference to figure 3,

3D rendering unit

430 and 450 can be separately positioned on the front and back of multi-channel decoder 440.3D rendering unit 430 can be removed 3D effect from scrambler 3D reduction audio signal, and will be input to multi-channel decoder 430 by the reduction audio signal that removal is obtained.Then, multi-channel decoder 430 decodable codes are to the reduction audio signal of its input, thereby generation is through pretreated 3D multi-channel signal.Perhaps, multi-channel decoder 430 can recover multi-channel signal from encoded 3D reduction audio signal, and 3D rendering unit 450 can remove 3D effect from the multi-channel signal that is recovered, thereby generates through the 3D of aftertreatment multi-channel signal.

3D plays up operation and the operation of reduction audio mixing subsequently generates if the scrambler 3D that is provided by code device reduction audio signal is by carrying out, and then scrambler 3D reduction audio signal can be decoded by carrying out multi-channel decoding operation and 3D subsequently to play up operation.On the other hand, if scrambler 3D reduction audio signal has been played up operation and generated by carrying out operation of reduction audio mixing and 3D subsequently, then scrambler 3D reduction audio signal can be decoded by carrying out 3D to play up operation and the operation of multi-channel decoding subsequently.

Can extract the encoded 3D reduction audio signal of indication from the bit stream that code device transmits is by play up the information that operation is obtained before reducing the audio mixing operation or at reduction audio mixing operation back execution 3D.

3D rendering unit

430 and 450 boths can be set at the front or the back of multi-channel decoder 440.

Fig. 4 is the block diagram of code device according to another embodiment of the invention.With reference to figure 4, code device comprises

multi-channel encoder device

500,3D rendering unit 510, reduction audio mixing scrambler 520 and bit packaged unit 530.The detailed description of the cataloged procedure identical with the embodiment of Fig. 1 will be omitted.

With reference to figure 4, multi-channel encoder device 500 generates reduction audio signal and spatial information based on the input multi-channel signal.3D rendering unit 510 generates 3D reduction audio signal by reduction audio signal execution 3D is played up operation.

Can select or determine whether reduction audio signal execution 3D is played up operation by the user according to the ability of code device, the characteristic or the desired tonequality of reproducing environment.

Reduction audio mixing scrambler 520 codings are by the reduction audio signal of multi-channel encoder device 500 generations or the 3D reduction audio signal that is generated by 3D rendering unit 510.

Bit packaged unit 530 based on spatial information and or encoded reduction audio signal or encoded, scrambler 3D reduction audio signal generate bit stream.The bit stream that is generated by bit packaged unit 530 can comprise that the encoded reduction audio signal that indication is included in the bit stream does not have the non-3D reduction audio signal of 3D effect or the reduction audio mixing identification information with scrambler 3D reduction audio signal of 3D effect.More specifically, reduction audio mixing identification information can indicate the bit stream that is generated by bit packaged unit 530 whether to comprise non-3D reduction audio signal, scrambler 3D reduction audio signal or the two.

Fig. 5 is the block diagram of decoding device according to another embodiment of the invention.With reference to figure 5, decoding device comprises bit split cells 540, reduction

audio mixing demoder

550 and 3D rendering unit 560.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

With reference to figure 5, bit split cells 540 extracts encoded reduction audio signal, spatial information and reduction audio mixing identification information from incoming bit stream.The encoded reduction audio signal of reduction audio mixing identification information indication is the encoded 3D reduction audio signal that does not have the encoded non-3D reduction audio signal of 3D effect or have 3D effect.

If incoming bit stream comprises non-3D reduction audio signal and 3D reduction audio signal, then only can come from incoming bit stream, to extract one of non-3D reduction audio signal and 3D reduction audio signal by user's selection or according to the ability of decoding device, the characteristic or the required tonequality of reproducing environment.

550 pairs of encoded reduction audio signal of reduction audio mixing demoder are decoded.If the reduction audio signal of obtaining by the decoding of being carried out by reduction audio mixing demoder 550 is to play up the scrambler 3D reduction audio signal that operation is obtained by carrying out 3D, then this reduction audio signal can easily be reproduced.

On the other hand, if by the reduction audio signal of being obtained by the decoding that reduces 550 execution of audio mixing demoder is the reduction audio signal with 3D effect, then 3D rendering unit 560 can generate demoder 3D reduction audio signal by playing up to operate to the reduction audio signal execution 3D that is obtained by the 550 execution decodings of reduction audio mixing demoder.

Fig. 6 is the block diagram of decoding device according to another embodiment of the invention.With reference to figure 6, decoding device comprises bit split cells 600, reduction audio mixing demoder 610, a 3D rendering unit 620, the 2nd 3D rendering unit 630 and filter information storage unit 640.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

That bit split cells 600 extracts from incoming bit stream is encoded, scrambler 3D reduction audio signal and spatial information.610 pairs of reduction audio mixing demoders encoded, scrambler 3D reduction audio signal decodes.

The one 3D rendering unit 620 is used to carry out the inverse filter of wave filter that 3D plays up the code device of operation, removes 3D effect from the scrambler 3D reduction audio signal that the decoding of being carried out by reduction audio mixing demoder 610 is obtained.The utilization of second rendering unit 630 is stored in wave filter in the decoding device by removing the reduction audio signal of obtaining and carry out 3D and play up operation and generate the combination 3D reduction audio signal with 3D effect being carried out by a 3D rendering unit 620.

The 2nd 3D rendering unit 630 can be utilized its characteristic and be used to carry out 3D and play up the different wave filter of the wave filter of coding unit of operation and carry out 3D and play up operation.For example, the 2nd 3D rendering unit 630 can be utilized its coefficient HRTF different with the coefficient of the employed HRTF of code device to carry out 3D and play up operation.

Filter information storage unit 640 storage is about the filter information of the wave filter that is used to carry out 3D and plays up, for example, and the HRTF coefficient information.The 2nd 3D rendering unit 630 can utilize the filter information that is stored in the filter information storage unit 640 to generate combination 3D reduction audio mixing.

Filter information storage unit 640 can be stored many filter informations that correspond respectively to a plurality of wave filters.In this case, can select one of many filter informations by user's selection or according to the ability or the desired tonequality of decoding device.

Can have different ear structures from not agnate people.Therefore, the HRTF coefficient at Different Individual optimization can differ from one another.Decoding device shown in Fig. 6 can generate the 3D reduction audio signal at user optimization.In addition, what the decoding device shown in Fig. 6 can be regardless of the type of the HRTF that is provided by 3D reduction audio signal supplier, and generates the hrtf filter corresponding 3D reduction audio signal with 3D effect desired with the user.

Fig. 7 is the block diagram of 3D rendering device according to an embodiment of the invention.With reference to figure 7, the 3D rendering device comprises the first and second territory converting units 700 and 720 and 3D rendering unit 710.Play up operation in order in predetermined territory, to carry out 3D, the first and second territory converting units 700 and 720 can be separately positioned on the front and back of 3D rendering unit 710.

With reference to figure 7, input reduction audio signal can convert frequency domain reduction audio signal to by the first territory converting unit 700.More specifically, the first territory converting unit 700 can will be imported the reduction audio signal by execution DFT or FFT and convert DFT territory reduction audio signal or FFT territory reduction audio signal to.

3D rendering unit 710 generates multi-channel signal by spatial information being put on the frequency domain reduction audio signal that is provided by the first territory converting unit 700.Afterwards, 3D rendering unit 710 generates 3D reduction audio signal by the filtering multi-channel signal.

The 3D reduction audio signal that is generated by 3D rendering unit 710 converts time domain 3D reduction audio signal to by the second territory converting unit 720.More specifically, the second territory converting unit 720 can be carried out IDFT or IFFT to the 3D reduction audio signal that is generated by 3D rendering unit 710.

During frequency domain 3D reduction audio signal converted time domain 3D reduction audio signal to, the loss of data or the data distortion of aliasing and so on may take place.

In order to generate multi-channel signal and the 3D reduction audio signal in the frequency domain, the spatial information of each parameter band can be mapped to frequency domain, and a plurality of filter coefficients can be converted to frequency domain.

3D rendering unit 710 can multiply each other by frequency domain reduction audio signal, spatial information and the filter coefficient that the first territory converting unit 700 is provided and generate 3D reduction audio signal.

Have M useful signal by the time-domain signal that reduction audio signal, spatial information and a plurality of filter coefficient all represented is multiplied each other obtain in M point frequency domain.For expression reduction audio signal, spatial information and filter coefficient in M point frequency domain, can carry out M point DFT or M point FFT.

Useful signal is the signal that not necessarily has 0 value.For example, can generate x useful signal altogether by from sound signal, obtaining x signal via sampling.In this x useful signal, y useful signal is by zero padding.Then, the decreased number of useful signal is to (x-y).Afterwards, have the signal of a useful signal and signal by convolution, thereby obtain (a+b-1) individual useful signal altogether with b useful signal.

Reducing multiplying each other of audio signal, spatial information and filter coefficient in the M point frequency domain can provide and convolution in time domain reduction audio signal, effect that spatial information is identical with filter coefficient.Signal with (3*M-2) individual useful signal can generate by the result who reduction audio signal, spatial information and filter coefficient in the M point frequency domain is converted to time domain and this conversion of convolution.

Therefore, may be different by reduction audio signal, spatial information and filter coefficient in the frequency domain being multiplied each other and multiplied result being converted to the number of the useful signal in the signal that time domain obtains with the number of useful signal in the signal that obtains by reduction audio signal, spatial information and filter coefficient in the convolution time domain.As a result, aliasing can take place during the reduction of the 3D in frequency domain audio signal was converted to time-domain signal.

In order to prevent aliasing, the summation of the number of the number of the useful signal of the reduction audio signal in the time domain, the number of useful signal of spatial information that is mapped to frequency domain and filter coefficient can not be greater than M.The number of useful signal that is mapped to the spatial information of frequency domain can be determined according to the number of the point of frequency domain.In other words, if the spatial information that each parameter band is represented is mapped to N point frequency domain, then the number of the useful signal of spatial information can be N.

Comprise the first zero padding unit 701 and first frequency domain converting unit 702 with reference to figure 7, the first territory converting units 700.The 3rd rendering unit 710 comprises map unit 711, time domain converting unit 712, the second zero padding unit 713, second frequency domain converting unit 714, multi-channel signal generation unit 715, the 3rd zero padding unit 716, the 3rd frequency domain converting unit 717 and 3D reduction audio signal generation unit 718.

The reduction audio signal with X sample in the 701 pairs of time domains in the first zero padding unit is carried out the zero padding operation makes the number of samples of reduction audio signal to increase to M from X.First frequency domain converting unit 702 is reduced audio signal with zero padding and is converted M point frequency-region signal to.Reduction audio signal through zero padding has M sample.In M sample of the reduction audio signal of zero padding, only X sample is useful signal.

Map unit 711 maps to N point frequency domain with the spatial information of each parameter band.Time domain converting unit 712 will be converted to time domain by the spatial information that the mapping that map unit 711 is carried out is obtained.The spatial information that obtains by time domain converting unit 712 execution conversions has N sample.

The spatial information with N sample in the 713 pairs of time domains in the second zero padding unit is carried out the zero padding operation, makes the number of samples of spatial information to increase to M from N.Second frequency domain converting unit 714 will convert M point frequency-region signal through the spatial information of zero padding to.Spatial information through zero padding has N sample.In N sample of the spatial information of zero padding, only N sample is effective.

The spatial information that multi-channel signal generation unit 715 provides by the reduction audio signal that makes first frequency domain converting unit 712 and provide and the second frequency domain converting unit 714 generation multi-channel signal that multiplies each other.The multi-channel signal that is generated by multi-channel signal generation unit 715 has M useful signal.On the other hand, the multi-channel signal that obtains of reduction audio signal that is provided by first frequency domain converting unit 712 by convolution in time domain and the spatial information that provided by second frequency domain converting unit 714 has (X+N-1) individual useful signal.

The zero padding operation can be carried out to the Y filter coefficient of representing in the time domain in the 3rd zero padding unit 716, makes the number of sample can increase to M.The 3rd frequency domain converting unit 717 converts the zero padding filter coefficient to M point frequency domain.Filter coefficient through zero padding has M sample.In M sample, only Y sample is useful signal.

A plurality of filter coefficients that 3D reduction audio signal generation unit 718 provides by multi-channel signal that multi-channel signal generation unit 715 is generated and the 3rd frequency domain converting unit 717 multiply each other and generate 3D and reduce audio signal.The 3D reduction audio signal that is generated by 3D reduction audio signal generation unit 718 has M useful signal.On the other hand, the 3D reduction audio signal obtained of multi-channel signal that is generated by multi-channel signal generation unit 715 by convolution in time domain and the filter coefficient that provided by the 3rd frequency domain converting unit 717 has (X+N+Y-2) individual useful signal.

By first, second and the 3rd frequency domain converting unit 702 employed M point frequency domains being arranged to satisfied following equation: M 〉=(X+N+Y-2) prevent that aliasing from being possible.In other words, might prevent aliasing by M point DFT or the M point FFT that first, second and the 3rd frequency domain converting unit 702,714 and 717 can be carried out satisfy following equation: M 〉=(X+N+Y-2).

Conversion to frequency domain can utilize the bank of filters except that DFT bank of filters, fft filters group and QMF group to carry out.The generation of 3D reduction audio signal can utilize hrtf filter to carry out.

The number of the useful signal of spatial information can utilize the method except that said method to regulate, and maybe can utilize in the said method most effective and need the minimum a kind of method of calculated amount to regulate.

Not only signal, coefficient or spatial information be converted to from frequency domain time domain or conversely during, and signal, coefficient or spatial information be converted to from the QMF territory hybrid domain or conversely during, all aliasing can take place.The above-mentioned method that prevents aliasing also be used in signal, coefficient or spatial information from the QMF territory be converted to hybrid domain or conversely during prevent that aliasing from taking place.

The spatial information that is used to generate multi-channel signal or 3D reduction audio signal can change.Result as spatial information changes can take place as the signal of noise discontinuous in output signal.

Noise in the output signal can utilize smoothing method to reduce, and can prevent that by this smoothing method spatial information from changing fast.

For example, when first spatial information that when first frame and second frame are adjacent one another are, puts on first frame and the spatial information that puts on second frame not simultaneously, between first frame and second frame, take place discontinuous most probably.

In this case, can utilize first spatial information to compensate second spatial information or utilize second spatial information to compensate first spatial information, the difference of the win spatial information and second spatial information can be reduced, thereby can reduce by the discontinuous noise that causes between first and second frames.More specifically, the mean value of available first spatial information of at least one in first spatial information and second spatial information and second spatial information replaces, thereby reduces noise.

Noise is also probably owing to discontinuous generation the between a pair of adjacent parameter frequency band.For example, when when first and second parameter band are adjacent one another are corresponding to the 3rd spatial information of first parameter band with not simultaneously corresponding to the 4th spatial information of second parameter band, discontinuous may between first and second parameter band, the generation.

In this case, can utilize the 4th spatial information to compensate the 3rd spatial information or utilize the 3rd spatial information to compensate the 4th spatial information, make the difference of the 3rd spatial information and the 4th spatial information to reduce, and can reduce by the discontinuous noise that causes between first and second parameter band.More specifically, the mean value of available the 3rd spatial information of at least one in the 3rd spatial information and the 4th spatial information and the 4th spatial information replaces, thereby reduces noise.

By between a pair of consecutive frame or the discontinuous noise that causes between a pair of adjacent parameter frequency band can utilize the method except that said method to reduce.

More specifically, each frame can be multiply by the window such as peaceful (Hanning) window of the Chinese, and " overlapping and interpolation " scheme can be put on the result of multiplication, make that the change between the frame can reduce.Perhaps, the output signal that is applied with many spatial informations can be smoothed, makes change between a plurality of frames can prevent output signal.

For example can utilizing, the spatial information of ICC and so on carries out following adjusting with the decorrelation between the sound channel in the DFT territory.

Can multiply by the degree that predetermined value is regulated decorrelation by the coefficient that makes the signal that is input to one to two (OTT) or two to three (TTT) frame.Predetermined value can be limited by following equation: (A+ (1-A*A) ∧ 0.5*i), and wherein the A indication puts on the ICC value of the predetermined frequency band of OTT or TTT frame, and i indication imaginary part.Imaginary part can be positive or negative.

Predetermined value can be with the weighting factor according to the characteristic of signal, the characteristic of signal for example the signal of energy level, each frequency of signal energy response or apply the type of the frame of ICC value A.As the result who introduces weighting factor, the degree that can further regulate decorrelation, but and level and smooth or interpolation method between application of frame.

As above described, can be converted to the HRTF of frequency domain or header coherent pulse response (HRIR) by use and in frequency domain, generate 3D and reduce audio signal with reference to figure 7.

Perhaps, 3D reduction audio signal can generate by convolution HRIR in time domain and reduction audio signal.The 3D reduction audio signal that generates in the frequency domain can be stayed in the frequency domain, and do not carried out contrary territory conversion.

For convolution HRIR in time domain and reduction audio signal, can use finite impulse response (FIR) (FIR) wave filter or infinite impulse response (IIR) wave filter.

As mentioned above, can utilize the combination that relates to the HRTF in the use frequency domain or be converted to first method of the HRIR of frequency domain, second method that relates to convolution HRIR in time domain or first and second methods to generate 3D reduction audio signal according to the encoding apparatus and decoding apparatus of the embodiment of the invention.

Fig. 8 to 11 illustrates bit stream according to an embodiment of the invention.

With reference to figure 8, bit stream comprises: comprise the multi-channel decoding information field that generates the multi-channel signal information needed, comprise the 3D that generates 3D reduction audio signal information needed and play up information field and comprise to utilize and be included in the information in the multi-channel decoding information field and be included in the header fields that 3D plays up the required header information of information in the information field.Bit stream can only comprise that multi-channel decoding information field, 3D play up one or two in information field and the header fields.

With reference to figure 9, the bit stream that contains the necessary supplementary of decode operation can comprise: comprise whole encoded signal header information the customized configuration header fields and comprise a plurality of frame data field about the supplementary of a plurality of frames.More specifically, each frame data field can comprise: comprise the frame header fields of header information of respective frame and the frame parameter data field that comprises the spatial information of respective frame.Perhaps, each in the frame data field only can comprise the frame parameter data field.

In the frame parameter data field each can comprise a plurality of modules, and each module comprises sign and supplemental characteristic.Module is to comprise supplemental characteristic such as spatial information and the data set that improves necessary other data of signal tonequality such as gain of reduction audio mixing and smoothed data.

If under without any the situation of additional mark, receive about module data by frame header fields specified message, if further classified by frame header fields specified message, if perhaps receive additional mark and data together with not by frame header specified message, then module data can not comprise any sign.

About the supplementary of 3D reduction audio signal, for example the HRTF coefficient information can be included at least one in customized configuration header fields, frame header fields and the frame parameter data field.

With reference to Figure 10, bit stream can comprise: comprise a plurality of multi-channel decoding information fields that generate the multi-channel signal information necessary and comprise a plurality of 3D that generate 3D reduction audio signal information necessary and play up information field.

When receiving bit stream, decoding device can use multi-channel decoding information field or 3D to play up information field and carry out decode operation and skip any multi-channel decoding information field and 3D that does not use and play up information field in decode operation.In this case, can determine that in the information field which multi-channel decoding information field and 3D play up and will be used to carry out decode operation according to the type of the signal that will reproduce.

In other words, in order to generate multi-channel signal, decoding device can be skipped 3D and play up information field, and reads the information that is included in the multi-channel decoding information field.On the other hand, in order to generate 3D reduction audio signal, decoding device can be skipped the multi-channel decoding information field, and reads and be included in 3D and play up information in the information field.

Some the method for skipping in a plurality of fields in the bit stream is as follows.

At first, the field length information about the bit size of field can be included in the bit stream.In this case, can skip this field by skipping corresponding to the bit number of field bit size.Can be with of the beginning of field length information setting in field.

The second, synchronization character can be arranged on the terminal or beginning of field.In this case, can skip this field by location positioning field based on synchronization character.

The 3rd, if determine and fixed the length of field in advance, then can skip this field by skipping corresponding to the data volume of the length of this field.Can be included in fixed field length information in the bit stream or be stored in the decoding device about field length.

The 4th, can utilize two or more the combination in the above-mentioned field skipping method to skip one of a plurality of fields.

It is to skip the field information necessary that field such as field length information, synchronization character or fixed field length information is skipped information, it can be included in one of customized configuration header fields shown in Figure 9, frame header fields and frame parameter data field, maybe it can be included in the field shown in Figure 9 field in addition.

For example, in order to generate multi-channel signal, decoding device can be skipped 3D and plays up information field with reference to being arranged on field length information, synchronization character or fixed field length information that each 3D plays up the beginning of information field, and reads the information that is included in the multi-channel decoding information field.

On the other hand, in order to generate 3D reduction audio signal, decoding device can be skipped the multi-channel decoding information field with reference to field length information, synchronization character or the fixed field length information of the beginning that is arranged on each multi-channel decoding information field, and reads and be included in 3D and play up information in the information field.

Bit stream can comprise that the data that indication is included in this bit stream generate the necessary or generation 3D reduction audio signal information necessary of multi-channel signal.

Yet, even bit stream does not comprise any spatial information such as CLD, and only comprise that generating 3D (for example reduces the necessary data of audio signal, the hrtf filter coefficient), also can reproduce multi-channel signal, and not need spatial information by utilizing the necessary data of generation 3D reduction audio signal to decode.

For example, obtain the stereo parameter of conduct from the reduction audio signal about the spatial information of two sound channels.Then, convert stereo parameter to spatial information, and generate multi-channel signal by putting on the reduction audio signal by the spatial information that conversion is obtained about a plurality of sound channels to be reproduced.

On the other hand, even only comprising, bit stream generates the necessary data of multi-channel signal, also can reproduce the reduction audio signal and not need additional decode operation, maybe can reproduce 3D reduction audio signal by utilizing additional hrtf filter that reduction audio signal execution 3D is handled.

Generate the necessary data of multi-channel signal and generate the necessary data of 3D reduction audio signal if bit stream comprises, then can allow the user to determine to reproduce multi-channel signal or 3D reduction audio signal.

To describe the method for skipping data in detail with reference to corresponding sentence structure separately hereinafter.

Sentence structure 1 indication is the method for unit decoded audio signal with the frame.

[sentence structure 1]

SpatialFrame()
SpatialFrame()	{
Framinglnfo()；	{
Framinglnfo()；	bslndependencyFIag；
OttData()；	bslndependencyFIag；
OttData()；	TttData()；
SmgData()；	TttData()；
SmgData()；	TempShapeData()；
if[bsArbitraryDownmix){	TempShapeData()；
if[bsArbitraryDownmix){	ArbitraryDownmixData()；
}	ArbitraryDownmixData()；
}	if(bsResidualCoding){
ResidualData()；	if(bsResidualCoding){
ResidualData()；	}
}	}

In sentence structure 1, Ottdata () and TttData () are the module of expression from the reduction audio signal recovery necessary parameter of multi-channel signal (such as the spatial information that comprises CLD, ICC and CPC), and SmgData (), TempShapeData (), ArbitraryDownmixData () and ResidualData () are expression improves the tonequality information necessary by correction coding contingent distorted signals of operating period modules.

For example, if parameter such as CLD, ICC or CPC and the information that is included among the modules A rbitraryDownmixData () are only used during decode operation, the module SmgData () and the TempShapeData () that then are arranged between module TttData () and the ArbitraryDownmixData () are unessential.Therefore, skip module SmgData () and TempShapeData () is efficient.

To describe the method for skipping module according to an embodiment of the invention in detail with reference to following sentence structure 2 hereinafter.

[sentence structure 2]

:
:	TttData()；
SkipData(){	TttData()；
SkipData(){	bsSkipBits；
}	bsSkipBits；
}	SmgData()；
TempShapeData()；	SmgData()；
TempShapeData()；	if[bsArbitraryDownmix){
ArbitraryDownmixData()；	if[bsArbitraryDownmix){
ArbitraryDownmixData()；	}
:	}

With reference to sentence structure 2, module SkipData () can be arranged on before the quilt module of skipping, and the bit size of the module that will be skipped is designated as bsSkipBits in module SkipData ().

In other words, suppose that module SmgData () and TempShapeData () will be skipped, and the module SmgData () of combination and the bit size of TempShapeData () are 150, then can be set to 150 by bsSkipBits and skip module SmgData () and TempShapeData ().

Hereinafter will be with reference to the sentence structure 3 detailed descriptions method of skipping module according to another embodiment of the invention.

[sentence structure 3]

:
:	TttData()；
bsSkipSyncflag；	TttData()；
bsSkipSyncflag；	SmgData()；
TempShapeData()；	SmgData()；
TempShapeData()；	bsSkipSyncword；
if[bsArbitraryDownmix){	bsSkipSyncword；
if[bsArbitraryDownmix){	ArbitraryDownmixData()；
}	ArbitraryDownmixData()；
}	:

With reference to figure 3, can skip unnecessary module by using bsSkipSyncflag and bsSkipSyncword, bsSkipSyncflag is that to indicate the sign that whether uses synchronization character, bsSkipSyncword be that can be set at will be by the synchronization character of the end of the module skipped.

More specifically, be arranged so that synchronization character can use if will indicate bsSkipSyncflag, then indicate the one or more modules between bsSkipSyncflag and the synchronization character bsSkipSyncword---be that module SmgData () and TempShapeData () can be skipped.

With reference to Figure 11, bit stream can comprise: comprise the multichannel header fields of reproducing the necessary header information of multi-channel signal, comprise the 3D that reproduces the necessary header information of 3D reduction audio signal and play up header fields and comprise and reproduce a plurality of multi-channel decoding information fields that multi-channel signal institute must data.

In order to reproduce multi-channel signal, decoding device can be skipped 3D and play up header fields, and from multichannel header fields and multi-channel decoding information field reading of data.

Skip method that 3D plays up header fields with above identical, therefore, can skip its detailed description with reference to the described field skipping method of Figure 10.

In order to reproduce 3D reduction audio signal, decoding device can be played up the header fields reading of data from multi-channel decoding information field and 3D.For example, decoding device can utilize reduction audio signal that is included in the multi-channel decoding information field and the HRTF coefficient information that is included in the 3D reduction audio signal to generate 3D reduction audio signal.

Figure 12 is the block diagram that is used to handle the coding/decoding device of any reduction audio signal according to an embodiment of the invention.With reference to Figure 12, reduce audio signal arbitrarily and be the reduction audio signal except the reduction audio signal that generates by the multi-channel encoder device 801 that is included in the code device 800.The detailed description of the process identical with the embodiment of Fig. 1 will be omitted.

With reference to Figure 12, code device 800 comprises multi-channel encoder device 801, spatial information synthesis unit 802 and comparing unit 803.

Multi-channel encoder device 801 will be imported multi-channel signal reduction audio mixing and become stereo or monophony reduction audio signal, and generate from the necessary fundamental space information of reduction audio signal recovery multi-channel signal.

Comparing unit 803 will reduce audio signal and compare with reducing audio signal arbitrarily, and result based on the comparison generates compensated information.Compensated information be compensation arbitrarily the reduction audio signal make that the reduction audio signal can be converted near the reduction audio signal necessary arbitrarily.Decoding device can utilize compensated information to compensate any reduction audio signal, and utilizes through any reduction audio signal of compensation and recover multi-channel signal.The multi-channel signal that recovers more is similar to original input multi-channel signal than the multi-channel signal that recovers from any reduction audio signal that is generated by multi-channel encoder device 801.

Compensated information can be the reduction audio signal and reduce the poor of audio signal arbitrarily.Decoding device can compensate any down-mix audio signal in the Calais with the difference of reducing audio signal arbitrarily with any reduction audio signal mutually by reducing audio signal.

The reduction audio signal can be the reduction audio mixing gain of the indication reduction audio signal and the difference of the energy level that reduces audio signal arbitrarily with the difference of reducing audio signal arbitrarily.

Can determine the gain of reduction audio mixing at each frequency band, each time/time slot and/or each sound channel.For example, the gain of part reduction audio mixing can be determined at each frequency band, and the gain of another part reduction audio mixing can be determined at each time slot.

The gain of reduction audio mixing can or be that each frequency band that reduces audio signal optimization is arbitrarily determined at each parameter band.Parameter band is the frequency interval that is applied with the spatial information of parameter type.

Can be with the residual quantityization of the energy level of reduction audio signal and any reduction audio signal.The resolution of quantization level of difference that quantizes the reduction audio signal and reduce the energy level of audio signal arbitrarily can and quantize the reduction audio signal and reduces the resolution of quantization level of the CLD between the audio signal arbitrarily identical or different.In addition, the reduction audio signal and the quantification of difference that reduces the energy level of audio signal arbitrarily can relate to and use all or part of of the quantization level that quantizes the reduction audio signal and reduce the CLD between the audio signal arbitrarily.

Because the resolution that reduction audio signal and the resolution of difference of the energy level of audio signal of reducing arbitrarily generally are lower than the reduction audio signal and reduce the CLD between the audio signal arbitrarily, so compare with the resolution that quantizes to reduce audio signal and reduce the quantization level of the CLD between the audio signal arbitrarily, the resolution that quantizes the reduction audio signal and the quantization level of the difference of the energy level that reduces audio signal arbitrarily can have small value.

The compensation compensated information of reduction audio signal arbitrarily can be the extend information that comprises residual information, and its appointment can not utilize the component of the input multi-channel signal of any reduction audio signal or reduction audio mixing gain recovery.Decoding device can utilize extend information to recover to utilize the component of the input multi-channel signal of any reduction audio signal or reduction audio mixing gain recovery, thereby recovers hardly the signal that can distinguish with original input multi-channel signal.

The method that generates extend information is as follows.

Multi-channel encoder device 801 can generate the information relevant with the component that reduces the input multi-channel signal that audio signal lacks as first extend information.Decoding device can recover hardly the signal that can distinguish with original input multi-channel signal by first extend information being applied to utilize reduction audio signal and basic spatial information to generate multi-channel signal.

Perhaps, multi-channel encoder device 801 can utilize reduction audio signal and fundamental space information to recover multi-channel signal, and the difference of multi-channel signal that generates the multi-channel signal that recovered and original input is as first extend information.

Comparing unit 803 can generate and the component that reduces the reduction audio signal that audio signal lacked arbitrarily---promptly can not utilize the component of the reduction audio signal of reduction audio mixing gain compensation---, and relevant information is as second extend information.Decoding device can utilize any reduction audio signal and second extend information to recover almost the signal that can not distinguish with the reduction audio signal.

In addition to the above methods, extend information also can utilize various residual error interpretation methods to generate.

Gain of reduction audio mixing and extend information both can be used as compensated information.More specifically, can obtain gain of reduction audio mixing and extend information at the whole frequency band of reduction audio signal, and can be with them together as compensated information.Perhaps, the gain of reduction audio mixing can be used as compensated information at a part of frequency band of reduction audio signal, and with the compensated information of extend information as another part frequency band of reduction audio signal.For example, extend information can be used as the compensated information of the low-frequency band of reduction audio signal, and will reduce the compensated information of audio mixing gain as the high frequency band of reduction audio signal.

Also can with except that the low-frequency band of reduction audio signal, be used as compensated information such as the relevant extend information of the peak value of appreciable impact tonequality or the fractional reduction audio signal the recess.

Spatial information synthesis unit 802 synthetic fundamental space information (for example, CLD, CPC, ICC and CTD) and compensated informations, thereby span information.In other words, the spatial information that is sent to decoding device can comprise fundamental space information, the gain of reduction audio mixing and first and second extend informations.

Spatial information can be included in the bit stream together with reducing audio signal arbitrarily, and bit stream can be sent to decoding device.

Extend information and any reduction audio signal can utilize the audio coding method such as AAC method, MP3 method or BSAC method to encode.Extend information can utilize identical audio coding method or different audio coding methods to encode with any reduction audio signal.

If extend information is utilized identical audio coding method coding with any reduction audio signal, then decoding device can utilize single audio frequency coding/decoding method decode extend information and reduction audio signal arbitrarily.In this case, always can be decoded because reduce audio signal arbitrarily, so extend information also always can be decoded.Yet, generally be input to decoding device as the pulse code modulation (pcm) signal because reduce audio signal arbitrarily, the type of reducing the audio codec of audio signal arbitrarily that is used to encode may not easily be discerned, therefore, be used to the to encode type of audio codec of extend information may can not easily be discerned.

Therefore, relevant with the type of the audio codec of reduction audio signal and extend information arbitrarily of being used for encoding audio codec information can be inserted into bit stream.

More specifically, audio codec information can be inserted the customized configuration header fields of bit stream.In this case, decoding device can extract audio codec information from the customized configuration header fields of bit stream, and uses the audio codec information decoding that is extracted to reduce audio signal and extend information arbitrarily.

On the other hand, utilize different coding methods to encode if reduce audio signal arbitrarily with extend information, then extend information may not be decoded.In this case, because can not discern the end of extend information, so can not carry out further decode operation.

In order to address this problem, the audio codec information relevant with the type of the audio codec that is respectively applied for coding any reduction audio signal and extend information can be inserted the customized configuration header fields of bit stream.Then, decoding device can read audio codec information from the customized configuration header fields of bit stream, and uses the information that the reads extend information of decoding.If decoding device does not comprise the decoding unit of any decodable code extend information, then may not further carry out the decoding of extend information, and can read the extend information information afterwards that is right after.

The audio codec information relevant with the type of the audio codec of the extend information that is used for encoding can be represented by the syntax elements of the customized configuration header fields that is included in bit stream.For example, audio coding decoding information can be represented by 4 bit syntax elements bsResidualCodecType, as what indicate in the following table 1.

Table 1

Extend information not only can comprise residual information, also can comprise the sound channel extend information.The sound channel extend information is to have the more multi-channel signal information necessary of multichannel with being expanded into by the multi-channel signal that utilizes the spatial information decoding to obtain.For example, the sound channel extend information can be that 5.1 sound channel signals or 7.1 sound channel signals are expanded into 9.1 sound channel signal information necessary.

Extend information can be included in the bit stream, and bit stream can be sent to decoding device.Then, decoding device can compensate the reduction audio signal, or utilizes extend information to expand multi-channel signal.Yet decoding device can be skipped extend information, rather than extracts extend information from bit stream.For example, the 3D reduction audio signal generation multi-channel signal or the utilization that are included in the bit stream in utilization are included under the situation of the reduction audio signal generation 3D reduction audio signal in the bit stream, and decoding device can be skipped extend information.

Skipping the method that is included in the extend information in the bit stream can be with above identical with reference to one of described field skipping method of Figure 10.

For example, extend information can utilize in the fixed bit size information of fixed bit size of the bit size information of the bit size of the beginning that is attached to the bit stream that comprises extend information and indication extend information, the beginning that is attached to the field that comprises extend information or terminal synchronization character and indication extend information at least one to skip.Bit size information, synchronization character and fixed bit size information all can be included in the bit stream.Also the fixed bit size information can be stored in the decoding device.

With reference to Figure 12, decoding unit 810 comprises reduction audio

mixing compensating unit

811,3D rendering unit 815 and multi-channel decoder 816.

Reduction audio mixing compensating unit 811 utilizes the compensated information that is included in the spatial information---and for example utilize gain of reduction audio mixing or extend information to compensate any reduction audio signal.

3D rendering unit 815 generates demoder 3D reduction audio signal by the reduction audio signal execution 3D through compensation is played up operation.Multi-channel decoder 816 utilizes through the reduction audio signal of compensation and the fundamental space information that is included in the spatial information and generates the 3D multi-channel signal.

Reduction audio mixing compensating unit 811 can compensate any reduction audio signal in the following manner.

If compensated information is the gain of reduction audio mixing, then reduce audio mixing compensating unit 811 and utilize reduction audio mixing gain compensation to reduce the energy level of audio signal arbitrarily, make that the reduction audio signal can be converted into the signal that is similar to the reduction audio signal arbitrarily.

If compensated information is second extend information, then reduces audio mixing compensating unit 811 and can utilize the compensation of second extend information to reduce the component that audio mixing information is lacked arbitrarily.

Multi-channel decoder 816 can generate multi-channel signal by prematrix M1, audio mixing matrix M 2 and rearmounted matrix M 3 sequentially being put on reduction audio mixing matrix signal.In this case, second extend information is used in audio mixing matrix M 2 is put on reduction audio signal compensating during reduction audio signal.In other words, second extend information can be used for compensating the reduction audio signal that has been applied with prematrix M1.

As mentioned above, can optionally compensate in a plurality of sound channels each by extend information being applied to generate multi-channel signal.For example, if extend information is put on the center channel of audio mixing matrix M 2, then can be by the L channel and the right channel component of extend information compensation reduction audio signal.If extend information is put on the L channel of audio mixing matrix M 2, then can be by the left channel component of extend information compensation reduction audio signal.

Gain of reduction audio mixing and extend information both can be used as compensated information.For example, can utilize the extend information compensation low-frequency band of reduction audio signal arbitrarily, and can utilize reduction audio mixing gain compensation to reduce the high frequency band of audio signal arbitrarily.In addition, but also can utilize extend information compensation remove the low-frequency band of reducing audio signal arbitrarily, reduce audio signal arbitrarily such as the peak value of appreciable impact tonequality or the sector of breakdown of recess.With will can be included in the bit stream by the relevant information of the part of extend information compensation.Whether the reduction audio signal that indication is included in the bit stream is whether information and the indication bit stream that reduces audio signal arbitrarily comprises that the information of compensated information can be included in the bit stream.

For the reduction audio signal clipped wave that prevents to generate by coding unit 800, can be with the reduction audio signal divided by predetermined gain.Predetermined gain can have quiescent value or dynamic value.

Reduction audio mixing compensating unit 811 can recover original reduction audio signal by utilizing predetermined gain to be compensated for as to prevent the reduction audio signal that slicing weakens.

Can easily reproduce any reduction audio signal by 811 compensation of reduction audio mixing compensating unit.Perhaps, any reduction audio signal also to be compensated can be input to 3D rendering unit 815, and can convert demoder 3D reduction audio signal to by 3D rendering unit 815.

With reference to Figure 12, reduction audio mixing compensating unit 811 comprises the first territory converter 812, compensation processor 813 and the second territory converter 814.

The territory that the first territory converter 812 will reduce audio signal arbitrarily converts predetermined domain to.Compensation processor 813 utilizes compensated information---for example, gain of reduction audio mixing or extend information---compensate any reduction audio signal in the predetermined domain.

The compensation of reduction audio signal can be carried out in the QMF/ hybrid domain arbitrarily.For this reason, the first territory converter 812 can be carried out the QMF/ hybrid analysis to any reduction audio signal.The first territory converter 812 can convert the territory that reduces audio signal arbitrarily to the territory except that the QMF/ hybrid domain, for example, and the frequency domain such as DFT or FFT territory.The compensation of reduction audio signal also can be carried out in the territory except that the QMF/ hybrid domain arbitrarily, for example, and frequency domain or time domain.

The second territory converter 814 will convert to and the identical territory of original any reduction audio signal through the territory of any reduction audio signal of compensation.More specifically, the second territory converter 814 converts the territory through any reduction audio signal of compensation and the identical territory of original any reduction audio signal to by oppositely carrying out by the performed territory conversion operations of the first territory converter 812.

For example, the second territory converter 814 can be by the QMF/ mixing is synthetic to convert any reduction audio signal through compensation to time-domain signal to carrying out through any reduction audio signal of compensation.Equally, the second territory converter 814 can be carried out IDFT or IFFT to any reduction audio signal through compensation.

Be similar to 3D rendering unit 710 shown in Figure 7,3D rendering unit 815 can be played up operation to any reduction audio signal execution 3D in frequency domain, QMF/ hybrid domain or time domain, through compensating.For this reason, this 3D rendering unit 815 can comprise territory converter (not shown).The territory converter will convert to through the territory of any reduction audio signal of compensation will carry out the territory that 3D plays up operation, or the territory of the signal that operation obtains is played up in conversion by 3D.

Wherein the territory of reduction audio signal can be with wherein 815 pairs of 3D rendering unit be identical or different through the territory that any reduction audio signal execution 3D of compensation plays up operation arbitrarily in compensation processor 813 compensation.

Figure 13 is the block diagram that reduces audio mixing compensation/3D rendering unit 820 according to an embodiment of the invention.With reference to Figure 13, reduction audio mixing compensation/3D rendering unit 820 comprises that the first territory converter 821, the second territory converter 822, compensation/3D play up processor 823 and the 3rd territory converter 824.

Reduction audio mixing compensation/3D rendering unit 820 can carry out compensating operation to any reduction audio signal in individual domain and 3D plays up operation, thereby reduces the calculated amount of decoding device.

More specifically, the first territory converter 821 territory that will reduce audio signal arbitrarily converts to and wherein will carry out first territory that compensating operation and 3D play up operation.The second territory converter, 822 transformed space information, it comprises that generating the necessary fundamental space information of multi-channel signal reduces the necessary compensated information of audio signal arbitrarily with compensation, makes spatial information become applicable to first territory.Compensated information can comprise at least one in gain of reduction audio mixing and the extend information.

For example, the second territory converter 822 can be mapped to frequency band with the compensated information corresponding to parameter band in the QMF/ hybrid domain, makes compensated information to become and easily is applicable to frequency domain.

First territory can be frequency domain, QMF/ hybrid domain or the time domain such as DFT or FFT.Perhaps, first territory can be the territory except that the territory of statement herein.

In the transition period of compensated information, time delay can take place.In order to address this problem, the second territory converter 822 can be carried out the delay compensation operation, makes that the territory of compensated information and the time delay between first territory can be compensated.

Compensation/3D plays up processor 823 and utilizes the spatial information through changing that any reduction audio signal in first territory is carried out compensating operation, then the signal that obtains by compensating operation is carried out 3D and plays up operation.Compensation/3D plays up processor 823 can be by playing up operation with different the order execution compensating operation and the 3D of this paper statement.

Compensation/3D plays up processor 823 can play up operation to any reduction audio signal execution compensating operation and 3D simultaneously.For example, compensation/3D plays up processor 823 and can generate through the 3D of compensation reduction audio signal by using new filter coefficient that any reduction audio signal execution 3D in first territory is played up operation, and this new filter coefficient is compensated information and common combination of playing up the existing filter coefficient that uses in the operation at 3D.

The 3rd territory converter 824 will compensate/and territory that 3D plays up the 3D reduction audio signal that processor 823 generated converts frequency domain to.

Figure 14 is used to handle the compatible block diagram that reduces the decoding device 900 of audio signal according to embodiments of the invention.With reference to Figure 14, decoding device 900 comprises first multi-channel decoder 910, the compatible processing unit 920 of reduction audio mixing, second multi-channel decoder 930 and 3D rendering unit 940.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

Compatible reduction audio signal is can be by the reduction audio signal of two or more multi-channel decoder decodings.In other words, compatible reduction audio signal is at first at being scheduled to multi-channel decoder optimization, can handling the reduction audio signal that operation convert the signal of optimizing at the multi-channel decoder except that this predetermined multi-channel decoder to by compatibility then.

With reference to Figure 14, suppose that the compatibility reduction audio signal of input is optimized at first multi-channel decoder 910.In order to make the compatibility reduction audio signal of second multi-channel decoder, 930 decoding inputs, the compatible processing unit 920 of reduction audio mixing can be carried out the compatible operation of handling to the compatibility reduction audio signal of input, makes the compatibility reduction audio signal of input can be converted into the signal of optimizing at second multi-channel decoder 930.First multi-channel decoder 910 generates first multi-channel signal by the compatibility reduction audio signal of decoding input.First multi-channel decoder 910 can not need spatial information to decode by the compatibility reduction audio signal of only using input to generate multi-channel signal.

Second multi-channel decoder 930 utilizes the reduction audio signal of being obtained by the compatibility processing operation of compatible processing unit 920 execution of reduction audio mixing to generate second multi-channel signal.3D rendering unit 940 can be carried out 3D and plays up operation and generate demoder 3D reduction audio signal by the compatibility of being carried out by the compatible processing unit 920 of reduction audio mixing being handled reduction audio signal that operation obtains.

Can utilize the compatibility information such as inverse matrix, will convert the reduction audio signal of optimizing at the multi-channel decoder except that predetermined multi-channel decoder at the compatibility reduction audio signal of predetermined multi-channel decoder optimization.For example when having the first and second multi-channel encoder devices that utilize the different coding method and utilizing first and second multi-channel decoders of different coding/coding/decoding method, code device can put on matrix the reduction audio signal that the first multi-channel encoder device generates, thereby generates the compatibility reduction audio signal of optimizing at second multi-channel decoder.Then, decoding device can put on inverse matrix the compatibility reduction audio signal that is generated by code device, thereby generates the compatibility reduction audio signal of optimizing at first multi-channel decoder.

With reference to Figure 14, the compatible processing unit 920 of reduction audio mixing can utilize inverse matrix that the compatibility reduction audio signal of input is carried out the compatible operation of handling, thereby generates the reduction audio signal of optimizing at second multi-channel decoder 930.

The information relevant with the compatible processing unit 920 employed inverse matrixs of reduction audio mixing can be stored in the decoding device 900 in advance, maybe can be included in the bit stream of code device transmission.In addition, to be included in reduction audio signal in the incoming bit stream be to reduce audio signal arbitrarily or the information of compatible reduction audio signal can be included in the incoming bit stream in indication.

With reference to Figure 14, the compatible processing unit 920 of reduction audio mixing comprises the first territory converter 921, compatible processor 922 and the second territory converter 923.

The territory of the compatibility reduction audio signal that the first territory converter 921 will be imported converts predetermined domain to, and compatible processor 922 utilizes the compatibility information such as inverse matrix to carry out the compatible operation of handling, and makes that the compatible reduction of the input audio signal in predetermined domain can be converted into the signal of optimizing at second multi-channel decoder 930.

Compatible processor 922 can be carried out the compatible operation of handling in the QMF/ hybrid domain.For this reason, the first territory converter 921 can be carried out the QMF/ hybrid analysis to the compatibility reduction audio signal of input.Equally, the first territory converter 921 can convert the territory of the compatibility reduction audio signal of input to the territory except that the QMF/ hybrid domain, for example, frequency domain such as DFT or FFT territory, and compatible processor 922 can be in the territory except that the QMF/ hybrid domain---as carrying out the compatible operation of handling in frequency domain or the time domain.

The territory of operating the compatibility reduction audio signal of obtaining is handled in 923 conversions of the second territory converter by compatibility.More specifically, the second territory converter 923 can convert to and the identical territory of original input compatible reduction audio signal handle the territory of operating the compatibility reduction audio signal of obtaining by compatibility by oppositely carrying out by the first territory converter, 921 performed territory conversion operations.

For example, the second territory converter 923 can be by carrying out synthetic will the processing by compatibility of QMF/ hybrid domain and operate the compatibility of obtaining and reduce audio signal and convert time-domain signal to being handled compatibility reduction audio signal that operation obtains by compatibility.Perhaps, the second territory converter 923 can be carried out IDFT or IFFT to the compatibility reduction audio signal of being obtained by the compatible processing operation.

3D rendering unit 940 can be played up operation to compatibility reduction audio signal execution 3D in frequency domain, QMF/ hybrid domain or time domain, that obtained by compatibility processing operation.For this reason, this 3D rendering unit 940 can comprise territory converter (not shown).The territory of the compatibility reduction audio signal that the territory converter will be imported converts to wherein will carry out the territory that 3D plays up operation, or the territory of operating the signal that obtains is played up in conversion by 3D.

Wherein compatible processor 922 is carried out compatible territory of handling operation and can be carried out 3D with 3D rendering unit 940 wherein to play up the territory of operation identical or different.

Figure 15 is the block diagram that reduces the compatible processing/3D rendering unit 950 of audio mixing according to an embodiment of the invention.With reference to Figure 15, the compatible processing/3D rendering unit 950 of reduction audio mixing comprises that the first territory converter 951, the second territory converter 952, compatibility/3D play up processor 953 and the 3rd territory converter 954.

Compatible processing/3D the rendering unit 950 of reduction audio mixing is carried out compatible processing operation in individual domain and 3D plays up operation, thereby reduces the calculated amount of decoding device.

Compatibility that the first territory converter 951 will be imported reduction audio signal is converted to wherein will be carried out compatible the processing and operate and 3D plays up first territory of operation.The second territory converter, 952 transformed space information and compatibility informations, for example inverse matrix makes spatial information and compatibility information to become and is applicable to first territory.

For example, the second territory converter 952 can be mapped to frequency domain with the inverse matrix corresponding to parameter band in the QMF/ hybrid domain, makes inverse matrix can easily be applicable to frequency domain.

First territory can be frequency domain, QMF/ hybrid domain or the time domain such as DFT or FFT territory.Perhaps, first territory can be the territory except that the territory of statement herein.

In the transition period of spatial information and compatibility information, but time of origin postpones.

In order to address this problem, the second territory converter 952 can be carried out the delay compensation operation, makes that the territory of spatial information and compensated information and the time delay between first territory can be compensated.

Compatibility/3D plays up processor 953 and utilizes the compatibility information through changing that the compatible audio signal of reducing of the input in first territory is carried out compatible processing operation, then the compatibility reduction audio signal execution 3D that obtains by compatibility processing operation is played up operation.Compatibility/3D plays up processor 953 can be by playing up operation with the compatible processing operation of different order execution and the 3D of this paper statement.

Compatibility/3D plays up processor 953 and can be simultaneously the compatibility reduction audio signal of input be carried out and compatiblely handle operation and 3D plays up operation.For example, compatibility/3D plays up processor 953 can generate 3D reduction audio signal by using new filter coefficient that the compatibility of the input in first territory reduction audio signal execution 3D is played up operation, and this new filter coefficient is compatibility information and the combination of playing up the existing filter coefficient that uses in the operation at 3D usually.

The 3rd territory converter 954 converts the territory that compatibility/3D plays up the 3D reduction audio signal that processor 953 generated to frequency domain.

Figure 16 is the block diagram that is used to eliminate the decoding device of crosstalking according to embodiments of the invention.With reference to Figure 16, decoding device comprises bit split cells 960, reduction

audio mixing demoder

970,3D rendering unit 980 and cross-talk cancellation unit 990.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

3D reduction audio signal by 980 outputs of 3D rendering unit can be by headphone reproduction.Yet, when 3D reduction audio signal by away from user's loudspeaker reproduction the time, crosstalk between sound channel and take place probably.

Therefore, decoding device can comprise the cross-talk cancellation unit 990 of 3D reduction audio signal being carried out the elimination operation of crosstalking.

Decoding device can be carried out sound field and handle operation.

Sound field is handled the sound field information of using in the operation, that is, sign wherein will be reproduced the information in the space of 3D reduction audio signal, can be included in the incoming bit stream that is transmitted by code device, or can be selected by decoding device.

Incoming bit stream can comprise reverberation time information.Can handle the wave filter that uses in the operation in sound field according to the reverberation time information Control.

The reverberation part of forward part and back can differentially be carried out sound field processing operation for morning.For example, early forward part can utilize the FIR wave filter to handle, and the reverberation of back part can utilize iir filter to handle.

More specifically, can by use the FIR wave filter in time domain, carries out convolution operation or by in time domain, carry out multiply operation and with the result of multiply operation be converted to time domain come to morning forward part carry out sound field and handle and operate.Sound field is handled operation and can the reverberation to the back partly be carried out in time domain.

The present invention can be embodied as the computer-readable code that writes on the computer readable recording medium storing program for performing.Computer readable recording medium storing program for performing can be the recording unit of any kind stored in the computer-readable mode of data wherein.The example of computer readable recording medium storing program for performing comprises ROM, RAM, CD-ROM, tape, floppy disk, optical data storage, the carrier wave data transmission of the Internet (for example, by).Computer readable recording medium storing program for performing can be distributed on a plurality of computer systems that are connected to network, make computer-readable code to write or from its execution to it in the mode of disperseing.Realize that function program, code and code segment required for the present invention can easily be explained by those of ordinary skill in the art.

As mentioned above, according to the present invention, coding has the multi-channel signal of 3D effect expeditiously, and according to the characteristic of reproducing environment with optimum tonequality recover adaptively with reproducing audio signal be possible.

Industrial applicibility

Other are implemented in the scope of following claim. For example, can be applied to various applications and various product according to marshalling of the present invention, data decoding and entropy decoding. Use the storage data storage medium of one aspect of the present invention within the scope of the invention.

Claims

1. coding/decoding method that recovers multi-channel signal, described coding/decoding method comprises:

Extract three-dimensional (3D) reduction audio signal and spatial information from incoming bit stream;

By being played up operation, described 3D reduction audio signal execution 3D comes to remove 3D effect from 3D reduction audio signal; And

Utilize described spatial information and generate multi-channel signal by the reduction audio signal that described removal is obtained.

2. coding/decoding method as claimed in claim 1 is characterized in that, described removal comprises the inverse filter of the wave filter that is used to generate described 3D reduction audio signal.

3. coding/decoding method as claimed in claim 2 is characterized in that, extracts from described incoming bit stream about the information of described wave filter.

4. coding/decoding method as claimed in claim 1 is characterized in that, described removal comprises the inverse function of the header related transfer function (HRTF) that is used to generate described 3D reduction audio signal.

5. coding/decoding method as claimed in claim 4 is characterized in that, extracts from described incoming bit stream about the information of the contrafunctional coefficient of the coefficient of described HRTF or described HRTF.

6. coding/decoding method as claimed in claim 1, it is characterized in that described incoming bit stream comprises whether the described incoming bit stream of indication comprises that sign is used for carrying out described 3D and plays up at least one of the information whether information of filter information of wave filter of operation and the described filter information of indication specify the inverse filter of the wave filter that is used to generate 3D reduction audio signal.

7. coding/decoding method as claimed in claim 1, it is characterized in that described removal is included in the middle described 3D of execution of one of discrete Fourier transform (DFT) (DFT) territory, fast Fourier transform (FFT) territory, quadrature mirror filter (QMF)/hybrid domain and time domain and plays up operation.

8. coding/decoding method as claimed in claim 1 is characterized in that, also comprises the described 3D reduction of decoding audio signal.

9. coding/decoding method that recovers multi-channel signal, described coding/decoding method comprises:

Extract 3D reduction audio signal and spatial information from incoming bit stream;

Utilize described 3D reduction audio signal and described spatial information to generate multi-channel signal; And

By being played up operation, described multi-channel signal execution 3D comes to remove 3D effect from described multi-channel signal.

10. a coding has the coding method of the multi-channel signal of a plurality of sound channels, and described coding method comprises:

Described multi-channel signal is encoded into reduction audio signal with less sound channel;

Generation is about the spatial information of described a plurality of sound channels;

By being played up operation, described reduction audio signal execution 3D generates 3D reduction audio signal; And

Generation comprises the bit stream of described 3D reduction audio signal and described spatial information.

11. coding method as claimed in claim 10 is characterized in that, the generation of described 3D reduction audio signal comprises that utilizing HRTF to carry out described 3D plays up operation.

12. coding method as claimed in claim 11 is characterized in that, described bit stream comprises about the information of the coefficient of described HETF and about in the information of the contrafunctional coefficient of described HRTF at least one.

13. coding method as claimed in claim 10 is characterized in that, the generation of described 3D reduction audio signal is included in the middle described 3D of execution of one of DFT territory, FFT territory, QMF/ hybrid domain and time domain and plays up operation.

14. a coding has the coding method of the multi-channel signal of a plurality of sound channels, described coding method comprises:

Described multi-channel signal is carried out 3D play up operation;

To play up the multi-channel signal that obtains of operation by described 3D and be encoded into 3D reduction audio signal with less sound channel;

Generation is about the spatial information of described a plurality of sound channels; And

15. a decoding device that recovers multi-channel signal, described decoding device comprises:

The bit split cells, it extracts encoded 3D reduction audio signal and spatial information from incoming bit stream;

Reduction audio mixing demoder, it is decoded to encoded 3D reduction audio signal;

The 3D rendering unit, it is carried out 3D through the 3D of decoding reduction audio signal and plays up operation and come to remove 3D effect from described through the 3D of decoding reduction audio signal by what the decoding of being carried out by described reduction audio mixing demoder was obtained; And

Multi-channel decoder, it utilizes described spatial information and carries out the reduction audio signal of removing and obtaining by described 3D rendering unit and generates multi-channel signal.

16. a decoding device that recovers multi-channel signal, described decoding device comprises:

Multi-channel decoder, it utilizes described spatial information and carries out the 3D reduction audio signal of decoding and obtaining by described reduction audio mixing demoder and generates multi-channel signal; And

The 3D rendering unit, it comes to remove 3D effect from described multi-channel signal by described multi-channel signal execution 3D is played up operation.

17. a coding has the code device of the multi-channel signal of a plurality of sound channels, described code device comprises:

The multi-channel encoder device, it is encoded into reduction audio signal with less sound channel with described multi-channel signal and generates spatial information about described a plurality of sound channels;

The 3D rendering unit, it generates 3D reduction audio signal by described reduction audio signal execution 3D is played up operation;

Reduction audio mixing scrambler, its described 3D reduction audio signal of encoding; And

Bit packaged unit, its generation comprise the described encoded 3D reduction audio signal and the bit stream of described spatial information.

18. a coding has the code device of the multi-channel signal of a plurality of sound channels, described code device comprises:

The 3D rendering unit, it is carried out 3D to described multi-channel signal and plays up operation;

The multi-channel encoder device, it will be played up the multi-channel signal that obtains of operation by described 3D and be encoded into the 3D reduction audio signal with less sound channel and generate spatial information about described a plurality of sound channels;

19. the computer readable recording medium storing program for performing with computer program is used for each described coding/decoding method of enforcement of rights requirement 1 to 9 or each the described coding method in the claim 10 to 14.

20. a bit stream, it comprises:

Data field, it comprises the information about 3D reduction audio signal;

The filter information field, it comprises that sign is used to generate the filter information that described 3D reduces the wave filter of audio signal;

First header fields, it comprises whether the described filter information field of indication comprises the information of described filter information;

Second header fields, it comprises that the described filter information field of indication comprises the information of the coefficient or the coefficient of the inverse filter of described wave filter of described wave filter; And

The spatial information field, it comprises the spatial information about a plurality of sound channels.