CN101385075B

CN101385075B - Apparatus and method for encoding/decoding signal

Info

Publication number: CN101385075B
Application number: CN200780004505.3A
Authority: CN
Inventors: 郑亮源; 房熙锡; 吴贤午; 金东秀; 林宰显
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2006-02-07
Filing date: 2007-02-07
Publication date: 2015-04-22
Anticipated expiration: 2027-02-07
Also published as: CN101379555B; CN101379555A; CN101385076B; CN101379552A; CN101385076A; CN101379553A; CN101385077B; CN101379552B; CN101385077A; CN101379553B; CN101379554A; CN101385075A; CN101379554B

Abstract

An encoding method and apparatus and a decoding method and apparatus are provided. The decoding method includes extracting a three-dimensional (3D) down-mix signal and spatial information from an input bitstream, removing 3D effects from the 3D down-mix signal by performing a 3D rendering operation on the 3D down-mix signal, and generating a multi-channel signal using the spatial information and a down-mix signal obtained by the removal. Accordingly, it is possible to efficiently encode multi-channel signals with 3D effects and to adaptively restore and reproduce audio signals with optimum sound quality according to the characteristics of a reproduction environment.

Description

For the apparatus and method of encoding/decoding signal

Technical field

The present invention relates to coding/decoding method and coding/decoding device, particularly relate to and can audio signal make it possible to produce three-dimensional (3D) acoustic coding/decoding device, and utilize the coding/decoding method of this coding/decoding device.

Background technology

Multi-channel signal is reduced the signal that audio mixing becomes to have less sound channel by code device, and the signal through reducing audio mixing is sent to decoding device.Then, decoding device from through reducing the signal recuperation multi-channel signal of audio mixing, and use as 5.1 channel loudspeakers and so on three or more loudspeaker reproduction the multi-channel signal that recovered.

Multi-channel signal can be reproduced by 2 channel loudspeakers of such as earphone and so on.In this case, in order to make user feel by 2 channel loudspeakers export sound as reproducing from three or more sound source, be necessary to develop can encode or decoding multi-channel signal make it possible to generation 3D effect three-dimensional (3D) treatment technology.

Summary of the invention

Technical matters

The invention provides a kind of can by process expeditiously have 3D effect signal and in various reproducing environment the coding/decoding device of rendering multi-channel signal and coding/decoding method.

Technical solution

According to an aspect of the present invention, provide a kind of coding/decoding method of decoded signal, this coding/decoding method comprises: skip the extend information be included in incoming bit stream; Extract three-dimensional (3D) from incoming bit stream and reduce audio signal and spatial information; Come to reduce audio signal removal 3D effect from 3D by reducing audio signal execution 3D Rendering operations to 3D; And utilize by removing the reduction audio signal and spatial information generation multi-channel signal that obtain.

According to another aspect of the present invention, provide a kind of coding/decoding method of decoded signal, this coding/decoding method comprises: skip the extend information be included in incoming bit stream; Reduction audio signal and spatial information is extracted from incoming bit stream; And generate 3D reduction audio signal by performing 3D Rendering operations to reduction audio signal.

According to another aspect of the present invention, provide the coding method that a kind of coding has the multi-channel signal of multiple sound channel, this coding method comprises: multi-channel signal is encoded into the reduction audio signal with less sound channel and the spatial information generated about multiple sound channel; Generate the extend information of at least one comprised in channel expansion information and residual information; Generate the bit stream comprising spatial information and extend information; And the necessary information of skipping of extend information will be skipped insert this bit stream.

According to an aspect of the present invention, provide a kind of decoding device for decoded signal, this decoding device comprises: bit packing unit, and it is skipped to be included in extend information in incoming bit stream and to extract 3D from incoming bit stream and reduces audio signal and spatial information; 3D rendering unit, it comes to reduce audio signal removal 3D effect from 3D by reducing audio signal execution 3D Rendering operations to 3D; And multi-channel decoder, it utilizes by 3D rendering unit execution removal and the reduction audio signal of acquisition and spatial information generate multi-channel signal.

According to another aspect of the present invention, provide a kind of decoding device for decoded signal, this decoding device comprises: bit packing unit, and it is skipped and is included in extend information in incoming bit stream and extracts reduction audio signal and spatial information from incoming bit stream; And 3D rendering unit, it generates 3D reduction audio signal by performing 3D Rendering operations to reduction audio signal.

According to another aspect of the present invention, provide the code device that a kind of coding has the multi-channel signal of multiple sound channel, this code device comprises: multi-channel encoder, and multi-channel signal is encoded into the reduction audio signal with less sound channel and the spatial information generated about multiple sound channel by it; Extend information generation unit, it generates the extend information of at least one comprised in channel expansion information and residual information; And bit packaged unit, its generation comprises spatial information, extend information and skips the necessary bit stream skipping information of extend information.

According to another aspect of the present invention, a kind of computer readable recording medium storing program for performing had for performing the computer program of any one in above-mentioned coding/decoding method or above-mentioned coding method is provided.

Beneficial effect

According to the present invention, can encode efficiently and there is the multi-channel signal of 3D effect, and recover adaptively and reproducing audio signal with optimum tonequality according to the characteristic of reproducing environment.

Brief Description Of Drawings

Fig. 1 is the block diagram of coding/decoding device according to an embodiment of the invention;

Fig. 2 is the block diagram of code device according to an embodiment of the invention;

Fig. 3 is the block diagram of decoding device according to an embodiment of the invention;

Fig. 4 is the block diagram of code device according to another embodiment of the invention;

Fig. 5 is the block diagram of decoding device according to another embodiment of the invention;

Fig. 6 is the block diagram of decoding device according to another embodiment of the invention;

Fig. 7 is the block diagram of three-dimensional according to an embodiment of the invention (3D) rendering device;

Fig. 8 to 11 illustrates bit stream according to an embodiment of the invention;

Figure 12 is according to the block diagram of embodiments of the invention for the treatment of the coding/decoding device of any reduction audio signal;

Figure 13 is the block diagram reducing arbitrarily audio signal compensation/3D rendering unit according to an embodiment of the invention;

Figure 14 is according to the block diagram of embodiments of the invention for the treatment of the decoding device of compatible down-mix signal;

Figure 15 is the block diagram reducing mix compatibility process/3D rendering unit according to an embodiment of the invention; And

Figure 16 is for eliminating the block diagram of the decoding device of crosstalk according to embodiments of the invention.

Preferred forms of the present invention

Hereinafter with reference to illustrating that the accompanying drawing of exemplary embodiment of the present invention more fully describes the present invention.Fig. 1 is the block diagram of coding/decoding device according to an embodiment of the invention.With reference to figure 1, coding unit 100 comprises multi-channel encoder 110, three-dimensional (3D) rendering unit 120, reduction audio mixing scrambler 130 and bit packaged unit 140

The multi-channel information reduction audio mixing with multiple sound channel is become the reduction audio signal of such as stereo or monophonic signal and so on by multi-channel encoder 110, and generates the spatial information about the sound channel of this multi-channel signal.Spatial information is needed to be to recover multi-channel signal from reduction audio signal.

The example of spatial information comprises: indicate the levels of channels difference (CLD) of the difference of the energy level of a pair sound channel, channel prediction coefficient (CPC)---namely for generating the sound channel mistiming (CTD) in the time interval between the predictive coefficient of 3 sound channel signals, the inter-channel correlation (ICC) indicating the correlativity between a pair sound channel and a pair sound channel based on 2 sound channel signals.

3D rendering unit 120 generates 3D based on reduction audio signal and reduces audio signal.It can be 2 sound channel signals with three or more directivity that 3D reduces audio signal, therefore can be reproduced by 2 channel loudspeakers of such as earphone and so on and be had 3D effect.In other words, 3D reduces audio signal and can be reproduced by 2 channel loudspeakers, make user feel 3D reduce audio signal seem from the sound source with three or more sound channel reproduce the same.The direction of sound source can be determined based at least one in the difference of the phase place of the time interval be input to respectively between the difference of intensity of two sound of two ears, two sound and two sound.Therefore, how 3D rendering unit 120 can utilize the 3D position of its sense of hearing determination sound source to convert reduction audio signal to 3D reduction audio signal based on the mankind.

3D rendering unit 120 reduces audio signal by utilizing filter filtering to reduce audio signal to generate 3D.In this case, can by external source by wave filter relevant information---as filter coefficient is input to 3D rendering unit 120.3D rendering unit 120 can utilize the spatial information provided by multi-channel encoder 110 to generate 3D reduction audio signal based on reduction audio signal.More specifically, 3D rendering unit 120 is by utilizing spatial information to convert reduction audio signal the multi-channel signal of imagination to and reduction audio signal to be converted to 3D reduction audio signal by the multi-channel signal of this imagination of filtering.

3D rendering unit 120 reduces audio signal by utilizing head related transfer function (HRTF) filter filtering to reduce audio signal to generate 3D.

HRTF is a kind of transport function, and it describes the transmission of sound wave between the sound source of optional position and ear-drum, and the value of the direction returned according to sound source and height change.If utilize HRTF filtering not have directive signal, then can hear that this signal is as reproducing from certain direction.

3D rendering unit 120 can perform 3D Rendering operations in the frequency domain in such as discrete Fourier transform (DFT) (DFT) territory or fast fourier transform (FFT) territory and so on.In this case, 3D rendering unit 120 can perform DFT or FFT before 3D Rendering operations, or can perform inverse DFT (IDFT) or inverse FFT (IFFT) after 3D Rendering operations.

3D rendering unit 120 can perform 3D Rendering operations in quadrature mirror filter (QMF)/hybrid domain.In this case, 3D rendering unit 120 can perform QMF/ hybrid analysis and synthetic operation before or after 3D Rendering operations.

3D rendering unit 120 can perform 3D Rendering operations in the time domain.3D rendering unit 120 can be determined to perform 3D Rendering operations by which territory according to the functipnal capability of required tonequality and coding/decoding device.

The reduction audio signal that reduction audio mixing scrambler 130 coding multi-channel encoder 110 exports or the 3D exported by 3D rendering unit 120 reduce audio signal.Reduction audio mixing scrambler 130 can utilize such as advanced audio decoding (AAC) method, the audio coding method of MPEG layer 3 (MP3) method or bit slice algorithm decoding (BSAC) method and so on carrys out the reduction audio signal that coding multi-channel encoder 110 exports or the 3D reduction audio signal exported by 3D rendering unit 120.

The non-3D of reduction audio mixing scrambler 130 codified reduces audio signal or 3D reduces audio signal.In this case, encoded non-3D reduces audio signal and encoded 3D reduction audio signal both can be included in bit stream to be transmitted.

Bit packaged unit 140 is based on spatial information and or encoded non-3D reduces audio signal or encoded 3D reduction audio signal generates bit stream.

The bit stream generated by bit packaged unit 140 can comprise spatial information, the instruction reduction audio signal comprised in the bitstream is that non-3D reduces audio signal or 3D reduces the reduction audio mixing identification information of audio signal and identifies the information (such as, HRTF coefficient information) of the wave filter used by 3D rendering unit 120.

In other words, the bit stream generated by bit packaged unit 140 can comprise and also reduces audio signal without the non-3D of 3D process and operate at least one in the scrambler 3D reduction audio signal of acquisition by the 3D process performed by code device and identify the reduction audio mixing identification information of type of the reduction audio signal comprised in the bitstream.

Can to select by user or the ability of coding/decoding device according to Fig. 1 and the characteristic of reproducing environment determine that non-3D reduces that audio signal and scrambler 3D reduce in audio signal which will be included in the bit stream that generated by bit packaged unit 140.

HRTF coefficient information can comprise the contrafunctional coefficient of the HRTF used by 3D rendering unit 120.HRTF coefficient information only can comprise the brief information of the coefficient of the HRTF used by 3D rendering unit 120, such as, and the envelope information of HRTF coefficient.If will comprise the bit stream of the contrafunctional coefficient of HRTF to decoding device, then decoding device does not need to perform the operation of HRTF coefficients conversion, therefore can reduce the calculated amount of decoding device.

The bit stream generated by bit packaged unit 140 also can comprise the information about the energy variation in the signal caused by the filtering based on HRTF, that is, about will filtering the energy of signal and the difference of the energy of the signal of filtering or will the energy of signal of filtering and the information of the ratio of the energy of the signal of filtering.

The bit stream generated by bit packaged unit 140 also can comprise the information indicating it whether to comprise HRTF coefficient.If HRTF coefficient is included in the bit stream generated by bit packaged unit 140, then this bit stream also can comprise the information indicating it to comprise the coefficient of the HRTF used by 3D rendering unit 120 or the contrafunctional coefficient of HRTF.

With reference to figure 1, the first decoding unit 200 comprises bit packing unit 210, reduction audio mixing demoder 220,3D rendering unit 230 and multi-channel decoder 240.

Bit packing unit 210 receives incoming bit stream from coding unit 100, and from this incoming bit stream, extract encoded reduction audio signal and spatial information.Reduction audio mixing demoder 220 is decoded to encoded reduction audio signal.Reduction audio mixing demoder 220 can utilize the audio signal decoding method of such as AAC method, MP3 method or BSAC method and so on to decode to encoded reduction audio signal.

As mentioned above, the encoded reduction audio signal extracted from incoming bit stream can be that encoded non-3D reduces audio signal or encoded, scrambler 3D reduces audio signal.The encoded reduction audio signal extracted from incoming bit stream is indicated to be that encoded non-3D reduces audio signal or encoded, that scrambler 3D reduces audio signal information can be included in incoming bit stream.

If the encoded reduction audio signal extracted from incoming bit stream is scrambler 3D reduce audio signal, then encoded reduction audio signal easily can be reproduced after being decoded by reduction audio mixing demoder 220.

On the other hand, if the encoded reduction audio signal extracted from incoming bit stream is non-3D reduce audio signal, then encoded reduction audio signal can be decoded by reduction audio mixing demoder 220, and converts demoder 3D reduction audio signal by the reduction audio signal that decoding obtains to by the 3D Rendering operations performed by the 3rd rendering unit 233.Demoder 3D reduces audio signal and can easily be reproduced.

3D rendering unit 230 comprises the first renderer 231, second renderer 232 and the 3rd renderer 233.First renderer 231 generates reduction audio signal by reducing audio signal execution 3D Rendering operations to the scrambler 3D provided by reduction audio mixing demoder 220.Such as, the first renderer 231 generates non-3D reduction audio signal by reducing audio signal removal 3D effect from scrambler 3D.The 3D effect that scrambler 3D reduces audio signal may not be removed completely by the first renderer 231.In this case, the reduction audio signal exported by the first renderer 231 can have identical 3D effect.

The reduction audio signal that 3D effect can be removed from it by the first renderer 231 by the inverse filter that the 3D reduction audio signal provided by reduction audio mixing demoder 220 converts the wave filter using the 3D rendering unit 120 of coding unit 100 to use to.Can be included in incoming bit stream about the wave filter used by 3D rendering unit 120 or the information of the inverse filter of wave filter that used by 3D rendering unit 120.

The wave filter used by 3D rendering unit 120 can be hrtf filter.In this case, the coefficient of the HRTF used by coding unit 100 or the contrafunctional coefficient of HRTF also can be included in incoming bit stream.If the coefficient of the HRTF used by cell encoder 100 is included in incoming bit stream, then HRTF coefficient can be reversed and change, and can use the result of this inverse conversion during the 3D Rendering operations performed by the first renderer 231.If the contrafunctional coefficient of the HRTF used by coding unit 100 is included in incoming bit stream, then they easily can use during the 3D Rendering operations performed by the first renderer 231, and do not carry out any inverse conversion operation.In this case, the calculated amount of the first decoding device 100 can be reduced.

Incoming bit stream also can comprise filter information (such as, indicating the coefficient of the HRTF used by coding unit 100 whether to be included in information in incoming bit stream) and indicate this filter information whether to be reversed the information of changing.

Multi-channel decoder 240 generates the 3D multi-channel signal with three or more sound channel based on the reduction audio signal and the spatial information that extracts from incoming bit stream removing 3D effect from it.

Second renderer 232 generates the 3D reduction audio signal with 3D effect by performing 3D Rendering operations to the reduction audio signal removing 3D effect from it.In other words, the first renderer 231 removes 3D effect from the scrambler 3D reduction audio signal provided by reduction audio mixing demoder 220.Afterwards, second renderer 232 can utilize the wave filter of the first decoding device, and the combination 3D with 3D effect generated desired by the first decoding device 200 by the reduction audio signal execution 3D Rendering operations obtained being performed removal by the first renderer 231 reduces audio signal.

First decoding device 200 can comprise two or more the renderer be wherein combined with in first, second, and third renderer 231,232 and 233 performing same operation.

The bit stream generated by coding unit 100 can be imported into second decoding device 300 with the structure different from the first decoding device 200.Second decoding device 300 can generate 3D reduction audio signal based on being included in the reduction audio signal in the bit stream of its input.

More specifically, the second decoding device 300 comprises bit packing unit 310, reduction audio mixing demoder 320 and 3D rendering unit 330.Bit packing unit 310 receives incoming bit stream from coding unit 100, and from this incoming bit stream, extract encoded reduction audio signal and spatial information.Reduction audio mixing demoder 320 is decoded to encoded reduction audio signal.3D rendering unit 330 performs 3D Rendering operations to the reduction audio signal through decoding, makes the reduction audio signal through decoding can be converted into 3D and reduces audio signal.

Fig. 2 is the block diagram of code device according to an embodiment of the invention.With reference to figure 2, this code device comprises rendering unit 400 and 420 and multi-channel encoder 410.The detailed description of the cataloged procedure identical with the embodiment of Fig. 1 will be omitted.

With reference to figure 2,3D rendering unit 400 and 420 can be separately positioned on the front and back of multi-channel encoder 410.Therefore, multi-channel signal can carry out 3D by 3D rendering unit 400 and play up, and then, the multi-channel signal played up through 3D can be encoded by multi-channel encoder 410, thus generates pretreated, scrambler 3D and reduce audio signal.Or multi-channel signal can carry out reduction audio mixing by multi-channel encoder 410, then, 3D can be carried out by 3D rendering unit 420 play up through reducing the signal of audio mixing, thus generate through aftertreatment, scrambler reduction audio signal.

Instruction multi-channel signal is before reduction audio mixing or carrying out the information that 3D plays up after reduction audio mixing can be included in bit stream to be transmitted.

3D rendering unit 400 and 420 both can be arranged on before multi-channel encoder 410 or below.

Fig. 3 is the block diagram of decoding device according to an embodiment of the invention.With reference to figure 3, this decoding device comprises 3D rendering unit 430 and 450 and multi-channel decoder 440.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

With reference to figure 3,3D rendering unit 430 and 450 can be separately positioned on the front and back of multi-channel decoder 440.3D rendering unit 430 can be reduced audio signal from scrambler 3D and be removed 3D effect, and is input to multi-channel decoder 430 by by removing the reduction audio signal obtained.Then, the reduction audio signal that multi-channel decoder 430 decodable code inputs to it, thus generate pretreated 3D multi-channel signal.Or multi-channel decoder 430 can reduce audio signal from encoded 3D and recover multi-channel signal, and 3D rendering unit 450 can remove 3D effect from recovered multi-channel signal, thus generates the 3D multi-channel signal through aftertreatment.

Generated by execution 3D Rendering operations and reduction mixing operation subsequently if the scrambler 3D provided by code device reduces audio signal, then scrambler 3D reduction audio signal is decoded by execution multi-channel decoding operation and 3D Rendering operations subsequently.On the other hand, if scrambler 3D reduction audio signal is generated by execution reduction mixing operation and 3D Rendering operations subsequently, then scrambler 3D reduction audio signal is decoded by execution 3D Rendering operations and multi-channel decoding operation subsequently.

Extracting the encoded 3D of instruction the bit stream that can transmit from code device, to reduce audio signal be by before reduction mixing operation or the information performing 3D Rendering operations and obtain after reduction mixing operation.

3D rendering unit 430 and 450 both can be arranged on before multi-channel decoder 440 or below.

Fig. 4 is the block diagram of code device according to another embodiment of the invention.With reference to figure 4, code device comprises multi-channel encoder 500,3D rendering unit 510, reduction audio mixing scrambler 520 and bit packaged unit 530.The detailed description of the cataloged procedure identical with the embodiment of Fig. 1 will be omitted.

With reference to figure 4, multi-channel encoder 500 generates reduction audio signal and spatial information based on input multi-channel signal.3D rendering unit 510 generates 3D reduction audio signal by performing 3D Rendering operations to reduction audio signal.

Can select by user or determine whether according to the ability of code device, the characteristic of reproducing environment or required tonequality to perform 3D Rendering operations to reduction audio signal.

The reduction audio signal that reduction audio mixing scrambler 520 coding multi-channel encoder 500 generates or the 3D generated by 3D rendering unit 510 reduce audio signal.

Bit packaged unit 530 based on spatial information and or encoded reduction audio signal or encoded, scrambler 3D reduce audio signal and generate bit stream.The bit stream generated by bit packaged unit 530 can comprise the reduction audio mixing identification information of the scrambler 3D reduction audio signal indicating the encoded reduction audio signal comprised in the bitstream not have the non-3D of 3D effect to reduce audio signal or to have 3D effect.More specifically, reduce audio mixing identification information and can indicate whether the bit stream generated by bit packaged unit 530 comprises non-3D reduction audio signal, scrambler 3D reduces audio signal or both.

Fig. 5 is the block diagram of decoding device according to another embodiment of the invention.With reference to figure 5, decoding device comprises bit packing unit 540, reduction audio mixing demoder 550 and 3D rendering unit 560.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

With reference to figure 5, bit packing unit 540 extracts encoded reduction audio signal, spatial information and reduction audio mixing identification information from incoming bit stream.Reduction audio mixing identification information indicates encoded reduction audio signal to be that the encoded non-3D without 3D effect reduces audio signal or has the encoded 3D reduction audio signal of 3D effect.

If incoming bit stream comprises non-3D reduce audio signal and 3D reduction audio signal, then only can select or extract from incoming bit stream according to the ability of decoding device, the characteristic of reproducing environment or required tonequality non-3D by user and reduce one of audio signal and 3D reduction audio signal.

Reduction audio mixing demoder 550 is decoded to encoded reduction audio signal.If the reduction audio signal obtained by the decoding performed by reduction audio mixing demoder 550 is the scrambler 3D obtained by performing 3D Rendering operations reduce audio signal, then this reduction audio signal can easily be reproduced.

On the other hand, if the reduction audio signal obtained by the decoding performed by reduction audio mixing demoder 550 is the reduction audio signal without 3D effect, then 3D rendering unit 560 is by performing 3D Rendering operations and reduce audio signal to being performed reduction audio signal that decoding obtains by reduction audio mixing demoder 550 to generate demoder 3D.

Fig. 6 is the block diagram of decoding device according to another embodiment of the invention.With reference to figure 6, decoding device comprises bit packing unit 600, reduction audio mixing demoder the 610, the one 3D rendering unit 620, the 2nd 3D rendering unit 630 and filter information storage unit 640.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

Bit packing unit 600 extracts encoded, scrambler 3D and reduces audio signal and spatial information from incoming bit stream.Reduction audio mixing demoder 610 reduces audio signal decode to encoded, scrambler 3D.

One 3D rendering unit 620 uses the inverse filter of the wave filter of the code device for performing 3D Rendering operations, and the scrambler 3D obtained from the decoding performed by reduction audio mixing demoder 610 reduces audio signal removal 3D effect.The reduction audio signal that second rendering unit 630 utilizes the wave filter be stored in decoding device to pass through to obtain being performed removal by a 3D rendering unit 620 performs 3D Rendering operations and generates the combination 3D reduction audio signal with 3D effect.

2nd 3D rendering unit 630 can utilize the wave filter that its characteristic is different from the wave filter of the coding unit for performing 3D Rendering operations to perform 3D Rendering operations.Such as, the HRTF that the coefficient of the 2nd 3D rendering unit 630 HRTF that its coefficient and code device can be utilized to use is different performs 3D Rendering operations.

Filter information storage unit 640 stores about the filter information for performing the wave filter that 3D plays up, such as, and HRTF coefficient information.2nd 3D rendering unit 630 can utilize the filter information be stored in filter information storage unit 640 to generate combination 3D reduction audio mixing.

Filter information storage unit 640 can store many filter informations corresponding respectively to multiple wave filter.In this case, can select by user or select one of many filter informations according to the ability of decoding device or required tonequality.

Different ear structures can be had from not agnate people.Therefore, the HRTF coefficient optimized for Different Individual can be different from each other.The 3D that decoding device shown in Fig. 6 can generate for user optimization reduces audio signal.In addition, what the type of the HRTF no matter decoding device shown in Fig. 6 can be provided by 3D reduction audio signal supplier is, and the generation 3D with 3D effect corresponding with the hrtf filter desired by user reduces audio signal.

Fig. 7 is the block diagram of 3D rendering device according to an embodiment of the invention.The first and second territory converting unit 700 and 720 and 3D rendering unit 710 are comprised with reference to figure 7,3D rendering device.In order to perform 3D Rendering operations in predetermined territory, the first and second territory converting units 700 and 720 can be separately positioned on the front and back of 3D rendering unit 710.

With reference to figure 7, input reduction audio signal can convert frequency domain to by the first territory converting unit 700 and reduce audio signal.More specifically, the first territory converting unit 700 converts input reduction audio signal to DFT territory reduction audio signal or FFT territory reduction audio signal by performing DFT or FFT.

3D rendering unit 710 generates multi-channel signal by spatial information being put on the frequency domain reduction audio signal provided by the first territory converting unit 700.Afterwards, 3D rendering unit 710 generates 3D reduction audio signal by filtering multi-channel signal.

The 3D generated by 3D rendering unit 710 reduces audio signal and converts time domain 3D reduction audio signal to by the second territory converting unit 720.More specifically, the second territory converting unit 720 can reduce audio signal execution IDFT or IFFT to the 3D generated by 3D rendering unit 710.

During frequency domain 3D reduction audio signal converts time domain 3D reduction audio signal to, loss of data or the data distortion of aliasing and so on may be there is.

In order to generate multi-channel signal in frequency domain and 3D reduces audio signal, the spatial information of each parameter band can be mapped to frequency domain, and multiple filter coefficient can be converted to frequency domain.

3D rendering unit 710 to be multiplied with filter coefficient by frequency domain reduction audio signal, the spatial information making the first territory converting unit 700 and provide and to generate 3D and reduce audio signal.

The time-domain signal being multiplied with multiple filter coefficient by the reduction audio signal that makes all to represent in M point frequency domain, spatial information and being obtained has M useful signal.In order to represent reduction audio signal, spatial information and filter coefficient in M point frequency domain, M point DFT or M point FFT can be performed.

Useful signal is the signal not necessarily with 0 value.Such as, x useful signal is altogether generated by obtaining x signal via sampling from sound signal.In this x useful signal, y useful signal is by zero padding.Then, the decreased number of useful signal is to (x-y).Afterwards, there is the signal of a useful signal and there is the signal of b useful signal by convolution, thus obtaining (a+b-1) individual useful signal altogether.

Audio signal, being multiplied to provide and reducing audio signal, effect that spatial information is identical with filter coefficient with convolution in the time domain of spatial information and filter coefficient are provided in M point frequency domain.The signal with (3*M-2) individual useful signal is by being converted to time domain by reduction audio signal, spatial information and the filter coefficient in M point frequency domain and the result of this conversion of convolution generates.

Therefore, by making the reduction audio signal in frequency domain, spatial information be multiplied with filter coefficient and the number of useful signal in the signal obtained to time domain by the results conversion be multiplied may be different from the number of the useful signal in the signal obtained by reduction audio signal, spatial information and the filter coefficient in convolution time domain.As a result, 3D in a frequency domain reduces during audio signal is converted to time-domain signal and aliasing can occur.

In order to prevent aliasing, the number of the useful signal of the reduction audio signal in time domain, the summation being mapped to the number of the useful signal of the spatial information of frequency domain and the number of filter coefficient can not be greater than M.The number being mapped to the useful signal of the spatial information of frequency domain can be determined according to the number of the point of frequency domain.In other words, if be mapped to N point frequency domain to the spatial information that each parameter band represents, then the number of the useful signal of spatial information can be N.

With reference to figure 7, the first territory converting unit 700 comprises the first zero padding unit 701 and the first frequency domain converting unit 702.3rd rendering unit 710 comprises map unit 711, time domain converting unit 712, second zero padding unit 713, second frequency domain converting unit 714, multi-channel signal generation unit 715, the 3rd zero padding unit 716, the 3rd frequency domain converting unit 717 and 3D and reduces audio signal generation unit 718.

The reduction audio signal with X sample in first zero padding unit 701 pairs time domain performs zero padding operation and makes the number of samples reducing audio signal can be increased to M from X.Zero padding is reduced audio signal and converts M point frequency-region signal to by the first frequency domain converting unit 702.Reduction audio signal through zero padding has M sample.In M sample of the reduction audio signal through zero padding, only X sample is useful signal.

The spatial information of each parameter band is mapped to N point frequency domain by map unit 711.The spatial information that the mapping performed by map unit 711 is obtained by time domain converting unit 712 is converted to time domain.The spatial information obtained by time domain converting unit 712 execution conversion has N number of sample.

The spatial information with N number of sample in second zero padding unit 713 pairs time domain performs zero padding operation, makes the number of samples of spatial information can be increased to M from N.Spatial information through zero padding is converted to M point frequency-region signal by the second frequency domain converting unit 714.Spatial information through zero padding has N number of sample.In N number of sample of the spatial information through zero padding, only individual sample is effective.

Multi-channel signal generation unit 715 to be multiplied with the spatial information that the second frequency domain converting unit 714 provides generation multi-channel signal by the reduction audio signal that makes the first frequency domain converting unit 712 and provide.The multi-channel signal generated by multi-channel signal generation unit 715 has M useful signal.On the other hand, the multi-channel signal that the reduction audio signal provided by the first frequency domain converting unit 712 by convolution in the time domain and the spatial information provided by the second frequency domain converting unit 714 are obtained has (X+N-1) individual useful signal.

3rd zero padding unit 716 can perform zero padding operation to the Y filter coefficient represented in time domain, makes the number of sample to be increased to M.Zero padding filter coefficient is converted to M point frequency domain by the 3rd frequency domain converting unit 717.Filter coefficient through zero padding has M sample.In M sample, only Y sample is useful signal.

3D is reduced audio signal generation unit 718 and to be multiplied with multiple filter coefficients that the 3rd frequency domain converting unit 717 provides by the multi-channel signal that generated by multi-channel signal generation unit 715 and to generate 3D and reduce audio signal.The 3D generated by 3D reduction audio signal generation unit 718 reduces audio signal and has M useful signal.On the other hand, the 3D that the multi-channel signal generated by multi-channel signal generation unit 715 by convolution in the time domain and the filter coefficient provided by the 3rd frequency domain converting unit 717 are obtained reduces audio signal and has (X+N+Y-2) individual useful signal.

Be arranged to meet following equation by the M point frequency domain that first, second, and third frequency domain converting unit 702 is used: M >=(X+N+Y-2) prevents aliasing from being possible.In other words, likely following equation is met by making first, second, and third frequency domain converting unit 702,714 and 717 perform: M point DFT or the M point FFT of M >=(X+N+Y-2) prevents aliasing.

Conversion to frequency domain can utilize the bank of filters except DFT bank of filters, fft filters group and QMF group to perform.The generation that 3D reduces audio signal can utilize hrtf filter to perform.

The number of the useful signal of spatial information can utilize method in addition to the method described above to regulate, and most effective in said method and a kind of method needing calculated amount minimum maybe can be utilized to regulate.

Not only signal, coefficient or spatial information to be converted to from frequency domain time domain or conversely during, and signal, coefficient or spatial information to be converted to from QMF territory hybrid domain or conversely during, all can there is aliasing.Above-mentioned prevent the method for aliasing be also used in signal, coefficient or spatial information from QMF territory be converted to hybrid domain or conversely during prevent aliasing from occurring.

Spatial information for generating multi-channel signal or 3D reduction audio signal can change.As the result that spatial information changes, the signal that can occur as noise is in the output signal discontinuous.

Noise in output signal can utilize smoothing method to reduce, and can prevent spatial information Rapid Variable Design by this smoothing method.

Such as, when the first spatial information putting on the first frame when the first frame and the second frame are adjacent one another are is different from the spatial information putting on the second frame, occur discontinuous between the first frame and the second frame most probably.

In this case, the first spatial information can be utilized to compensate second space information or utilize second space information compensation first spatial information, the difference of the first spatial information and second space information can be reduced, thus can be reduced by the discontinuous noise caused between the first and second frames.More specifically, at least one in the first spatial information and second space information can replace with the mean value of the first spatial information and second space information, thus reduces noise.

Noise also probably generates due to discontinuous between a pair adjacent parameter frequency band.Such as, when the 3rd spatial information corresponding to the first parameter band when the first and second parameter band are adjacent one another are is different with corresponding to the 4th spatial information of the second parameter band, discontinuously may to occur between the first and second parameter band.

In this case, the 4th spatial information can be utilized to compensate the 3rd spatial information or utilize the 3rd spatial information to compensate the 4th spatial information, the difference of the 3rd spatial information and the 4th spatial information can be reduced, and can be reduced by the discontinuous noise caused between the first and second parameter band.More specifically, at least one in the 3rd spatial information and the 4th spatial information can replace with the mean value of the 3rd spatial information and the 4th spatial information, thus reduces noise.

Method in addition to the method described above can be utilized reduce by the discontinuous noise caused between a pair consecutive frame or between a pair adjacent parameter frequency band.

More specifically, each frame can be multiplied by the window of peaceful (Hanning) window of the such as Chinese and so on, and " overlapping and interpolation " scheme can be put on the result of multiplication, the change between frame can be reduced.Or the output signal being applied with many spatial informations can by smoothly, make the change between multiple frames that can prevent from outputing signal.

The spatial information of such as ICC and so on can be utilized the decorrelation between the sound channel in DFT territory to be regulated as follows.

Predetermined value is multiplied by regulate the degree of decorrelation by making the coefficient of the signal being input to one to two (OTT) or two to three (TTT) frame.Predetermined value can be limited by following equation: (A+ (1-A*A) ^0.5*i), and wherein A instruction puts on the ICC value of the predetermined frequency band of OTT or TTT frame, and i indicates imaginary part.Imaginary part can be positive or negative.

Predetermined value can with the weighting factor according to the characteristic of signal, the type of the frame of the energy level of characteristic such as signal of signal, the energy response of the signal of each frequency or applying ICC value A.As introducing the result of weighting factor, the degree of decorrelation can be regulated further, and can level and smooth or interpolation method between application of frame.

Described by above reference diagram 7, be converted to the HRTF of frequency domain or header coherent pulse response (HRIR) generates 3D reduction audio signal in a frequency domain by using.

Or 3D reduces audio signal by convolution HRIR and reduction audio signal generate in the time domain.The 3D generated in frequency domain can be reduced audio signal to stay in a frequency domain, and not carry out inverse domain conversion.

In order to convolution HRIR in the time domain and reduction audio signal, finite impulse response (FIR) (FIR) wave filter or infinite impulse response (IIR) wave filter can be used.

As mentioned above, can utilize according to the encoding apparatus and decoding apparatus of the embodiment of the present invention the first method relating to the HRIR using the HRTF in frequency domain or be converted to frequency domain, the combination of the second method or the first and second methods that relate to convolution HRIR in the time domain generates 3D and reduce audio signal.

Fig. 8 to 11 illustrates bit stream according to an embodiment of the invention.

With reference to figure 8, bit stream comprises: comprise the multi-channel decoding information field generating multi-channel signal information needed, the 3D spatial cue field comprising generation 3D reduction audio signal information needed and comprise the information that utilization is included in multi-channel decoding information field and the header fields being included in the header information needed for information in 3D spatial cue field.Bit stream can only comprise in multi-channel decoding information field, 3D spatial cue field and header fields one or two.

With reference to figure 9, the bit stream containing the necessary supplementary of decode operation can comprise: comprise the customized configuration header fields of the header information of whole coded signal and comprise multiple frame data field of the supplementary about multiple frame.More specifically, each frame data field can comprise: the frame parameter data field of the frame header fields comprising the header information of respective frame and the spatial information comprising respective frame.Or each in frame data field only can comprise frame parameter data field.

Each in frame parameter data field can comprise multiple module, and each module comprises mark and supplemental characteristic.Module comprises the supplemental characteristic of such as spatial information and so on and such as reduces the data set of downmix gain and smoothed data and so on raising signal tonequality other data necessary.

If when without any the module data received when additional mark about the information of being specified by frame header fields, if the information of being specified by frame header fields is classified further, if or received additional mark and data together with not by the information that frame header is specified, then module data could not comprise any mark.

Reduce the supplementary of audio signal about 3D, such as HRTF coefficient information, can be included at least one in customized configuration header fields, frame header fields and frame parameter data field.

With reference to Figure 10, bit stream can comprise: comprise the multiple multi-channel decoding information field generating the necessary information of multi-channel signal and the multiple 3D spatial cue fields comprising the necessary information of generation 3D reduction audio signal.

When receiving bit stream, decoding device can use multi-channel decoding information field or 3D spatial cue field to perform decode operation and to skip any multi-channel decoding information field of not using in decode operation and 3D spatial cue field.Which in this case, will be able to be used to perform decode operation according in the type determination multi-channel decoding information field of the signal that will reproduce and 3D spatial cue field.

In other words, in order to generate multi-channel signal, decoding device can skip 3D spatial cue field, and reads the information be included in multi-channel decoding information field.On the other hand, reduce audio signal to generate 3D, decoding device can skip multi-channel decoding information field, and reads the information be included in 3D spatial cue field.

The method of some skipped in the multiple fields in bit stream is as follows.

First, the field length information about the bit size of field can be included in the bitstream.In this case, by skipping bit number corresponding to field bit size to skip this field.Field length information can be arranged on the beginning of field.

The second, synchronization character can be arranged on end or the beginning of field.In this case, this field is skipped by the position location field based on synchronization character.

3rd, if determine in advance and secure the length of field, then by skipping data volume corresponding to the length of this field to skip this field.Fixed field length information about field length can be comprised in the bitstream or is stored in decoding device.

4th, the combination of two or more in above-mentioned field skipping method can be utilized to skip one of multiple field.

The such as field of field length information, synchronization character or fixed field length information and so on is skipped information and is to skip the necessary information of field, can be included in the customized configuration header fields shown in Fig. 9, one of frame header fields and frame parameter data field, maybe can be included in the field beyond field shown in Fig. 9.

Such as, in order to generate multi-channel signal, decoding device can refer to the field length information of the beginning being arranged on each 3D spatial cue field, synchronization character or fixed field length information and skips 3D spatial cue field, and reads the information be included in multi-channel decoding information field.

On the other hand, audio signal is reduced in order to generate 3D, decoding device can refer to the field length information of the beginning being arranged on each multi-channel decoding information field, synchronization character or fixed field length information and skips multi-channel decoding information field, and reads the information be included in 3D spatial cue field.

It is that generation multi-channel signal is necessary or generating 3D reduces the necessary information of audio signal that bit stream can comprise the instruction data comprised in this bitstream.

But, even if bit stream does not comprise any spatial information of such as CLD and so on, and only comprise the necessary data of generation 3D reduction audio signal (such as, hrtf filter coefficient), also carry out rendering multi-channel signal by utilizing generation 3D to reduce the necessary decoding data of audio signal, and do not need spatial information.

Such as, the stereo parameter as the spatial information about two sound channels is obtained from reduction audio signal.Then, stereo parameter is converted to the spatial information about multiple sound channels to be reproduced, and generate multi-channel signal by the spatial information obtained by conversion being put on reduction audio signal.

On the other hand, even if bit stream only comprises generate the necessary data of multi-channel signal, also can reproduce reduction audio signal and not need additional decoding operations, or reducing audio signal by utilizing additional hrtf filter to perform 3D process to reduction audio signal to reproduce 3D.

If bit stream comprises generating the necessary data of multi-channel signal and generating 3D reduce the necessary data of audio signal, then user can be allowed to determine rendering multi-channel signal or 3D reduction audio signal.

The method skipping data will be described in detail hereinafter with reference to respective corresponding syntax.

Syntax 1 indicates the method for decoded audio signal in units of frame.

[syntax 1]

SpatialFrame()
	{
Framinglnfo()；
	bslndependencyFIag；
OttData()；

TttData()；
	SmgData()；
TempShapeData()；
	if[bsArbitraryDownmix){
ArbitraryDownmixData()；
	}
if(bsResidualCoding){
	ResidualData()；
}
	}

In syntax 1, Ottdata () and TttData () represents the module recovering the necessary parameter of multi-channel signal (such as comprising the spatial information of CLD, ICC and CPC) from reduction audio signal, and SmgData (), TempShapeData (), ArbitraryDownmixData () and ResidualData () be represent operated by correction coding during contingent distorted signals improve the module of the necessary information of tonequality.

Such as, if the parameter of such as CLD, ICC or CPC and so on and the information be included in modules A rbitraryDownmixData () only use during decode operation, then it is unrequired for being arranged on module SmgData () between module TttData () and ArbitraryDownmixData () and TempShapeData ().Therefore, module SmgData () is skipped and TempShapeData () is efficient.

The method skipping module is according to an embodiment of the invention described in detail hereinafter with reference to following syntax 2.

[syntax 2]

:
	TttData()；
SkipData(){
	bsSkipBits；
}
	SmgData()；
TempShapeData()；
	if[bsArbitraryDownmix){
ArbitraryDownmixData()；
	}
:

With reference to syntax 2, module SkipData () can be arranged on before the module that is skipped, and the bit size of the module be skipped is designated as bsSkipBits in module SkipData ().

In other words, suppose that module SmgData () and TempShapeData () will be skipped, and the module SmgData () of combination and the bit size of TempShapeData () are 150, then by bsSkipBits being set to 150 to skip module SmgData () and TempShapeData ().

The method skipping module is according to another embodiment of the invention described in detail hereinafter with reference to syntax 3.

[syntax 3]

:
	TttData()；
bsSkipSyncflag；
	SmgData()；
TempShapeData()；
	bsSkipSyncword；
if[bsArbitraryDownmix){
	ArbitraryDownmixData()；
}
	:

With reference to figure 3, unnecessary module is skipped by using bsSkipSyncflag and bsSkipSyncword, bsSkipSyncflag is the mark indicating whether to use synchronization character, and bsSkipSyncword can be arranged on the synchronization character by the end of the module be skipped.

More specifically, if mark bsSkipSyncflag is arranged so that synchronization character can use, then indicate between bsSkipSyncflag and synchronization character bsSkipSyncword one or more modules---namely module SmgData () and TempShapeData () can be skipped.

With reference to Figure 11, bit stream can comprise: comprise the multichannel header fields of the necessary header information of rendering multi-channel signal, comprise reproduce that 3D that 3D reduces the necessary header information of audio signal plays up header fields and comprise rendering multi-channel signal institute must multiple multi-channel decoding information fields of data.

In order to rendering multi-channel signal, decoding device can skip 3D and plays up header fields, and reads data from multichannel header fields and multi-channel decoding information field.

Skipping 3D, to play up the method for header fields identical with above field skipping method described in reference diagram 10, therefore, can skip it and describe in detail.

Reduce audio signal to reproduce 3D, decoding device can be played up header fields from multi-channel decoding information field and 3D and read data.Such as, decoding device can utilize the reduction audio signal be included in multi-channel decoding information field and the HRTF coefficient information be included in 3D reduction audio signal to generate 3D reduction audio signal.

Figure 12 is according to an embodiment of the invention for the treatment of the block diagram of the coding/decoding device of any reduction audio signal.With reference to Figure 12, any reduction audio signal is the reduction audio signal except the reduction audio signal except being generated by the multi-channel encoder 801 be included in code device 800.The detailed description of the process identical with the embodiment of Fig. 1 will be omitted.

With reference to Figure 12, code device 800 comprises multi-channel encoder 801, spatial information synthesis unit 802 and comparing unit 803.

Input multi-channel signal reduction audio mixing is become stereo or monophony reduction audio signal by multi-channel encoder 801, and generates from the necessary fundamental space information of reduction audio signal recovery multi-channel signal.

Comparing unit 803 by reduction audio signal with arbitrarily reduce audio signal and compares, and result generation compensated information based on the comparison.Compensated information compensates arbitrarily reduction audio signal to make to reduce arbitrarily audio signal and can be converted into close to reduction audio signal necessary.Decoding device can utilize compensated information to compensate and reduce audio signal arbitrarily, and utilizes any reduction audio signal through compensating to recover multi-channel signal.The multi-channel signal recovered more is similar to original input multi-channel signal than the multi-channel signal recovered from any reduction audio signal generated by multi-channel encoder 801.

Compensated information can be reduction audio signal and the difference of reducing arbitrarily audio signal.Decoding device is by reducing audio signal phase Calais by the difference of reduction audio signal and any reduction audio signal compensate any down-mix audio signal with any.

The difference of reduction audio signal and any reduction audio signal can be the reduction downmix gain of the difference of the energy level of instruction reduction audio signal and any reduction audio signal.

Can for each frequency band, each time/time slot and/or each sound channel determine to reduce downmix gain.Such as, part reduction downmix gain can be determined for each frequency band, and another part reduction downmix gain can be determined for each time slot.

Reduction downmix gain or can be that each frequency band reducing arbitrarily audio signal optimization is determined for each parameter band.Parameter band is the frequency interval of the spatial information being applied with parameter type.

Reduction audio signal can be reduced the quantizing of the energy level of audio signal with any.The resolution quantizing reduction audio signal and any quantization level of the difference of the energy level of reduction audio signal and can quantize reduce audio signal and any resolution of reducing the quantization level of the CLD between audio signal is identical or different.In addition, the quantification reducing the difference of the energy level of audio signal and any reduction audio signal can relate to use and quantize all or part of of the quantization level of the CLD between reduction audio signal and any reduction audio signal.

Because the resolution of reduction audio signal and the arbitrarily difference of the energy level of reduction audio signal is generally lower than the resolution of the CLD between reduction audio signal and any reduction audio signal, so compared with the resolution quantizing to reduce audio signal and reduce arbitrarily the quantization level of the CLD between audio signal, the resolution of quantification reduction audio signal and the arbitrarily quantization level of the difference of the energy level of reduction audio signal can have small value.

The compensated information compensating reduction audio signal arbitrarily can be the extend information comprising residual information, the component of the input multi-channel signal that its appointment can not utilize reduction audio signal or reduction downmix gain arbitrarily to recover.Decoding device can utilize extend information to recover the component of the input multi-channel signal that reduction audio signal or reduction downmix gain arbitrarily can not be utilized to recover, thus the signal that recovery can be distinguished with original input multi-channel signal hardly.

The method generating extend information is as follows.

Multi-channel encoder 801 can generate the information relevant with the component reducing the input multi-channel signal that audio signal lacks as the first extend information.Decoding device utilizes reduction audio signal and basic spatial information generation multi-channel signal to recover the signal can distinguished with original input multi-channel signal hardly by the first extend information being applied to.

Or multi-channel encoder 801 can utilize reduction audio signal and fundamental space information to recover multi-channel signal, and the difference generating the multi-channel signal of the multi-channel signal that recovers and original input is as the first extend information.

Comparing unit 803 can generate and the component reducing arbitrarily the reduction audio signal that audio signal lacks---namely can not utilize the component reducing the reduction audio signal that downmix gain compensates---, and relevant information is as the second extend information.Decoding device can utilize reduction audio signal and the second extend information arbitrarily to recover the signal almost can not distinguished with reduction audio signal.

In addition to the above methods, extend information also can utilize various residual error interpretation method to generate.

Reduction downmix gain and extend information both can be used as compensated information.More specifically, reduction downmix gain and extend information can be obtained for the whole frequency band of reduction audio signal, and they can be used as compensated information together.Or, reduction downmix gain can be used as the compensated information of a part of frequency band for reduction audio signal, and extend information be used as the compensated information of another part frequency band of reduction audio signal.Such as, extend information can be used as the compensated information of the low-frequency band of reduction audio signal, and reduction downmix gain be used as the compensated information of the high frequency band of reduction audio signal.

Also can by with except reducing the low-frequency band of audio signal, extend information that the fractional reduction audio signal of the peak value or recess and so on of such as appreciable impact tonequality is relevant is used as compensated information.

Spatial information synthesis unit 802 synthesizes fundamental space information (such as, CLD, CPC, ICC and CTD) and compensated information, thus span information.In other words, the spatial information being sent to decoding device can comprise fundamental space information, reduction downmix gain and the first and second extend informations.

Spatial information can be included in the bitstream together with reducing arbitrarily audio signal, and can by bit stream to decoding device.

Extend information and arbitrarily reduction audio signal can utilize the audio coding method of such as AAC method, MP3 method or BSAC method and so on to encode.Extend information and arbitrarily reduction audio signal can utilize identical audio coding method or different audio coding methods to encode.

If extend information and any reduction audio signal utilize identical audio coding method coding, then decoding device can utilize single audio frequency coding/decoding method carry out decoding expansion information and reduce audio signal arbitrarily.In this case, because reduction audio signal always can be decoded arbitrarily, so extend information also always can be decoded.But, because reduction audio signal is generally input to decoding device as pulse code modulation (PCM) (PCM) signal arbitrarily, type for the audio codec reducing arbitrarily audio signal of encoding may not easily identify, therefore, the type for the audio codec of coding extension information may can not easily identify.

Therefore, relevant with the type of reducing arbitrarily the audio codec of audio signal and extend information for encoding voice codec information can be inserted in bit stream.

More specifically, voice codec information can be inserted the customized configuration header fields of bit stream.In this case, decoding device can extract voice codec information from the customized configuration header fields of bit stream, and uses voice codec information decoding reduction audio signal and the extend information arbitrarily extracted.

On the other hand, if reduction audio signal and extend information utilize different coding methods to encode arbitrarily, then extend information may not be decoded.In this case, because the end of extend information can not be identified, so further decode operation can not be performed.

In order to solve this problem, the voice codec information relevant with the type of reducing arbitrarily the audio codec of audio signal and extend information of being respectively used to encode can be inserted the customized configuration header fields of bit stream.Then, decoding device can read voice codec information from the customized configuration header fields of bit stream, and uses reading information to carry out decoding expansion information.If decoding device does not comprise the decoding unit of any decodable code extend information, then may not carry out the decoding of extend information further, and can read the information after immediately extend information.

The voice codec information relevant with the type of the audio codec for coding extension information can be represented by the syntax elements be included in the customized configuration header fields of bit stream.Such as, audio coding decoding information can be represented by 4 bit syntax elements bsResidualCodecType, as what indicate in following table 1.

Table 1

Extend information not only can comprise residual information, also can comprise channel expansion information.Channel expansion information is that the multi-channel signal by utilizing spatial information decoding to obtain is expanded into the necessary information of the multi-channel signal with more multichannel.Such as, channel expansion information can be that 5.1 sound channel signals or 7.1 sound channel signals are expanded into the necessary information of 9.1 sound channel signal.

Extend information can be comprised in the bitstream, and can by bit stream to decoding device.Then, decoding device can compensate reduction audio signal, or utilizes extend information to expand multi-channel signal.But decoding device can skip extend information, instead of extract extend information from bit stream.Such as, when the utilization 3D comprised in the bitstream reduces audio signal generation multi-channel signal or the utilization reduction audio signal comprised in the bitstream generates 3D reduction audio signal, decoding device can skip extend information.

The method skipping the extend information comprised in the bitstream can be identical with one of field skipping method described in reference diagram 10 above.

Such as, extend information can utilize the beginning that is attached to the bit stream comprising extend information and the bit size information of the bit size of indication extension information, be attached to the beginning of the field comprising extend information or the synchronization character of end and indication extension information fixed bit size fixed bit size information at least one skip.Bit size information, synchronization character and fixed bit size information all can be comprised in the bitstream.Also fixed bit size information can be stored in decoding device.

With reference to Figure 12, decoding unit 810 comprises reduction audio mixing compensating unit 811,3D rendering unit 815 and multi-channel decoder 816.

Reduction audio mixing compensating unit 811 utilizes the compensated information be included in spatial information---and such as utilize reduction downmix gain or extend information to compensate and reduce audio signal arbitrarily.

3D rendering unit 815 generates demoder 3D reduction audio signal by performing 3D Rendering operations to the reduction audio signal through compensating.Multi-channel decoder 816 utilizes the reduction audio signal through compensating and the fundamental space information be included in spatial information to generate 3D multi-channel signal.

Reduction audio mixing compensating unit 811 can compensate in the following manner and reduce audio signal arbitrarily.

If compensated information is reduction downmix gain, then reduces audio mixing compensating unit 811 and utilize reduction downmix gain to compensate the energy level of reduction audio signal arbitrarily, make to reduce arbitrarily audio signal and can be converted into the signal being similar to reduction audio signal.

If compensated information is the second extend information, then reduces audio mixing compensating unit 811 and the second extend information can be utilized to compensate any component reducing mixing information and lack.

Multi-channel decoder 816 is by sequentially putting on reduction audio mixing matrix signal to generate multi-channel signal by prematrix M1, audio mixing matrix M 2 and rearmounted matrix M 3.In this case, the second extend information compensates reduction audio signal during being used in and audio mixing matrix M 2 being put on reduction audio signal.In other words, the second extend information can be used for compensating the reduction audio signal being applied with prematrix M1.

As mentioned above, by extend information being applied to each that generate multi-channel signal and optionally compensate in multiple sound channel.Such as, if extend information to be put on the center channel of audio mixing matrix M 2, then can be compensated L channel and the right channel component of reduction audio signal by extend information.If extend information to be put on the L channel of audio mixing matrix M 2, then can be compensated the left channel component of reduction audio signal by extend information.

Reduction downmix gain and extend information both can be used as compensated information.Such as, extend information can be utilized to compensate the low-frequency band of reduction audio signal arbitrarily, and reduction downmix gain can be utilized to compensate the high frequency band of reduction audio signal arbitrarily.In addition, also can utilize extend information compensate except reducing arbitrarily the low-frequency band of audio signal, such as the peak value of appreciable impact tonequality or the sector of breakdown of recess can reduce audio signal arbitrarily.Can be included in the bitstream with by information relevant for the part compensated by extend information.Whether instruction comprises reduction audio signal is in the bitstream reduce arbitrarily the information whether information of audio signal and indication bit stream comprise compensated information can be included in the bitstream.

In order to prevent the reduction audio signal clipped wave generated by coding unit 800, can by reduction audio signal divided by predetermined gain.Predetermined gain can have quiescent value or dynamic value.

The reduction audio signal that reduction audio mixing compensating unit 811 weakens by utilizing predetermined gain to be compensated for as to prevent slicing is to recover original reduction audio signal.

Easily can reproduce any reduction audio signal compensated by reduction audio mixing compensating unit 811.Or any reduction audio signal also to be compensated can be input to 3D rendering unit 815, and demoder 3D reduction audio signal can be converted to by 3D rendering unit 815.

With reference to Figure 12, reduction audio mixing compensating unit 811 comprises the first territory converter 812, compensation processor 813 and the second territory converter 814.

First territory converter 812 converts the territory reducing arbitrarily audio signal to predetermined domain.Compensation processor 813 utilizes compensated information---such as, reduction downmix gain or extend information---compensate any reduction audio signal in predetermined domain.

The compensation of any reduction audio signal can be carried out in QMF/ hybrid domain.For this reason, the first territory converter 812 can to reduction audio signal execution QMF/ hybrid analysis arbitrarily.The territory reducing arbitrarily audio signal can be converted to the territory except QMF/ hybrid domain by the first territory converter 812, such as, and the such as frequency domain in DFT or FFT territory and so on.The compensation of any reduction audio signal also can be carried out in the territory except QMF/ hybrid domain, such as, and frequency domain or time domain.

Second territory converter 814 converts the territory of any reduction audio signal through compensating to the territory identical with original any reduction audio signal.More specifically, the second territory converter 814 converts the territory of any reduction audio signal through compensating to the territory identical with original any reduction audio signal by the territory conversion operations oppositely performed performed by the first territory converter 812.

Such as, the second territory converter 814 converts any reduction audio signal through compensating to time-domain signal by performing QMF/ mixing synthesis to any reduction audio signal through compensating.Equally, the second territory converter 814 can perform IDFT or IFFT to any reduction audio signal through compensating.

Be similar to 3D rendering unit 710, the 3D rendering unit 815 shown in Fig. 7 can in frequency domain, QMF/ hybrid domain or time domain, any reduction audio signal through compensating performs 3D Rendering operations.For this reason, this 3D rendering unit 815 can comprise territory converter (not shown).Territory converter converts the territory of any reduction audio signal through compensating to will perform 3D Rendering operations territory, or changes the territory of the signal obtained by 3D Rendering operations.

The territory that the territory that wherein compensation processor 813 compensates arbitrarily reduction audio signal can perform 3D Rendering operations with wherein 3D rendering unit 815 to any reduction audio signal through compensating is identical or different.

Figure 13 is the block diagram reducing audio mixing compensation/3D rendering unit 820 according to an embodiment of the invention.With reference to Figure 13, reduction audio mixing compensation/3D rendering unit 820 comprises the first territory converter 821, second territory converter 822, compensation/3D rendering processor 823 and the 3rd territory converter 824.

Reduction audio mixing compensation/3D rendering unit 820 can perform compensating operation and 3D Rendering operations to reducing arbitrarily audio signal in individual domain, thus reduces the calculated amount of decoding device.

More specifically, the territory reducing arbitrarily audio signal converts to and wherein will perform the first territory of compensating operation and 3D Rendering operations by the first territory converter 821.Second territory converter 822 transformed space information, it comprises generation multi-channel signal necessary fundamental space information and compensates the necessary compensated information of reduction audio signal arbitrarily, spatial information is become and is applicable to the first territory.Compensated information can comprise at least one in reduction downmix gain and extend information.

Such as, the compensated information corresponding to parameter band in QMF/ hybrid domain can be mapped to frequency band by the second territory converter 822, compensated information can be become and be easily applicable to frequency domain.

First territory can be the frequency domain of such as DFT or FFT and so on, QMF/ hybrid domain or time domain.Or the first territory can be the territory except the territory of stating herein.

In the transition period of compensated information, time delay can be there is.In order to solve this problem, the second territory converter 822 can perform delay compensation operation, and the time delay between the territory of compensated information and the first territory can be compensated.

Compensation/3D rendering processor 823 utilizes the spatial information through conversion to perform compensating operation to any reduction audio signal in the first territory, then performs 3D Rendering operations to the signal obtained by compensating operation.Compensation/3D rendering processor 823 can perform compensating operation and 3D Rendering operations by from the different order stated herein.

Compensation/3D rendering processor 823 can perform compensating operation and 3D Rendering operations to reducing arbitrarily audio signal simultaneously.Such as, compensation/3D rendering processor 823 reduces audio signal by using new filter coefficient to the 3D that any reduction audio signal execution 3D Rendering operations in the first territory generates through compensating, and this new filter coefficient is the combination of compensated information and the usual existing filter coefficient used in 3D Rendering operations.

The territory that the 3D that compensation/3D rendering processor 823 generates by the 3rd territory converter 824 reduces audio signal converts frequency domain to.

Figure 14 is according to the block diagram of embodiments of the invention for the treatment of the decoding device 900 of compatible down-mix signal.With reference to Figure 14, decoding device 900 comprises the first multi-channel decoder 910, reduction mix compatibility processing unit 920, second multi-channel decoder 930 and 3D rendering unit 940.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

Compatible down-mix signal is the reduction audio signal can decoded by two or more multi-channel decoders.In other words, compatible down-mix signal is at first for predetermined multi-channel decoder optimization, the reduction audio signal that then can be converted to the signal optimized for the multi-channel decoder except this predetermined multi-channel decoder by compatibility processing operation.

With reference to Figure 14, suppose that the compatible down-mix signal pin inputted is optimized the first multi-channel decoder 910.In order to make the compatible down-mix signal of the second multi-channel decoder 930 decoding input, reduction mix compatibility processing unit 920 can perform compatibility processing operation to the compatible down-mix signal of input, makes the compatible down-mix signal inputted can be converted into the signal optimized for the second multi-channel decoder 930.First multi-channel decoder 910 generates the first multi-channel signal by the compatible down-mix signal of decoding input.First multi-channel decoder 910 does not need spatial information to carry out decoding to generate multi-channel signal by only using the compatible down-mix signal of input.

The reduction audio signal that second multi-channel decoder 930 utilizes the compatibility processing operation performed by reduction mix compatibility processing unit 920 to obtain generates the second multi-channel signal.3D rendering unit 940 performs 3D Rendering operations by the reduction audio signal obtained the compatibility processing operation performed by reduction mix compatibility processing unit 920 and reduces audio signal to generate demoder 3D.

The compatibility information of such as inverse matrix and so on can be utilized, the compatible down-mix signal optimized is converted to the reduction audio signal optimized for the multi-channel decoder except predetermined multi-channel decoder for predetermined multi-channel decoder.Such as when there is the first and second multi-channel decoder utilizing the first and second multi-channel encoders of different coding method and utilize different coding/coding/decoding method, matrix can be put on the reduction audio signal that the first multi-channel encoder generates by code device, thus generates the compatible down-mix signal optimized for the second multi-channel decoder.Then, inverse matrix can be put on the compatible down-mix signal generated by code device by decoding device, thus generates the compatible down-mix signal optimized for the first multi-channel decoder.

With reference to Figure 14, reduction mix compatibility processing unit 920 can utilize inverse matrix to perform compatibility processing operation to the compatible down-mix signal of input, thus generates the reduction audio signal optimized for the second multi-channel decoder 930.

The information relevant with the inverse matrix that reduction mix compatibility processing unit 920 uses can be stored in decoding device 900 in advance, maybe can be included in the bit stream of code device transmission.In addition, the reduction audio signal be included in incoming bit stream is indicated to be that the information of reducing arbitrarily audio signal or compatible down-mix signal can be included in incoming bit stream.

With reference to Figure 14, reduction mix compatibility processing unit 920 comprises the first territory converter 921, compatible processor 922 and the second territory converter 923.

First territory converter 921 converts the territory of the compatible down-mix signal of input to predetermined domain, and compatible processor 922 utilizes the compatibility information of such as inverse matrix and so on to perform compatibility processing operation, makes the input compatible down-mix signal in predetermined domain can be converted into the signal optimized for the second multi-channel decoder 930.

Compatible processor 922 can perform compatibility processing operation in QMF/ hybrid domain.For this reason, the first territory converter 921 can perform QMF/ hybrid analysis to the compatible down-mix signal of input.Equally, the territory of the compatible down-mix signal of input can be converted to the territory except QMF/ hybrid domain by the first territory converter 921, such as, the such as frequency domain in DFT or FFT territory and so on, and compatible processor 922 can in the territory except QMF/ hybrid domain---as performed compatibility processing operation in frequency domain or time domain.

Second territory converter 923 changes the territory of the compatible down-mix signal obtained by compatibility processing operation.More specifically, the second territory converter 923 converts the territory of the compatibility obtained by compatibility processing operation reduction audio signal to the territory identical with original input compatible down-mix signal by the territory conversion operations oppositely performed performed by the first territory converter 921.

Such as, the second territory converter 923 converts the compatible down-mix signal obtained by compatibility processing operation to time-domain signal by performing the synthesis of QMF/ hybrid domain to the compatible down-mix signal obtained by compatibility processing operation.Or the second territory converter 923 can perform IDFT or IFFT to being operated by compatible processing the compatible down-mix signal obtained.

3D rendering unit 940 can in frequency domain, QMF/ hybrid domain or time domain, the compatible down-mix signal that obtained by compatibility processing operation performs 3D Rendering operations.For this reason, this 3D rendering unit 940 can comprise territory converter (not shown).The territory of the compatible down-mix signal of input converts to and wherein will perform the territory of 3D Rendering operations by territory converter, or changes the territory of the signal obtained by 3D Rendering operations.

Wherein compatible processor 922 perform compatibility processing operation territory can to perform the territory of 3D Rendering operations identical or different with wherein 3D rendering unit 940.

Figure 15 is the block diagram reducing mix compatibility process/3D rendering unit 950 according to an embodiment of the invention.With reference to Figure 15, reduction mix compatibility process/3D rendering unit 950 comprises the first territory converter 951, second territory converter 952, compatibility/3D rendering processor 953 and the 3rd territory converter 954.

Reduction mix compatibility process/3D rendering unit 950 performs compatibility processing operation and 3D Rendering operations in individual domain, thus reduces the calculated amount of decoding device.

The compatible down-mix signal of input is converted to and wherein will performs the first territory of compatibility processing operation and 3D Rendering operations by the first territory converter 951.Second territory converter 952 transformed space information and compatibility information, such as inverse matrix, makes spatial information and compatibility information to become and is applicable to the first territory.

Such as, the inverse matrix corresponding to parameter band in QMF/ hybrid domain can be mapped to frequency domain by the second territory converter 952, makes inverse matrix easily can be applicable to frequency domain.

First territory can be the frequency domain in such as DFT or FFT territory and so on, QMF/ hybrid domain or time domain.Or the first territory can be the territory except the territory of stating herein.

In the transition period of spatial information and compatibility information, can postpone by time of origin.

In order to solve this problem, the second territory converter 952 can perform delay compensation operation, and the time delay between the territory of spatial information and compensated information and the first territory can be compensated.

Compatibility/3D rendering processor 953 utilizes the compatibility information through conversion to perform compatibility processing operation to the input compatible down-mix signal in the first territory, then performs 3D Rendering operations to the compatible down-mix signal obtained by compatibility processing operation.Compatibility/3D rendering processor 953 can perform compatibility processing operation and 3D Rendering operations by from the different order stated herein.

Compatibility/3D rendering processor 953 can perform compatibility processing operation and 3D Rendering operations to the compatible down-mix signal of input simultaneously.Such as, compatibility/3D rendering processor 953 reduces audio signal by using new filter coefficient to perform 3D Rendering operations to the input compatible down-mix signal in the first territory to generate 3D, and this new filter coefficient is the combination of compatibility information and the usual existing filter coefficient used in 3D Rendering operations.

The territory that the 3D that compatibility/3D rendering processor 953 generates by the 3rd territory converter 954 reduces audio signal converts frequency domain to.

Figure 16 is for eliminating the block diagram of the decoding device of crosstalk according to embodiments of the invention.With reference to Figure 16, decoding device comprises bit packing unit 960, reduction audio mixing demoder 970,3D rendering unit 980 and cross-talk cancellation unit 990.The detailed description of the decode procedure identical with the embodiment of Fig. 1 will be omitted.

The 3D exported by 3D rendering unit 980 reduces audio signal can by headphone reproduction.But when 3D reduces audio signal by loudspeaker reproduction away from user, sound intertrack crosstalk probably occurs.

Therefore, decoding device can comprise the cross-talk cancellation unit 990 of 3D being reduced to audio signal execution crosstalk canceling operation.

Decoding device can perform sound field process operation.

The sound field information used in sound field process operation, that is, mark is wherein by reproducing the information in the space of 3D reduction audio signal, can be included in the incoming bit stream transmitted by code device, or can be selected by decoding device.

Incoming bit stream can comprise reverberation time information.The wave filter of use in sound field process operation can be controlled according to reverberation time information.

Sound field process operation differentially can be performed for earlier section and reverberant part below.Such as, earlier section can utilize FIR filter to process, and reverberant part below can utilize iir filter process.

More specifically, by using FIR filter to perform convolution operation in the time domain or operating by performing multiply operation in the time domain and the results conversion of multiply operation to time domain being come perform sound field process to earlier section.Sound field process operation can perform reverberant part below in the time domain.

The present invention can be embodied as the computer-readable code write on a computer readable recording medium.Computer readable recording medium storing program for performing can be the wherein recording unit of any type that stores in a computer-readable manner of data.The example of computer readable recording medium storing program for performing comprises ROM, RAM, CD-ROM, tape, floppy disk, optical data storage, carrier wave (such as, being transmitted by the data of the Internet).Computer readable recording medium storing program for performing can be distributed in and be connected in multiple computer systems of network, make computer-readable code to write to it in the mode of dispersion or to perform from it.Realize function program required for the present invention, code and code segment easily to be explained by those of ordinary skill in the art.

As mentioned above, according to the present invention, coding has the multi-channel signal of 3D effect expeditiously, and recovers adaptively with reproducing audio signal to be possible with optimum tonequality according to the characteristic of reproducing environment.

Industrial applicibility

Other realize in the scope of following claim.Such as, various application and various product can be applied to according to marshalling of the present invention, data decoding and entropy decoding.The storage medium of the storage data of application one aspect of the present invention within the scope of the invention.

Claims

1. a method for decoded signal, described method comprises:

Receive bit stream, described bit stream comprises that three-dimensional (3D) reduces audio signal, extend information, size information, spatial information, the instruction reduction audio signal be included in described bit stream are the information that non-3D reduces that audio signal or 3D reduce the wave filter that the reduction audio mixing identification information of audio signal and mark are played up for 3D;

Use described reduction audio mixing identification information to the type of the described reduction audio signal identifying described bit stream and comprise;

The size information relevant with the size of described extend information is extracted from described bit stream;

The described extend information in described bit stream is skipped based on described size information;

Audio signal and described spatial information is reduced from 3D described in described bitstream extraction;

Come to reduce audio signal removal 3D effect from described 3D by reducing audio signal execution 3D Rendering operations to described 3D; And

Utilize and remove by described the reduction audio signal and described spatial information generation multi-channel signal that obtain.

2. the method for claim 1, is characterized in that, described extend information comprises at least one in channel expansion information and residual information.

3. the method for claim 1, is characterized in that, described removal 3D effect performs by using the inverse filter reducing the wave filter of audio signal for generating described 3D.

4. a method for decoded signal, described method comprises:

Receive bit stream, described bit stream comprises reduction audio signal, extend information, size information, spatial information, the instruction reduction audio signal be included in described bit stream are the information that non-3D reduces that audio signal or 3D reduce the wave filter that the reduction audio mixing identification information of audio signal and mark are played up for 3D;

From reducing audio signal and described spatial information described in described bitstream extraction; And

3D reduction audio signal is generated by performing 3D Rendering operations to described reduction audio signal.

5. method as claimed in claim 4, it is characterized in that, described extend information comprises at least one in channel expansion information and residual information.

6., for a device for decoded signal, described device comprises:

Bit packing unit, it receives bit stream, described bit stream comprises three-dimensional (3D) and reduces audio signal, extend information, size information, spatial information, the instruction reduction audio signal be included in described bit stream is the reduction audio mixing identification information that non-3D reduces audio signal or 3D reduction audio signal, and mark is used for the information of the wave filter that 3D plays up, use described reduction audio mixing identification information to the type of the described reduction audio signal identifying described bit stream and comprise, the size information relevant with the size of described extend information is extracted from described bit stream, the described extend information in described bit stream is skipped based on described size information, and reduce audio signal and described spatial information from 3D described in described bitstream extraction,

3D rendering unit, it comes to reduce audio signal removal 3D effect from described 3D by reducing audio signal execution 3D Rendering operations to described 3D; And

Multi-channel decoder, the reduction audio signal that its utilization is obtained by described removal and described spatial information generate multi-channel signal.

7. device as claimed in claim 6, it is characterized in that, described extend information comprises at least one in channel expansion information and residual information.

8. device as claimed in claim 6, is characterized in that, described 3D rendering unit removes 3D effect by using the inverse filter reducing the wave filter of audio signal for generating described 3D.

9., for a device for decoded signal, described device comprises:

Bit packing unit, it receives bit stream, described bit stream comprises reduction audio signal, extend information, size information, spatial information, the instruction reduction audio signal be included in described bit stream is the reduction audio mixing identification information that non-3D reduces audio signal or 3D reduction audio signal, and mark is used for the information of the wave filter that 3D plays up, use described reduction audio mixing identification information to the type of the described reduction audio signal identifying described bit stream and comprise, the size information relevant with the size of described extend information is extracted from described bit stream, the described extend information in described bit stream is skipped based on described size information, and from reducing audio signal and described spatial information described in described bitstream extraction, and

3D rendering unit, it generates 3D reduction audio signal by performing 3D Rendering operations to reduction audio signal.

10. device as claimed in claim 9, it is characterized in that, described extend information comprises at least one in channel expansion information and residual information.