CN101385076B - Apparatus and method for encoding/decoding signal - Google Patents

Apparatus and method for encoding/decoding signal Download PDF

Info

Publication number
CN101385076B
CN101385076B CN2007800045157A CN200780004515A CN101385076B CN 101385076 B CN101385076 B CN 101385076B CN 2007800045157 A CN2007800045157 A CN 2007800045157A CN 200780004515 A CN200780004515 A CN 200780004515A CN 101385076 B CN101385076 B CN 101385076B
Authority
CN
China
Prior art keywords
audio signal
reduction audio
information
signal
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2007800045157A
Other languages
Chinese (zh)
Other versions
CN101385076A (en
Inventor
郑亮源
房熙锡
吴贤午
金东秀
林宰显
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LG Electronics Inc
Original Assignee
LG Electronics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LG Electronics Inc filed Critical LG Electronics Inc
Priority claimed from PCT/KR2007/000668 external-priority patent/WO2007091842A1/en
Publication of CN101385076A publication Critical patent/CN101385076A/en
Application granted granted Critical
Publication of CN101385076B publication Critical patent/CN101385076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Stereophonic System (AREA)

Abstract

An encoding method and apparatus and a decoding method and apparatus are provided. The decoding method includes extracting a three-dimensional (3D) down-mix signal and spatial information from an input bitstream, removing 3D effects from the 3D down-mix signal by performing a 3D rendering operation on the 3D down-mix signal, and generating a multi-channel signal using the spatial information and a down-mix signal obtained by the removal. Accordingly, it is possible to efficiently encode multi-channel signals with 3D effects and to adaptively restore and reproduce audio signals with optimum sound quality according to the characteristics of a reproduction environment.

Description

The apparatus and method that are used for encoding/decoding signal
Technical field
The present invention relates to coding/decoding method and coding/decoding device, make it possible to produce three-dimensional (3D) acoustic coding/decoding device but relate in particular to audio signal, and the coding/decoding method that utilizes this coding/decoding device.
Background technology
Code device reduces the signal that audio mixing becomes to have less sound channel with multi-channel signal, and will be sent to decoding device through the signal of reduction audio mixing.Then, decoding device recovers multi-channel signal from the signal through the reduction audio mixing, and use as three of 5.1 channel loudspeakers and so on or more multi-loudspeaker reproduce the multi-channel signal that is recovered.
Multi-channel signal can be reproduced by 2 channel loudspeakers such as earphone.In this case, feel by the sound of 2 channel loudspeakers output like from three or the reproduction of more sound sources, be necessary to develop and encode or the decoding multi-channel signal makes it possible to produce three-dimensional (3D) treatment technology of 3D effect in order to make the user.
Summary of the invention
Technical matters
The present invention provides a kind of coding/decoding device and coding/decoding method that can in various reproducing environment, reproduce multi-channel signal through handling the signal with 3D effect expeditiously.
Technical solution
According to an aspect of the present invention, a kind of coding/decoding method that recovers multi-channel signal is provided, this coding/decoding method comprises: extract three-dimensional (3D) reduction audio signal and spatial information from incoming bit stream; Through being carried out the 3D rendering operations, 3D reduction audio signal removes 3D effect from 3D reduction audio signal; And utilize spatial information and generate multi-channel signal through the reduction audio signal that removal is obtained.
According to another aspect of the present invention, a kind of coding/decoding method that recovers multi-channel signal is provided, this coding/decoding method comprises: extract 3D reduction audio signal and spatial information from incoming bit stream; Utilize 3D reduction audio signal and spatial information to generate multi-channel signal; And through multi-channel signal execution 3D rendering operations is removed 3D effect from multi-channel signal.
According to another aspect of the present invention, the coding method that provides a kind of coding to have the multi-channel signal of a plurality of sound channels, this coding method comprises: multi-channel signal is encoded into the reduction audio signal with less sound channel; Generation is about the spatial information of a plurality of sound channels; Generate 3D reduction audio signal through the reduction audio signal being carried out the 3D rendering operations; And generation comprises the bit stream of 3D reduction audio signal and spatial information.
According to another aspect of the present invention, the coding method that provides a kind of coding to have the multi-channel signal of a plurality of sound channels, this coding method comprises: multi-channel signal is carried out the 3D rendering operations; To be encoded into 3D reduction audio signal through the multi-channel signal that the 3D rendering operations is obtained with less sound channel; Generation is about the spatial information of a plurality of sound channels; And generation comprises the bit stream of 3D reduction audio signal and spatial information.
According to another aspect of the present invention, a kind of decoding device that recovers multi-channel signal is provided, this decoding device comprises: the bit split cells, and it extracts through the 3D of coding reduction audio signal and spatial information from incoming bit stream; Reduction audio mixing demoder, it is decoded to the 3D reduction audio signal through coding; The 3D rendering unit, it is through coming from removing 3D effect through the 3D of decoding reduction audio signal through the 3D of decoding reduction audio signal execution 3D rendering operations what obtained by reduction audio mixing demoder execution decoding; And multi-channel decoder, it utilizes spatial information and carries out the reduction audio signal of removing and obtaining by the 3D rendering unit and generates multi-channel signal.
According to another aspect of the present invention, a kind of decoding device that recovers multi-channel signal is provided, this decoding device comprises: the bit split cells, and it extracts through the 3D of coding reduction audio signal and spatial information from incoming bit stream; Reduction audio mixing demoder, it is decoded to the 3D reduction audio signal through coding; Multi-channel decoder, it utilizes spatial information and carries out the 3D reduction audio signal of decoding and obtaining by reduction audio mixing demoder and generates multi-channel signal; And the 3D rendering unit, it comes to remove 3D effect from multi-channel signal through multi-channel signal being carried out the 3D rendering operations.
According to another aspect of the present invention; Provide a kind of coding to have the code device of the multi-channel signal of a plurality of sound channels; This code device comprises: the multi-channel encoder device, and it is encoded into multi-channel signal reduction audio signal with less sound channel and generates the spatial information about a plurality of sound channels; The 3D rendering unit, it generates 3D reduction audio signal through the reduction audio signal being carried out the 3D rendering operations; Reduction audio mixing scrambler, its coding 3D reduction audio signal; And bit packing unit, its generation comprises the bit stream through the 3D of coding reduction audio signal and spatial information.
According to another aspect of the present invention, the code device that provides a kind of coding to have the multi-channel signal of a plurality of sound channels, this code device comprises: the 3D rendering unit, it carries out the 3D rendering operations to multi-channel signal; The multi-channel encoder device, it will be encoded into the 3D reduction audio signal with less sound channel through the multi-channel signal that the 3D rendering operations is obtained and generate the spatial information about a plurality of sound channels; Reduction audio mixing scrambler, its coding 3D reduction audio signal; And bit packing unit, its generation comprises the bit stream through the 3D of coding reduction audio signal and spatial information.
According to another aspect of the present invention, a kind of bit stream is provided, it comprises: data field, and it comprises the information about 3D reduction audio signal; The filter information field, it comprises that sign is used to generate the filter information that 3D reduces the wave filter of audio signal; First header fields, it comprises whether indication filter information field comprises the information of filter information; Second header fields, what it comprised that indication filter information field comprises is the information of the coefficient or the coefficient of the inverse filter of wave filter of wave filter; And the spatial information field, it comprises the spatial information about a plurality of sound channels.
According to another aspect of the present invention, a kind of any computer readable recording medium storing program for performing of computer program that is used for carrying out above-mentioned coding/decoding method or above-mentioned coding method that has is provided.
Beneficial effect
According to the present invention, can encode efficiently has the multi-channel signal of 3D effect, and recovers adaptively and reproducing audio signal with optimum tonequality according to the characteristic of reproducing environment.
Brief Description Of Drawings
Fig. 1 is the block diagram of coding/decoding device according to an embodiment of the invention;
Fig. 2 is the block diagram of code device according to an embodiment of the invention;
Fig. 3 is the block diagram of decoding device according to an embodiment of the invention;
Fig. 4 is the block diagram of code device according to another embodiment of the invention;
Fig. 5 is the block diagram of decoding device according to another embodiment of the invention;
Fig. 6 is the block diagram of decoding device according to another embodiment of the invention;
Fig. 7 is the block diagram of three-dimensional according to an embodiment of the invention (3D) rendering device;
Fig. 8 to 11 illustrates bit stream according to an embodiment of the invention;
Figure 12 is the block diagram that is used to handle the coding/decoding device of any reduction audio signal according to embodiments of the invention;
Figure 13 is the block diagram that reduces audio signal compensation/3D rendering unit according to an embodiment of the invention arbitrarily;
Figure 14 is used to handle the compatible block diagram that reduces the decoding device of audio signal according to embodiments of the invention;
Figure 15 is the block diagram that reduces the compatible processing/3D rendering unit of audio mixing according to an embodiment of the invention; And
Figure 16 is the block diagram that is used to eliminate the decoding device of crosstalking according to embodiments of the invention.
Preferred forms of the present invention
Hereinafter will the present invention more fully be described with reference to the accompanying drawing that exemplary embodiment of the present invention is shown.Fig. 1 is the block diagram of coding/decoding device according to an embodiment of the invention.With reference to figure 1, coding unit 100 comprises multi-channel encoder device 110, three-dimensional (3D) rendering unit 120, reduction audio mixing scrambler 130 and bit packing unit 140
The multichannel information reduction audio mixing that multi-channel encoder device 110 will have a plurality of sound channels becomes the reduction audio signal such as stereo or monophonic signal, and generates the spatial information about the sound channel of this multi-channel signal.Needing spatial information is in order to recover multi-channel signal from the reduction audio signal.
The example of spatial information comprises: levels of channels poor (CLD), the sound channel predictive coefficient (CPC) of difference of indicating the energy level of a pair of sound channel---promptly be used for generating the sound channel mistiming (CTD) in the time interval between correlativity (ICC) between the predictive coefficient of 3 sound channel signals, the sound channel of correlativity between a pair of sound channel of indication and a pair of sound channel based on 2 sound channel signals.
3D rendering unit 120 generates 3D reduction audio signal based on the reduction audio signal.3D reduction audio signal can be to have three or 2 sound channel signals of multidirectional more, therefore can reproduced and had a 3D effect by 2 channel loudspeakers such as earphone.In other words, 3D reduction audio signal can be reproduced by 2 channel loudspeakers, make the user feel 3D reduction audio signal seem from have three or more the sound source of multichannel reproduce the same.The direction of sound source can be confirmed based in the difference of the phase place of the time interval between poor, two sound of the intensity of two sound that are input to two ears respectively and two sound at least one.Therefore, how 3D rendering unit 120 can utilize its sense of hearing to confirm that the 3D position of sound source converts the reduction audio signal to 3D reduction audio signal based on the mankind.
3D rendering unit 120 can generate 3D reduction audio signal through utilizing filter filtering reduction audio signal.In this case, can be by external source with the wave filter relevant information---be input to 3D rendering unit 120 like filter coefficient.3D rendering unit 120 spatial informations that provided by multi-channel encoder device 110 capable of using come to generate 3D reduction audio signal based on the reduction audio signal.More specifically, 3D rendering unit 120 can be through utilizing spatial information will reduce multi-channel signal that audio signal converts imagination to and the multi-channel signal of this imagination of filtering converts the reduction audio signal to 3D reduction audio signal.
3D rendering unit 120 can generate 3D reduction audio signal through utilizing header related transfer function (HRTF) filter filtering reduction audio signal.
HRTF is a kind of transport function, and it describes the transmission of sound wave between sound source and the ear-drum of optional position, and returns according to the direction of sound source and the value of height change.If utilize HRTF filtering not have the signal of directivity, can hear that then this signal is like reproducing from certain direction.
3D rendering unit 120 can be carried out the 3D rendering operations in the frequency domain in for example DFT (DFT) territory or fast Fourier transform (FFT) territory and so on.In this case, 3D rendering unit 120 can be carried out DFT or FFT before the 3D rendering operations, perhaps can after the 3D rendering operations, carry out contrary DFT (IDFT) or contrary FFT (IFFT).
3D rendering unit 120 can be carried out the 3D rendering operations in quadrature mirror filter (QMF)/hybrid domain.In this case, 3D rendering unit 120 can be carried out QMF/ hybrid analysis and synthetic operation before or after the 3D rendering operations.
3D rendering unit 120 can be carried out the 3D rendering operations in time domain.3D rendering unit 120 can be confirmed in which territory, to carry out the 3D rendering operations according to the functipnal capability of required tonequality and coding/decoding device.
Reduction audio mixing scrambler 130 codings are by the reduction audio signal of multi-channel encoder device 110 outputs or the 3D reduction audio signal of being exported by 3D rendering unit 120.The reduction audio mixing scrambler 130 audio coding methods such as advanced audio decoding (AAC) method, MPEG layer 3 (MP3) method or bit slice algorithm decoding (BSAC) method capable of using are encoded by the reduction audio signal of multi-channel encoder device 110 outputs or the 3D reduction audio signal of being exported by 3D rendering unit 120.
Non-3D reduction audio signal of reduction audio mixing scrambler 130 codifieds or 3D reduction audio signal.In this case, can be included in the transferred bit stream through the non-3D reduction audio signal of coding with through the 3D of coding reduction audio signal both.
Bit packing unit 140 is based on spatial information and perhaps generate bit stream through the non-3D reduction audio signal of coding or the 3D reduction audio signal of warp coding.
The bit stream that is generated by bit packing unit 140 can comprise that spatial information, indication are included in reduction audio mixing identification information that reduction audio signal right and wrong 3D reduction audio signal in the bit stream still is a 3D reduction audio signal and the sign information (for example, HRTF coefficient information) by 3D rendering unit 120 employed wave filters.
In other words, the bit stream by bit packing unit 140 generations can comprise that the non-3D that also handles without 3D reduces audio signal and handles at least one of operating in the scrambler 3D reduction audio signal of obtaining and the reduction audio mixing identification information that identifies the type that is included in the reduction audio signal in the bit stream through the 3D that is carried out by code device.
Can select or confirm according to the characteristic of the ability of coding/decoding device shown in Figure 1 and reproducing environment that in non-3D reduction audio signal and the scrambler 3D reduction audio signal which will be included in by bit and pack in the bit stream of unit 140 generations by the user.
The HRTF coefficient information can comprise the contrafunctional coefficient by 3D rendering unit 120 employed HRTF.The HRTF coefficient information can only comprise the brief information by the coefficient of 3D rendering unit 120 employed HRTF, for example, and the envelope information of HRTF coefficient.Be sent to decoding device if will comprise the bit stream of the contrafunctional coefficient of HRTF, then decoding device need not carried out HRTF coefficient conversion operations, therefore can reduce the calculated amount of decoding device.
The bit stream that is generated by bit packing unit 140 also can comprise the information about the energy changing in the signal that is caused by the filtering based on HRTF; That is, about the energy of the difference of the energy of signal that will filtering and the energy of the signal of filtering or signal that will filtering and the information of the ratio of the energy of the signal of filtering.
The bit stream that is generated by bit packing unit 140 also can comprise indication, and whether it comprises the information of HRTF coefficient.If the HRTF coefficient is included in the bit stream that is generated by bit packing unit 140, then this bit stream also can comprise indication it comprises that coefficient by 3D rendering unit 120 employed HRTF still is the information of contrafunctional coefficient of HRTF.
Comprise bit split cells 210, reduction audio mixing demoder 220,3D rendering unit 230 and multi-channel decoder 240 with reference to figure 1, the first decoding unit 200.
Bit split cells 210 receives incoming bit stream from coding unit 100, and from this incoming bit stream, extracts reduction audio signal and spatial information through coding.220 pairs of reduction audio signal through coding of reduction audio mixing demoder are decoded.The reduction audio mixing demoder 220 audio signal decoding methods such as AAC method, MP3 method or BSAC method capable of using are come decoding through the reduction audio signal of coding.
As stated, the reduction audio signal of from incoming bit stream, extracting through coding can be through the non-3D of coding reduction audio signal or through coding, scrambler 3D reduces audio signal.The reduction audio signal through coding that indication is extracted from incoming bit stream is can be included in the incoming bit stream through the non-3D reduction audio signal of coding or through information coding, scrambler 3D reduction audio signal.
If the reduction audio signal of from incoming bit stream, extracting through coding is a scrambler 3D reduction audio signal, then can after by 220 decodings of reduction audio mixing demoder, easily reproduce through the reduction audio signal of coding.
On the other hand; If the reduction audio signal right and wrong 3D through coding that from incoming bit stream, extracts reduces audio signal; Then can decode by reduction audio mixing demoder 220, and can convert demoder 3D reduction audio signal through the 3D rendering operations of carrying out by the 3rd rendering unit 233 to through the reduction audio signal that decoding is obtained through the reduction audio signal of coding.Demoder 3D reduction audio signal can easily be reproduced.
3D rendering unit 230 comprises first renderer 231, second renderer 232 and the 3rd renderer 233.First renderer 231 generates the reduction audio signal through the scrambler 3D reduction audio signal that is provided by reduction audio mixing demoder 220 is carried out the 3D rendering operations.For example, first renderer 231 can generate non-3D reduction audio signal through remove 3D effect from scrambler 3D reduction audio signal.The 3D effect of scrambler 3D reduction audio signal may not be removed by first renderer 231 fully.In this case, the reduction audio signal by 231 outputs of first renderer can have identical 3D effect.
The inverse filter that first renderer 231 can convert the 3D reduction audio signal that is provided by reduction audio mixing demoder 220 to use coding unit 100 3D rendering unit 120 employed wave filters is with the reduction audio signal of 3D effect from its removal.About being included in the incoming bit stream by 3D rendering unit 120 employed wave filters or by the information of the inverse filter of 3D rendering unit 120 employed wave filters.
By 3D rendering unit 120 employed wave filters can be hrtf filter.In this case, also can be included in the incoming bit stream by the coefficient of coding unit 100 employed HRTF or the contrafunctional coefficient of HRTF.If the coefficient by cell encoder 100 employed HRTF is included in the incoming bit stream, then the HRTF coefficient can be reversed and change, and can during the 3D rendering operations of being carried out by first renderer 231, use the result of this inverse conversion.If the contrafunctional coefficient by coding unit 100 employed HRTF is included in the incoming bit stream, then they can easily use during the 3D rendering operations of being carried out by first renderer 231, and do not carry out any inverse conversion operation.In this case, can reduce the calculated amount of first decoding device 100.
Incoming bit stream also can comprise filter information (for example, whether indication is included in the information in the incoming bit stream by the coefficient of coding unit 100 employed HRTF) and indicate this filter information whether to be reversed the information of changing.
Multi-channel decoder 240 generates based on the spatial information of removing the reduction audio signal of 3D effect from it and from incoming bit stream, extract has three or the 3D multi-channel signal of multichannel more.
Second renderer 232 can generate the 3D reduction audio signal with 3D effect through the reduction audio signal of removing 3D effect from it being carried out the 3D rendering operations.In other words, first renderer 231 is removed 3D effect from the scrambler 3D reduction audio signal that is provided by reduction audio mixing demoder 220.Afterwards; The wave filter of second renderer, 232 first decoding devices capable of using reduces audio signal through the reduction audio signal execution 3D rendering operations of being obtained by the 231 execution removals of first renderer is generated first decoding device, the 200 desired combination 3D with 3D effect.
First decoding device 200 can comprise wherein be combined with carry out same operation first, second with the 3rd renderer 231,232 and 233 in two or more renderer.
The bit stream that is generated by coding unit 100 can be imported into second decoding device 300 that has with first decoding device, 200 various structure.Second decoding device 300 can generate 3D reduction audio signal based on the reduction audio signal that is included in the bit stream of its input.
More specifically, second decoding device 300 comprises bit split cells 310, reduction audio mixing demoder 320 and 3D rendering unit 330.Bit split cells 310 receives incoming bit stream from coding unit 100, and from this incoming bit stream, extracts reduction audio signal and spatial information through coding.320 pairs of reduction audio signal through coding of reduction audio mixing demoder are decoded.330 pairs of reduction audio signal through decoding of 3D rendering unit are carried out the 3D rendering operations, make the reduction audio signal of warp decoding can be converted into 3D reduction audio signal.
Fig. 2 is the block diagram of code device according to an embodiment of the invention.With reference to figure 2, this code device comprises rendering unit 400 and 420 and multi-channel encoder device 410.With the detailed description of omitting the cataloged procedure identical with the embodiment of Fig. 1.
With reference to figure 2, can 3D rendering unit 400 and 420 be separately positioned on the front and back of multi-channel encoder device 410.Therefore, multi-channel signal can carry out 3D by 3D rendering unit 400 to be played up, and then, the multi-channel signal of playing up through 3D can reduce audio signal thereby generate through pretreated, scrambler 3D by multi-channel encoder device 410 codings.Perhaps, multi-channel signal can reduce audio mixing by multi-channel encoder device 410, then, can carry out 3D by 3D rendering unit 420 and plays up through the signal of reduction audio mixing, thereby generate through aftertreatment, scrambler reduction audio signal.
The indication multi-channel signal carries out the information that 3D plays up and can be included in the transferred bit stream after reduction still reduces audio mixing before the audio mixing.
3D rendering unit 400 and 420 boths can be set at the front or the back of multi-channel encoder device 410.
Fig. 3 is the block diagram of decoding device according to an embodiment of the invention.With reference to figure 3, this decoding device comprises 3D rendering unit 430 and 450 and multi-channel decoder 440.With the detailed description of omitting the decode procedure identical with the embodiment of Fig. 1.
With reference to figure 3, can 3D rendering unit 430 and 450 be separately positioned on the front and back of multi-channel decoder 440.3D rendering unit 430 can be removed 3D effect from scrambler 3D reduction audio signal, and will be input to multi-channel decoder 430 through the reduction audio signal that removal is obtained.Then, multi-channel decoder 430 decodable codes are to the reduction audio signal of its input, thereby generation is through pretreated 3D multi-channel signal.Perhaps, multi-channel decoder 430 can recover multi-channel signal from the 3D reduction audio signal through coding, and 3D rendering unit 450 can remove 3D effect from the multi-channel signal that is recovered, thereby generates through the 3D of aftertreatment multi-channel signal.
Operation generates with reduction audio mixing subsequently if the scrambler 3D that is provided by code device reduction audio signal is through carrying out the 3D rendering operations, and then scrambler 3D reduction audio signal can be decoded through carrying out multi-channel decoding operation and 3D rendering operations subsequently.On the other hand, if scrambler 3D reduction audio signal generates through carrying out operation of reduction audio mixing and 3D rendering operations subsequently, then scrambler 3D reduction audio signal can be decoded through carrying out the operation of 3D rendering operations and multi-channel decoding subsequently.
Can from the bit stream that code device transmits, extract indication is through before the operation of reduction audio mixing or after the operation of reduction audio mixing, carrying out the information that the 3D rendering operations is obtained through the 3D reduction audio signal of coding.
3D rendering unit 430 and 450 boths can be set at the front or the back of multi-channel decoder 440.
Fig. 4 is the block diagram of code device according to another embodiment of the invention.With reference to figure 4, code device comprises multi-channel encoder device 500,3D rendering unit 510, reduction audio mixing scrambler 520 and bit packing unit 530.With the detailed description of omitting the cataloged procedure identical with the embodiment of Fig. 1.
With reference to figure 4, multi-channel encoder device 500 generates reduction audio signal and spatial information based on the input multi-channel signal.3D rendering unit 510 generates 3D reduction audio signal through the reduction audio signal being carried out the 3D rendering operations.
Can determine whether the reduction audio signal is carried out the 3D rendering operations by user's selection or according to the ability of code device, the characteristic or the desired tonequality of reproducing environment.
Reduction audio mixing scrambler 520 codings are by the reduction audio signal of multi-channel encoder device 500 generations or the 3D reduction audio signal that is generated by 3D rendering unit 510.
Bit packing unit 530 based on spatial information and or through the reduction audio signal of coding or through coding, scrambler 3D reduction audio signal generates bit stream.The bit stream that is generated by bit packing unit 530 can comprise that the reduction audio signal through coding that indication is included in the bit stream does not have the non-3D reduction audio signal of 3D effect or the reduction audio mixing identification information with scrambler 3D reduction audio signal of 3D effect.More specifically, reduction audio mixing identification information can indicate the bit stream that is generated by bit packing unit 530 whether to comprise non-3D reduction audio signal, scrambler 3D reduction audio signal or the two.
Fig. 5 is the block diagram of decoding device according to another embodiment of the invention.With reference to figure 5, decoding device comprises bit split cells 540, reduction audio mixing demoder 550 and 3D rendering unit 560.With the detailed description of omitting the decode procedure identical with the embodiment of Fig. 1.
With reference to figure 5, bit split cells 540 extracts reduction audio signal, spatial information and the reduction audio mixing identification information through coding from incoming bit stream.It is that the non-3D through coding with 3D effect reduces audio signal or has the 3D reduction audio signal through encoding of 3D effect that reduction audio mixing identification information is indicated through the reduction audio signal of encoding.
If incoming bit stream comprises non-3D reduction audio signal and 3D reduction audio signal, then only can come from incoming bit stream, to extract one of non-3D reduction audio signal and 3D reduction audio signal by user's selection or according to the ability of decoding device, the characteristic or the required tonequality of reproducing environment.
550 pairs of reduction audio signal through coding of reduction audio mixing demoder are decoded.If the reduction audio signal of obtaining through the decoding of being carried out by reduction audio mixing demoder 550 is through carrying out the scrambler 3D reduction audio signal that the 3D rendering operations is obtained, then should the reduction audio signal can easily being reproduced.
On the other hand; If the reduction audio signal of obtaining through the decoding of being carried out by reduction audio mixing demoder 550 is the reduction audio signal with 3D effect, then 3D rendering unit 560 can generate demoder 3D reduction audio signal through the reduction audio signal of being obtained by the 550 execution decodings of reduction audio mixing demoder is carried out the 3D rendering operations.
Fig. 6 is the block diagram of decoding device according to another embodiment of the invention.With reference to figure 6, decoding device comprises bit split cells 600, reduction audio mixing demoder 610, a 3D rendering unit 620, the 2nd 3D rendering unit 630 and filter information storage unit 640.With the detailed description of omitting the decode procedure identical with the embodiment of Fig. 1.
Bit split cells 600 from incoming bit stream, extract through coding, scrambler 3D reduction audio signal and spatial information.610 pairs of warps of reduction audio mixing demoder are encoded, scrambler 3D reduction audio signal is decoded.
The one 3D rendering unit 620 is used to carry out the inverse filter of wave filter of the code device of 3D rendering operations, removes 3D effect from the scrambler 3D reduction audio signal that the decoding of being carried out by reduction audio mixing demoder 610 is obtained.Second rendering unit 630 utilizes the wave filter that is stored in the decoding device through carrying out the combination 3D reduction audio signal that the generation of 3D rendering operations has 3D effect to removing the reduction audio signal of obtaining by a 3D rendering unit 620 execution.
The 2nd 3D rendering unit 630 can be utilized its characteristic and be used to carry out the wave filter different filter execution 3D rendering operations of the coding unit of 3D rendering operations.For example, the 2nd 3D rendering unit 630 can utilize its coefficient HRTF different with the coefficient of the employed HRTF of code device to carry out the 3D rendering operations.
Filter information storage unit 640 storage is about the filter information of the wave filter that is used to carry out 3D and plays up, for example, and the HRTF coefficient information.The 2nd 3D rendering unit 630 filter informations that are stored in the filter information storage unit 640 capable of using generate combination 3D reduction audio mixing.
Filter information storage unit 640 can be stored many filter informations that correspond respectively to a plurality of wave filters.In this case, can select one of many filter informations by user's selection or according to the ability or the desired tonequality of decoding device.
Can have different ear structures from not agnate people.Therefore, the HRTF coefficient to Different Individual optimization can differ from one another.Decoding device shown in Fig. 6 can generate the 3D reduction audio signal to user optimization.In addition, what the decoding device shown in Fig. 6 can be regardless of the type of the HRTF that is provided by 3D reduction audio signal supplier, and generates the hrtf filter corresponding 3D reduction audio signal with 3D effect desired with the user.
Fig. 7 is the block diagram of 3D rendering device according to an embodiment of the invention.With reference to figure 7, the 3D rendering device comprises the first and second territory converting units 700 and 720 and 3D rendering unit 710.In order in predetermined territory, to carry out the 3D rendering operations, can the first and second territory converting units 700 and 720 be separately positioned on the front and back of 3D rendering unit 710.
With reference to figure 7, input reduction audio signal can convert frequency domain reduction audio signal to by the first territory converting unit 700.More specifically, the first territory converting unit 700 can will be imported the reduction audio signal through execution DFT or FFT and convert DFT territory reduction audio signal or FFT territory reduction audio signal to.
3D rendering unit 710 generates multi-channel signal through spatial information being put on the frequency domain reduction audio signal that is provided by the first territory converting unit 700.Afterwards, 3D rendering unit 710 generates 3D reduction audio signal through the filtering multi-channel signal.
The 3D reduction audio signal that is generated by 3D rendering unit 710 converts time domain 3D reduction audio signal to by the second territory converting unit 720.More specifically, the second territory converting unit 720 can be carried out IDFT or IFFT to the 3D reduction audio signal that is generated by 3D rendering unit 710.
During frequency domain 3D reduction audio signal converted time domain 3D reduction audio signal to, the loss of data or the data distortion of aliasing and so on possibly take place.
In order to generate multi-channel signal and the 3D reduction audio signal in the frequency domain, can the spatial information of each parameter band be mapped to frequency domain, and can a plurality of filter coefficients be converted to frequency domain.
3D rendering unit 710 can multiply each other through frequency domain reduction audio signal, spatial information and the filter coefficient that the first territory converting unit 700 is provided and generate 3D reduction audio signal.
Time-domain signal through reduction audio signal, spatial information and a plurality of filter coefficient all in M point frequency domain, represented being multiplied each other obtain has M useful signal.For expression reduction audio signal, spatial information and filter coefficient in M point frequency domain, can carry out M point DFT or M point FFT.
Useful signal is the signal that not necessarily has 0 value.For example, can generate x useful signal altogether through from sound signal, obtaining x signal via sampling.In this x useful signal, y useful signal is by zero padding.Then, the decreased number of useful signal is to (x-y).Afterwards, the signal with a useful signal and the signal with b useful signal be by convolution, thereby obtain (a+b-1) individual useful signal altogether.
Reducing multiplying each other of audio signal, spatial information and filter coefficient in the M point frequency domain can provide and convolution in time domain reduction audio signal, effect that spatial information is identical with filter coefficient.Signal with (3*M-2) individual useful signal can generate through the result who reduction audio signal, spatial information and filter coefficient in the M point frequency domain is converted to time domain and this conversion of convolution.
Therefore, maybe be different through reduction audio signal, spatial information and filter coefficient in the frequency domain being multiplied each other and multiplied result being converted to the number of the useful signal in the signal that time domain obtains with the number of useful signal in the signal that obtains through reduction audio signal, spatial information and filter coefficient in the convolution time domain.As a result, aliasing can take place during the reduction of the 3D in frequency domain audio signal was converted to time-domain signal.
In order to prevent aliasing, the number of the useful signal of the reduction audio signal in the time domain, the summation of number of number and filter coefficient of useful signal that is mapped to the spatial information of frequency domain can not be greater than M.The number of useful signal that is mapped to the spatial information of frequency domain can be confirmed according to the number of the point of frequency domain.In other words, if the spatial information that each parameter band is represented is mapped to N point frequency domain, then the number of the useful signal of spatial information can be N.
Comprise the first zero padding unit 701 and first frequency domain converting unit 702 with reference to figure 7, the first territory converting units 700.The 3rd rendering unit 710 comprises map unit 711, time domain converting unit 712, the second zero padding unit 713, second frequency domain converting unit 714, multi-channel signal generation unit 715, the 3rd zero padding unit 716, the 3rd frequency domain converting unit 717 and 3D reduction audio signal generation unit 718.
The reduction audio signal with X sample in the 701 pairs of time domains in the first zero padding unit is carried out the zero padding operation makes the number of samples of reduction audio signal to increase to M from X.First frequency domain converting unit 702 is reduced audio signal with zero padding and is converted M point frequency-region signal to.Reduction audio signal through zero padding has M sample.In M sample of the reduction audio signal of zero padding, only X sample is useful signal.
Map unit 711 maps to N point frequency domain with the spatial information of each parameter band.Time domain converting unit 712 will be converted to time domain through the spatial information that the mapping that map unit 711 is carried out is obtained.The spatial information that obtains through time domain converting unit 712 execution conversions has N sample.
The spatial information with N sample in the 713 pairs of time domains in the second zero padding unit is carried out the zero padding operation, makes the number of samples of spatial information to increase to M from N.Second frequency domain converting unit 714 will convert M point frequency-region signal through the spatial information of zero padding to.Spatial information through zero padding has N sample.In N sample of the spatial information of zero padding, only
Figure S2007800045157D00131
individual sample is effective.
The spatial information that multi-channel signal generation unit 715 provides through the reduction audio signal that makes first frequency domain converting unit 712 and provide and the second frequency domain converting unit 714 generation multi-channel signal that multiplies each other.The multi-channel signal that is generated by multi-channel signal generation unit 715 has M useful signal.On the other hand, the multi-channel signal that reduction audio signal that is provided by first frequency domain converting unit 712 through convolution in time domain and the spatial information that is provided by second frequency domain converting unit 714 obtain has (X+N-1) individual useful signal.
The zero padding operation can be carried out to the Y filter coefficient of representing in the time domain in the 3rd zero padding unit 716, makes the number of sample can increase to M.The 3rd frequency domain converting unit 717 converts the zero padding filter coefficient to M point frequency domain.Filter coefficient through zero padding has M sample.In M sample, only Y sample is useful signal.
A plurality of filter coefficients that 3D reduction audio signal generation unit 718 provides through multi-channel signal that multi-channel signal generation unit 715 is generated and the 3rd frequency domain converting unit 717 multiply each other and generate 3D and reduce audio signal.The 3D reduction audio signal that is generated by 3D reduction audio signal generation unit 718 has M useful signal.On the other hand, the 3D reduction audio signal that multi-channel signal that is generated by multi-channel signal generation unit 715 through convolution in time domain and the filter coefficient that is provided by the 3rd frequency domain converting unit 717 obtain has (X+N+Y-2) individual useful signal.
Through first, second and the 3rd frequency domain converting unit 702 employed M point frequency domains being arranged to satisfied following equation: M >=(X+N+Y-2) prevent that aliasing from being possible.In other words, might prevent aliasing through M point DFT or the M point FFT that first, second and the 3rd frequency domain converting unit 702,714 and 717 can be carried out satisfy following equation: M >=(X+N+Y-2).
Carry out except that DFT bank of filters, fft filters group and the bank of filters the QMF group to the conversion of frequency domain is capable of using.The generation hrtf filter capable of using of 3D reduction audio signal is carried out.
The number of the useful signal of the spatial information method except that said method capable of using is regulated, or most effective and need the minimum a kind of method of calculated amount to regulate in the said method capable of using.
Not only signal, coefficient or spatial information be converted to from frequency domain time domain or conversely during, and signal, coefficient or spatial information be converted to from the QMF territory hybrid domain or conversely during, all aliasing can take place.The above-mentioned method that prevents aliasing also be used in signal, coefficient or spatial information from the QMF territory be converted to hybrid domain or conversely during prevent that aliasing from taking place.
The spatial information that is used to generate multi-channel signal or 3D reduction audio signal can change.Result as spatial information changes can take place as the signal of noise discontinuous in the output signal.
Noise smoothing method capable of using in the output signal reduces, and can prevent that through this smoothing method spatial information from changing fast.
For example, when first spatial information that when first frame and second frame are adjacent one another are, puts on first frame and the spatial information that puts on second frame not simultaneously, between first frame and second frame, take place discontinuous most probably.
In this case; First spatial information capable of using compensates second spatial information or utilizes second spatial information to compensate first spatial information; The difference of the win spatial information and second spatial information can be reduced, thereby can reduce by the discontinuous noise that causes between first and second frames.More specifically, the mean value of available first spatial information of at least one in first spatial information and second spatial information and second spatial information replaces, thereby reduces noise.
Noise is also probably owing to discontinuous generation the between a pair of adjacent parameter frequency band.For example, when when first and second parameter band are adjacent one another are corresponding to the 3rd spatial information of first parameter band with not simultaneously corresponding to the 4th spatial information of second parameter band, discontinuous possibly between first and second parameter band, the generation.
In this case; The 4th spatial information capable of using compensates the 3rd spatial information or utilizes the 3rd spatial information to compensate the 4th spatial information; Make the difference of the 3rd spatial information and the 4th spatial information to reduce, and can reduce by the discontinuous noise that causes between first and second parameter band.More specifically, the mean value of available the 3rd spatial information of at least one in the 3rd spatial information and the 4th spatial information and the 4th spatial information replaces, thereby reduces noise.
The discontinuous noise that the causes method except that said method capable of using by between a pair of consecutive frame or between a pair of adjacent parameter frequency band reduces.
More specifically, can each frame be multiply by the window such as peaceful (Hanning) window of the Chinese, and can " overlapping and interpolation " scheme be put on the result of multiplication, make that the change between the frame can reduce.Perhaps, the output signal that is applied with many spatial informations can be by smoothly, makes change between a plurality of frames that can prevent to export signal.
The spatial information of for example ICC and so on capable of using is regulated the decorrelation between the sound channel in the DFT territory as follows.
Can multiply by the degree that predetermined value is regulated decorrelation through the coefficient that makes the signal that is input to one to two (OTT) or two to three (TTT) frame.Predetermined value can be limited following equation: (A+ (1-A*A) ^0.5*i), and wherein the A indication puts on the ICC value of the predetermined frequency band of OTT or TTT frame, and i indication imaginary part.Imaginary part can be positive or negative.
Predetermined value can be with the weighting factor according to the characteristic of signal, the characteristic of signal for example the signal of energy level, each frequency of signal energy response or apply the type of the frame of ICC value A.As the result who introduces weighting factor, the degree that can further regulate decorrelation, but and level and smooth or interpolation method between application of frame.
As above described, can in frequency domain, generate 3D through HRTF or the header coherent pulse response (HRIR) that use is converted to frequency domain and reduce audio signal with reference to figure 7.
Perhaps, 3D reduction audio signal can generate with the reduction audio signal through convolution HRIR in time domain.Can the 3D reduction audio signal that generate in the frequency domain be stayed in the frequency domain, and not carry out contrary territory conversion.
For convolution HRIR in time domain and reduction audio signal, can use finite impulse response (FIR) (FIR) wave filter or IIR (IIR) wave filter.
As stated, relate to the combination of using the HRTF in the frequency domain or being converted to first method of the HRIR of frequency domain, second method that relates to convolution HRIR in time domain or first and second methods and generate 3D reduction audio signal according to the code device of the embodiment of the invention and decoding device are capable of using.
Fig. 8 to 11 illustrates bit stream according to an embodiment of the invention.
With reference to figure 8, bit stream comprises: comprise the multi-channel decoding information field that generates the multi-channel signal information needed, comprise the 3D that generates 3D reduction audio signal information needed and play up information field and comprise to utilize and be included in the information in the multi-channel decoding information field and be included in the header fields that 3D plays up the required header information of information in the information field.Bit stream can only comprise that multi-channel decoding information field, 3D play up one or two in information field and the header fields.
With reference to figure 9, the bit stream that contains the necessary supplementary of decode operation can comprise: comprise the customized configuration header fields of whole header information through coded signal and comprise a plurality of frame data field about the supplementary of a plurality of frames.More specifically, each frame data field can comprise: the frame header fields and the frame parameter data field that comprises the spatial information of respective frame that comprise the header information of respective frame.Perhaps, each in the frame data field only can comprise the frame parameter data field.
In the frame parameter data field each can comprise a plurality of modules, and each module comprises sign and supplemental characteristic.Module is to comprise supplemental characteristic such as spatial information and the data set that such as gain of reduction audio mixing and smoothed data, improves necessary other data of signal tonequality.
If receive about module data having no under the situation of additional mark by frame header fields specified message; If further classified by frame header fields specified message; If perhaps receive additional mark and data together with not by frame header specified message, then module data can not comprise any sign.
About the supplementary of 3D reduction audio signal, for example the HRTF coefficient information can be included at least one in customized configuration header fields, frame header fields and the frame parameter data field.
With reference to Figure 10, bit stream can comprise: comprise a plurality of multi-channel decoding information fields that generate the multi-channel signal information necessary and comprise a plurality of 3D that generate 3D reduction audio signal information necessary and play up information field.
When receiving bit stream, decoding device can use multi-channel decoding information field or 3D to play up information field and carry out decode operation and skip any multi-channel decoding information field and 3D that in decode operation, does not use and play up information field.In this case, can confirm that in the information field which multi-channel decoding information field and 3D play up and will be used to carry out decode operation according to the type of the signal that will reproduce.
In other words, in order to generate multi-channel signal, decoding device can be skipped 3D and play up information field, and reads the information that is included in the multi-channel decoding information field.On the other hand, in order to generate 3D reduction audio signal, decoding device can be skipped the multi-channel decoding information field, and reads and be included in 3D and play up the information in the information field.
Some the method for skipping in a plurality of fields in the bit stream is following.
At first, can be included in the bit stream about the big or small field length information of the bit of field.In this case, can skip this field through skipping corresponding to the bit number of field bit size.Can be with of the beginning of field length information setting in field.
The second, can synchronization character be arranged on the terminal or beginning of field.In this case, can skip this field through location positioning field based on synchronization character.
The 3rd, if confirm and fixed the length of field in advance, then can skip this field through skipping corresponding to the data volume of the length of this field.Can be included in fixed field length information in the bit stream or be stored in the decoding device about field length.
The 4th, one of a plurality of fields are skipped in the above-mentioned field skipping method capable of using two kinds or more kinds of combinations.
It is to skip the field information necessary that field such as field length information, synchronization character or fixed field length information is skipped information; Can it be included in one of customized configuration header fields shown in Figure 9, frame header fields and frame parameter data field, maybe can it be included in the field beyond the field shown in Figure 9.
For example; In order to generate multi-channel signal; Decoding device can be skipped 3D and plays up information field with reference to being arranged on field length information, synchronization character or fixed field length information that each 3D plays up the beginning of information field, and reads the information that is included in the multi-channel decoding information field.
On the other hand; In order to generate 3D reduction audio signal; Decoding device can be skipped the multi-channel decoding information field with reference to field length information, synchronization character or the fixed field length information of the beginning that is arranged on each multi-channel decoding information field, and reads and be included in 3D and play up the information in the information field.
Bit stream can comprise that the data that indication is included in this bit stream generate the necessary or generation 3D reduction audio signal information necessary of multi-channel signal.
Yet; Even bit stream does not comprise any spatial information such as CLD; And only comprise that generating 3D (for example reduces the necessary data of audio signal; The hrtf filter coefficient), also can reproduce multi-channel signal, and not need spatial information through utilizing the necessary data of generation 3D reduction audio signal to decode.
For example, obtain the stereo parameter of conduct from the reduction audio signal about the spatial information of two sound channels.Then, convert stereo parameter to spatial information, and generate multi-channel signal through putting on the reduction audio signal by the spatial information that conversion is obtained about a plurality of sound channels to be reproduced.
On the other hand; Even only comprising, bit stream generates the necessary data of multi-channel signal; Also can reproduce the reduction audio signal and need not add decode operation, maybe can reproduce 3D reduction audio signal through utilizing additional hrtf filter that reduction audio signal execution 3D is handled.
Generate the necessary data of multi-channel signal and generate the necessary data of 3D reduction audio signal if bit stream comprises, then can allow the user to determine to reproduce multi-channel signal or 3D reduction audio signal.
With describing the method for skipping data in detail with reference to corresponding sentence structure separately hereinafter.
Sentence structure 1 indication is the method for unit decoded audio signal with the frame.
[sentence structure 1]
SpatialFrame()
{
Framinglnfo();
bslndependencyFIag;
OttData();
TttData();
SmgData();
TempShapeData();
if[bsArbitraryDownmix){
ArbitraryDownmixData();
}
if(bsResidualCoding){
ResidualData();
}
}
In sentence structure 1; Ottdata () and TttData () are the module of expression from the reduction audio signal recovery necessary parameter of multi-channel signal (such as the spatial information that comprises CLD, ICC and CPC), and SmgData (), TempShapeData (), ArbitraryDownmixData () and ResidualData () are expression improves the tonequality information necessary through correction coding contingent distorted signals of operating period modules.
For example; If parameter such as CLD, ICC or CPC and the information that is included among the modules A rbitraryDownmixData () are only used during decode operation, the module SmgData () and the TempShapeData () that then are arranged between module TttData () and the ArbitraryDownmixData () are unessential.Therefore, skip module SmgData () and TempShapeData () is efficient.
To describe the method for skipping module according to an embodiment of the invention in detail with reference to following sentence structure 2 hereinafter.
[sentence structure 2]
..
TttData();
SkipData(){
bsSkipBits;
}
SmgData();
TempShapeData();
if[bsArbitraryDownmix){
ArbitraryDownmixData();
}
..
With reference to sentence structure 2, can module SkipData () be arranged on before the quilt module of skipping, and the bit size of the module that will be skipped is designated as bsSkipBits in module SkipData ().
In other words; Suppose that module SmgData () and TempShapeData () will be skipped; And the module SmgData () of combination and the bit size of TempShapeData () are 150, then can be set to 150 through bsSkipBits and skip module SmgData () and TempShapeData ().
Hereinafter will be with reference to the sentence structure 3 detailed descriptions method of skipping module according to another embodiment of the invention.
[sentence structure 3]
..
TttData();
bsSkipSyncflag;
SmgData();
TempShapeData();
bsSkipSyncword;
if[bsArbitraryDownmix){
ArbitraryDownmixData();
}
..
With reference to figure 3; Can skip unnecessary module through using bsSkipSyncflag and bsSkipSyncword; BsSkipSyncflag is that to indicate the sign that whether uses synchronization character, bsSkipSyncword be can be set at by the synchronization character of the end of the module skipped.
More specifically; Be arranged so that synchronization character can use if will indicate bsSkipSyncflag, then indicate the one or more modules between bsSkipSyncflag and the synchronization character bsSkipSyncword---be that module SmgData () and TempShapeData () can be skipped.
With reference to Figure 11, bit stream can comprise: comprise the multichannel header fields of reproducing the necessary header information of multi-channel signal, comprise the 3D that reproduces the necessary header information of 3D reduction audio signal and play up header fields and comprise and reproduce a plurality of multi-channel decoding information fields that multi-channel signal institute must data.
In order to reproduce multi-channel signal, decoding device can be skipped 3D and play up header fields, and from multichannel header fields and multi-channel decoding information field reading of data.
Skip method that 3D plays up header fields with above identical, therefore, can skip its detailed description with reference to the described field skipping method of Figure 10.
In order to reproduce 3D reduction audio signal, decoding device can be played up the header fields reading of data from multi-channel decoding information field and 3D.For example, the decoding device reduction audio signal that is included in the multi-channel decoding information field capable of using generates 3D reduction audio signal with the HRTF coefficient information that is included in the 3D reduction audio signal.
Figure 12 is the block diagram that is used to handle the coding/decoding device of any reduction audio signal according to an embodiment of the invention.With reference to Figure 12, reducing audio signal arbitrarily is the reduction audio signal except the reduction audio signal that is generated by the multi-channel encoder device 801 that is included in the code device 800.With the detailed description of omitting the process identical with the embodiment of Fig. 1.
With reference to Figure 12, code device 800 comprises multi-channel encoder device 801, spatial information synthesis unit 802 and comparing unit 803.
Multi-channel encoder device 801 will be imported multi-channel signal reduction audio mixing and become stereo or monophony reduction audio signal, and generate from the necessary fundamental space information of reduction audio signal recovery multi-channel signal.
Comparing unit 803 will reduce audio signal and compare with reducing audio signal arbitrarily, and generate compensated information based on result relatively.Compensated information be compensation arbitrarily the reduction audio signal make that the reduction audio signal can be converted near the reduction audio signal necessary arbitrarily.Decoding device compensated information capable of using compensates any reduction audio signal, and utilizes through any reduction audio signal of compensation and recover multi-channel signal.The multi-channel signal that recovers more is similar to original input multi-channel signal than the multi-channel signal that recovers from any reduction audio signal that is generated by multi-channel encoder device 801.
Compensated information can be the poor of reduction audio signal and any reduction audio signal.Decoding device can compensate any down-mix audio signal in the Calais with the difference of reducing audio signal arbitrarily with any reduction audio signal mutually through reducing audio signal.
The reduction audio signal can be the reduction audio mixing gain of the indication reduction audio signal and the difference of the energy level that reduces audio signal arbitrarily with the difference of reducing audio signal arbitrarily.
Can be directed against each frequency band, each time/time slot and/or each sound channel and confirm the gain of reduction audio mixing.For example, the gain of part reduction audio mixing can be confirmed to each frequency band, and the gain of another part reduction audio mixing can be confirmed to each time slot.
Reducing the audio mixing gain can or be that each frequency band that reduces audio signal optimization is arbitrarily confirmed to each parameter band.Parameter band is the frequency interval that is applied with the spatial information of parameter type.
Can be with reduction audio signal and the residual quantityization of the energy level of reducing audio signal arbitrarily.The resolution of quantization level of difference that quantizes the reduction audio signal and reduce the energy level of audio signal arbitrarily can and quantize the reduction audio signal and reduces the resolution of quantization level of the CLD between the audio signal arbitrarily identical or different.In addition, the reduction audio signal and the quantification of difference that reduces the energy level of audio signal arbitrarily can relate to and use all or part of of the quantization level that quantizes the reduction audio signal and reduce the CLD between the audio signal arbitrarily.
Because the resolution that reduction audio signal and the resolution of difference of the energy level of audio signal of reducing arbitrarily generally are lower than the reduction audio signal and reduce the CLD between the audio signal arbitrarily; So and the resolution compared that quantizes to reduce audio signal and reduce the quantization level of the CLD between the audio signal arbitrarily, the resolution that quantizes the reduction audio signal and the quantization level of the difference of the energy level that reduces audio signal arbitrarily can have small value.
The compensation compensated information of reduction audio signal arbitrarily can be the extend information that comprises residual information, and its appointment can not utilize the component of the input multi-channel signal of any reduction audio signal or reduction audio mixing gain recovery.Decoding device extend information capable of using recovers to utilize the component of the input multi-channel signal of any reduction audio signal or reduction audio mixing gain recovery, thereby recovers hardly the signal that can distinguish with original input multi-channel signal.
The method that generates extend information is following.
Multi-channel encoder device 801 can generate the information relevant with the component that reduces the input multi-channel signal that audio signal lacks as first extend information.Decoding device can recover hardly the signal that can distinguish with original input multi-channel signal through first extend information being applied to utilize reduction audio signal and basic spatial information to generate multi-channel signal.
Perhaps, multi-channel encoder device 801 reduction audio signal capable of using and fundamental space information are recovered multi-channel signal, and the difference of multi-channel signal that generates the multi-channel signal recovered and original input is as first extend information.
Comparing unit 803 can generate and the component that reduces the reduction audio signal that audio signal lacked arbitrarily---promptly can not utilize the component of the reduction audio signal of reduction audio mixing gain compensation---, and relevant information is as second extend information.Decoding device any reduction audio signal capable of using and second extend information are recovered almost the signal that can not distinguish with the reduction audio signal.
Except that said method, extend information various residual error interpretation methods also capable of using generate.
Gain of reduction audio mixing and extend information both can be used as compensated information.More specifically, can obtain gain of reduction audio mixing and extend information to the whole frequency band of reduction audio signal, and can be with them together as compensated information.Perhaps, can the gain of reduction audio mixing be used as the compensated information to a part of frequency band of reduction audio signal, and with the compensated information of extend information as another part frequency band of reduction audio signal.For example, can extend information be used as the compensated information of the low-frequency band of reduction audio signal, and will reduce the compensated information of audio mixing gain as the high frequency band of reduction audio signal.
Also can with except that the low-frequency band of reduction audio signal, be used as compensated information such as the relevant extend information of the peak value of appreciable impact tonequality or the fractional reduction audio signal the recess.
Spatial information synthesis unit 802 synthetic fundamental space information (for example, CLD, CPC, ICC and CTD) and compensated informations, thereby span information.In other words, the spatial information that is sent to decoding device can comprise fundamental space information, the gain of reduction audio mixing and first and second extend informations.
Spatial information can be included in the bit stream together with reducing audio signal arbitrarily, and can bit stream be sent to decoding device.
Extend information is encoded with any reduction audio signal audio coding method such as AAC method, MP3 method or BSAC method capable of using.Extend information is with reduction audio signal identical audio coding method capable of using or different audio coding methods are encoded arbitrarily.
If extend information is utilized identical audio coding method coding with any reduction audio signal, decode extend information and reduction audio signal arbitrarily of decoding device single audio frequency coding/decoding method capable of using then.In this case, always can be decoded because reduce audio signal arbitrarily, so extend information also always can be decoded.Yet; Generally be input to decoding device as the pulse code modulation (pcm) signal because reduce audio signal arbitrarily; The type of reducing the audio codec of audio signal arbitrarily that is used to encode may not easily be discerned; Therefore, be used to the to encode type of audio codec of extend information possibly can not easily be discerned.
Therefore, the relevant audio codec information of type of the audio codec of reduction audio signal and extend information can be inserted into bit stream with being used for encoding arbitrarily.
More specifically, can audio codec information be inserted the customized configuration header fields of bit stream.In this case, decoding device can extract audio codec information from the customized configuration header fields of bit stream, and uses the audio codec information decoding that is extracted to reduce audio signal and extend information arbitrarily.
On the other hand, if reduce audio signal arbitrarily and extend information utilizes the different coding method to encode, then extend information may not be decoded.In this case, because can not discern the end of extend information, so can not carry out further decode operation.
In order to address this problem, can the audio codec information relevant with the type of the audio codec that is respectively applied for coding any reduction audio signal and extend information be inserted the customized configuration header fields of bit stream.Then, decoding device can read audio codec information from the customized configuration header fields of bit stream, and uses the information that the reads extend information of decoding.If decoding device does not comprise the decoding unit of any decodable code extend information, then may not further carry out the decoding of extend information, and can read the extend information information afterwards that is right after.
Can represent by the syntax elements of the customized configuration header fields that is included in bit stream with the relevant audio codec information of type of the audio codec of the extend information that is used for encoding.For example, audio coding decoding information can be represented by 4 bit syntax elements bsResidualCodecType, like what indicate in the following table 1.
Table 1
Figure S2007800045157D00231
Extend information not only can comprise residual information, also can comprise the sound channel extend information.The sound channel extend information is to have the more multi-channel signal information necessary of multichannel with being expanded into through the multi-channel signal that utilizes the spatial information decoding to obtain.For example, the sound channel extend information can be that 5.1 sound channel signals or 7.1 sound channel signals are expanded into 9.1 sound channel signal information necessary.
Can extend information be included in the bit stream, and can bit stream be sent to decoding device.Then, decoding device can compensate the reduction audio signal, or utilizes extend information to expand multi-channel signal.Yet decoding device can be skipped extend information, rather than from bit stream, extracts extend information.For example, the 3D reduction audio signal generation multi-channel signal or the utilization that are included in the bit stream in utilization are included under the situation of the reduction audio signal generation 3D reduction audio signal in the bit stream, and decoding device can be skipped extend information.
Skipping the method that is included in the extend information in the bit stream can be with above identical with reference to one of described field skipping method of Figure 10.
For example, at least one in the fixed bit property size information of the fixed bit size of the synchronization character of the extend information bit size information that is attached to the beginning of the bit stream that comprises extend information and indicates the bit size of extend information capable of using, the beginning that is attached to the field that comprises extend information or end and indication extend information skipped.Can bit size information, synchronization character and fixed bit size information all be included in the bit stream.Also can the fixed bit size information be stored in the decoding device.
With reference to Figure 12, decoding unit 810 comprises reduction audio mixing compensating unit 811,3D rendering unit 815 and multi-channel decoder 816.
Reduction audio mixing compensating unit 811 utilizes and is included in the compensated information in the spatial information---and for example utilize gain of reduction audio mixing or extend information to compensate any reduction audio signal.
3D rendering unit 815 is through generating demoder 3D reduction audio signal to carry out the 3D rendering operations through the reduction audio signal of compensation.Multi-channel decoder 816 utilizes the reduction audio signal through compensation to generate the 3D multi-channel signal with the fundamental space information that is included in the spatial information.
Reduction audio mixing compensating unit 811 can compensate any reduction audio signal by following mode.
If compensated information is the gain of reduction audio mixing, then reduce audio mixing compensating unit 811 and utilize reduction audio mixing gain compensation to reduce the energy level of audio signal arbitrarily, make that the reduction audio signal can be converted into the signal that is similar to the reduction audio signal arbitrarily.
If compensated information is second extend information, then reduces 811 second extend information compensation capable of using of audio mixing compensating unit and reduce the component that audio mixing information is lacked arbitrarily.
Multi-channel decoder 816 can generate multi-channel signal through prematrix M1, audio mixing matrix M 2 and rearmounted matrix M 3 sequentially being put on reduction audio mixing matrix signal.In this case, second extend information is used in audio mixing matrix M 2 is put on reduction audio signal compensating during reduction audio signal.In other words, second extend information can be used for compensating the reduction audio signal that has been applied with prematrix M1.
As stated, can come optionally to compensate each in a plurality of sound channels through extend information being applied to generate multi-channel signal.For example, if extend information is put on the center channel of audio mixing matrix M 2, then can be by the L channel and the right channel component of extend information compensation reduction audio signal.If extend information is put on the L channel of audio mixing matrix M 2, then can be by the left channel component of extend information compensation reduction audio signal.
Gain of reduction audio mixing and extend information both can be used as compensated information.For example, extend information compensation capable of using is the low-frequency band of reduction audio signal arbitrarily, and reduction audio mixing gain compensation capable of using reduces the high frequency band of audio signal arbitrarily.In addition, but extend information also capable of using compensation remove the low-frequency band of reducing audio signal arbitrarily, reduce audio signal arbitrarily such as the peak value of appreciable impact tonequality or the sector of breakdown of recess.With the relevant information of part by extend information compensation can be included in the bit stream.Whether the reduction audio signal that indication is included in the bit stream is whether information and the indication bit stream that reduces audio signal arbitrarily comprises that the information of compensated information can be included in the bit stream.
For the reduction audio signal clipped wave that prevents to generate by coding unit 800, can be with the reduction audio signal divided by predetermined gain.Predetermined gain can have quiescent value or dynamic value.
Reduction audio mixing compensating unit 811 can recover original reduction audio signal through utilizing predetermined gain to be compensated for as to prevent the reduction audio signal that slicing weakens.
Can easily reproduce any reduction audio signal by 811 compensation of reduction audio mixing compensating unit.Perhaps, any reduction audio signal also to be compensated can be input to 3D rendering unit 815, and can convert demoder 3D reduction audio signal to by 3D rendering unit 815.
With reference to Figure 12, reduction audio mixing compensating unit 811 comprises the first territory converter 812, compensation processor 813 and the second territory converter 814.
The territory that the first territory converter 812 will reduce audio signal arbitrarily converts predetermined domain to.Compensation processor 813 utilizes compensated information---for example, gain of reduction audio mixing or extend information---compensate any reduction audio signal in the predetermined domain.
The compensation of reduction audio signal can be carried out in the QMF/ hybrid domain arbitrarily.For this reason, the first territory converter 812 can be carried out the QMF/ hybrid analysis to any reduction audio signal.The first territory converter 812 can convert the territory that reduces audio signal arbitrarily to except that the QMF/ hybrid domain territory, for example, and the frequency domain such as DFT or FFT territory.The compensation of reduction audio signal also can be carried out in the territory except that the QMF/ hybrid domain arbitrarily, for example, and frequency domain or time domain.
The second territory converter 814 will convert to and the identical territory of original any reduction audio signal through the territory of any reduction audio signal of compensation.More specifically, the second territory converter 814 converts the territory through any reduction audio signal of compensation and the identical territory of original any reduction audio signal to through oppositely carrying out by the performed territory conversion operations of the first territory converter 812.
For example, the second territory converter 814 can be through the QMF/ mixing is synthetic will to convert time-domain signal to through any reduction audio signal of compensation to carrying out through any reduction audio signal of compensation.Equally, the second territory converter 814 can be carried out IDFT or IFFT to any reduction audio signal through compensation.
Be similar to 3D rendering unit 710 shown in Figure 7,3D rendering unit 815 can be to any reduction audio signal execution 3D rendering operations in frequency domain, QMF/ hybrid domain or time domain, through compensating.For this reason, this 3D rendering unit 815 can comprise territory converter (not shown).The territory converter will convert the territory that will carry out the 3D rendering operations to through the territory of any reduction audio signal of compensation, or the territory of the signal that obtains through the 3D rendering operations of conversion.
Wherein the territory of reduction audio signal can be identical or different with the territories that 815 pairs of any reduction audio signal through compensation of 3D rendering unit are wherein carried out the 3D rendering operations arbitrarily in compensation processor 813 compensation.
Figure 13 is the block diagram that reduces audio mixing compensation/3D rendering unit 820 according to an embodiment of the invention.With reference to Figure 13, reduction audio mixing compensation/3D rendering unit 820 comprises that the first territory converter 821, the second territory converter 822, compensation/3D play up processor 823 and the 3rd territory converter 824.
Reduction audio mixing compensation/3D rendering unit 820 can be carried out compensating operation and 3D rendering operations to any reduction audio signal in individual domain, thereby reduces the calculated amount of decoding device.
More specifically, the first territory converter 821 territory that will reduce audio signal arbitrarily converts first territory that wherein will carry out compensating operation and 3D rendering operations to.The second territory converter, 822 transformed space information, it comprises that generating the necessary fundamental space information of multi-channel signal reduces the necessary compensated information of audio signal arbitrarily with compensation, makes spatial information become applicable to first territory.Compensated information can comprise at least one in gain of reduction audio mixing and the extend information.
For example, the second territory converter 822 can be mapped to frequency band with the compensated information corresponding to parameter band in the QMF/ hybrid domain, makes compensated information to become and easily is applicable to frequency domain.
First territory can be frequency domain, QMF/ hybrid domain or the time domain such as DFT or FFT.Perhaps, first territory can be the territory the territory of in this paper, stating.
In the transition period of compensated information, time delay can take place.In order to address this problem, the second territory converter 822 can be carried out the delay compensation operation, makes that the territory of compensated information and the time delay between first territory can be compensated.
Compensation/3D plays up processor 823 and utilizes the spatial information of warp conversion that any reduction audio signal in first territory is carried out compensating operation, then the signal that obtains through compensating operation is carried out the 3D rendering operations.Compensation/3D plays up processor 823 can be by carrying out compensating operation and 3D rendering operations with the different order of this paper statement.
Compensation/3D plays up processor 823 can carry out compensating operation and 3D rendering operations to any reduction audio signal simultaneously.For example; Compensation/3D plays up processor 823 and can generate through the 3D of compensation reduction audio signal through using new filter coefficient that any reduction audio signal in first territory is carried out the 3D rendering operations, and this new filter coefficient is the combination of compensated information and the existing filter coefficient that in the 3D rendering operations, uses usually.
The 3rd territory converter 824 will compensate/and territory that 3D plays up the 3D reduction audio signal that processor 823 generated converts frequency domain to.
Figure 14 is used to handle the compatible block diagram that reduces the decoding device 900 of audio signal according to embodiments of the invention.With reference to Figure 14, decoding device 900 comprises first multi-channel decoder 910, the compatible processing unit 920 of reduction audio mixing, second multi-channel decoder 930 and 3D rendering unit 940.With the detailed description of omitting the decode procedure identical with the embodiment of Fig. 1.
Compatible reduction audio signal is can be by the reduction audio signal of two or more multi-channel decoder decodings.In other words, compatibility reduction audio signal is initial to being scheduled to multi-channel decoder optimization, can converting the reduction audio signal to the signal of optimizing except that this multi-channel decoder predetermined multi-channel decoder to through compatibility processing operation then.
With reference to Figure 14, suppose that the compatibility reduction audio signal of input is optimized to first multi-channel decoder 910.In order to make the compatibility reduction audio signal of second multi-channel decoder, 930 decoding inputs; The compatible processing unit 920 of reduction audio mixing can be carried out the compatible operation of handling to the compatibility reduction audio signal of input, makes the compatibility reduction audio signal of input can be converted into the signal of optimizing to second multi-channel decoder 930.First multi-channel decoder 910 generates first multi-channel signal through the compatibility reduction audio signal of decoding input.First multi-channel decoder 910 can not need spatial information to decode through the compatibility reduction audio signal of only using input to generate multi-channel signal.
Second multi-channel decoder 930 utilizes the reduction audio signal of being obtained by the compatibility processing operation of compatible processing unit 920 execution of reduction audio mixing to generate second multi-channel signal.3D rendering unit 940 can be carried out the 3D rendering operations through the reduction audio signal that the compatibility processing operation of being carried out by the compatible processing unit 920 of reduction audio mixing is obtained and generate demoder 3D reduction audio signal.
Compatibility information such as inverse matrix capable of using will convert the reduction audio signal of optimizing to the multi-channel decoder except that predetermined multi-channel decoder to the compatibility reduction audio signal of predetermined multi-channel decoder optimization.For example when having the first and second multi-channel encoder devices that utilize the different coding method and utilizing first and second multi-channel decoders of different coding/coding/decoding method; Code device can put on the reduction audio signal that the first multi-channel encoder device generates with matrix, thereby generates the compatibility reduction audio signal of optimizing to second multi-channel decoder.Then, decoding device can put on the compatibility reduction audio signal that is generated by code device with inverse matrix, thereby generates the compatibility reduction audio signal of optimizing to first multi-channel decoder.
With reference to Figure 14, compatible processing unit 920 inverse matrixs capable of using of reduction audio mixing are carried out the compatible operation of handling to the compatibility reduction audio signal of input, thereby generate the reduction audio signal of optimizing to second multi-channel decoder 930.
Can be stored in advance in the decoding device 900 with the relevant information of the compatible processing unit 920 employed inverse matrixs of reduction audio mixing, maybe can be included in the bit stream of code device transmission.In addition, to be included in reduction audio signal in the incoming bit stream be to reduce audio signal arbitrarily or the information of compatible reduction audio signal can be included in the incoming bit stream in indication.
With reference to Figure 14, the compatible processing unit 920 of reduction audio mixing comprises the first territory converter 921, compatible processor 922 and the second territory converter 923.
The territory of the compatibility reduction audio signal that the first territory converter 921 will be imported converts predetermined domain to; And compatible processor 922 utilizes the compatibility information such as inverse matrix to carry out the compatible operation of handling, and makes that the compatible reduction of the input audio signal in predetermined domain can be converted into the signal of optimizing to second multi-channel decoder 930.
Compatible processor 922 can be carried out the compatible operation of handling in the QMF/ hybrid domain.For this reason, the first territory converter 921 can be carried out the QMF/ hybrid analysis to the compatibility reduction audio signal of input.Equally; The first territory converter 921 can convert the territory of the compatibility reduction audio signal of importing to except that the QMF/ hybrid domain territory; For example; Frequency domain such as DFT or FFT territory, and compatible processor 922 can be in the territory except that the QMF/ hybrid domain---as carrying out the compatible operation of handling in frequency domain or the time domain.
The territory of operating the compatibility reduction audio signal of obtaining is handled in 923 conversions of the second territory converter by compatibility.More specifically, the second territory converter 923 can convert to and the identical territory of original input compatible reduction audio signal handle the territory of operating the compatibility reduction audio signal of obtaining through compatibility through oppositely carrying out by the first territory converter, 921 performed territory conversion operations.
For example, the second territory converter 923 can be through carrying out synthetic will the processing by compatibility of QMF/ hybrid domain and operate the compatibility of obtaining and reduce audio signal and convert time-domain signal to being handled compatibility reduction audio signal that operation obtains by compatibility.Perhaps, the second territory converter 923 can be carried out IDFT or IFFT to the compatibility reduction audio signal of being obtained by the compatible processing operation.
3D rendering unit 940 can in frequency domain, QMF/ hybrid domain or time domain, handle the compatibility reduction audio signal that operation obtains by compatibility and carry out the 3D rendering operations.For this reason, this 3D rendering unit 940 can comprise territory converter (not shown).The territory of the compatibility reduction audio signal that the territory converter will be imported converts the territory that wherein will carry out the 3D rendering operations to, or changes the territory of the signal that is obtained by the 3D rendering operations.
Wherein compatible processor 922 carries out that compatible territory of handling operation can to carry out the territory of 3D rendering operations identical or different with 3D rendering unit 940 wherein.
Figure 15 is the block diagram that reduces the compatible processing/3D rendering unit 950 of audio mixing according to an embodiment of the invention.With reference to Figure 15, the compatible processing/3D rendering unit 950 of reduction audio mixing comprises that the first territory converter 951, the second territory converter 952, compatibility/3D play up processor 953 and the 3rd territory converter 954.
Compatible processing/3D the rendering unit 950 of reduction audio mixing is carried out compatible operation and the 3D rendering operations handled in individual domain, thereby reduces the calculated amount of decoding device.
The compatibility reduction audio signal that the first territory converter 951 will be imported is converted to wherein will carry out compatible first territory of handling operation and 3D rendering operations.The second territory converter, 952 transformed space information and compatibility informations, for example inverse matrix makes spatial information and compatibility information to become and is applicable to first territory.
For example, the second territory converter 952 can be mapped to frequency domain with the inverse matrix corresponding to parameter band in the QMF/ hybrid domain, makes inverse matrix can easily be applicable to frequency domain.
First territory can be frequency domain, QMF/ hybrid domain or the time domain such as DFT or FFT territory.Perhaps, first territory can be the territory the territory of in this paper, stating.
In the transition period of spatial information and compatibility information, but time of origin postpones.
In order to address this problem, the second territory converter 952 can be carried out the delay compensation operation, makes that the territory of spatial information and compensated information and the time delay between first territory can be compensated.
Compatibility/3D plays up processor 953 utilizations and through the compatibility information of changing the compatible processing of the compatibility of the input in first territory reduction audio signal execution is operated, and operates the compatibility reduction audio signal execution 3D rendering operations of obtaining to handling through compatibility then.Compatibility/3D plays up processor 953 can be by carrying out compatible operation and the 3D rendering operations handled with the different order of this paper statement.
Compatibility/3D plays up processor 953 can carry out compatible operation and the 3D rendering operations handled to the compatibility reduction audio signal of input simultaneously.For example; Compatibility/3D plays up processor 953 can generate 3D reduction audio signal through using new filter coefficient that the compatible reduction of the input in first territory audio signal is carried out the 3D rendering operations, and this new filter coefficient is the combination of compatibility information and the existing filter coefficient that in the 3D rendering operations, uses usually.
The 3rd territory converter 954 converts the territory that compatibility/3D plays up the 3D reduction audio signal that processor 953 generated to frequency domain.
Figure 16 is the block diagram that is used to eliminate the decoding device of crosstalking according to embodiments of the invention.With reference to Figure 16, decoding device comprises bit split cells 960, reduction audio mixing demoder 970,3D rendering unit 980 and cross-talk cancellation unit 990.With the detailed description of omitting the decode procedure identical with the embodiment of Fig. 1.
3D reduction audio signal by 980 outputs of 3D rendering unit can be by headphone reproduction.Yet, when 3D reduction audio signal by away from user's loudspeaker reproduction the time, crosstalk between sound channel and take place probably.
Therefore, decoding device can comprise the cross-talk cancellation unit 990 of 3D reduction audio signal being carried out the elimination operation of crosstalking.
Decoding device can be carried out sound field and handle operation.
Sound field is handled the sound field information of using in the operation, that is, sign wherein will be reproduced the information in the space of 3D reduction audio signal, can be included in the incoming bit stream that is transmitted by code device, or can be selected by decoding device.
Incoming bit stream can comprise reverberation time information.Can handle the wave filter that uses in the operation in sound field according to the reverberation time information Control.
The reverberation part of forward part and back can differentially be carried out sound field processing operation for morning.For example, early forward part FIR wave filter capable of using is handled, and the reverberation part of back iir filter capable of using is handled.
More specifically, can through use the FIR wave filter in time domain, carries out convolution operation or through in time domain, carry out multiply operation and with the result of multiply operation be converted to time domain come to morning forward part carry out sound field and handle and operate.Sound field is handled operation and can in time domain, partly be carried out the reverberation of back.
Can the present invention be embodied as the computer-readable code that writes on the computer readable recording medium storing program for performing.Computer readable recording medium storing program for performing can be the recording unit of any kind stored with the computer-readable mode of data wherein.The example of computer readable recording medium storing program for performing comprises ROM, RAM, CD-ROM, tape, floppy disk, optical data storage, the carrier wave data transmission of the Internet (for example, through).Can computer readable recording medium storing program for performing be distributed on a plurality of computer systems that are connected to network, make computer-readable code to write or from its execution to it with the mode of disperseing.Realize that function program, code and code segment required for the present invention can easily be explained by those of ordinary skill in the art.
As stated, according to the present invention, coding has the multi-channel signal of 3D effect expeditiously, and according to the characteristic of reproducing environment with optimum tonequality recover adaptively with reproducing audio signal be possible.
Industrial applicibility
Other are implemented in the scope of following claim.For example, can be applied to various applications and various product according to marshalling of the present invention, data decoding and entropy decoding.The storage medium of the storage data of application one aspect of the present invention within the scope of the invention.

Claims (10)

1. the method for a decoded signal comprises:
From incoming bit stream, extract three-dimensional 3D reduction audio signal, spatial information and filter information;
Come to remove 3D effect from 3D reduction audio signal through using said filter information that said 3D reduction audio signal is carried out the 3D rendering operations, the inverse filter that is used to generate the filter information of said 3D reduction audio signal through use is carried out said 3D rendering operations; And
The reduction audio signal of utilizing said spatial information and having removed said 3D effect generates multi-channel signal.
2. the method for claim 1 is characterized in that, said bit stream also comprises and is used to point out whether said bit stream comprises the information of said filter information.
3. the method for claim 1 is characterized in that, a kind of corresponding among following of said filter information: the header related transfer function HRTF coefficient that is used to generate said three-dimensional 3D reduction audio signal; And the header related transfer function HRTF coefficient that is used for said inverse filter.
4. the method for claim 1; It is characterized in that; Also comprise at least one that extract following information: indicate said incoming bit stream whether to comprise whether information and indication incoming bit stream about the wave filter that is used to generate said 3D reduction audio signal comprise the information about the inverse filter of said wave filter.
5. the method for claim 1 is characterized in that, in one of DFT DFT territory, FFT FFT territory, quadrature mirror filter QMF/ hybrid domain and time domain, carries out said 3D rendering operations.
6. one kind is used for the decoded signal device, comprising:
The bit split cells extracts three-dimensional 3D reduction audio signal, spatial information and filter information from incoming bit stream;
The 3D rendering unit; Come to remove 3D effect from said 3D reduction audio signal through using said filter information that said 3D reduction audio signal is carried out the 3D rendering operations, the inverse filter that is used to generate the filter information of said 3D reduction audio signal through use is carried out said 3D rendering operations; And
Multi-channel decoder, the reduction audio signal of utilizing said spatial information and having removed said 3D effect generates multi-channel signal.
7. device as claimed in claim 6 is characterized in that, said bit stream also comprises and is used to point out whether said bit stream comprises the information of said filter information.
8. device as claimed in claim 6 is characterized in that, a kind of corresponding among following of said filter information: the header related transfer function HRTF coefficient that is used to generate said three-dimensional 3D reduction audio signal; And the header related transfer function HRTF coefficient that is used for said inverse filter.
9. device as claimed in claim 6; It is characterized in that; Said bit split cells extracts at least one of following information: indicate said incoming bit stream whether to comprise whether information and indication incoming bit stream about the wave filter that is used to generate said 3D reduction audio signal comprise the information about the inverse filter of said wave filter.
10. device as claimed in claim 6 is characterized in that, in one of DFT DFT territory, FFT FFT territory, quadrature mirror filter QMF/ hybrid domain and time domain, carries out said 3D rendering operations.
CN2007800045157A 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal Active CN101385076B (en)

Applications Claiming Priority (17)

Application Number Priority Date Filing Date Title
US76574706P 2006-02-07 2006-02-07
US60/765,747 2006-02-07
US77147106P 2006-02-09 2006-02-09
US60/771,471 2006-02-09
US77333706P 2006-02-15 2006-02-15
US60/773,337 2006-02-15
US77577506P 2006-02-23 2006-02-23
US60/775,775 2006-02-23
US78175006P 2006-03-14 2006-03-14
US60/781,750 2006-03-14
US78251906P 2006-03-16 2006-03-16
US60/782,519 2006-03-16
US79232906P 2006-04-17 2006-04-17
US60/792,329 2006-04-17
US79365306P 2006-04-21 2006-04-21
US60/793,653 2006-04-21
PCT/KR2007/000668 WO2007091842A1 (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal

Publications (2)

Publication Number Publication Date
CN101385076A CN101385076A (en) 2009-03-11
CN101385076B true CN101385076B (en) 2012-11-28

Family

ID=40422032

Family Applications (7)

Application Number Title Priority Date Filing Date
CN2007800045157A Active CN101385076B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal
CN2007800045354A Active CN101379553B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal
CN200780004505.3A Active CN101385075B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal
CN2007800045087A Active CN101379552B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal
CN200780004527XA Active CN101385077B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal
CN2007800045551A Expired - Fee Related CN101379555B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal
CN2007800045458A Active CN101379554B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal

Family Applications After (6)

Application Number Title Priority Date Filing Date
CN2007800045354A Active CN101379553B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal
CN200780004505.3A Active CN101385075B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal
CN2007800045087A Active CN101379552B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal
CN200780004527XA Active CN101385077B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal
CN2007800045551A Expired - Fee Related CN101379555B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal
CN2007800045458A Active CN101379554B (en) 2006-02-07 2007-02-07 Apparatus and method for encoding/decoding signal

Country Status (1)

Country Link
CN (7) CN101385076B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
MY160545A (en) 2009-04-08 2017-03-15 Fraunhofer-Gesellschaft Zur Frderung Der Angewandten Forschung E V Apparatus, method and computer program for upmixing a downmix audio signal using a phase value smoothing
JP2011217139A (en) * 2010-03-31 2011-10-27 Sony Corp Signal processing device and method, and program
US20130128979A1 (en) * 2010-05-11 2013-05-23 Telefonaktiebolaget Lm Ericsson (Publ) Video signal compression coding
CN104641414A (en) * 2012-07-19 2015-05-20 诺基亚公司 Stereo audio signal encoder
EP2757559A1 (en) * 2013-01-22 2014-07-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for spatial audio object coding employing hidden objects for signal mixture manipulation
KR101859453B1 (en) 2013-03-29 2018-05-21 삼성전자주식회사 Audio providing apparatus and method thereof
US9769586B2 (en) 2013-05-29 2017-09-19 Qualcomm Incorporated Performing order reduction with respect to higher order ambisonic coefficients
CN108449704B (en) * 2013-10-22 2021-01-01 韩国电子通信研究院 Method for generating a filter for an audio signal and parameterization device therefor
US9922656B2 (en) * 2014-01-30 2018-03-20 Qualcomm Incorporated Transitioning of ambient higher-order ambisonic coefficients
US10770087B2 (en) 2014-05-16 2020-09-08 Qualcomm Incorporated Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals
KR101818877B1 (en) * 2014-05-30 2018-01-15 퀄컴 인코포레이티드 Obtaining sparseness information for higher order ambisonic audio renderers
US10140996B2 (en) * 2014-10-10 2018-11-27 Qualcomm Incorporated Signaling layers for scalable coding of higher order ambisonic audio data
CN111970630B (en) 2015-08-25 2021-11-02 杜比实验室特许公司 Audio decoder and decoding method
US10074373B2 (en) * 2015-12-21 2018-09-11 Qualcomm Incorporated Channel adjustment for inter-frame temporal shift variations
CN108039175B (en) 2018-01-29 2021-03-26 北京百度网讯科技有限公司 Voice recognition method and device and server
CN113035209B (en) * 2021-02-25 2023-07-04 北京达佳互联信息技术有限公司 Three-dimensional audio acquisition method and three-dimensional audio acquisition device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6574339B1 (en) * 1998-10-20 2003-06-03 Samsung Electronics Co., Ltd. Three-dimensional sound reproducing apparatus for multiple listeners and method thereof

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0912077B1 (en) * 1994-02-25 2001-10-31 Henrik Moller Binaural synthesis, head-related transfer functions, and uses therof
DK1072089T3 (en) * 1998-03-25 2011-06-27 Dolby Lab Licensing Corp Method and apparatus for processing audio signals
DE19847689B4 (en) * 1998-10-15 2013-07-11 Samsung Electronics Co., Ltd. Apparatus and method for three-dimensional sound reproduction
EP1211857A1 (en) * 2000-12-04 2002-06-05 STMicroelectronics N.V. Process and device of successive value estimations of numerical symbols, in particular for the equalization of a data communication channel of information in mobile telephony
US7583805B2 (en) * 2004-02-12 2009-09-01 Agere Systems Inc. Late reverberation-based synthesis of auditory scenes
EP1315148A1 (en) * 2001-11-17 2003-05-28 Deutsche Thomson-Brandt Gmbh Determination of the presence of ancillary data in an audio bitstream
ES2300567T3 (en) * 2002-04-22 2008-06-16 Koninklijke Philips Electronics N.V. PARAMETRIC REPRESENTATION OF SPACE AUDIO.
KR100773539B1 (en) * 2004-07-14 2007-11-05 삼성전자주식회사 Multi channel audio data encoding/decoding method and apparatus

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6574339B1 (en) * 1998-10-20 2003-06-03 Samsung Electronics Co., Ltd. Three-dimensional sound reproducing apparatus for multiple listeners and method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HERRE et al.the reference model architecture for mpeg spatial audio coding.《AES 118th Convention, Barcelona, Spain》.2005,第2页第1栏倒数第1段、第2栏及图1. *

Also Published As

Publication number Publication date
CN101379552A (en) 2009-03-04
CN101379552B (en) 2013-06-19
CN101385076A (en) 2009-03-11
CN101379554A (en) 2009-03-04
CN101379553A (en) 2009-03-04
CN101379555A (en) 2009-03-04
CN101385075A (en) 2009-03-11
CN101379555B (en) 2013-03-13
CN101379553B (en) 2012-02-29
CN101385077B (en) 2012-04-11
CN101385075B (en) 2015-04-22
CN101379554B (en) 2012-09-19
CN101385077A (en) 2009-03-11

Similar Documents

Publication Publication Date Title
CN101385076B (en) Apparatus and method for encoding/decoding signal
CN104681030B (en) Apparatus and method for encoding/decoding signal
RU2406164C2 (en) Signal coding/decoding device and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1128810

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1128810

Country of ref document: HK