CN101253809B

CN101253809B - Method and apparatus for encoding and decoding an audio signal

Info

Publication number: CN101253809B
Application number: CN200680031572XA
Authority: CN
Inventors: 房熙锡; 吴贤午; 金东秀; 林宰显; 郑亮源
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2005-08-30
Filing date: 2006-08-30
Publication date: 2011-12-28
Anticipated expiration: 2026-08-30
Also published as: CN101248484A; CN101253808A; CN101253810A; CN101253551B; CN101253554A; KR20070025905A; CN101253808B; CN101253552B; CN101253553A; CN101253807B; CN101253807A; CN101253810B; CN101253551A; CN101253552A; CN101253806A; CN101253553B; CN101253806B; CN101253554B; CN101248484B; CN101253809A

Abstract

Spatial information associated with an audio signal is encoded into a bitstream, which can be transmitted to a decoder or recorded to a storage media. The bitstream can include different syntax related to time, frequency and spatial domains. In some embodiments, the bitstream includes one or more data structures (e.g., frames) that contain ordered sets of slots for which parameters can be applied. The data structures can be fixed or variable. A data structure type indicator can be inserted in the bitstream to enable a decoder to determine the data structure type and to invoke an appropriate decoding process. The data structure can include position information that can be used by a decoder to identify the correct slot for which a given parameter set is applied. The slot position information can be encoded with either a fixed number of bits or a variable number of bits based on the data structure type as indicated by the data structure type indicator. For variable data structure types, the slot position information can be encoded with a variable number of bits based on the position of the slot in the ordered set of slots.

Description

The devices and methods therefor that is used for the Code And Decode audio signal

Technical field

The application's subject content relate generally to Audio Signal Processing.

Background technology

People are making great efforts to research and develop the new method of multichannel audio being carried out perceptual coding, and the perceptual coding of multichannel audio is commonly referred to as spatial audio coding (SAC).SAC allows to transmit multichannel audio with low bit rate, and this makes SAC can be applicable to many popular voice applications (for example internet media stream, music download).

SAC carries out discrete coding to each audio frequency input sound channel, but catches the space reflection of multi-channel audio signal with compact parameter set.These parameters can be sent to decoder, and these parameters are used to the spatial property of synthetic or this audio signal of reconstruct there.

In some SAC used, spatial parameter was sent to decoder as the part of bit stream.This bit stream comprises a plurality of air-frames, and these air-frames comprise the orderly time slot collection that can apply set of spatial parameters.This bit stream also comprises positional information, and positional information can be applied the correct time slot of given parameter set by decoder in order to identification.

Notion element in some SAC applications exploiting coding/decoding paths.Element is commonly called 1 to 2 (OTT) and another element is commonly called 2 to 3 (TTT), hints the input and output channel number of respective decoder element respectively in this this two titles.The OTT encoder components is extracted two spatial parameters and is created down-mix audio signal and residual signals.The TTT element becomes stereo channels reduction audio signal to add residual signals three audio signal reduction audio mixings.These elements capable of being combined are to provide various space audio environment (for example surround sound) configuration.

Some SAC use and can operate under nothing instructs mode of operation, and only stereo downmix signal is sent to decoder and need not to carry out the spatial parameter transmission from encoder under this mode of operation.Decoder synthesizes from the spatial parameter of down-mix audio signal and utilizes these parameters to produce multi-channel audio signal.

Summary of the invention

The spatial information related with audio signal is coded in the bit stream, and this bit stream can be sent to decoder or be recorded to storage medium.This bit stream can comprise the different sentence structurees relevant with time domain, frequency domain and spatial domain.In certain embodiments, this bit stream comprises one or more data structures (for example, frame), and these data structures comprise the orderly gap collection that can apply parameter.These data structures can be that fix or variable.The type of data structure designator can be inserted this bit stream so that decoder specified data structure type and call suitable decode procedure.This data structure can comprise positional information, and this positional information can be applied the correct gap of given parameter set by decoder in order to identification.Can based on by the type of data structure of type of data structure designator indication with a fixed number bit or variable number bit this gap positional information of encoding.For the changeable data structure type, can come with variable number bits of encoded this gap positional information based on the position that concentrate in orderly gap in the gap.

In certain embodiments, a kind of method of coding audio signal comprises: generate the first or second information corresponding parameters collection with audio signal; And parameter set and corresponding first or second information be inserted in the bit stream of this audio signal of expression, wherein first or second information is represented by variable number of bits.

In certain embodiments, a kind of method of decoded audio signal comprises: determine the first information or the second information corresponding parameters collection with audio signal, wherein said parameter set and the first or second corresponding information are comprised in the bit stream of this audio signal of expression, and wherein first or second information is represented with variable number of bits in this bit stream; And based on parameter set and corresponding first or second information this audio signal of decoding.

Other embodiment at system, method, device, data structure and computer-readable medium of multiframe type time slot position coding are disclosed.

Should be appreciated that the general introduction of front and back are exemplary and explanat to the detailed description both of embodiment, and aim to provide of the present invention further explanation prescription.

Description of drawings

Be included in this and show embodiments of the invention, and be used for explaining principle of the present invention with explanation to provide further understanding of the present invention and to be received in the application and to constitute its a part of accompanying drawing.In the accompanying drawings:

Fig. 1 illustrates the diagram of the principle of span information according to an embodiment of the invention;

Fig. 2 is the block diagram that is used for the encoder of coding audio signal according to an embodiment of the invention;

Fig. 3 is the block diagram that is used for the decoder of decoded audio signal according to an embodiment of the invention;

Fig. 4 is the block diagram that is included in the sound channel modular converter in the channel expansion audio mixing unit of decoder according to an embodiment of the invention;

Fig. 5 is a diagram of explaining the method for the bit stream that disposes audio signal according to an embodiment of the invention;

Fig. 6 A and 6B be respectively explain the diagram that concerns between parameter set, time slot and the parameter band according to an embodiment of the invention and the time/the frequency coordinate diagram;

Fig. 7 A illustrates the sentence structure of the configuration information that is used for the representation space information signal according to an embodiment of the invention;

Fig. 7 B is the parameter band numerical table of spatial signal information according to an embodiment of the invention;

Fig. 8 A illustrates according to an embodiment of the invention the sentence structure of representing to put on the parameter band number of OTT frame with fixed number of bits;

Fig. 8 B illustrates according to an embodiment of the invention the sentence structure of representing to put on the parameter band number of OTT frame with variable number of bits;

Fig. 9 A illustrates according to an embodiment of the invention the sentence structure of representing to put on the parameter band number of TTT frame with fixed number of bits;

Fig. 9 B illustrates according to an embodiment of the invention the sentence structure of representing to act on the parameter band number of TTT frame with variable number of bits;

Figure 10 A illustrates the sentence structure of the spatial spread configuration information of spatial spread frame according to an embodiment of the invention;

Figure 10 B and 10C illustrate according to an embodiment of the invention, comprise the sentence structure of the spatial spread configuration information of this residual signals in the situation of residual signals in the spatial spread frame;

Figure 10 D illustrates the sentence structure of the method for the parameter band number of representing residual signals according to an embodiment of the invention;

Figure 11 A uses the block diagram that does not have the decoding device that instructs coding according to an embodiment of the invention;

Figure 11 B is the diagram that according to an embodiment of the invention parameter band numerical statement is shown one group method;

Figure 12 illustrates the sentence structure of the configuration information of air-frame according to an embodiment of the invention;

Figure 13 A illustrates the sentence structure of the positional information of the time slot that applies parameter set according to an embodiment of the invention;

Figure 13 B illustrates the sentence structure that the positional information that will apply the time slot of parameter set according to an embodiment of the invention is expressed as absolute value and difference;

Figure 13 C is the diagram that a plurality of positional informations that will apply each time slot of parameter set according to an embodiment of the invention are expressed as a group;

Figure 14 is the flow chart of coding method according to an embodiment of the invention;

Figure 15 is the flow chart of coding/decoding method according to an embodiment of the invention;

Figure 16 is the block diagram of realization with reference to the equipment framework of the Code And Decode process of Fig. 1-15 description.

Embodiment

Fig. 1 illustrates the diagram of the principle of span information according to an embodiment of the invention.The perceptual coding schemes that is used for multi-channel audio signal is based on the following fact: humanly can pass through three dimensions sensing audio signal.But the three dimensions usage space information of audio signal represented, described spatial information is including, but not limited to following known spatial parameter: correlation/coherence (ICC), sound channel time difference (CTD), sound channel predictive coefficient (CPC) etc. between sound channel energy level difference (CLD), sound channel.Energy (energy level) between two audio tracks of CLD parametric description is poor, correlation between two audio tracks of ICC parametric description or coherence's amount, and the time difference between two audio tracks of CTD parametric representation.

The generation of CTD shown in Fig. 1 and CLD parameter.Behind diffraction around people's the head, arrive intelligent's auris dextra 106 from the first direct sound wave 103 of far-end sound source 101 to intelligent's the left ear 107 and the second direct sound wave 102.Directly

sound wave

102 and 103 is differing from one another aspect the time of advent and the energy level.Can be respectively based on the difference and energy level difference generation CTD and the CLD parameter time of advent of sound wave 102 and 103.In addition, arrive

ear

106 and 107 respectively, and do not have correlation each other through reflected sound wave 104 and 105.Can generate the ICC parameter based on the correlation between

sound wave

104 and 105.

At the encoder place, extract spatial information (for example spatial parameter) and generate down-mix audio signal from the multichannel audio input signal.Down-mix audio signal and spatial parameter are transferred into decoder.Can use an arbitrary number voice-grade channel to down-mix audio signal, including, but not limited to: monophonic signal, stereophonic signal or multi-channel audio signal.At the decoder place, from the channel expansion audio signal of down-mix audio signal and spatial parameter establishment multichannel.

Fig. 2 is according to an embodiment of the invention to the block diagram of the encoder of coding audio signal.This encoder comprises down-mix unit 202, spatial information generation unit 203, down-mix audio signal coding unit 207 and multiplexed unit 209.Encoder also can have other configuration.Encoder can be realized with the combination of hardware, software or hardware and software.Encoder can be realized with integrated circuit (IC) chip, chipset, monolithic system (SoC), digital signal processor, general processor and various numeral and analogue device.

Down-mix unit 202 generates down-mix audio signal 204 from multi-channel audio signal 201.In Fig. 2, x ₁..., x _nIndication input audio track.Such as previously mentioned, down-mix audio signal 204 can be monophonic signal, stereophonic signal or multi-channel audio signal.In the example shown, x ' ₁..., x ' _mThe sound Taoist monastic name of indication down-mix audio signal 204.In certain embodiments, the down-mix audio signal 205 (for example, artistry multi-channel audio) that provides from the external world of coder processes rather than handle down-mix audio signal 204.

Spatial information generation unit 203 extracts spatial information from multi-channel audio signal 201.In this case, the relevant information of audio signal sound channel used when in decoder, down-mix audio signal 204 channel expansion audio mixings being become multi-channel audio signal of " spatial information " expression.By the multi-channel audio signal multi-channel audio has been generated down-mix audio signal 204.This spatial information is encoded so that encoded spatial signal information 206 to be provided.

Down-mix audio signal coding unit 207 generates encoded down-mix audio signal 208 by down-mix audio signal 204 codings that will generate from down-mix unit 202.

Multiplexed unit 209 generates the bit stream 210 that comprises encoded down-mix audio signal 208 and encoded spatial signal information 206.Bit stream 210 can be transferred into the decoder in downstream and/or be recorded on the storage medium.

Fig. 3 is the block diagram of the decoder of according to an embodiment of the invention encoded audio signal being decoded.This decoder comprises demultiplex unit 302, down-mix audio signal decoding unit 305, spatial information decoding unit 307 and channel expansion audio mixing unit 309.Decoder can be realized with the combination of hardware, software or hardware and software.Decoder can be realized with integrated circuit (IC) chip, chipset, monolithic system (SoC), digital signal processor, general processor and various numeral and analogue device.

In certain embodiments, demultiplex unit 302 receives the bit stream 301 of representing audio signal and isolate encoded down-mix audio signal 303 and encoded spatial signal information 304 subsequently from bit stream 301.In Fig. 3, x ' ₁..., x ' _mThe sound channel of indication down-mix audio signal 303.Down-mix audio signal decoding unit 305 is by exporting 303 decodings of encoded down-mix audio signal the down-mix audio signal 306 through decoding.If this decoder can not be exported multi-channel audio signal, then down-mix audio signal decoding unit 305 direct output channels reduce audio signal 306.In Fig. 3, y ' ₁..., y ' _mThe direct output channels of indication down-mix audio signal decoding unit 305.

Spatial signal information decoding unit 307 extracts the configuration information of spatial signal information and uses the configuration information that is extracted that spatial signal information 304 is decoded subsequently from encoded spatial signal information 304.

Channel expansion audio mixing unit 309 can use the spatial information 308 that extracted that down-mix audio signal 306 is made sound channel expansion audio mixing to become multi-channel audio signal 310.In Fig. 3, y ₁..., y _nThe output channels number of indication channel expansion audio mixing unit 309.

Fig. 4 is the block diagram that can be included in the sound channel conversion module in the channel expansion audio mixing unit 309 of the decoder shown in Fig. 3.In certain embodiments, channel expansion audio mixing unit 309 can comprise a plurality of sound channel conversion modules.The sound channel conversion module is the conceptization device that can use customizing messages that input sound channel number and output channels number are distinguished each other.

In certain embodiments, the sound channel conversion module can comprise and is used for that a sound channel is transformed into two sound channels and two sound channels are transformed into OTT (1-to the 2) frame of a sound channel and are used for that two sound channels are transformed into three sound channels and three sound channels are transformed into TTT (2-to the 3) frame of two sound channels.OTT and/or TTT frame can be aligned to various useful configurations.For example, channel expansion audio mixing unit 309 shown in Figure 3 can comprise 5-1-5 structure, 5-2-5 structure, 7-2-7 structure, 7-5-7 structure etc.In the 5-1-5 structure, by becoming a sound channel to generate the down-mix audio signal with a sound channel five multi-channel audio, this down-mix audio signal can be become five sound channels by the channel expansion audio mixing subsequently.Other structure can use the various combinations of OTT and TTT frame to create in the same manner.

With reference to Fig. 4, the exemplary 5-2-5 structure of the audio mixing of channel expansion shown in figure unit 400.In the 5-2-5 structure, the down-mix audio signal 401 with two sound channels is input to channel expansion audio mixing unit 400.In the example shown, provide L channel (L) and R channel (R) as input to channel expansion audio mixing unit 400.In this embodiment, channel expansion audio mixing unit 400 comprises a TTT frame 402 and three OTT frames 406,407 and 408.Provide down-mix audio signal 401 conducts to TTT frame (TTT with two sound channels ₀) 402 input, TTT frame (TTT ₀) 402 handle down-mix audio signal 401 and provide three sound channels 403,404 and 405 as output.One or more spatial parameters (for example CPC, CLD, ICC) can be used as input and are provided for TTT frame 402, and are used to handle down-mix audio signal 401, and are as described below.In certain embodiments, optionally residual signals is offered TTT frame 402 as input.In this situation, CPC can be described to be used for generate from two sound channels the predictive coefficient of three sound channels.

Be provided for the OTT frame 406 that uses two output channels of one or more spatial parameters generations as output as input from the sound channel 403 that TTT frame 402 provides.In the example shown, these two output channels are illustrated in left front (FL) and left back (BL) loudspeaker position in the surround sound environment for example.Sound channel 404 is provided for as input uses one or more spatial parameters to generate the OTT frame 407 of two output channels.In the example shown, these two output channels representative right front (FR) and right back (BR) loudspeaker position.Sound channel 405 is provided for the OTT frame 408 that generates two output channels as input.In the example shown, these two output channels mid-(C) loudspeaker position of representative and low frequency strengthen (LFE) sound channel.In this case, can provide spatial information (for example CLD, ICC) as input to each OTT frame.In certain embodiments, can provide residual signals (Res1, Res2) as input to OTT frame 406 and 407.In this embodiment, can residual signals not offered the OTT frame 408 of output center channels and LFE sound channel as input.

Structure shown in Figure 4 is an example of structure of sound channel conversion module.The sound channel conversion module also can adopt other structure, comprises the various combinations of OTT and TTT frame.Because each sound channel conversion module can be worked in frequency domain, so definable puts on the number of the parameter band of each sound channel conversion module.The parameter band is represented parameter at least one frequency band applicatory.The number of parameter band will be described in conjunction with Fig. 6 B.

Fig. 5 is the figure that the method for the bit stream that disposes audio signal according to an embodiment of the invention is shown.Fig. 5 (a) illustrates the bit stream of the audio signal that only comprises spatial signal information, and Fig. 5 (b) and 5 (c) illustrate the bit stream of the audio signal that comprises down-mix audio signal and spatial signal information.

With reference to Fig. 5 (a), the bit stream of audio signal can comprise configuration information 501 and frame 503.Frame 503 can in bit stream, be repeated and in certain embodiments frame comprise the single air-frame 502 that contains spatial audio information.

In certain embodiments, configuration information 501 comprise the time slot sum described in the air-frame 502, across the parameter band sum of audio signal frequency scope, parameter band number, parameter band number in the TTT frame and the information of the parameter band number in the residual signals in the OTT frame.Also can as required out of Memory be included in the configuration information 501.

In certain embodiments, air-frame 502 comprises parameter set number in one or more spatial parameters (for example CLD, ICC), frame type, the frame and the time slot that can apply parameter set.Also can as required out of Memory be included in the air-frame 502.Explain configuration information 501 and the meaning and the use that are included in the information in the air-frame 502 below in conjunction with Fig. 6-10.

With reference to Fig. 5 (b), the bit stream of audio signal comprises configuration information 504, down-mix audio signal 505 and air-frame 506.In this case, a frame 507 can comprise down-mix audio signal 505 and air-frame 506, and frame 507 can be repeated in bit stream.

With reference to Fig. 5 (c), the bit stream of audio signal can comprise down-mix audio signal 508, configuration information 509 and air-frame 510.In this case, a frame 511 can comprise configuration information 509 and air-frame 510, and frame 511 can be repeated in bit stream.If configuration information 509 is inserted in each frame 511, then audio signal can be by playback apparatus playback on the arbitrariness position.

Although it is that 511 ground are inserted in the bit stream frame by frame that Fig. 5 (c) illustrates configuration information 509, yet should it is apparent that, configuration information 509 can by periodically or a plurality of frames of repeating of aperiodic ground be inserted in the bit stream.

Fig. 6 A and 6B illustrate the figure of the relation between parameter set, time slot and the parameter band according to an embodiment of the invention.Parameter set represents to put on one or more spatial parameters of a time slot.Spatial parameter can comprise spatial information, for example CDL, ICC, CPC etc.Time slot represents can apply it in the audio signal time interval of spatial parameter.An air-frame can comprise one or more time slots.

With reference to Fig. 6 A, several parameter sets 1 ..., P can be used in the air-frame, and each parameter set can comprise one or more data fields 1 ..., Q-1.Parameter set can be applied in the whole frequency range of audio signal, and each spatial parameter in the parameter set can put on one or more parts of frequency band.For example, if parameter set comprises 20 spatial parameters, then the whole frequency band of audio signal can be divided into 20 districts (being referred to as " parameter band " hereinafter), and these 20 spatial parameters of this parameter set are applied in this 20 parameter bands.Can as required parameter be put on the parameter band.For example, spatial parameter can be put on the low-frequency parameter band densely and sparsely be put on the high-frequency parameter band.

With reference to Fig. 6 B, coordinate diagram illustrates the relation between parameter set and the time slot for the moment/frequently.In the example shown, three parameter sets (parameter set 1, parameter set 2, parameter set 3) are applied in the ordered set of 12 time slots in the single air-frame.In this case, the whole frequency range of audio signal is divided into 9 parameter bands.Therefore, transverse axis is represented timeslot number and the longitudinal axis is represented the parameter reel number.In these three parameter sets each is applied in a particular time-slot.For example, first parameter set (parameter set 1) is applied in time slot #1, and second parameter set (parameter set 2) is applied in time slot #5, and the 3rd parameter set (parameter set 3) is applied in time slot #9.Can and/or parameter set be copied to other time slots by interpolation these parameter sets are applied to those time slots.Generally speaking, the number of parameter set can be equal to or less than the number of time slot, and the number of parameter band can be equal to or less than the number of the frequency band of audio signal.By be audio signal time and frequency zone all parts rather than be the whole time and frequency zone space encoder information of audio signal, just can reduce the spatial information amount that sends to decoder from encoder.It is feasible that these data reduce, because according to known sensing audio encoding principle, the sparse information in the time and frequency zone is enough often for the mankind's auditory perception.

A key character of the disclosed embodiments is to use fixing or variable bit number can apply the time slot position Code And Decode of parameter set.Also available fixed number of bits or variable number of bits are represented parameter band number.The variable bit encoding scheme also can be applicable to the out of Memory that uses in spatial audio coding, described out of Memory is including, but not limited to the information that is associated with time domain, spatial domain and/or frequency domain (for example putting on from the number of the frequency subband of bank of filters output).

Fig. 7 A illustrates the sentence structure of the configuration information of representation space information signal according to an embodiment of the invention.This configuration information comprises a plurality of fields 701 to 718 that can assign several bits to it.

701 expressions of " bsSamplingFrequencyIndex " field are from the sample frequency of the sampling process acquisition of audio signal.In order to represent this sample frequency, distributed 4 bits for " bsSamplingFrequencyIndex " field 701.If the value of " bsSamplingFrequencyIndex " field 701 is 15, promptly binary number 1111, then add " bsSamplingFrequency " field 702 with the expression sample frequency.In this case, distribute 24 bits for " bsSamplingFrequency " field 702.

" bsFrameLength " field 703 is represented the sum (being referred to as " numSlots " hereinafter) of time slot in the air-frame, and can have the relation of numSlots=bsFrameLength+1 between " numSlots " and " bsFrameLength " field 703.

704 expressions of " bsFreqRes " field are across the sum of the parameter band of the whole frequency domain of audio signal." bsFreqRes " field 704 will make an explanation in Fig. 7 B.

" bsTreeConfig " field 705 expression comprises as the information with reference to the tree of figure 4 described a plurality of sound channel conversion modules.The information of this tree comprises the information of the type of the spatial information that uses in for example sound channel conversion module type, sound channel conversion module number, the sound channel conversion module, I/O channel number of audio signal etc.

Tree can have in 5-1-5 structure, 5-2-5 structure, 7-2-7 structure, the 7-5-7 structure etc. one according to the type of sound channel conversion module or channel number.The tree that has the 5-2-5 structure shown in Fig. 4.

The quantitative mode information of " bsQuantMode " field 706 representation space information.

Whether 707 expressions of " bsOneIcc " field use an ICC subset of parameters to all OTT frames.In this case, subset of parameters represents to put on the parameter set of particular time-slot and particular channel conversion module.

Whether the existence of " bsArbitraryDownmix " field 708 expression arbitrariness multi-channel audio gains.

The gain that " bsFixedGainSur " field 709 expression applies the surround channel of for example LS (left side around) and RS (right side around) etc.

The gain that 710 expressions of " bsFixedgainLF " field apply the LFE sound channel.

The gain that 711 expressions of " bsFixedGainDM " field apply down-mix audio signal.

Whether 712 expressions of " bsMatrixMode " field generate the compatible stereo downmix signal of a matrix from encoder.

The mode of operation (for example TES (temporal envelope shaping) and/or TP (time shaping)) of the time shaping in " bsTempShapeConfig " 713 expression decoders.

The mode of operation of the decorrelator of " bsDecorrConfig " field 714 instruction decoding devices.

And whether whether " bs3DaudioMode " field 715 expression down-mix audio signal be encoded into the 3D signal and adopt contrary HRTF (head related transfer function) to handle.

After the information of definite in encoder/decoder/each field of extraction, the information that puts on the parameter band number of sound channel conversion module is determined in encoder/decoder/extracts.At first determine/extract the parameter band number (716) that the OTT frame is applied, determine/extract the parameter band number (717) that the TTT frame is applied then.Below in conjunction with Fig. 8 A-9B the parameter band number of OTT frame and/or TTT frame is elaborated.

In the situation that has the expansion frame, " spatialExtensionConfig " piece 718 comprises the configuration information of expanding frame.Below in conjunction with Figure 10 A-10D the information that is included in " spatialExtensionConfig " piece 718 is described.

Fig. 7 B is the table of the parameter band number of spatial signal information according to an embodiment of the invention.The parameter band number of whole frequency domain of " numBands " expression audio signal and the index information of " bsFreqRes " expression parameter band number.For example, can be as required the whole frequency domain of audio signal be cut apart by parameter band number (for example 4,5,7,10,14,20,28 etc.).

In certain embodiments, can add a parameter to each parameter band application.For example, if " numBands " is 28, then the whole frequency domain of audio signal is divided into 28 parameter bands and can applies in 28 parameters each in these 28 parameter bands each.Add again, adding fruit " numRands " is 4, and then the whole frequency domain of given audio signal is divided into 4 parameter bands and can applies in 4 parameters each in these 4 parameter bands each.In Fig. 7 B, the parameter band number of the whole frequency domain of the given audio signal of term " reservation " expression is not determined as yet.

Be noted that human hearing organ is also insensitive to the parameter band number that uses in the encoding scheme.Therefore, use a few parameters band than using a greater number parameter band to provide similar space audio effect as the listener.

Different with " numBands ", " numSlots " that represented by " bsFrameLength " field shown in Fig. 7 A 703 can represent all values.Yet if the interior number of samples of air-frame can be divided exactly by " numSlots " just, " numSlots " value can be limited.Therefore, if the maximum of " numSlots " that in fact can represent is " b ", then each value of " bsFramelength " field 703 can be by ceil{log ₂(b) } bit is represented.In this case, " ceil (x) " expression is more than or equal to the smallest positive integral of value " x ".For example, if an air-frame comprises 72 time slots, distribute ceil{log then can for " bsFrameLength " field 703 ₂(72) } bit=7, and the number of parameter band that can determine to put on the sound channel conversion module is in " numBands ".

Fig. 8 A illustrates according to an embodiment of the invention the sentence structure of representing to put on the parameter band number of OTT frame with fixed number of bits.With reference to Fig. 7 A and 8A, the value of " i " value is 0 to numOttBoxes-1, and wherein " numOttBoxes " is the sum of OTT frame.That is, each OTT frame of " i " value representation, and be applied in the number of the parameter band of each OTT frame according to the value representation of " i ".If the OTT frame has the LFE sound channel mode, then the available fixed number of bits of number (hereinafter being referred to as " bsOttBands ") of the parameter band that the LFE sound channel of OTT frame is applied is represented.In example shown in Fig. 8 A, distribute 5 bits for " bsOttBands " field 801.If the OTT frame does not have the LFE sound channel mode, then can apply sum (numBands) to a sound channel of OTT frame with the parameter band.

Fig. 8 B illustrates according to an embodiment of the invention the sentence structure of representing to put on the parameter band number of OTT frame with variable number of bits.Fig. 8 B is similar to Fig. 8 A, is different from Fig. 8 A part and is that " bsOttBands " field 802 shown in Fig. 8 B represented by variable number of bits.Specifically, can use " numBands " to represent that with variable number of bits value is equal to or less than " bsOttBands " field 802 of " numBands ".

Be equal to or greater than in 2^ (n-1) and the scope less than 2^ (n) if " numBands " drops on, then " bsOttBands " field 802 can be represented by a variable n bit.

For example: if (a) " numBands " is 40, then " bsOttBands " field 802 is represented by 6 bits; (b) if " numBands " is 28 or 20, then " bsOttBands " field 802 is represented by 5 bits; (c) if " numBands " is 14 or 10, then " bsOttBands " field 802 is represented by 4 bits; And if (d) " numBands " is 7,5 or 4, then " bsOttBands " field 802 is represented by 3 bits.

If " numBands " drops on greater than 2^ (n-1) and be equal to or less than in the scope of 2^ (n), then " bsOttBands " field 802 can be represented by a variable n bit.

For example: if (a) " numBands " is 40, then " bsOttBands " field 802 is represented by 6 bits; (b) if " numBands " is 28 or 20, then " bsOttBands " field 802 is represented by 5 bits; (c) if " numBands " is 14 or 10, then " bsOttBands " field 802 is represented by 4 bits; (d) if " numBands " is 7,5, then " bsOttBands " field 802 is represented by 3 bits; And if (e) " numBands " is 4, then " bsOttBands " field 802 is represented by 2 bits.

" bsOttBands " field 802 can be represented by variable bit number as the function (hereinafter being referred to as " ceiling (ceil) function ") that is rounded up to immediate integer of variable by getting " numBands ".

Specifically, i) in the situation of 0＜bsOttBands≤numBands or 0≤bsOttBands＜numBands, " bsOttBands " field 802 is by corresponding to ceil (log ₂(numBands)) bit number of value is represented, or ii) in the situation of 0≤bsOttBands≤numBands, " bsOttBands " field 802 can be by ceil (log ₂(numBands+1)) bit is represented.

If determine to arbitrariness a value (hereinafter being referred to as " numberBands ") that is equal to or less than " numBands ", then " bsOttBands " field 802 can be represented by variable bit number as the ceiling function of variable by getting " numberBands ".

Specifically, i) in the situation of 0＜bsOttBands≤numberBands or 0≤bsOttBands＜numberBands, " bsOttBands " field 802 is by ceil (log ₂(numberBands)) bit is represented, perhaps ii) in the situation of 0≤bsOttBands≤numberBands, " bsOttBands " field 802 can be by ceil (log ₂(numberBands+1)) bit is represented.

If used more than one OTT frame, then can express the combination of " bsOttBands " by following formula 1:

Σ_{i = 1}^{N} {numBands}^{i - 1} \cdot {bsOttBands}_{i},

0≤bsOttBands _i＜numBands，

Here, bsOttBands _iRepresent i " bsOttBands ".For example, suppose to have three OTT frames and three values (N=3) corresponding to " bsOttBands " field 802.In this example, " bsOttBands " puts on three values (hereinafter being called a1, a2 and a3 respectively) of the field 802 of these three OTT frames and can be represented by 2 bits separately.Therefore, needing altogether, 6 bits come expression values a1, a2 and a3.Yet, if value a1, a2 and a3 are represented as a group, may take place 27 (=3*3*3) plant situation, this can be represented by 5 bits, economize next bit.If " numBands " is 3 and is 15 by the class value that 5 bits are represented that then this class value can be expressed as 15=1* (3^2)+2* (3^1)+0* (3^0).Therefore, decoder can be by inverse operation formula 1 determines that from class value 15 three value a1, the a2 of " bsOttBands " fields 802 and a3 are respectively 1,2 and 0.

In the situation of a plurality of OTT frames, the combination of " bsOttBands " can use " numberBands " to be expressed as formula 2 to the formula 4 (definition) hereinafter.Owing to use " numberBands " to represent to use " numBands " to represent it is similar in " bsOttBands " and the formula 1, therefore will save it and explain in detail and also only provide following formula:

[formula 2]

Σ_{i = 1}^{N} {(numberBands + 1)}^{i - 1} \cdot {bsOttBands}_{i},

0≤bsOttBands _i≤numberBands，

[formula 3]

Σ_{i = 1}^{N} {numberBands}^{i - 1} \cdot {bsOttBands}_{i},

0≤bsOttBands _i＜numberBands，

[formula 4]

Σ_{i = 1}^{N} {numberBands}^{i - 1} \cdot {bsOttBands}_{i},

0＜bsOttBands _i≤numberBands，

Fig. 9 A illustrates according to an embodiment of the invention the sentence structure of being represented to put on the parameter band number of TTT frame by fixed number of bits.With reference to Fig. 7 A and 9A, " i " value has 0 to numTttBoxes-1 value, and wherein " numTttBoxes " is the number of all TTT frames.That is each TTT frame of the value representation of " i ".Represent parameter band number that each TTT frame is applied according to the value of " i ".In certain embodiments, the TTT frame can be divided into low-frequency band scope and high-band frequency range, and can apply different processing to low-frequency band scope and high-band frequency range.Also can use other point-score.

Whether the given TTT frame of " bsTttDualMode " field 901 expression is for low-frequency band scope and high-band frequency range work under different mode respectively (hereinafter being referred to as " double-mode ").For example, if the value of " bsTttDualMode " field 901 is zero, then the whole frequency band scope is used a kind of pattern and between low-frequency band scope and high-band frequency range, do not distinguished.If the value of " bsTttDualMode " field 901 is 1, then can use different patterns to the low-frequency band scope with high-band frequency range respectively.

The mode of operation of the given TTT frame of " bsTttModeLow " field 902 indications, it can have multiple mode of operation.For example, the TTT frame can have the predictive mode that uses for example CPC and ICC parameter, use CLD parameter for example based on pattern of energy etc.If the TTT frame has double-mode, then may need additional information to high-band frequency range.

903 indications of " bsTttModeHigh " field have the mode of operation of the situation medium-high frequency band scope of double-mode at the TTT frame.

The parameter band number that 904 expressions of " bsTttBandsLow " field apply the TTT frame.

" bsTttBandsHigh " field 905 has " numBands ".

If the TTT frame has double-mode, then the low-frequency band scope can be equal to or greater than zero and less than " bsTttBandsLow ", and high-band frequency range can be equal to or greater than " bsTttBandsLow " and less than " bsTttBandsHigh ".

If the TTT frame does not have double-mode, then the parameter band number that the TTT frame is applied can be equal to or greater than zero and less than " numBands " (907).

" bsTttBandsLow " field 904 can be represented by fixed number of bits.For example, shown in Fig. 9 A, 5 can distribute 5 bits with expression " bsTttBandsLow " field 904.

Fig. 9 B illustrates the parameter band number of representing to put on the TTT frame according to an embodiment of the invention with variable number of bits.Fig. 9 B is similar to Fig. 9 A, but its difference is Fig. 9 B with variable number of bits " bsTttBandsLow " field 907, and Fig. 9 A represents " bsTttBandsLow " field 904 with fixed number of bits.Specifically, because " bsTttBandsLow " field 907 has the value that is equal to or less than " numBands ", therefore can use " numBands " to represent " bsTttBands " field 907 with variable number of bits.

Specifically, be equal to or greater than in 2^ (n-1) and the situation, can represent " bsTttBandsLow " field 907 with the n bit less than 2^n at " numBands ".

For example: if (i) " numBands " is 40, then " bsTttBandsLow " field 907 is represented by 6 bits; If (ii) " numBands " is 28 or 20, then " bsTttBandsLow " field 907 is represented by 5 bits; If (iii) " numBands " is 14 or 10, then " bsTttBandsLow " field 907 is represented by 4 bits; And if (iv) " numBands " is 7,5 or 4, then " bsTttBandsLow " field 907 is represented by 3 bits.

If " numBands " drops on greater than 2^ (n-1) and be equal to or less than in the scope of 2^n, then " bsTttBandsLow " field 907 can be represented by the n bit.

For example: if (i) " numBands " is 40, then " bsTttBandsLow " field 907 is represented by 6 bits; If (ii) " numBands " is 28 or 20, then " bsTttBandsLow " field 907 is represented by 5 bits; If (iii) " numBands " is 14 or 10, then " bsTttBandsLow " field 907 is represented by 4 bits; If (iv) " numBands " is 7 or 5, then " bsTttBandsLow " field 907 is represented by 3 bits; If (v) " numBands " is 4, and then " bsTttBandsLow " field 907 is represented by 2 bits.

" bsTttBandsLow " field 907 can be represented for the definite bit number of the ceiling function of variable by getting " numBands ".

For example, (i) in the situation of 0＜bsTttBandsLow≤numBands or 0≤bsTttBandsLow＜numBands, " bsTttBandsLow " field 907 is by corresponding to ceil (log ₂(numBands)) bit number of value is represented; Perhaps (ii) in the situation of 0≤bsTttBandsLow≤numBands, " bsTttBandsLow " field 907 can be by ceil (log ₂(numBands+1)) bit is represented.

If value is equal to or less than " numBands ", promptly " numberBands " is that arbitrariness ground is determined, then can use " numberBands " to represent " bsTttBandsLow " field 907 with variable bit number.

Specifically, (i) at 0＜bsTttBandsLow≤numberBands or in the situation of 0≤bsTttBandsLow＜numberBands, " bsTttBandsLow " field 907 is by corresponding to ceil (log ₂(numberBands)) bit number of value is represented or (ii) in the situation of 0≤bsTttBandsLow≤numberBands, and " bsTttBandsLow " field 907 can be by corresponding to ceil (log ₂(numberBands+1)) bit number is represented.

In the situation of a plurality of TTT frames, the combination of " bsTttBandsLow " can be expressed as the formula 5 as giving a definition.

[formula 5]

Σ_{i = 1}^{N} {numBands}^{i - 1} \cdot {bsTttBandsLow}_{i},

0≤bsTttBandsLow _i＜numBands，

In this case, bsTttBandsLow _iRepresent i " bsTttBandsLow ",, so save detailed description in the following description formula 5 because the meaning of formula 5 and formula 1 is identical.

In the situation of a plurality of TTT frames, the combination of " bsTttBandsLow " can use " numberBands " to be expressed as among the formula 6-8 one.Because therefore the meaning of formula 6-8 and the same meaning of formula 2-4 save the detailed description to formula 6-8 in the explanation of back.

[formula 6]

Σ_{i = 1}^{N} {(numberBands + 1)}^{i - 1} \cdot {bsTttBandsLow}_{i},

0≤bsTttBandsLow _i≤numberBands，

[formula 7]

Σ_{i = 1}^{N} {numberBands}^{i - 1} \cdot {bsTttBandsLow}_{i},

0≤bsTttBandsLow _i＜numberBands，

[formula 8]

Σ_{i = 1}^{N} {numberBands}^{i - 1} \cdot {bsTttBandsLow}_{i},

0＜bsTttBandsLow _i≤numberBands，

Can be expressed as the value that divides exactly of " numBands " to sound channel conversion module (for example OTT frame and/or TTT frame) the parameter band number that applies.In this example, this value of dividing exactly use the half value of " numBands " or with " numBands " divided by the value that obtains behind the particular value.

In case determined parameter band number that OTT and/or TTT frame are applied, just can determine in the scope of these numbers parameter band, can put on the parameter set of each OTT frame and/or each TTT frame.Each parameter energy collecting with the time slot be unit put on each OTT frame and/or each TTT frame.Promptly a parameter set can be applied in a time slot.

Mention as the front explanation, an air-frame can comprise a plurality of time slots.If air-frame is the anchor-frame type, then parameter set can be put on equally spaced a plurality of time slot.If frame is variable frame type, then need the positional information of the time slot that applies parameter set.This will be described in detail in conjunction with Figure 13 A-13C in the back.

Figure 10 A illustrates the sentence structure of the spatial spread configuration information of spatial spread frame according to an embodiment of the invention.The spatial spread configuration information can comprise " bsSacExtType " field 1001, " bsSacExtLen " field 1002, " bsSacExtLenAdd " field 1003, " bsSacExtLenAddAdd " field 1004 and " bsFillBits " field 1007.Also can use other field.

The data type of " bsSacExtType " field 1001 indication spatial spread frames.For example, spatial spread frame usable zero, residual signals data, arbitrariness downmix residual signal data or arbitrariness tree data are filled.

The byte number of " bsSacExtLen " field 1002 indication spatial spread configuration informations.

" bsSacExtLenAdd " field 1003 byte numbers add-word joint number of indication spatial spread configuration information under 15 the situation of for example being equal to or greater than that becomes at the spatial spread configuration information.

" bsSacExtLenAddAdd " field 1004 bit numbers add-word joint number of indication spatial spread configuration information under 270 the situation of for example being equal to or greater than that becomes at the spatial spread configuration information.

After in encoder/decoder, determine/having extracted each field, for the data type that is included in the spatial spread frame is determined configuration information (1005).

Mention as the front explanation, in the spatial spread frame, can comprise residual signals data, arbitrariness downmix residual signal data, tree data etc.

Then, the not use bit number 1006 of the length of computer memory expanded configuration information.

1007 indications of " bsFillBits " field can be left in the basket to fill the bit number that these do not use the data of bit.

Figure 10 B and 10C illustrate according to an embodiment of the invention the sentence structure that is comprised in the spatial spread configuration information of this residual signals in the situation in the spatial spread frame at residual signals.

With reference to Figure 10 B, the sample frequency of " bsResidualSamplingFrequencyIndex " field 1008 indication residual signals.

The residual error frame number of the every air-frame of " bsResidualFramesPerSpatialFrame " field 1009 indications.For example, can comprise 1,2,3 or 4 residual frame in an air-frame.

The parameter band number of the residual signals that " ResidualConfig " frame 1010 indication applies each OTT and/or TTT frame.

With reference to Figure 10 C, " bsResidualPresent " field 1011 indicates whether each OTT and/or TTT frame have been applied residual signals.

" bsResidualBands " field 1012 exists indication under the situation of residual signals to be present in the parameter band number of the residual signals in each OTT and/or the TTT frame in each OTT and/or TTT frame.The parameter band number of residual signals can be represented by fixed number of bits or variable number of bits.In the situation that parameter band number is represented by fixing bit number, residual signals can have the value of the parameter band sum of the audio signal of being equal to or less than.Therefore, can distribute the expression necessary bit numbers of all parameter band numbers (for example being 5 bits among Figure 10 C).

Figure 10 D illustrates the sentence structure of representing the parameter band number of residual signals according to an embodiment of the invention with variable number of bits." bsResidualBands " field 1014 can use " numBands " to represent with variable number of bits.If numBands is equal to or greater than 2^ (n-1) and less than 2^ (n), then " bsResidualBands " field 1014 can be represented by the n bit.

For example: if (i) " numBands " is 40, then " bsResidualBands " field 1014 is represented by 6 bits; If (ii) " numBands " is 28 or 20, then " bsResidualBands " field 1014 is represented by 5 bits; If (iii) " numBands " is 14 or 10, then " bsResidualBands " field 1014 is represented by 4 bits; And if (iv) " numBands " is 7,5 or 4, then " bsResidualBands " field 1014 is represented by 3 bits.

If numBands is greater than 2^ (n-1) and be equal to or less than 2^ (n), then can represent the parameter band number of residual signals by the n bit.

For example: if (i) " numBands " is 40, then " bsResidualBands " field 1014 is represented by 6 bits; If (ii) " numBands " is 28 or 20, then " bsResidualBands " field 1014 is represented by 5 bits; If (iii) " numBands " is 14 or 10, then " bsResidualBands " field 1014 is represented by 4 bits; If (iv) " numBands " is 7 or 5, then " bsResidualBands " field 1014 is represented by 3 bits; And (if v) " numBands " is 4, then " bsResidualBands " field 1014 is represented by 2 bits.

In addition, " bsResidualBands " field 1014 can be represented for the definite bit number of the ceiling function that is rounded up to immediate integer of variable by getting " numBands ".

Specifically, (i) in the situation of 0＜bsResidualBands≤numBands or 0≤bsResidualBands＜numBands, " bsResidualBands " field 1014 is by ceil{log ₂(numBands) } bit is represented, perhaps (ii) in the situation of 0≤bsResidualBands≤numBands, " bsResidualBands " field 1014 can be by ceil{log ₂(numBands+1) } bit is represented.

In certain embodiments, " bsResidualBands " field 1014 can use the value (numberBands) that is equal to or less than numBands to represent.

Specifically, (i) in the situation of 0＜bsResidualBands≤numberBands or 0≤bsResidualBands＜numberBands, " bsResidualBands " field 1014 is by ceil{log ₂(numberBands) } bit is represented, perhaps (ii) in the situation of 0≤bsResidualBands≤numberBands, " bsResidualBands " field 1014 can be by ceil{log ₂(numberBands+1) } bit is represented.

If there are a plurality of residual signals (N), then the combination of " bsResidualBands " can be expressed as shown in following formula 9:

[formula 9]

Σ_{i = 1}^{N} {numBands}^{i - 1} \cdot {bsResidualBands}_{i},

0≤bs?ResidualBands _i＜numBands，

In this case, bs ResidualBands _iRepresent i " bsresidualBands ".Because the meaning of formula 9 is identical with formula 1, therefore in the explanation of back, save detailed description to formula 9.

If there are a plurality of residual signals, then can use " numberBands " that the combination table of " bsresidualBands " is shown among the formula 10-12 one.Owing to use " numberbands " expression " bsresidualBands " identical, therefore in the explanation of back, save detailed description thereof with the expression of formula 2-4.

[formula 10]

Σ_{i = 1}^{N} {(munberBands + 1)}^{i - 1} \cdot {bsResidualBands}_{i},

0≤bs?ResidualBands _i≤numberBands，

[formula 11]

Σ_{i = 1}^{N} {numberBands}^{i - 1} \cdot {bsResidualBands}_{i},

0≤bs?ResidualBands _i＜numberBands，

[formula 12]

Σ_{i = 1}^{N} {numberBands}^{i - 1} \cdot {bsResidualBands}_{i},

0＜bs?ResidualBands _i≤numberBands，

The parameter band number of residual signals can be expressed as the value that divides exactly of " numBands ".In this example, this value of dividing exactly can be used the half value of " numBands " or the value that " numBands " obtains divided by particular value.

Residual signals can be comprised in the bit stream of audio signal with down-mix audio signal and spatial signal information, and this bit stream can be sent to decoder.Decoder can extract down-mix audio signal, spatial signal information and residual signals from bit stream.

Then, usage space information is carried out the channel expansion audio mixing to down-mix audio signal.Simultaneously, in channel expansion audio mixing process, residual signals is put on down-mix audio signal.Specifically, usage space information is carried out the channel expansion audio mixing to down-mix audio signal in a plurality of sound channel conversion modules.During this period, residual signals is applied in the sound channel conversion module.As mentioning in the explanation of front, the sound channel conversion module has several parameter bands and parameter set is that unit is applied in the sound channel conversion module with the time slot.When residual signals is applied in the sound channel conversion module, may needs residual signals to upgrade it is applied correlation information between the sound channel of audio signal of residual signals.Then, correlation information is used in the channel expansion audio mixing is handled between the sound channel after the renewal.

Figure 11 A is the block diagram that does not have the decoder that instructs coding according to an embodiment of the invention.Do not have in the bit stream that instructs coding to mean audio signal and do not comprise spatial information.

In certain embodiments, this decoder comprises analysis filterbank 1102, analytic unit 1104, space synthesis unit 1106 and composite filter group 1108.Although the down-mix audio signal of stereophonic signal type shown in Figure 11 A, yet also can use the down-mix audio signal of other type.

At work, this decoder reception down-mix audio signal 1101 and analysis filterbank 1102 are transformed into frequency-region signal 1103 with the down-mix audio signal 1101 that receives.Analytic unit 1104 is from down-mix audio signal 1103 span information through conversion.Analytic unit 1104 is that unit carries out and handles and can whenever a plurality of information of the span with gap 1105 with the gap.In this case, described gap comprises time slot.

Can two steps generate spatial information.At first, generate the multi-channel audio parameter from down-mix audio signal.The second, with the spatial information of multi-channel audio parameter transformation one-tenth such as spatial parameter.In certain embodiments, can generate the multi-channel audio parameter by the matrix computations of down-mix audio signal.

The spatial information 1105 and down-mix audio signal 1103 synthetic generate multi-channel audio signals 1107 of spatial analysis unit 1106 by being generated.The multi-channel audio signal 1107 process composite filter groups 1108 that generated are to be transformed into time-domain audio signal 1109.

Can be in predetermined gap position span information.Distance between these positions can equate (that is, equidistant).For example, can be per four span information with gap.Also can be in variable interstitial site span information.In this case, can be from the bitstream extraction interstitial site information of span information since then.Positional information can be represented by variable bit number.Positional information can be expressed as absolute value and with the difference of last time slot position information.

In using the situation of not having the guidance coding, the parameter band number (hereinafter being referred to as " bsNumguidedBlindBands ") of each sound channel of audio signal can be represented by fixing bit number." bsNumguidedBlindBands " can use " numBands " to represent with variable bit number.For example, if " numBands " is equal to or greater than 2^ (n-1) and less than 2^ (n), then can represents " bsNumguidedBlindBands " with variable n bit.

Specifically, (a) if " numBands " is 40, then " bsNumguidedBlindBands " represented by 6 bits, (b) if " numBands " is 28 or 20, then " bsNumguidedBlindBands " represented by 5 bits, if (c) " numBands " is 14 or 10, then " bsNumguidedBlindBands " represented by 4 bits, if and (d) " numBands " is 7,5 or 4, then " bsNumguidedBlindBands " represented by 3 bits.

If " numBands " is greater than 2^ (n-1) and be equal to or less than 2^ (n), then can represent " bsNumguidedBlindBands " with variable n bit.

For example: if (a) " numBands " is 40, then " bsNumguidedBlindBands " represented by 6 bits; (b) if " numBands " is 28 or 20, then " bsNumguidedBlindBands " represented by 5 bits; (c) if " numBands " is 14 or 10, then " bsNumguidedBlindBands " represented by 4 bits; (d) if " numBands " is 7 or 5, then " bsNumguidedBlindBands " represented by 3 bits; And if (e) " numBands " is 4, then " bsNumguidedBlindBands " represented by 2 bits.

In addition, " bsNumguidedBlindBands " can use and get " numBands " and represent with variable bit number as the ceil function of variable.

For example, (i) in the situation of 0＜bsNumguidedBlindBands≤numBands or 0≤bsNumguidedBlindBands＜numBands, " bsNumguidedBlindBands " is by ceil{log ₂(numBands) } bit is represented, perhaps (ii) in the situation of 0≤bsNumguidedBlindBands＜numBands, " bsNumguidedBlindBands " can be by ceil{log ₂(numBands+1) } bit is represented.

If a value is equal to or less than " numBands ", promptly " numBands " is that arbitrariness ground determines that then " bsNumguidedBlindBands " can be expressed as follows.

Specifically, (i) in the situation of 0＜bsNumguidedBlindBands≤numberBands or 0≤bsNumguidedBlindBands＜numberBands, " bsNumguidedBlindBands " is by ceil{log ₂(numberBands) } bit is represented, perhaps (ii) in the situation of 0≤bsNumguidedBlindBands≤numberBands, " bsNumguidedBlindBands " can be by ceil{log ₂(numberBands+1) } bit is represented.

If there are several sound channels (N), then the combination of " bsNumguidedBlindBands " can be expressed as suc as formula 13.

[formula 13]

Σ_{i = 1}^{N} {numBands}^{i - 1} \cdot {bsNumGuidedBlindBands}_{i},

0≤bsNumGuidedBlindBands _i＜numBands，

In this case, " bsNumGuidedBlindBands _i" i of indication " bsNumguidedBlindBands ".Because the meaning of formula 13 is identical with formula 1, therefore save detailed description in the following description to formula 13.

If there are a plurality of sound channels, then can use " numberBands " that " bsNumguidedBlindBands " is expressed as among the formula 14-16 one.Owing to use " numberBands " expression " bsNumguidedBlindBands " identical, therefore save detailed description in the following description to formula 14-16 with the expression of formula 2-4.

[formula 14]

Σ_{i = 1}^{N} {(numBands + 1)}^{i - 1} \cdot {bsNumGuidedBlindBands}_{i},

0≤bsNumGuidedBlindBands _i≤numberBands，

[formula 15]

Σ_{i = 1}^{N} {numberBands}^{i - 1} \cdot {bsNumGuidedBlindBands}_{i},

0≤bsNumGuidedBlindBands _i＜numberBands，

[formula 16]

Σ_{i = 1}^{N} {numberBands}^{i - 1} \cdot {bsNumGuidedBlindBands}_{i},

0＜bsNumGuidedBlindBands _i≤numberBands，

Figure 11 B is the figure that according to an embodiment of the invention parameter band numerical statement is shown one group method.Parameter band number comprises the parameter band information of number that puts on the sound channel conversion module, the parameter band information of number that puts on residual signals and is using the parameter band information of number of not having each sound channel that instructs the situation sound intermediate frequency signal of encoding.In the situation that has a plurality of parameter band information of number, these a plurality of information of number (for example " bsOttBands ", " bsTttBands ", " bsResidualBand " and/or " bsNumguidedBlindBands ") can be expressed as at least one or a plurality of groups.

With reference to Figure 11 B, if exist (kN+L) individual parameter band information of number and if desired Q bit represent each parameter band information of number, then a plurality of parameter band information of number can be represented as following one group.In this case, " k " and " N " is the arbitrariness integer of non-zero and " L " is the arbitrariness integer that satisfies 0≤L＜N.

A kind of grouping method may further comprise the steps: generate the k group by N parameter band information of number bound together, and generate the most last group by last L parameter band information of number bound together.This k group can be expressed as M bit and the most last group can be expressed as the p bit.In this case, this M bit is preferably the N*Q bit that is less than not representing to use in the situation of each parameter band information of number with organizing into groups.This p bit is preferably the L*Q bit that is equal to or less than not representing to use in the situation of each parameter band information of number with organizing into groups.

For example, suppose that two parameter band information of number are respectively b1 and b2.If each among b1 and the b2 can have 5 values, then need 3 bits to represent among b1 and the b2 each.In this case, although 3 bits can be represented 8 values, what need in fact is 5 values.Therefore, each of b1 and b2 has three redundancies.Yet,, can use 5 bits rather than 6 bits (=3 bits+3 bits) by b1 and b2 are bound together and b1 and b2 are expressed as in the situation of a group.Specifically, since all combinations of b1 and b2 comprise 25 (=5*5) type, so the group of b1 and b2 can be represented as 5 bits.Because this 5 bit can be represented 32 values, therefore in the situation of marshalling expression, produce 7 redundancies.Yet in the situation of representing by marshalling b1 and b2, the redundancy that its redundancy ratio is expressed as among b1 and the b2 each in the situation of 3 bits is little.The method that a plurality of parameter band information of number is expressed as group can realize with following variety of way.

If a plurality of parameter band information of number have 40 kinds of values separately, then use 2,3,4,5 or 6 to generate k group as N.This k group can be represented as 11,16,22,27 and 32 bits respectively.In addition, can represent this k group by each situation is combined.

If a plurality of parameter band information of number have 28 kinds of values separately, then use 6 as N generation k group, and this k group can be represented as 29 bits.

If a plurality of parameter band information of number have 20 kinds of values separately, then use 2,3,4,5,6 or 7 to generate k group as N.This k group is represented as 9,13,18,22,26 and 31 bits respectively.Perhaps, can represent this k group by each situation is combined.

If a plurality of parameter band information of number have 14 kinds of values separately, then use 6 to generate k group as N.This K group can be expressed as 23 bits.

If a plurality of parameter band information of number have 10 kinds of values separately, then use 2,3,4,5,6,7,8 or 9 to generate k group as N.This k group can be expressed as 7,10,14,17,20,24,27 and 30 bits respectively.Perhaps, can represent this k group by each situation is combined.

If a plurality of parameter band information of number have 7 kinds of values separately, then use 6,7,8,9,10 or 11 to generate k group as N.This k group is expressed as 17,20,23,26,29 and 31 bits respectively.Perhaps,, each situation represents this k group by being combined.

If a plurality of parameter band information of number have for example 5 kinds of values separately, then can use 2,3,4,5,6,7,8,9,10,11,12 or 13 to generate k group as N.This k group can be expressed as 5,7,10,12,14,17,19,21,24,26,28 and 31 bits respectively.Perhaps,, each situation represents this k group by being combined.

In addition, a plurality of parameter band information of number can be configured to be expressed as above-mentioned all group, perhaps by make each parameter band information of number become one independently bit sequence represent consistently.

Figure 12 illustrates the sentence structure of the configuration information of representation space frame according to an embodiment of the invention.Air-frame comprises " FramingInfo " piece 1201, " bsIndependencyFlag " field 1202, " OttData " piece 1203, " TttData " piece 1204, " SmgData " piece 1205 and " tempShapeData " piece 1206.

" FramingInfo " piece 1201 comprises the parameter set information of number and about it being applied the information of the time slot of each parameter set." FramingInfo " piece 1201 will be described in detail in Figure 13 A.

Frame before whether " bsIndependencyFlag " field 1202 expression present frames needn't be known just can be decoded.

" OttData " piece 1203 comprises all spatial parameter information of all OTT frames.

" TttData " piece 1204 comprises all spatial parameter information of all TTT frames.

" SmgData " piece 1205 comprises about putting on the information through the time smoothing of the spatial parameter of inverse quantization.

" tempShapeData " piece 1206 comprises about putting on the information through the temporal envelope shaping of the signal of decorrelation.

Figure 13 A illustrates the sentence structure that is used to represent it is applied the time slot position information of parameter set according to an embodiment of the invention.The air-frame of " bsFramingType " field 1301 indicative audio signals is anchor-frame type or variable frame type.Anchor-frame represents that parameter set is applied in the frame that presets time slot.For example, parameter set is applied in the time slot uniformly-spaced to preset.Variable frame represents to receive separately the frame of positional information that it is applied the time slot of parameter set.

" bsNumParamSets " field 1302 is illustrated in the parameter set number (being referred to as " numParaSets " hereinafter) in the air-frame, and has the relation of " numParamSets=bsNumParamSets+1 " between " numParamSets " and " bsNumParamSets ".

Owing to assigned 3 bits for example in Figure 13 A, for " bsNumParamSets " field 1302, therefore maximum 8 parameter sets can be provided in an air-frame.Owing to the bit number that distributed without limits, therefore can in an air-frame, provide more parameter set.

If air-frame is the anchor-frame type, then can determines it is applied the positional information of the time slot of parameter set according to presetting rule, and be unnecessary its additional location information that applies the time slot of parameter set.Yet,, its positional information that applies the time slot of parameter set is needed if air-frame is variable frame type.

1303 indications of " bsParamSlot " field apply the positional information of the time slot of parameter set to it.Can use the timeslot number in the air-frame is that " numSlots " represents " bsParamSlot " field 1303 by variable number of bits.Specifically, be equal to or greater than in 2^ (n-1) and the situation less than 2^ (n) at " numSlots ", " bsParamSlot " field 1103 can be represented by the n bit.

For example: if (i) " numSlots " drops in the scope between 64 and 127, then " bsParamSlot " field 1303 can be represented by 7 bits; If (ii) " numSlots " drops in the scope between 32 and 63, then " bsParamSlot " field 1303 can be represented by 6 bits; If (iii) " numSlots " drops in the scope between 16 and 31, then " bsParamSlot " field 1303 can be represented by 5 bits; If (iv) " numSlots " drops in the scope between 8 and 15, then " bsParamSlot " field 1303 can be represented by 4 bits; If (v) " numSlots " drops in the scope between 4 and 7, and then " bsParamSlot " field 1303 can be represented by 3 bits; If (vi) " numSlots " drops in the scope between 2 and 3, and then " bsParamSlot " field 1303 can be represented by 2 bits; If (vii) " numSlots " is 1, and then " bsParamSlot " field 1303 can be represented by 1 bit; And (if viii) " numSlots " is 0, then " bsParamSlot " field 1303 can be represented by 0 bit.Similarly, if " numSlots " drops in the scope between 64 and 127, then " bsParamSlot " field 1303 can be represented by 7 bits.

If there are a plurality of parameter sets (N), then the combination of " bsParamSlot " can be represented according to formula 9.

[formula 9]

Σ_{i = 1}^{N} {numSlots}^{i - 1} \cdot {bsParamSlot}_{i},

0≤bsParamSlot _i＜numSlots，

In this case, " bsParamSlots _i" indication it is applied the time slot of i parameter set.For example, suppose " numSlots " be 3 and " bsParamSlot " field 1303 can have 10 values.In this case, " bsParamSlot " field 1303 needs three information (hereinafter being referred to as c1, c2 and c3 respectively), the (=4*3) bit that owing to need 4 bits to represent among c1, c2 and the c3 each, therefore needs 12 altogether.By c1, c2 and c3 being bound together it is expressed as in one group the situation, (=10*10*10) kind situation, these situations can be represented as 10 bits, thereby save 2 bits may to take place 1000." if numSlots " be 3 and the value that is read as 5 bits be 31, then this value can be represented as 31=1* (3^2)+5* (3^1)+7* (3^0).Decoder device can determine that c1, c2 and c3 are respectively 1,5 and 7 by formula 9 being carried out inverse operation.

Figure 13 B illustrates the sentence structure that the positional information that is used for according to an embodiment of the invention it being applied the time slot of parameter set is expressed as absolute value and difference.If air-frame is variable frame type, then can utilize this fact of " bsParamSlot " information monotonic increase that " bsParamSlot " field 1303 among Figure 13 A is expressed as absolute value and difference.

For example: (i) can be generated as an absolute value to its position that applies the time slot of first parameter set, promptly " bsParamSlot[0] "; And (ii) to its apply second or more the position of the time slot of senior staff officer's manifold can be generated as difference, i.e. " difference " or " difference-1 " between " bsParamSlot[ps] " and " bsParamSlot[ps-1] " (be referred to as hereinafter " bsDiffParamSlot[ps] ").In this case, " ps " expression parameter set.

" bsParamSlot[0] " field 1304 can be represented by the bit number (hereinafter being referred to as " nBitsParamSlot (0) ") that uses " numSlots " and " numParamSets " to calculate.

The bit number that " bsDiffParamSlot[ps] " field 1305 can be by using " numSlots ", " numParamSets " and it being applied the position of the time slot of last parameter set---promptly " bsParamSlot[ps-1] "---calculates (hereinafter be referred to as " nBitsParamSlot[ps] ") represent.

Specifically, in order to represent " bsParamSlot[ps] " with minimum number bits, can determine the bit number of expression " bsParamSlot[ps] " based on following rule: (i) a plurality of " bsParamSlot[ps] " increase progressively (bsParamSlot[ps]=bsParamSlot[ps-1]) with the ascending order ordered series of numbers; (ii) the maximum of " bsParamSlot[0] " is " numSlots-NumParamsets "; And (iii) in the situation of 0＜ps＜numParamSets, bsParamSlot[ps] can only have the value between " bsParamSlot[ps-1]+1 " and " numSlots-numParamSets+ps ".

For example, if " numSlots " if be 10 and " numParamsets " be 3, then because bsParamSlot[ps] increase progressively with the ascending order ordered series of numbers, therefore the maximum of " bsParamSlot[0] " becomes " 10-3=7 ".Promptly, should from the value of 0-7, select " bsParamSlot[0] ".This is because if " bsParamSlot[0] " has the value greater than 7, and then the used timeslot number of Sheng Xia parameter set will be not enough.

If " bsParamSlot[0] " be 5, then should from the value between " 5+1=6 " and " 10-3+1=8 ", select the time slot position bsParamSlot[1 of second parameter set].

If " bsParamSlot[1] " be 7, " bsParamSlot[2] " can be changed into 8 or 9.If " bsParamSlot[1] " be 8, then " bsParamSlot[2] " can be changed into 9.

Therefore, can use above-mentioned feature that " bsParamSlot[ps] " is expressed as variable number of bits, rather than be expressed as fixed bit.

When configuration " bsParamSlot[ps] " in bit stream, if " ps " is 0, then " bsParamSlot[0] " can be expressed as absolute value by bit number corresponding to " nBitsParamSlot (0) ".If " ps " greater than 0, and then " bsParamSlot[ps] " can be expressed as difference by bit number corresponding to " nBitsParamSlot (ps) ".When from bit stream, reading " bsParamSlot[ps] " of above-mentioned configuration, can use formula 10 to find the bitstream length of each data, promptly " nBitsParamSlot[ps] ".

[formula 10]

Specifically, can find " nBitsParamSlot[ps] " be nBitsParamSlot[0]=f _b(numSlots-numParamSets+1).If 0＜ps＜numParameSets, then can find " nBitsParamSlot[ps] " be nBitsParamSlot[ps]=f _b(numSlots-numParamSets+ps-bsParamSlot[ps-1]).Can use and determine " nBitsParamSlot[ps] " expanding on the formula 10 to the formula 11 of 7 bits.

[formula 11]

Function f _b(x) a example is explained as follows." if numSlots " be 15 and " numParamSets " be 3, then this function can be evaluated as nBitsParamSlot[0]=f _b(15-3+1)=4 bit.

If " bsParamSlot[0] " that represented by 4 bits is 7, then this function can be evaluated as nBitsParamSlot[1]=f _b(15-3+1-7)=3 bit.In this case, " bsDiffParamSlot[1] " field 1305 is represented by 3 bits.

If the value of being represented by 3 bits is 3, then " bsParamSlot[1] " becomes 7+3=10.Therefore become nBitsParamSlot[2] fb (15-3+2-10)=2 bit.In this case, " bsDiffParamSlot[2] " field 1305 can be represented by 2 bits.If the remaining time slots number equals rest parameter collection number, assign 0 bit then can for " bsDiffParamSlot[ps] " field.In other words, need not the position that additional information just can represent it is applied the time slot of this parameter set.

Therefore, the bit number of " bsParamSlot[ps] " can be determined changeably.In decoder, can use function f _b(x) read the bit number of " bsParamSlot[ps] " from bit stream.In certain embodiments, function f _b(x) can comprise function ceil (log ₂(x)).

When in decoder when bit stream reads the information of " bsParamSlot[ps] " that be represented as absolute value and difference, can at first from bit stream, read " bsParamSlot[0] " and can read subsequently " bsDiffParamSlot[ps] ", 0＜ps＜numParamSets.Can use " bsParamSlot[0] " and " bsDiffParamSlot[ps] " to find interval 0≤ps≤numParamSets " bsParamSlot[ps] " subsequently.For example shown in Figure 13 B, can find " bsParamSlot[ps] " by add " bsDiffParamSlot[ps]+1 " to " bsParamSlot[ps-1] ".

Figure 13 C illustrates the positional information that is used for according to an embodiment of the invention it being applied the time slot of parameter set and is expressed as one group sentence structure.In having the situation of a plurality of parameter sets, " bsParamSlots " 1307 of a plurality of parameter sets can be expressed as at least one or a plurality of groups.

The number of " if bsParamSlots " 1307 for (kN+L) and if desired the Q bit be expressed as each " bsParamSlots " 1307, then " bsParamSlots " 1307 can be expressed as following one group.In this case, " k " and " N " is the arbitrariness integer of non-zero and " L " is the arbitrariness integer that satisfies 0≤L＜N.

A kind of grouping method can may further comprise the steps: by being bound together, N " bsParamSlots " 1307 generate the k group, and by last L " bsParamSlots " 1307 bound together to generate the most last group.This k group can be represented and Mo Zhu can be represented by the p bit by the M bit.In this case, the M bit is preferably the N*Q bit that is less than not representing to use in the situation of each " bsParamSlots " 1307 with organizing into groups.This p bit is preferably the L*Q bit that is equal to or less than not representing to use in the situation of each " bsParamSlots " 1307 with organizing into groups.

For example, suppose to be respectively d1 and d2 corresponding to a pair of " bsParamSlots " 1307 of two parameter sets.If each among d1 and the d2 can have five kinds of values, then need 3 bits to represent among d1 and the d2 each.In this case, although this 3 bit can be represented 8 values, what need in fact is 5 values.Therefore, each among d1 and the d2 has three redundancies.Yet, in be expressed as one group situation by d1 and d2 are bound together and with d1 and d2, use be 5 bits rather than 6 bits (=3 bits+3 bits).Specifically, (group of=5*5) individual type, so d1 and d2 can be represented as only 5 bits because all combinations of d1 and d2 comprise 25.Because 5 bits can be represented 32 values, therefore in the situation of marshalling expression, produce 7 redundancies.Yet in the situation that marshalling d1 and d2 do to represent, the redundancy of situation that its redundancy ratio is expressed as among d1 and the d2 each 3 bits is little.

In configuration this when group, can use first value " bsParamSlot[0] " and second or " bsParamSlot[ps] " of higher value between difference dispose the data of this group.

In configuration this when group, if the parameter set number is 1 then direct allocation bit and not organizing into groups, if the parameter set number is equal to or greater than 2 then can finish marshalling allocation bit afterwards.

Figure 14 is the flow chart of coding method according to an embodiment of the invention.A kind of method according to coding audio signal of the present invention and encoder work is explained as follows.

At first, the sum (numBands) of determining time slot sum (numSlots) in the air-frame of audio signal and parameter band (S1401).

Then, determine to put on the parameter band number and/or the residual signals (S1402) of sound channel conversion module (OTT frame and/or TTT frame).

If the OTT frame has the LFE sound channel mode, then determine to put on the parameter band number of OTT frame separately.

If the OTT frame does not have the LFE sound channel mode, then use " numBands " as the number of parameters that puts on this OTT frame.

Then, determine the type of air-frame.In this case, air-frame can be classified into anchor-frame type and variable frame type.

If air-frame is variable frame type (S1403), then determine the parameter set number (S1406) that in an air-frame, uses.In this case, can be that unit puts on the sound channel conversion module with parameter set with the time slot.

Then, determine position (S1407) to the time slot of its application parameter collection.In this case, the position to the time slot of its application parameter collection can be represented as absolute value and difference.For example, can be represented as absolute value to its position that applies the time slot of first parameter set, and to its apply second or more the position of the time slot of senior staff officer's manifold be represented as difference with last time slot position.In this case, can represent it is applied the position of the time slot of parameter set with variable number of bits.

Specifically, can represent by the bit number that uses time slot sum and parameter set sum to calculate its position that applies the time slot of first parameter set.It is applied second or more the position of the time slot of senior staff officer's manifold can be by using time slot sum, parameter set sum and the bit number that its position calculation that applies the time slot of last parameter obtains being represented.

If air-frame is the anchor-frame type, then determine the parameter set number (S1404) that in an air-frame, uses.In this case, use presetting rule to determine it is applied the position of the time slot of parameter set.For example, can be confirmed as distance to its position that applies the time slot of parameter set its position that applies the time slot of last parameter set is had uniformly-spaced (S1405).

Then, the parameter set sum in time slot sum, the parameter band sum of determining above down-mix unit and spatial information generation unit use, the parameter band number that will put on the sound channel converter unit, the air-frame and its positional information that applies the time slot of parameter set generated down-mix audio signal and spatial information (S1408) respectively.

At last, multiplexed unit generates the bit stream (S1409) that comprises this down-mix audio signal and this spacing wave and subsequently the bit stream that is generated is passed to decoder (S1409).

Figure 15 is the flow chart of coding/decoding method according to an embodiment of the invention.Be explained as follows according to audio signal decoding of the present invention and decoder method of operating.

At first, the bit stream of decoder received audio signal (S1501).Demultiplex unit is isolated down-mix audio signal and spatial signal information (S1502) from received bit stream 1502.Then, the spatial signal information decoding unit extracts information, the parameter band sum of a time slot sum in the air-frame and puts on the parameter band number (S1503) of sound channel conversion module from the configuration information of spatial signal information.

If air-frame is variable frame type (S1504), then from then on extract in the air-frame one in the air-frame the parameter set number and it is applied the positional information (S1505) of the time slot of parameter set.The positional information of time slot can be represented by fixing or variable bit number.In this case, can be represented as absolute value to its positional information that applies the time slot of first parameter set, and to its apply second or more the positional information of the time slot of senior staff officer's manifold can be represented as difference.It is applied second or more the actual position information of the time slot of senior staff officer's manifold can be by adding that this difference finds to its positional information that applies the time slot of last parameter set.

At last, use the information of being extracted that down-mix audio signal is transformed into multi-channel audio signal (S1506).

Above-mentioned disclosed embodiment provides the some advantages that are better than conventional audio coding scheme.

At first, when representing that with variable number of bits its position that applies the time slot of parameter set encoded multi-channel audio signal, the disclosed embodiments can reduce the data volume of being transmitted.

Secondly, by will being expressed as absolute value to its position that applies the time slot of first parameter set, and by will to its apply second or more the position of the time slot of senior staff officer's manifold be expressed as difference, the disclosed embodiments can reduce the data volume of being transmitted.

The 3rd, by represent to put on the parameter band number such as the sound channel conversion module of OTT frame or TTT frame with fixing or variable number of bits, the disclosed embodiments can reduce the data volume of being transmitted.In this case, can use aforementioned principles to represent it is applied the position of the time slot of parameter set, wherein these parameter sets can be present in the scope of parameter band number.

Figure 16 is the block diagram that is used to realize in conjunction with the exemplary apparatus framework 1600 of the described realization audio encoder/decoder of Fig. 1-15.This equipment framework 1600 is applicable to various device, including, but not limited to: personal computer, server computer, consumer electronics, mobile phone, PDA(Personal Digital Assistant), electronics tablet, television system, TV set-top box, game console, media player, music player, navigation system and any other equipment that can decoded audio signal.In these equipment some can realize using the anamorphotic system of the combination of hardware and software.

Framework 1600 comprises one or more processors 1602 (PowerPC for example

, Intel Pentium

4 etc.), one or more display devices 1604 (for example CRT, LCD), audio subsystem 1606 (for example audio hardware/software), one or more network interface 1608 (for example Ethernet, FireWire

, USB etc.), input equipment 1610 (for example keyboard, mouse etc.) and one or more computer-readable medium 1612 (for example RAM, ROM, SDRAM, hard disk, CD, flash memory etc.).These devices can come switched communication and data via one or more bus 1614 (for example EISA, PCI, PCI express train etc.).

Term " computer-readable medium " expression participates in providing instruction for any medium of carrying out, including, but not limited to non-volatile media (for example CD or disk), Volatile media (for example memory) and transmission medium to processor 1602.Transmission medium including, but not limited to, coaxial cable, copper cash and optical fiber.Transmission medium is also taked the form of sound, light or rf wave.

Computer-readable medium 1612 also comprises operating system 1616 (Mac OS for example

, Windows

, Linux etc.), network communication module 1618, audio codec 1620 and one or more application 1622.

Operating system 1616 can be multi-user, multiprocessing, multitask, multithreading, real time operating system etc.Operating system 1616 is carried out basic task, including, but not limited to: identification is from the input of input equipment 1610; Send output to display device 1604 and audio subsystem 1606; File and catalogue on the tracking computer computer-readable recording medium 1612 (for example memory or memory device); Control peripheral devices (for example disk drive, printer etc.); And manage traffic on this or the multiple bus 1614.

Network communication module 1618 comprises and is used to set up the various elements that are connected with maintaining network (for example be used to realize such as the communication protocol of TCP/IP, HTTP, Ethernet etc. software).Network communication module 1618 can comprise browser, and it allows the operator of equipment framework 1600 to go up search information (for example audio content) at network (for example internet).

Audio codec 1620 is responsible for realizing the whole of the coding described in conjunction with Fig. 1-15 and/or decode procedure or a part wherein.In certain embodiments, this audio codec is cooperated with audio signal with hardware (for example processor 1602, audio subsystem 1606), comprises according to described herein and encoding and/or decoded audio signal of the present inventionly.

Use 1622 and can comprise relevant with audio content and/or any software application of coding and/or decoded audio content therein, including, but not limited to media player, music player (for example MP3 player), mobile phone application, PDA, television system, set-top box etc.In one embodiment, audio codec can be used to provide the coding/decoding service on network (for example internet) by the ASP.

In the superincumbent explanation,, numerous details have been set forth to provide to thorough of the present invention for ease of explaining.Yet those skilled in that art can know and know, need not these details and also can put into practice the present invention.In other cases, structure and equipment illustrate with the present invention that avoids confusion with the block diagram form.

Especially, those skilled in that art can recognize, can use other framework and graphics environment, and can use other graphical tools and product except that foregoing to realize the present invention.Specifically, client/server approach only is an example of the framework that is used to provide control board function collection of the present invention, and those skilled in that art can recognize the non-client/server approach that also can use other.

The some parts of this detailed description is to provide with algorithm and the symbolic representation to the operation of data bit in computer storage.These arthmetic statements and expression are the means that the technical staff conveys to other those skilled in that art in the data processing field in order to the flesh and blood of most effectively they being worked.And generally speaking algorithm here, is contemplated and becomes to cause to close the self-consistent series of steps that needs the result.These steps are to make the step that physics is handled to physical quantity.Usually, although optional, this tittle takes to be stored, changes, makes up, relatively or the signal of telecommunication of handling or the form of magnetic signal.Proved that sometimes---mainly being the reason for versatility---it is easily that these signals are called bit, value, unit, symbol, character, item, numeral etc.

Industrial applicibility

Yet, should be kept in mind that all these and similarly term to be associated with suitable physical quantity and only be that the facility that is applied to this tittle indicates.May be obvious that unless stated otherwise or from discuss, run through this explanation otherwise should understand, the discussion of the term of use such as " processing " or " calculating " or " computing " or " determining " or " demonstration " etc. is meant the action and the processing of computer system or similar electronic computing device, and it is handled the data of showing with physics (electronics) scale and converts thereof in computer system memory or register or other this type of information storage, transmission or display device by other data of representing with physical quantity similarly in the RS of computer system.

The invention still further relates to and use the device of carrying out operation herein.This device can be at desired purpose and special configuration, and perhaps it can be served as reasons and be stored in the all-purpose computer that the computer program in the computer optionally activates or reconfigure and constitute.This computer program can be stored in the computer-readable recording medium, such as, but be not limited to, the dish that comprises any kind of floppy disk, CD, CD-ROM and magneto optical disk, read-only memory (ROM), random-access memory (ram), EPROM, EEPROM, magnetic or optical card or be applicable to the medium of any kind of store electrons instruction, and they are coupled in computer system bus separately.

The algorithm that provides herein is not relevant with any certain computer or other device inherently with module.Various general-purpose systems can be used with the program according to the instruction of this paper, perhaps provable structure more the device of specialization to carry out these method steps be easily.The necessary structure of various these type systematics will occur in the following description.In addition, the present invention does not describe with reference to any certain programmed language.Will be appreciated that and to use various programming languages to realize instruction of the present invention discussed herein.In addition, those skilled in the art should be clear and definite, and module of the present invention, feature, attribute, method collection and others can be embodied as software, hardware, firmware or three's combination in any.Certainly, be implemented as the occasion of software at assembly of the present invention, this assembly can be implemented as stand-alone program, be embodied as the part of relatively large program, be embodied as a plurality of stand-alone programs, be embodied as static state or dynamic link library, be embodied as kernel loadable module, be embodied as device driver and/or with technical staff in the computer programming field now or the every other method known to future.In addition, the present invention is in no way limited to realize in any specific operation system or environment.

Those skilled in that art can be clear and definite, can make various modifications and distortion to disclosed embodiment and can not break away from the spirit or scope of the present invention.Therefore, the present invention is intended to cover all this type of modification and the distortion to disclosed embodiment, as long as these modifications and distortion drop in the scope of claims and equivalence techniques scheme thereof.

Claims

1. the method for a decoded audio signal, it comprises:

Received audio signal, described audio signal comprises down-mix audio signal and spatial information, and described spatial information comprises at least one frame, and described at least one frame has at least one time slot and at least one parameter set, and described parameter set comprises at least one parameter;

Extract the gap information in the variable bit length, described gap information indication is applied in the position of the time slot of parameter set;

Extract parameter in the fixed bit length breath of taking a message, the described parameter breath indication parameter band number of taking a message; And

Based on described parameter take a message breath and described gap information, the parameter that adds parameter set by the parameter band application described audio signal of decoding to described time slot;

The step of wherein said extraction gap information comprises:

Extract number of time slot and parameter set number with the identification gap information from described audio signal;

Determine the bit length of described gap information, the described bit length that is used for first parameter set can be according to number of time slot, parameter set number of variations, and the described bit length that is used for parameter set subsequently can apply the change in location of the time slot of last parameter set according to number of time slot, parameter set number and to it; And

Extract described gap information based on described bit length,

The number of wherein said gap information equals described parameter set number.

2. the method for claim 1, it is characterized in that, described gap information comprises absolute value or difference, and wherein said absolute value is used to indicate the position of the time slot that is applied in first parameter set and described difference is used to indicate and is applied in first parameter set position of the time slot of parameter set subsequently, and

Wherein determine to be applied in the described time slot of described parameter set subsequently by the position that described difference is added to the time slot that is applied in last parameter set.

3. device that is used for decoded audio signal comprises:

The receiving element that is used for received audio signal, described audio signal comprises down-mix audio signal and spatial information, described spatial information comprises at least one frame, and described at least one frame has at least one time slot and at least one parameter set, and described parameter set comprises at least one parameter;

The parameter that spatial information decoding unit, described spatial information decoding unit extract the indication parameter band number in the fixed bit length breath of taking a message; Extract number of time slot and parameter set number are applied in the time slot of parameter set with the identification indication the gap information of position from described audio signal; Determine the bit length of described gap information, the described bit length that is used for first parameter set can be according to number of time slot, parameter set number of variations, and the bit length that is used for parameter set subsequently can apply the change in location of the time slot of last parameter set according to number of time slot, parameter set number and to it; Extract described gap information based on described bit length; And

Channel expansion audio mixing unit, described channel expansion audio mixing unit use the described down-mix audio signal of described spatial information channel expansion audio mixing to become multi-channel audio signal;

4. device as claimed in claim 3, it is characterized in that, described gap information comprises absolute value or difference, and wherein said absolute value is used to indicate the position of the time slot that is applied in first parameter set and described difference is used to indicate and is applied in first parameter set position of the time slot of parameter set subsequently of continuing.