CN101542597A

CN101542597A - Methods and apparatuses for encoding and decoding object-based audio signals

Info

Publication number: CN101542597A
Application number: CNA2008800003869A
Authority: CN
Inventors: 金东秀; 房熙锡; 林宰显; 尹圣龙; 李显国
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2007-02-14
Filing date: 2008-02-14
Publication date: 2009-09-23
Anticipated expiration: 2028-02-14
Also published as: CN101542595B; CN101542596A; CN101542595A; CN101542596B; CN101542597B

Abstract

An audio decoding method and apparatus and an audio encoding method and apparatus which can efficiently process object-based audio signals are provided. The audio decoding method includes receiving a downmix signal, which is obtained by downmixing a plurality of object signals, and object side information, extracting metadata from the object-side information and displaying an information regarding the object signals based on the metadata.

Description

The method and apparatus that is used for the object-based sound signal of Code And Decode

Technical field

The present invention relates to a kind of audio coding method and device, and a kind of audio-frequency decoding method and device, wherein can effectively handle object-based sound signal by carrying out the Code And Decode operation.

Background technology

In general, in multi-channel audio coding and decoding technique, a plurality of sound channel signals of multi-channel signal are reduced in the sound channel signal that audio mixing advances less number, and transmission is about the side information (side information) of original channel signal and recover to have multi-channel signal with the as many sound channel of original multi-channel signal.

Object-based audio coding and decoding technique and multi-channel audio coding and decoding technique are advancing a plurality of sound source reduction audio mixings in the sound source signals of less number, and the side information aspect of transmitting about original sound source is similar basically.Yet, in object-based audio coding and decoding technique, it is the fundamental element (for example sound of musical instrument or people's voice) of sound channel signal for an object signal, be regarded as identical with sound channel signal in multi-channel audio coding and the decoding technique, and also can be by coding/decoding.

In other words, in object-based audio coding and decoding technique, object signal is considered to be will be by the main body of coding/decoding (entities).In this, object-based audio coding and decoding technique and multi-channel audio coding and decoding technique are distinguishing, this difference is that multichannel audio coding/decoding operation is simply according to information between sound channel and by coding/decoding, and with irrelevant by the number of elements in the sound channel signal of coding/decoding.

Summary of the invention

Technical matters

The invention provides a kind of audio coding method and device, and a kind of audio-frequency decoding method and device, wherein can coding or decoded audio signal so that this sound signal can be applied to various environment.

Technical scheme

According to an aspect of the present invention, it provides a kind of audio-frequency decoding method, comprising: receive reduction audio signal and object-based side information, this reduction audio signal obtains by a plurality of object signal of reduction audio mixing; From object-based side information, extract metadata; Show object-related information according to metadata about object signal.According to another aspect of the present invention, it provides a kind of audio coding method, comprising: mix a plurality of object signal by reduction and generate the reduction audio signal; Generate object-based side information by from object signal, extracting object-related information; Be inserted into object-based side information with the metadata that will be used for the indicated object relevant information.

According to another aspect of the present invention, it provides a kind of audio decoding apparatus, comprise: demodulation multiplexer, it is configured to extract reduction audio signal and object-based side information from the sound signal of input, and this reduction audio signal obtains by a plurality of object signal of reduction audio mixing; Code converter, it is configured to extract metadata from object-based side information; Renderer, it shows object-related information about object signal according to metadata.

According to another aspect of the present invention, it provides a kind of computer readable recording medium storing program for performing, wherein record the computer program that is used to carry out a kind of audio-frequency decoding method, this audio-frequency decoding method comprises: receive reduction audio signal and object-based side information, this reduction audio signal obtains by a plurality of object signal of reduction audio mixing; From object-based side information, extract metadata; With the object-related information that shows according to metadata about object signal.

According to another aspect of the present invention, it provides a kind of computer readable recording medium storing program for performing, wherein record the computer program that is used to carry out a kind of audio coding method, this audio coding method comprises: mix a plurality of object signal by reduction and generate the reduction audio signal; Generate object-based side information by extract object-related information from object signal, this metadata is represented object-related information.

Description of drawings

Fig. 1 is the block scheme of typical object-based audio coding/decoding system;

Fig. 2 is the block scheme according to the audio decoding apparatus of first embodiment of the invention;

Fig. 3 is the block scheme according to the audio decoding apparatus of second embodiment of the invention;

Fig. 4 is the block scheme according to the audio decoding apparatus of third embodiment of the invention;

Fig. 5 is the block scheme that can be used for any reduction audio mixing gain (ADG) module of audio decoding apparatus shown in Figure 4;

Fig. 6 is the block scheme according to the audio decoding apparatus of fourth embodiment of the invention;

Fig. 7 is the block scheme according to the audio decoding apparatus of fifth embodiment of the invention;

Fig. 8 is the block scheme according to the audio decoding apparatus of sixth embodiment of the invention;

Fig. 9 is the block scheme according to the audio decoding apparatus of seventh embodiment of the invention;

Figure 10 is the block scheme according to the audio decoding apparatus of eighth embodiment of the invention;

Figure 11 and 12 is the block diagrams that are used for the operation of interpretive code converter;

Figure 13 to 16 is the block diagrams that are used to explain the structure of object-based side information;

Figure 17 to 22 is used to explain the block diagram that the fragment of a plurality of object-based side informations is merged into an independent side information;

Figure 23 to 27 is the block diagrams that are used to explain pretreatment operation; With

Figure 28 to 33 is used for explaining that a plurality of bit streams that will use object-based signal decoding are merged into the block diagram of the situation of a bit stream.

Implement optimal mode of the present invention

Describe the present invention in detail referring now to accompanying drawing, represented exemplary embodiment of the present invention in the accompanying drawings.

Can be applied to object-based Audio Processing operation according to a kind of audio coding method of the present invention and device and a kind of audio-frequency decoding method and device, but the present invention is not limited to this.In other words, this audio coding method and device and audio-frequency decoding method and device also can be applied to the various signal processing operations outside the object-based Audio Processing operation.

Fig. 1 is the block scheme of typical object-based audio coding/decoding system.As a rule, the sound signal that inputs to object-based audio coding apparatus is not corresponding with the sound channel of multi-channel signal, and these sound signals are object signal independently.In this, object-based audio coding apparatus is different with the multi-channel audio coding device, and its difference is the sound channel signal of multi-channel audio coding device input multi-channel signal.

For instance, be imported in the multi-channel audio signal such as the left front sound channel signal of 5.1 sound channel signals and the sound channel signal the right front channels signal, yet the object signal of the little main body of the ratio sound channel signal such as people's voice or musical instrument sound (for example sound of violin or piano) can be imported in the object-based audio coding apparatus.

Referring to Fig. 1, this object-based audio coding/decoding system comprises: object-based audio coding apparatus and object-based audio decoding apparatus.Object-based audio coding apparatus comprises object encoder 100, and object-based audio decoding apparatus comprises object decoder 111 and mixer/renderer 113.

Object encoder 100 receives N object signal, and generate the object-based reduction audio signal and the side information that have one or more sound channels, described side information comprises a plurality of information of extracting, for example energy difference information, phase information and correlation information from N object audio signal.Side information and object-based reduction audio signal are merged into a single bit stream, and this bit stream is transferred to object-based decoding device.

Side information can comprise and indicates whether to carry out based on the audio coding/decoding of sound channel or the sign of object-based audio coding/decoding, then, can determine that the audio coding/decoding of carrying out based on sound channel still is to carry out object-based audio coding/decoding according to the sign of side information.Side information also can comprise energy information about object signal, grouping information, repose period information, reduction audio mixing gain information and deferred message.

Side information and object-based reduction audio signal can be integrated in the individual bit stream, and this bit stream can be transferred to object-based audio decoding apparatus.

Object decoder 111 receives object-based reduction audio signal and the side information from object-based audio coding apparatus, and recovers to have object signal with N object signal like attribute according to object-based reduction audio signal and side information.The object signal that is generated by object decoder 111 is not assigned to any position in the multichannel space, be that each of mixer/renderer 113 object signal that will be generated by object decoder 111 is assigned to the precalculated position in the multichannel space, and the level of definite object signal like this can be by reproducing object signal by each relevant position of mixer/renderer 113 appointments and each corresponding level of being determined by mixer/renderer 113.The control information relevant with each object signal that is generated by object decoder 111 can change in time, then, can be changed according to control information by the locus and the level of the object signal of object decoder 111 generations.

Fig. 2 is the block scheme according to the audio decoding apparatus 120 of first embodiment of the invention.Referring to Fig. 2, this audio decoding apparatus 120 can be carried out adaptive decoding by analysis and Control information.

Referring to Fig. 2, this audio decoding apparatus 120 comprises: object decoder 121, mixer/renderer 123 and parametric converter 125.This audio decoding apparatus 120 also comprises the demodulation multiplexer (not shown), be used for extracting reduction audio signal and side information from the bit stream of input, and this demodulation multiplexer will be applied in all audio decoding apparatus according to other embodiments of the invention.

Object decoder 121 generates a plurality of object signal according to the reduction audio signal with by the amended side information that parametric converter 125 provides.Mixer/renderer 123 is assigned to precalculated position in the multichannel space according to control information with each of the object signal that generated by object decoder 121, and determines the level of the object signal that generated by object decoder 121.Parametric converter 125 generates amended side information by merging side information and control information.Then, parametric converter 125 is transferred to object decoder 121 with amended side information.

Object decoder 121 can be carried out adaptive decoding by the control information in the side information after the analysis modify.

For instance, if control information is indicated first object signal and second object signal to be assigned to the identical position in the multichannel space and is had identical level, typical audio decoding apparatus first and second object signal of can decoding respectively are then by audio mixing/play up operation they are arranged in the multichannel space.

On the other hand, learning in the control information of object decoder 121 from amended side information of described audio decoding apparatus 120 that first and second object signal are assigned to the same position in the multichannel space and have same level, is independent sound sources as first and second object signal.Thereby object decoder 121 is regarded first and second object signal as an independent sound source and first and second object signal of decoding, and not with they separately decodings.Like this, complexity of decoding has reduced.In addition, because the quantity of the sound source of need handling has reduced, the complexity of audio mixing/play up has also reduced.

Audio decoding apparatus 120 can be used in quantity when object signal effectively greater than this situation of the quantity of output channels, because a plurality of object signal probably is assigned to identical locus.

Optionally, audio decoding apparatus 120 can be used in when first object signal and second object signal and be assigned to same position in the multichannel space, but has this situation of varying level.In this case, audio decoding apparatus 120 is considered as single signal first and second object signal of decoding with first and second object signal, and first and second object signal of not decoding respectively, and decoded first and second object signal are transferred to mixer/renderer 123.More particularly, the control information of object decoder 121 from amended side information obtains the information about the difference between the level of first and second object signal, and according to the information that obtains first and second object signal of decoding.Like this, even first and second object signal have varying level, also first and second object signal can be decoded as the single sound source.

Equally optionally, object decoder 121 can be adjusted the level of the object signal that is generated by object decoder 121 according to control information.Then, object decoder 121 decodable codes are adjusted the object signal of over level.Thereby mixer/renderer 123 does not need to adjust the level of the decoded object signal that is provided by object decoder 121, and as long as will be arranged in the multichannel space by the decoded object signal that object decoder 121 provides simply.In brief, because object decoder 121 has been adjusted the level of the object signal that is generated by object decoder 121 according to control information, mixer/renderer 123 can easily be arranged into the object signal that is generated by object decoder 121 in the multichannel space, and does not need the extra level of adjusting the object signal that is generated by object decoder 121.Therefore, can reduce the complexity of audio mixing/play up.

According to the embodiment of Fig. 2, the object decoder of audio decoding apparatus 120 can be carried out decode operation adaptively by the analysis to control information, thereby has reduced the complexity of complexity of decoding and audio mixing/play up.Can use the merging of the described method of carrying out by audio decoding apparatus 120.

Fig. 3 is the block scheme according to the audio decoding apparatus 130 of second embodiment of the invention.Referring to Fig. 3, this audio decoding apparatus 130 comprises object decoder 131 and mixer/renderer 133.This audio decoding apparatus 130 is characterised in that: it not only provides side information to object decoder 131, also provides side information to mixer/renderer 133.

Even when the object signal that exists corresponding to repose period, audio decoding apparatus 130 also can be carried out decode operation effectively.For instance, second to the 4th object signal may be corresponding to the musical performance phase of instrument playing, static (silent) phase that first object signal may may be played corresponding to accompaniment corresponding to quiet (mute) phase of the musical performance of only having powerful connections and first object signal.In this case, indicate in a plurality of object signal which can be included in the side information, and this side information can be provided for mixer/renderer 133 and object decoder 131 corresponding to the information of repose period.

Object decoder 131 can be by not minimizing decoding complex degree to decoding corresponding to the object signal of repose period.131 1 object signal of object decoder are set to corresponding to 0 value, and give mixer/renderer 133 with the level transmissions of this object signal.In general, the object signal with 0 value is regarded as identical with the object signal with non-0 value, and enters audio mixing/play up operation together.

On the other hand, audio decoding apparatus 130 transmission comprises which signal in a plurality of object signal of indication is to give mixer/renderer 133 corresponding to the side information of the information of repose period, the audio mixing that then can stop object signal corresponding to repose period to enter to be carried out by mixer/renderer 133/play up operation.Therefore, audio decoding apparatus 130 can stop the unnecessary increase of the complexity of audio mixing/play up.

Fig. 4 is the block scheme according to the audio decoding apparatus 140 of third embodiment of the invention.Referring to Fig. 4, this audio decoding apparatus 140 uses multi-channel decoder 141 to replace object decoder and mixer/renderer, and suitably is arranged into decoding a plurality of object signal in back in the multichannel space in object signal.

Specifically, audio decoding apparatus 140 comprises multi-channel decoder 141 and parametric converter 145.Multi-channel decoder 141 generates multi-channel signal, the object signal of these multi-channel signals has been arranged in the multichannel space according to reduction audio signal and spatial parameter information, and this spatial parameter information is the parameter information based on sound channel that is provided by parametric converter 145.Parametric converter 145 is analyzed by next side information and the control information of audio coding apparatus (not shown) transmission, and according to the parameter information of analyzing of the span as a result.More specifically, parametric converter 145 generates spatial parameter information by merging side information and control information, and this control information comprises configuration information and the audio mixing information reproduced.That is to say that it is spatial data corresponding to one to two (OTT) box or two to three (TTT) box that parametric converter 145 is carried out combined transformation with side information and control information.

Audio decoding apparatus 140 can be carried out multi-channel decoding operation, wherein object-based decode operation and audio mixing/play up operation to be merged, and can skip decoding to each object signal.Therefore, can reduce the complexity of decoding and/or audio mixing/play up.

For instance, reproduce 10 object signal and during the multi-channel signal that obtains according to these 10 object signal when using 5.1 channel loudspeaker systems, typical object-based audio decoding apparatus generates the decoded signal that corresponds respectively to these 10 object signal according to reduction audio signal and side information, and generate 5.1 sound channel signals in the multichannel space by these 10 object signal suitably are arranged into, then these object signal become and are suitable for 5.1 channel loudspeaker environment.Yet during 5.1 sound channel signals generated, the efficient that generates 10 object signal was very low, and the difference of this problem between the number of channels of the quantity of object signal and the multi-channel signal that will generate becomes more serious when increasing.

On the other hand, according to embodiment shown in Figure 4, audio decoding apparatus 140 generates the spatial parameter information that is suitable for 5.1 sound channel signals according to side information and control information, and spatial parameter information and reduction audio signal are offered multi-channel decoder 141.Then, multi-channel decoder 141 generates 5.1 sound channel signals according to spatial parameter information and reduction audio signal.In other words, when the quantity of the sound channel that will export is 5.1 sound channels, audio decoding apparatus 140 can easily generate 5.1 sound channel signals according to the reduction audio signal, and do not need to generate 10 object signal, then this audio decoding apparatus with respect to common audio decoding apparatus more efficient aspect the complexity.

When calculating the calculated amount required corresponding to the spatial parameter information of each OTT box and TTT box when carrying out audio mixing/play up operate required calculated amount after each object signal of decoding by analyzing the side information that come by the audio coding apparatus transmission and control information, this audio decoding apparatus 140 is more efficient.

Come the module that is used for span parameter information is joined typical multichannel audio decoding device by analyzing side information and control information, this audio decoding apparatus 140 can be easily obtained, and the compatibility with typical multichannel audio decoding device can be kept.Same, audio decoding apparatus 140 can improve sound quality by the existing instrument that uses typical multi-channel decoding device, and such as the envelope shaping device, the subband time domain is handled (STP) instrument and decorrelator.By described content, can infer that all advantages of typical multichannel audio coding/decoding method all can be applied to object-based audio-frequency decoding method like a cork.

The spatial parameter information that is transferred to multi-channel decoder 141 by parametric converter 145 can be compressed to be suitable for transmission.Optionally, this spatial parameter information can have and the same form of data that is transmitted by typical multi-channel encoder device.That is to say that spatial parameter information can be carried out Hofmann decoding operation or pilot tone decode operation, and can be used as unpressed spatial cues data (spatial cue data) and be transferred to each module.Preceding a kind of mode is suitable for the spatial parameter information transmission is given the multichannel audio decoding device of remote control position, a kind of mode in back is also very convenient because do not need the multichannel audio decoding device the spatial cues data-switching of compression to the unpressed spatial cues data of in decode operation, using easily.

May cause postponing according to configuration to the spatial parameter information of the analysis of side information and control information.In order to compensate this delay, can provide an extra impact damper to the reduction audio signal, can compensate the delay between reduction audio signal and the bit stream like this.Optionally, can provide an extra impact damper to the spatial parameter information that obtains from control information, like this can the compensation space parameter information and bit stream between delay.Yet these methods are inconvenient, because extra impact damper need be provided.Optionally, side information can be transmitted before the reduction audio signal, and it has considered the delay between contingent reduction audio signal and the spatial parameter information.In this case, the spatial parameter information that obtains by merging side information and control information does not need to be adjusted again and can easily be used.

If a plurality of object signal of reduction audio signal have varying level, any reduction audio mixing gain (ADG) module of energy direct compensation reduction audio signal can be determined the associated level of object signal, and can use such as levels of channels poor (CLD) information, the spatial cues data of (ICC) information of correlativity between sound channel and sound channel predictive coefficient (CPC) information and so on are assigned to precalculated position in the multichannel space with each object signal.

For instance, if predetermine one signal of control information indication will be assigned to the precalculated position in the multichannel space, and the level of this object signal is higher than other object signal, typical multi-channel decoder can calculate poor between the channel energies of reduction audio signal, and will reduce audio signal according to result calculated and be divided into some output channels.Yet, the volume that typical multi-channel decoder can not increase or reduce to reduce sound in the audio signal.In other words, typical multi-channel decoder will reduce audio signal simply and distribute to some output channels, and not increase or reduce to reduce the volume of sound in the audio signal.

Each precalculated position that is assigned in the multichannel space of a plurality of object signal of the reduction audio signal that will be generated by object encoder according to control information also is relatively very simple.Yet, increase or the amplitude that reduces the predetermine one signal needs special technique.In other words, if use the reduction audio signal that is generated by object encoder, the amplitude that reduces to reduce each object signal of audio signal is difficult.

Therefore, according to one embodiment of the invention, can use as shown in Figure 5 ADG module 147 to change the relative amplitude of object signal according to control information.This ADG module 147 can be installed in the multi-channel decoder 141, or is separated with multi-channel decoder 141.

If use ADG module 147 suitably to adjust the relative amplitude of the object signal of reduction audio signal, then can use typical multi-channel decoder to carry out the object decoding.If the reduction audio signal that is generated by object encoder is monophony or stereophonic signal or multi-channel signal with three or more sound channels, this reduction audio signal can be handled by ADG module 147.If the reduction audio signal that is generated by object encoder has two or more sound channels, and need exist only in by the predetermine one signal that ADG module 147 is adjusted in the sound channel in the reduction audio signal, then ADG module 147 can only be applied to comprising the sound channel of this predetermine one signal, rather than is applied to reduce all sound channels of audio signal.Reduction audio signal after being handled by described method by ADG module 147 can use typical multi-channel decoder to handle at an easy rate, and does not need to revise the structure of multi-channel decoder.

Even when the signal of final output is not the multi-channel signal that can be reproduced by multi-channel loudspeaker, but binaural signal also can use ADG module 147 to adjust the relative amplitude of the object signal of final output signal.

As using substituting of ADG module 147, during the generation of a plurality of object signal, can comprise in the control information that appointment will be applied to the gain information of the yield value of each object signal.For this reason, revise the structure of typical multi-channel decoder possibly.Even need to revise the structure of existing multi-channel decoder, during decode operation, by yield value being applied to each object signal, and do not need to calculate ADG and each object signal of compensation, this method is reducing aspect the decoding complex degree still very easily.

ADG module 147 can not only be used to adjust the level of object signal, also can be used to revise the spectrum information of special object signal.More specifically, ADG module 147 can not only be used to increase or reduce the level of special object signal, also can be used to revise the spectrum information of special object signal, for example amplifies the high pitch or the bass part of special object signal.Can not use ADG module 147 and revise spectrum information.

Fig. 6 is the block scheme according to the audio decoding apparatus 150 of fourth embodiment of the invention.Referring to Fig. 6, this audio decoding apparatus 150 comprises multichannel ears demoder 151, the first parametric converters 157 and second parametric converter 159.

Second parametric converter 159 is provided by side information and the control information that is provided by audio coding apparatus, and comes the configuration space parameter information according to analysis result.First parametric converter 157 is by increasing three-dimensional (3D) information, and for example a related transfer function (HRTF) parameter is given spatial parameter information, and disposing can be by virtual three-dimensional (3D) parameter information of multichannel ears demoder 151 uses.Multichannel ears demoder 151 generates binaural signal by the ears parameter information being used to the reduction audio signal.

First parametric converter 157 and second parametric converter 159 can be replaced by an independent module, it is parameter transformation module 155, it receives side information, control information and 3D information, and disposes the ears parameter information according to side information, control information and HRTF parameter.

As a rule, in order to use headphone to generate to be used to reproduce the binaural signal of the reduction audio signal that comprises 10 object signal, object signal must generate 10 decoded signals corresponding to 10 object signal respectively according to reduction audio signal and side information.Thereafter, mixer/renderer is assigned to precalculated position in the multichannel space to be suitable for 5 channel loudspeaker environment with reference to control information with each of 10 object signal.Thereafter, mixer/renderer generates 5 sound channel signals that can use 5 channel loudspeakers to reproduce.Thereafter, mixer/renderer is applied to 3D information in 5 sound channel signals, thereby generates 2 sound channel signals.In brief, described common audio-frequency decoding method comprises: reproduce 10 object signal, these 10 object signal are converted to 5 sound channel signals, and generate 2 sound channel signals according to 5 sound channel signals, as seen its efficient is very low.

On the other hand, audio decoding apparatus 150 can easily generate the binaural signal that can use headphone to reproduce according to object signal.In addition, audio decoding apparatus 150 comes the configuration space parameter information by the analysis to side information and control information, and uses typical multichannel ears demoder to generate binaural signal.And, even if when being equipped with integrated parametric converter, audio decoding apparatus 150 still can use typical multichannel ears demoder, this parametric converter receives side information, control information and HRTF parameter, and disposes the ears parameter information according to side information, control information and HRTF parameter.

Fig. 7 is the block scheme according to the audio decoding apparatus 160 of fifth embodiment of the invention.Referring to Fig. 7, audio decoding apparatus 160 comprises pretreater 161, multi-channel decoder 163 and parametric converter 165.

Parametric converter 165 generates the parameter information that the spatial parameter information that can be used by multi-channel decoder 163 and pretreated device 161 use.The pretreatment operation that pretreater 161 is carried out the reduction audio signal, and transmission pretreatment operation result's reduction audio signal is given multi-channel decoder 163.163 pairs of reduction audio signal of being come by pretreater 161 transmission of multi-channel decoder are carried out decode operation, thus output stereophonic signal, ears stereophonic signal or multi-channel signal.Example by the performed pretreatment operation of pretreater 161 comprises: revise in time domain or frequency domain or conversion reduction audio signal by filtering.

If the reduction audio signal that is input in the audio decoding apparatus 160 is a stereophonic signal, before this reduction audio signal is imported into multi-channel decoder 163, this reduction audio signal can be used to the reduction audio mixing pre-service carried out by pretreater 161, because multi-channel decoder 163 can not will be mapped to the R channel of multi-channel signal corresponding to the object signal of the L channel of stereo reduction audio signal by decoding.Therefore, for the object signal that can will belong to the L channel of stereo reduction audio signal is transferred on the R channel, this stereo reduction audio signal may need pretreated device 161 pre-service, and pretreated reduction audio signal can be transfused to multi-channel decoder 163.

Can be according to from side information with carry out the pre-service of stereo reduction audio signal from the pretreatment information that control information obtains.

Fig. 8 is the block scheme according to the audio decoding apparatus 170 of sixth embodiment of the invention.Referring to Fig. 8, this audio decoding apparatus 170 comprises multi-channel decoder 171, preprocessor 173 and parametric converter 175.

Parametric converter 175 generates the spatial parameter information that can be used by multi-channel decoder 171 and can be post-treated the parameter information that device 173 uses.Preprocessor 173 is carried out the aftertreatment to the signal of being exported by multi-channel decoder 171.The example of the signal that multi-channel decoder 171 is exported comprises: stereophonic signal, ears stereophonic signal and multi-channel signal.

The example of the post-processing operation that preprocessor 173 is performed comprises: revise or each sound channel or all sound channels of conversion output signal.For instance, if side information comprises the basic frequency information about the predetermine one signal, preprocessor 173 can be removed harmonic component with reference to this basic frequency information from the predetermine one signal.The multichannel audio coding/decoding method may be efficient inadequately for karaoke OK system.Yet if be included in the side information about the basic frequency information of voice object signal, and the harmonic component of voice object signal is removed during post-processing operation, can realize high performance karaoke OK system by the embodiment that uses Fig. 8.The embodiment of Fig. 8 also can be applicable to the object signal except that the voice object signal.For instance, can use the embodiment of Fig. 8 to remove the sound of being scheduled to musical instrument.Equally, can use the embodiment of Fig. 8 to use and amplify predetermined harmonic component about the basic frequency information of object signal.In brief, post-treatment parameters can be realized the application of the multiple effect that can't be carried out by multi-channel decoder 171, the amplification of the insertion of the effect that for example echoes, the adding of noise and bass part.

Preprocessor 173 can directly be used an extra effect and give the reduction audio signal, maybe will reduce audio signal and be increased to the output of the multi-channel decoder 171 of effect.Frequency spectrum or modification reduction audio signal that preprocessor 173 can in officely be what is the need for and be changed object when wanting.If directly the implementation effect processing is operated (such as to reducing the reverberation of audio signal) and the signal that the effect process operation is obtained is transferred to multi-channel decoder 171 is not very suitable, preprocessor 173 can join the signal that is obtained through the effect process operation output of multi-channel decoder 171, to replace giving multi-channel decoder 171 to the direct implementation effect processing of reduction audio signal and the result transmission of effect process.

Fig. 9 is the block scheme according to the audio decoding apparatus 180 of seventh embodiment of the invention.Referring to Fig. 9, audio decoding apparatus 180 comprises pretreater 181, multi-channel decoder 183, preprocessor 185 and parametric converter 187.

Explanation to described pretreater 161 can directly apply to pretreater 181.Preprocessor 185 can be used to the output of the output of pretreater 181 and multi-channel decoder 183 added is in the same place, so that final signal to be provided.In this case, preprocessor 185 has adopted a totalizer simply, is used to add signal.Can provide effect parameter to any one the application in pretreater 181 and the preprocessor 185 with implementation effect.In addition, the adding of the signal that obtains by the output of effects applications being given reduction audio signal and multi-channel decoder 183, and can be carried out for the output of multi-channel decoder 183 simultaneously the effects applications.

Pretreater 161 among Fig. 7 and Fig. 9 and 181 can be carried out playing up for the reduction audio signal according to customer-furnished control information.In addition, the pretreater 161 among Fig. 7 and Fig. 9 and 181 can increase or reduce the level of object signal and the frequency spectrum of change object signal.In this case, the pretreater 161 among Fig. 7 and Fig. 9 and 181 can be carried out the function of ADG module.

Change according to the frequency spectrum of the adjustment of the playing up of the object signal of object signal directional information, object signal level and object signal can be carried out simultaneously.In addition, can carry out some playing up by using pretreater 161 or 181 according to the object signal of object signal directional information, the change of the adjustment of some object signal level and the frequency spectrum of some object signal, and any not by pretreater 161 or 181 carry out according to the playing up of the object signal of object signal directional information, the change of the adjustment of object signal level and the frequency spectrum of object signal can use the ADG module to carry out.For instance, be poor efficiency by the frequency spectrum that uses the ADG module to change object signal because the ADG module use quantization level spacing and parameter interband every.In this case, can use pretreater 161 or 181 to come accurately to change to frequency one by one the frequency spectrum of object signal, and use the ADG module to adjust the level of object signal.

Figure 10 is the block scheme according to the audio decoding apparatus of eighth embodiment of the invention.Referring to Figure 10, this audio decoding apparatus 200 comprises plays up matrix maker 201, code converter 203, multi-channel decoder 205, pretreater 207, surround processor 208 and totalizer 209.

Play up matrix maker 201 and generate and play up matrix, its expression is about the object location information of the position of object signal with about the reproduction configuration information of object signal level, and will play up matrix and offer code converter 203.Play up matrix maker 201 and generate 3D information, for example HRTF coefficient according to object location information.HRTF describes the sound source of optional position and the transition function of the sonic transmissions between the ear-drum, and returns the value that height and direction according to sound source change.If use HRTF to filter the signal that does not have directivity, it seems to be reproduced equally from specific direction that this signal sounds.

Can change in time by playing up the object location information that matrix maker 201 receives and reproducing configuration information, and can provide by the terminal user.

Code converter according to object-based side information, play up matrix and 3D information and generate side information based on sound channel, and multi-channel decoder 205 necessary side information and 3D information based on sound channel are offered multi-channel decoder 205.That is to say, code converter 203 transmission from object-based parameter information, obtain about N object signal about M sound channel based on each 3D information of the side information of sound channel and N object signal to multi-channel decoder 205.

Multi-channel decoder 205 generates multi-channel audio signal according to the reduction audio signal with by the side information based on sound channel that code converter provides, and according to 3D information multi-channel audio signal is carried out 3D and play up, thereby generates the 3D multi-channel signal.Play up matrix maker 201 and can comprise 3D information database (not shown).

If desired will reduce audio signal input to multi-channel decoder 205 before pre-service reduction audio signal, code converter 203 transmission are given pretreater 207 about pretreated information.Object-based side information comprises the information about all object signal, plays up matrix and comprises object location information and reproduce configuration information.Code converter 203 is according to object-based side information and play up matrix and generate side information based on sound channel, and then generates audio mixing and the necessary side information based on sound channel of regeneration object signal according to this channel information.After this, code converter 203 will be transferred to multi-channel decoder 205 based on the side information of sound channel.

Side information and the 3D information based on the sound channel that are provided by code converter 203 can comprise frame index.Therefore, multi-channel decoder 205 can be by using frame index synchronously based on the side information and the 3D information of sound channel, and 3D information only can be applied to the particular frame of bit stream.In addition, even 3D information has been updated, also can come easily synchronously based on the side information of sound channel and the 3D information after the renewal by using frame index.That is to say that frame index can be included in respectively in the side information and 3D information based on sound channel, so that multi-channel decoder 205 is synchronously based on the side information and the 3D information of sound channel.

If necessary, before the reduction audio signal of input was transfused to multi-channel decoder 205, pretreater 207 can be carried out pre-service to the reduction audio signal of input.As mentioned above, if the reduction audio signal of input is a stereophonic signal, and need reproduce the object signal that belongs to L channel from R channel, before the reduction audio signal is imported into multi-channel decoder 205, this reduction audio signal need be carried out the pre-service by pretreater 207 execution, because multi-channel decoder 205 can not be transferred to another sound channel from a sound channel with object signal.Code converter 203 can offer pretreater 207 with the necessary information of reduction audio signal of pre-service input.Carry out the reduction audio signal that pre-service obtained by pretreater 207 and can be transferred to multi-channel decoder 205.

Surround processor 208 and totalizer 209 can directly apply to extra effect the reduction audio signal, maybe will reduce audio signal and increase output to the multi-channel decoder 205 of having used effect.Frequency spectrum or modification reduction audio signal that surround processor 208 can in officely be what is the need for and be changed object when wanting.If the effect process that the reduction audio signal is directly carried out such as reverberation is operated, and it is inappropriate will being transferred to multi-channel decoder 205 by the signal that the effect process operation obtains, surround processor 208 can increase the output that the signal that obtained by effect process operation is given multichannel processor 205 simply, directly reduction audio signal implementation effect is handled and is given multi-channel decoder 205 with the result transmission of effect process replacing.

Below will describe in detail by playing up the matrix of playing up that matrix maker 201 generated.

Playing up matrix is the position of indicated object signal and the matrix of reproduction configuration.That is to say,, play up matrix and can indicate and how in every way N object signal to be mapped on M the sound channel if N object signal and M sound channel are arranged.

More specifically, when N object signal is mapped on M the sound channel, can sets up a N*M and play up matrix.In this case, this is played up matrix and comprises that N is capable, and this N is capable to represent N object signal respectively, and the M row, and these M row are represented M sound channel respectively.Each of M coefficient of each row in N is capable is real number or integer, and its expression is assigned to the object signal part of corresponding sound channel and the ratio of whole object signal.

More specifically, to play up M coefficient of each capable row of N in the matrix be real number to N*M.Then,, N*M equals predetermined reference value if playing up M coefficient sum of delegation in the matrix, and for example 1, its level that can determine object signal does not change.If M coefficient sum is less than 1, its level that can determine object signal has reduced.If M coefficient sum is greater than 1, its level that can determine object signal has increased.This predetermined reference value can be the numerical value outside 1.The level variable quantity of object signal is limited in the 12dB scope.For instance, if the predetermined reference value is 1, and M coefficient sum be 1.5, and its level that can determine object signal has increased 12dB.If predetermined reference value is 1, and M coefficient sum be 0.5, and its level that can determine object signal has reduced 12dB.If predetermined reference value is 1, and M coefficient sum be 0.5 to 1.5, its can determine object signal-12dB and+changed scheduled volume between the 12dB, this scheduled volume can be definite linearly by M coefficient sum.

Each capable M coefficient of going of N that N*M plays up in the matrix can be an integer.Then,, N*M equals predetermined reference value if playing up M coefficient sum of certain delegation in the matrix, and for example 10,20,30 or 100, its level that can determine object signal does not change.If M coefficient sum is less than predetermined reference value, its level that can determine object signal reduces.If M coefficient sum is greater than predetermined reference value, its level that can determine object signal increases.The level variable quantity of object signal is limited in the scope of 12dB for example.The measures of dispersion of M coefficient sum and predetermined reference value is represented the level variable quantity (unit: dB) of object signal.For instance, if M coefficient sum surpasses predetermined reference value 1, its level that can determine object signal has increased 2dB.Therefore, if predetermined reference value is 20, and M coefficient sum be 23, and its level that can determine object signal has increased 6dB.If predetermined reference value is 20, and M coefficient sum be 15, and its level that can determine object signal has reduced 10dB.

For instance, if 6 object signal and 5 sound channels (just left front (FL), right front (FR), middle (C) are arranged, left back (RL) and right back (RR) sound channel), set up a 6*5 and play up matrix, it has 6 row, correspond respectively to 6 object signal and 5 row, correspond respectively to 5 sound channels.The coefficient that this 6*5 plays up matrix is an integer, and each in 6 object signal of its indication is dispensed on 5 ratios in the sound channel.This 6*5 plays up matrix can have reference value 10.Then, equal 10 if 6*5 plays up 5 coefficient sums of any delegation of the row of 6 in the matrix, its level that can determine corresponding object signal does not change.5 coefficient sums that 6*5 plays up any delegation of 6 in matrix row and the measures of dispersion of reference value are represented the amount of the level change of corresponding object signal.For instance, if 6*5 plays up 5 coefficient sums of any delegation of 6 in matrix row and the difference of reference value is 1, its level that can determine corresponding object signal has changed 2dB.This 6*5 plays up matrix and can be represented by formula (1):

[formula 1]

[\begin{matrix} 3 & 1 & 2 & 2 & 2 \\ 2 & 4 & 3 & 1 & 2 \\ 0 & 0 & 12 & 0 & 0 \\ 7 & 0 & 0 & 0 & 0 \\ 2 & 2 & 2 & 2 & 2 \\ 2 & 1 & 1 & 2 & 1 \end{matrix}]

6*5 referring to formula (1) plays up matrix, and first goes corresponding to first object signal, and represents that this first object signal has been assigned to FL, FR, C, the ratio of one of them among RL and the RR.Because first coefficient of first row has maximum round values 3, and the coefficient sum of first row is 10, and it can determine that first object signal mainly is assigned to the FL sound channel, and the level of first object signal does not change.Because second coefficient corresponding to second row of second object signal has maximum round values 4, and the coefficient sum of second row is 12, and it can determine that second object signal mainly is assigned to the FR sound channel, and the level of second object signal has increased 4dB.Because the tertiary system number corresponding to the third line of the 3rd object signal has maximum round values 12, and the coefficient sum of the third line is 12, and it can determine that the 3rd object signal only is assigned to the C sound channel, and the level of the 3rd object signal has increased 4dB.Because all coefficients corresponding to the fifth line of the 5th object signal have identical round values 2, and the coefficient sum of fifth line is 10, can determine that the 5th object signal distributed to FL fifty-fifty, FR, C, RL and RR sound channel, and the level of the 5th object signal does not change.

Optionally, when N object signal is mapped in M the sound channel, sets up a N* (M+1) and play up matrix.This N* (M+1) plays up matrix and N*M, and to play up matrix closely similar.More specifically, play up in the matrix at N* (M+1), as playing up in the matrix at N*M, first to M coefficient of each row during N is capable represents to be assigned to FL, FR, C, the ratio of the corresponding object signal in RL and the RR sound channel.Yet N* (M+1) plays up matrix and plays up with N*M that matrix is different to be, it has extra row ((M+1) row just), is used for the level of indicated object signal.

N* (M+1) plays up matrix and is different from N*M and plays up matrix, and how its indication distributes object signal between M sound channel, and whether the level of denoted object signal changes separately.Then, play up matrix by using N* (M+1), it can easily obtain the information about the variation of the level in any object signal, and does not need extra calculating.Almost to play up matrix identical with N*M because N* (M+1) plays up matrix, and this N* (M+1) plays up matrix and can easily be converted into N*M and play up matrix and do not need extra calculating, and vice versa.

Equally optionally, when N object signal is mapped in M the sound channel, sets up a N*2 and play up matrix.This N*2 plays up the angle position of the first row denoted object signal of matrix, and the possible level of each of secondary series denoted object signal changes.This N*2 plays up matrix can come the indicated object signal at interval with the rule of 1 or 3 degree in the scope of 0-360 degree angle position.The object signal of mean allocation can be represented by predetermined value on all directions, rather than use angle is represented.

This N*2 plays up matrix and can be converted into N*3 and play up matrix, and this N*3 plays up the not only 2D direction of denoted object signal of matrix, the 3D direction of going back the denoted object signal.More specifically, the N*3 secondary series of playing up matrix can be used to the 3D direction of denoted object signal.The 3rd row that N*3 plays up matrix use with N*M and play up the possible level variation that the employed identical method of matrix is indicated each object signal.If the final reproduction mode of object decoder is that ears are stereo, play up matrix maker 201 can transmit the indication each object signal the position 3D information or corresponding to the 3D information index.Under latter event, code converter 203 may need to obtain corresponding to by the 3D information of playing up the index that matrix maker 201 transmitted.In addition, if indicate the 3D information of the position of each object signal to be received from playing up matrix maker 201, code converter 203 can be according to the 3D information that receives, play up matrix and object-based side information calculates the 3D information that can be used by multi-channel decoder 205.

Playing up matrix and 3D information can be according to by the terminal user adaptive change being carried out in object location information and the modification that the reproduction configuration information is made in real time.Therefore, in playing up matrix and 3D information about playing up whether matrix and 3D information have been upgraded and the message upgraded is transferred to code converter 203 with time interval of rule, this time interval for example is 0.5 second.Then, if detect the renewal of playing up in matrix and the 3D information, code converter 203 can and existingly be played up matrix and the linear transformation of existing 3D information and executing to the renewal that receives, supposes that this plays up matrix and the linear in time variation of this 3D information.

When playing up matrix and 3D information and be transferred to code converter 203, if object location information and reproduce configuration information and do not revised by the terminal user, matrix is played up in indication and 3D information not have the information of change can be transferred to code converter 203.On the other hand, when playing up matrix and 3D information and be transferred to code converter 203, if object location information and reproduce configuration information and revised by the terminal user, the indication in playing up matrix and 3D information are played up matrix and 3D information and have been changed with updated information and can be transferred to code converter 203.More specifically, the renewal and the renewal in the 3D information of playing up in the matrix can be transferred to code converter 203 respectively.Optionally, the renewal and/or the renewal in the 3D information of playing up in the matrix can jointly be represented by a predetermined typical value.Then, this predetermined typical value can be with this predetermined typical value of indication corresponding to playing up the renewal in the matrix or being transferred to code converter 203 corresponding to the updated information in the 3D information.By such mode, its easily information code converter 203 play up matrix and whether 3D information has renewal.

Play up matrix class like the N*M represented with formula (1) plays up matrix and can comprise extra row, comes the 3D directional information of indicated object signal.In this case, these extra row can be in-90 degree 3D directional information of indicated object signal to the angular ranges of+90 degree.These extra row not only can be provided for the N*M matrix, can also be provided for N* (M+1) and play up matrix and N*2 matrix.The 3D directional information of object signal is not to use in the normal decoder pattern of multi-channel decoder.Yet the 3D directional information of object signal must be used in the ears pattern of multi-channel decoder.The 3D directional information of this object signal can be transmitted with playing up matrix.Optionally, the 3D directional information of object signal can be transmitted with 3D information.In ears mode decoding operating period, the 3D directional information of this object signal does not influence the side information based on sound channel, but influences 3D information.

Can be used as about the information of locus and object signal level and to play up matrix and be provided.Optionally, can represent by the modification of the frequency spectrum of object signal, for example strengthen the bass part or the high pitch part of object signal about the information of locus and object signal level.In this case, can be used as level in employed each parameter band by the multichannel codec about the information of the modification of the frequency spectrum of object signal changes and is transmitted.If the modification of the frequency spectrum of terminal user's controlling object signal can be used as and plays up the spectral matrix that matrix separates and be transmitted about the information of the modification of the frequency spectrum of object signal.How many object signal this spectral matrix has how many row are just arranged, and has how many parameters how many row are just arranged.Each coefficient of this spectral matrix is represented about the whole information of the charged Heibei provincial opera of each parameter.

Below will describe the operation of code converter 203 thereafter.This code converter 203 according to object-based side information, to play up matrix information and 3D information be the side informations that multi-channel decoder 205 generates based on sound channel, and the side information that will be somebody's turn to do based on sound channel is transferred to multi-channel decoder 205.In addition, this code converter 203 is that multi-channel decoder 205 generates 3D information, and gives multi-channel decoder 205 with this 3D information transmission.If it is pretreated that the reduction audio signal of input needs before being input to multi-channel decoder 205, this code converter 203 can transmit the information about this input reduction audio signal.

This code converter 203 can receive object-based side information, and this object-based side information indicates a plurality of object signal are how to be included in the reduction audio signal of input.Object-based side information can be by using OTT box and TTT box, and by using CLD, and ICC and CPC information indicate a plurality of object signal are how to be included in the reduction audio signal of input.This object-based side information can provide the explanation of several different methods with each the information of indication about a plurality of object signal, and can how to be included in the side information by the denoted object signal, and these methods can be carried out by object encoder.

Under the TTT of multichannel codec box situation, L, C and R signal can or be expanded audio mixing to L and R signal by the reduction audio mixing.In this case, the C signal can shared L and some bit of R signal.Yet this is infrequent in the situation of reduction audio mixing or expansion audio mixing object signal.Therefore, the OTT box is used to carry out the expansion audio mixing or the reduction audio mixing of object coding/decoding more widely.Even the C signal comprises the independent signal component except that L and R signal section, the TTT box can be used to carry out the expansion audio mixing or the reduction audio mixing of object coding/decoding.

For instance, as shown in figure 11, if 6 object signal are arranged, these 6 object signal can be converted to the reduction audio signal by the OTT box, can use the OTT box to obtain information about each object signal.

Referring to Figure 11,6 object signal can be represented by a reduction audio signal with by the integrally provided information (for example, CLD and ICC information) of 5 OTT boxes 211,213,215,217 and 219.Structure shown in Figure 11 can change in every way.That is to say that referring to Figure 11, an OTT box 211 can receive two in 6 object signal.In addition, OTT box 211,213,215,217 and 219 classification method of attachment can random variation.Therefore, side information can comprise that indication OTT box 211,213,215,217 is with 219 graded-structure informations how classification is connected with indicate each object signal to be input to the input position information of which OTT box.If OTT box 211,213,215,217 and 219 forms any tree structure, the method for employed this any tree structure of expression of multichannel codec can be used to indicate this graded-structure information.In addition, this input position information can be indicated by variety of way.

Side information can also comprise the information about the quiet phase of each object signal.In this case, OTT box 211,213,215,217 and 219 tree structure adaptive change in time.For instance, referring to Figure 11, (OBJECT1) is quiet when first object signal, and be dispensable about the information of an OTT box 211, and only have second object signal (OBJECT2) to be input in the 4th OTT box 217.Then, correspondingly change the tree structure of OTT box 211,213,215,217 and 219.Then, the information about the variation of the tree structure of OTT box 211,213,215,217 and 219 can be included in the side information.

If predetermined object signal is quiet, can provide indication not use information, and indication does not have the information of the clue that the OTT box can use corresponding to the OTT box of predetermine one signal.In this case, it can not reduce the size of side information by do not comprise the information about the OTT box that is not used or TTT box in side information.Even the tree structure of a plurality of OTT or TTT box has been modified, it can be that quiet information easily determines to open or close which OTT or TTT box according to which object signal of expression.Therefore, do not need to transmit continually possible tree structure of giving OTT or TTT box about the information of revising.On the contrary, indicate the information of quiet object signal to be transmitted.Then, demoder can determine easily which part of the tree structure of OTT or TTT box need be modified.Therefore, it can minimize the size of the information that need be transferred to demoder.In addition, it can easily transmit about the clue of object signal and give demoder.

Figure 12 is used for explaining how a plurality of object signal are included in the block diagram of reduction audio signal.In the embodiment of Figure 11, it has adopted a kind of OTT box structure of multichannel codec.Yet, in the embodiment of Figure 12, used a kind of distortion of OTT box structure of multichannel codec.That is to say that referring to Figure 12, a plurality of object signal are imported in each box, and only generate a reduction audio signal at last.Referring to Figure 12, about a plurality of object signal each information can by each object signal recently the representing of total energy magnitude of energy level (energy level) and object signal.Yet along with the increase of object signal quantity, the energy level of each object signal has reduced with the ratio of the total energy magnitude of object signal.In order to overcome this problem, an object signal (hereinafter referred to as the highest energy object signal) that in the preset parameter band, has the highest energy level in a plurality of object signal of search, and provide the ratio of energy level with the energy level of highest energy object signal of other object signal (hereinafter referred to as non-highest energy object signal), with as information about each object signal.In this case, in case the information of the absolute value of the energy level of given indication highest energy object signal and highest energy object signal just can easily be determined the energy level of the object signal of other non-highest energy.

The energy level of the object signal of highest energy in multipoint control unit (MCU), carry out a plurality of bit streams are merged to individual bit stream is essential.Yet, in most of the cases, the energy level of highest energy object signal is optional, because can easily obtain the absolute value of the energy level of highest energy object signal from the energy level of other non-highest energy object signal with the ratio of the energy level of highest energy object signal.

For instance, suppose the object signal A that has 4 to belong to the preset parameter band, B, C and D, and object signal A is the highest energy object signal.Then, the ENERGY E of preset parameter band _PAbsolute value E with the energy level of object signal A _ASatisfy formula (2):

[formula 2]

E _p＝E _A+(a+b+c)E _A

E_{A} = \frac{E_{p}}{1 + a + b + c}

A wherein, b and c be indicated object signal B respectively, the ratio of the energy level of C and D and the energy level of object signal A.Referring to formula (2), it can be according to ratio a, the ENERGY E of b and c and preset parameter band _PCome the absolute value E of the energy level of calculating object signal A _ATherefore, unless need to use MCU a plurality of bit streams to be merged in the individual bit stream absolute value E of the energy level of object signal A _ADo not need to be included in this bit stream.The absolute value E of the energy level of denoted object signal A _AThe information that whether is included in the bit stream can be included in the head of bit stream, thereby has reduced the size of bit stream.

On the other hand, use MCU that a plurality of bit streams are incorporated in the independent bit stream if desired, then the energy level of highest energy object signal is exactly essential.In this case, the energy level sum that recently calculates according to the energy level of non-highest energy object signal and the energy level of the object signal of highest energy may be different with the energy level of the reduction audio signal that obtains by all object signal of reduction audio mixing.For instance, when the energy level of reduction audio signal was 100, the energy level sum that calculates may be 98 or 103, and this is owing to for example causing in the mistake that quantizes to cause during the reconciliation quantization operation.In order to overcome this problem, the difference of the energy level of reduction audio signal and the energy level sum that calculates can be similar to compensation by using pre-determined factor to be multiplied by each energy level that calculates.If the energy level of reduction audio signal is X, and the energy level sum that calculates is Y, each energy level that calculates can be multiplied by X/Y.If the difference to energy level of reducing audio signal and the energy level sum that calculates does not compensate, these quantization errors may be included in parameter band and the frame, thereby cause distorted signals.

Therefore, in the predetermined parameters band, the information of indicating in a plurality of object signal which to have maximum energy absolute value is essential.This information can be represented by a plurality of bits.In the preset parameter band, be used for indicating a plurality of object signal which have the necessary bit number of ceiling capacity absolute value and change according to the quantity of object signal.Along with the increase of object signal quantity, in the preset parameter band, be used for indicating a plurality of object signal which have the necessary bit number of ceiling capacity absolute value and also increase.On the other hand, along with the minimizing of object signal quantity, in the preset parameter band, be used for indicating a plurality of object signal which have the necessary bit number of ceiling capacity absolute value and also reduce.Predetermined bit number may be divided to be equipped with indication in a plurality of object signal which when the preset parameter band increases and to have the ceiling capacity absolute value in advance.Optionally, can be identified in the preset parameter band in a plurality of object signal of indication which according to information specific and have the necessary bit number of ceiling capacity absolute value.

Be used to the OTT and/or the employed CLD of TTT box that reduce at the multichannel codec by use, the identical method of the size of ICC and CPC information, the big I that indication which in a plurality of object signal has the information of ceiling capacity absolute value in each parameter band is reduced, for example, by service time difference method, frequency differential method or pilot tone decoding method.

For which of indicating a plurality of object signal in each parameter band has the ceiling capacity absolute value, can use the huffman table of optimization.In this case, the energy level that may need the denoted object signal in what order with the information of the ratio of the energy level of object signal with highest energy absolute value.For instance, if 5 object signal (first to the 5th object signal just) are arranged, and the 3rd object signal is the highest energy object signal, and the information about the 3rd object signal may be provided.Then, can provide in every way first, second, the ratio of the energy level of the 4th and the 5th object signal and the energy level of the 3rd object signal, below these modes will be described in further detail.

Can sequentially provide first, second, the ratio of the energy level of the 4th and the 5th object signal and the energy level of the 3rd object signal.The ratio of energy level with the energy level of the 3rd object signal of the 4th, the 5th, first and second object signal optionally, sequentially is provided in the round-robin mode.Then, the indication that provides first, second, the information of the energy level of the 4th and the 5th object signal and the order of the ratio of the energy level of the 3rd object signal can be included in top of file or can be sent out in the interim of a plurality of frames.The multichannel codec can be determined CLD and ICC information according to the serial number of OTT box.Same, the information how indication is mapped to each object signal in the bit stream also is essential.

Under the situation of multichannel codec, can represent by the serial number of OTT or TTT box about information corresponding to each sound channel.According to a kind of object-based audio coding method, if N object signal arranged, this N object signal may need by proper number.Yet for the terminal user, using object decoder to control N object signal is essential sometimes.In this case, the terminal user may not only need the serial number of N object signal, also need for example indicate first object signal corresponding to female voice, and second object signal is corresponding to the explanation of piano sound for the explanation of this N object signal.These explanations of N object signal can be used as metadata and are included in the head of bit stream, and then along with this bit stream is transmitted together.More specifically, these explanations of N object signal can text mode be provided, or by using code table or code word to provide.

Information about the correlativity between the object signal also is essential sometimes.For this reason, the correlativity between highest energy object signal and other the non-highest energy object signal can be calculated.In this case, an independent relevance values can be assigned to all object signal, just as use an ICC value in all OTT boxes.

If object signal is a stereophonic signal, the information of ICC when of the left channel energy of object signal and R channel energy is essential.Can use according to the energy level absolute value of highest energy object signal and the energy level of other non-highest energy object signal and come the left channel energy of calculating object signal and the ratio of R channel energy with the identical method of the energy level of recently calculating a plurality of object signal of the energy level of highest energy object signal.For instance, if the absolute value of the left side of highest energy object signal and the energy level of R channel is respectively A and B, and the ratio of the energy level of the L channel of non-highest energy object signal and A, and the ratio of the energy level of the R channel of non-highest energy object signal and B is respectively x and y, and the left side of non-highest energy object signal and the energy level of R channel can be calculated by A*x and B*y.In this way, can calculate the L channel of stereo object signal and the ratio of R channel.

When object signal is monophonic signal, also to use the absolute value of the energy level of highest energy object signal, ratio with energy level with the energy level of highest energy object signal of other non-highest energy object signal, the reduction audio signal that is obtained by the monophone object signal is a stereophonic signal, and this monophone object signal is included in two sound channels of stereo reduction audio signal.In this case, be included in the energy of the part of each the monophone object signal in the L channel of stereo reduction audio signal, the energy and the correlation information that are included in the part of the monophone object signal in the R channel that reduces audio signal accordingly are necessary, and it is applied directly to stereo object signal.If the monophone object signal is included among the L and R sound channel of stereo reduction audio signal, the L-of monophone object signal and R-channel component may only have level differences, and this monophone object may have from 1 relevance values to the entire parameter band.In this case, in order to reduce data volume, the indication monophone object signal that provides that can be extra has from 1 information to the relevance values of entire parameter band.Then, do not need to be each parameter band indication relevance values 1.What substitute is relevance values 1 indication entire parameter band.

By during a plurality of object signal being added in the generation reduction audio signal of coming together, slicing (clipping) may take place.In order to address this problem, the predefine gain can be multiply by this reduction audio signal, the maximum level that then should reduce audio signal can surpass the slicing threshold value.But this predefine gain time to time change.Therefore, the information about this predefine gain is essential.If the reduction audio signal is a stereophonic signal,, can provide different yield values with the R-sound channel for this L-that reduces audio signal in order to prevent slicing.In order to reduce volume of transmitted data, the different gains value can not be separated transmission.What substitute is transmission different gains value sum, and the ratio of different gains value.Then, be compared to the situation of transmitting the different gains value respectively, it can lower dynamic range and reduce volume of transmitted data.

During the total that is shown in by a plurality of object signal generates the reduction audio signal whether slicing takes place in order further to reduce volume of transmitted data, can to provide a bit to be used in reference to.Then, only when definite slicing took place, yield value just was transmitted.These slicing information are for being essential in order to stop slicing during merging a plurality of reduction audio signal sums of a plurality of bit streams.In order to stop slicing, can multiply by a plurality of reduction audio signal sums by the inverse of predefined yield value to stop slicing.

Figure 13 to 16 is the block diagrams that are used to explain the whole bag of tricks that disposes object-based side information.The embodiment of Figure 13 to 16 not only can be applied to monophone or stereo object signal, also can be applied to the multichannel object signal.

Referring to Figure 13, multichannel object signal (object A (sound channel 1) is to object A (sound channel n)) is imported in the object encoder 221.Then, this object encoder 221 generates reduction audio signal and side information according to multichannel object signal (object A (sound channel 1) is to object A (sound channel n)).Object encoder 223 receives a plurality of object signal objects 1 to object n and the reduction audio signal that generated by object encoder 221, and generates another reduction audio signal and another side information according to object signal object 1 to object N and the reduction audio signal that receives.Multiplexer 225 will be combined by the side information of object encoder 221 generations and the side information that is generated by object encoder 223.

Referring to Figure 14, object encoder 233 generates first bit stream according to multichannel object signal (object A (sound channel 1) is to object A (sound channel n)).Then, object encoder 231 generates second bit stream according to a plurality of non-multichannel object signal objects 1 to object n.Then, object encoder 235 merges to an individual bit stream by the method much at one that use is used under MCU helps a plurality of bit streams being merged to an individual bit stream with first and second bit streams.

Referring to Figure 15, multi-channel encoder device 241 generates the reduction audio signal according to multichannel object signal (object A (sound channel 1) is to object A (sound channel n)) and based on the side information of sound channel.Object encoder 243 receives the reduction audio signal that generated by multi-channel encoder device 241 and a plurality of non-multichannel object signal object 1 to object n, and generates an object bit stream and side information according to the reduction audio signal that receives and object signal object 1 to object n.Multiplexer 245 will be by combining based on the side information of sound channel and the side information that is generated by object encoder 243 that multi-channel encoder device 241 generates, and the result that merges of output.

Referring to Figure 16, multi-channel encoder device 253 generates the reduction audio signal according to multichannel object signal (object A (sound channel 1) is to object A (sound channel n)) and based on the side information of sound channel.Object encoder 251 generates reduction audio signal and side information according to a plurality of non-multichannel object signal objects 1 to object n.Object encoder 255 receives reduction audio signal that is generated by multi-channel encoder device 253 and the reduction audio signal that is generated by object encoder 251, and the reduction audio signal that will receive combines.Multiplexer 257 will be combined by the side information of object encoder 251 generations and the side information that is generated based on sound channel by multi-channel encoder device 253, and the result of output merging.

In teleconference, use under the situation of object-based audio coding, a plurality of object bit streams must be merged into an independent bit stream sometimes.Below will describe in detail a plurality of object bit streams will be merged into an independent bit stream.

Figure 17 is used to explain the block diagram that merges two object bit streams.Referring to Figure 17, when two object bit streams are merged into an independent object bit stream, be present in two side informations in the object bit stream respectively, for example CLD and ICC informational needs are modified.Can be simply by using extra OTT box, just the side information of the 11 OTT box and use such as CLD that is provided by the 11 OTT box and ICC information is merged into an independent object bit stream with two object bit streams.

The tree structure information of each of these two object bit streams must merge in the tree structure information after the merging, two object bit streams are merged into an independent object bit stream.For this reason, merging any extra configuration information that is generated by two object bit streams can be modified, the numeral index that is used to generate the OTT box of two object bit streams also will be modified, and only carry out a small amount of extra processing, for example by the computing of the 11 OTT box execution and two reduction audio mixings that reduce audio signal of two object bit streams.In this way, two object bit streams can easily be merged into an independent object bit stream, and do not need to revise information about each of a plurality of object signal, therefore, provide a kind of method that simply two bit streams is generated a bit stream.

Referring to Figure 17, the 11 OTT box is optional.In this case, two of two object bit streams reduction audio signal can be taken as two down-mix audio signal and use.Then, two object bit streams can be merged into an independent object bit stream, and need not extra calculating.

Figure 18 be used to explain with two or more independently the object bit stream be merged into the block diagram of an independent object bit stream with stereo reduction audio signal.Referring to Figure 18, if two or more independently the object bit stream have different parameter band quantity, can be at the mapping of object bit stream execution parameter band, the parameter band quantity that has an object bit stream of less parameters band like this rises to identical with the parameter band quantity of another object bit stream.

More specifically, can use predetermined mapping table to come the mapping of execution parameter band.In this case, can use simple linear formula to come the mapping of execution parameter band.

If overlapping parameter band is arranged, consider amount that overlapping parameter band overlaps each other and hybrid parameter value suitably.Paying the utmost attention under this situation of complexity, can be at the mapping of two object bit stream execution parameter bands, so the parameter band quantity that has than a bit stream of multiparameter band in two object bit streams reduces to the same with the parameter band quantity of another object bit stream.

In the embodiment of Figure 17 and 18, two or more independently the object bit stream can be merged into an object bit stream after the merging, and do not need the existing CALCULATION OF PARAMETERS of object bit stream independently.Yet, merging under a plurality of these situations of reduction audio signal, may need to be calculated once more about the parameter of this reduction audio signal by the QMF/ hybrid analysis.Yet this calculates needs very big calculated amount, thereby comprises the usefulness of the embodiment of Figure 17 and 18.Therefore, a kind of method need to be proposed, even when the reduction audio signal is reduced audio mixing, can extracting parameter and do not need the QMF/ hybrid analysis or synthesize.For this reason, the information of energy about each parameter band of each reduction audio signal can be included in the object bit stream.Then, when reduction audio signal when being reduced audio mixing, can easily calculate information such as CLD information according to these energy informations, and not need QMF/ hybrid analysis or synthetic.These energy informations can be represented the highest energy level of each parameter band, or the absolute value of the energy level of the highest energy object signal of each parameter band.Can further reduce calculated amount by the ICC value of using the entire parameter band that obtains from time domain.

During a plurality of reduction audio signal reduction audio mixings, slicing (clipping) may take place.In order to overcome this problem, can reduce the level of reduction audio signal.If the level of reduction audio signal has been lowered, may need to be included in the object bit stream about the level information of the level after reduction being lowered of audio signal.Be used to stop the level information of slicing can be applied to each frame of object bit stream, or only be applied to taking place therein some frame of slicing.Can be by the contrary level that should be used for calculating original reduction audio signal to the level information that is used to stop the slicing that during decode operation, takes place.Be used for stoping the level information of slicing to be calculated, then do not need to introduce QMF/ and mix synthetic or analysis in time domain.Can use as shown in figure 12 structure to carry out a plurality of object signal are merged into an independent object bit stream, describe this operation in detail hereinafter with reference to Figure 19.

Figure 19 be used to explain with two independently the object bit stream be merged into the block diagram of an independent object bit stream.Referring to Figure 19, first box 261 generates the first object bit stream, and second box 263 generates the second object bit stream.Then, the 3rd box 265 generates the 3rd object bit stream by merging first and second bit streams.In this case, if the first and second object bit streams comprise the information about the absolute value of the energy level of the highest energy object signal of each parameter band, ratio with energy level with the energy level of highest energy object signal of other non-highest energy object signal, and about the gain information of yield value, this yield value will multiply each other with the reduction audio signal that comes from first and

second boxes

261 and 263, the 3rd box 265 can generate the 3rd object bit stream by first and second bit streams are combined, and does not need extra calculation of parameter or extraction.

The 3rd box 265 receives a plurality of reduction audio signal DOWNMIX_A and DOWNMIX_B.The 3rd box 265 will reduce audio signal DOWNMIX_A and DOWNMIX_B is converted to the PCM signal, thereby and these PCM signals are added generate independent reduction audio signal together.In this operating period, yet, slicing may take place.In order to overcome this problem, reduction audio signal DOWNMIX_A and DOWNMIX_B can be multiplied by a predefined yield value.Information about this predefined yield value can be included in the 3rd object bit stream, and transmits with the 3rd object bit stream.

Below will describe in further detail a plurality of object bit streams will be merged into an independent object bit stream.Referring to Figure 19, side information A can comprise which is the ratio of energy level with the energy level of highest energy object signal of the information of highest energy object signal and other non-highest energy object signal to object n about a plurality of object signal objects 1.Same, side information B can comprise the information the same with side information A, it comprises which is the ratio of energy level with the energy level of highest energy object signal of the information of highest energy object signal and other non-highest energy object signal to object n about a plurality of object signal objects 1.

As shown in figure 20, SIDE_INFO_A and SIDE_INFO_B can be included in the bit stream concurrently.In this case, can additionally provide a bit to be used to indicate whether to exist concurrently bit stream more than one.

Referring to Figure 20, for whether indicating predetermined bit stream is the bit stream that comprises more than after the merging of one bit stream, indicating predetermined bit stream is the information of the bit stream after merging, and will be included in the predetermined bit stream about the information of bit stream quantity.And the information that is included in any original position about bit stream in the predetermined bit stream can be provided in the head of predetermined bit stream, and thereafter then more than one bit stream.In this case, demoder can determine that whether this predetermined bit stream is the bit stream that comprises more than after the merging of one bit stream by the information that analysis is arranged in the head of predetermined bit stream.Such bit stream merging method does not need extra processing except increasing the minority identifier to the bit stream.Yet these identifiers need be provided in the interim of a plurality of frames, and such bit stream merging method whether need each bit stream that demoder goes to determine that this demoder received be bit stream after merging.

As the replacement of described bit stream merging method, can a plurality of bit streams be merged into a bit stream by making demoder can not identify the mode whether a plurality of bit streams be merged into individual bit stream.Describe this mode in detail hereinafter with reference to Figure 21.

Referring to Figure 21, the energy level of the energy level of the highest energy object signal of representing by SIDE_INFO_A and the highest energy object signal represented by SIDE_INFO_B relatively.Then, the highest energy object signal that has the bit stream after more the object signal of high energy level is confirmed as merging in these two object signal.For instance, if the energy level of the highest energy object signal of being represented by SIDE_INFO_A is higher than the energy level of the highest energy object signal of being represented by SIDE_INFO_B, then the highest energy object signal of being represented by SIDE_INFO_A is exactly the highest energy object signal of the bit stream after merging.Then, the bit stream after the energy of SIDE_INFO_A can be used to merge than information, and the energy of SIDE_INFO_B can be multiplied by the ratio of the energy level of the highest energy object signal among A and the B than information.

Then, one of them the energy about the information of the highest energy object signal of the bit stream after merging that comprises of SIDE_INFO_A and SIDE_INFO_B compares information, with the energy of the highest energy object signal of representing by SIDE_INFO_A than information, and can be used to bit stream after this merging by the highest energy object signal that SIDE_INFO_B represents.This method comprises the calculating again than information of the energy of SIDE_INFO_B.Yet, relatively also more uncomplicated than the calculating again of information to the energy of SIDE_INFO_B.In the method, demoder possibly can't determine whether received bit stream is the bit stream that comprises more than after the merging of a bit stream, and can use typical demoder method.

Merge the employed method of the bit stream method much at one that comprises monophone reduction audio signal by using, two object bit streams that comprise stereo reduction audio signal can easily be merged into an independent object bit stream, and do not need the calculating again about the information of object signal.In an object bit stream, there is information about tree structure, the object signal information that is obtained each branch (each box just) from tree structure is being followed in reduction audio mixing object signal back.

Below described the object bit stream, supposed that this specific object only is assigned to the L channel or the R channel of stereo reduction audio signal.Yet object signal normally is assigned to two sound channels of stereo reduction audio signal.Therefore, how below will to describe in detail according to the object bit stream of two sound channels distributing to stereo reduction audio signal and the formation object bit stream.

Figure 22 is used to explain the block diagram that generates the method for stereo reduction audio signal by a plurality of object signal of audio mixing, more specifically, a kind of be used for from object 1 to object 4 object signal reduction audio mixings of 4 to the method for L and R stereophonic signal.For instance, the first object signal object 1 is assigned to L and R sound channel with ratio a: b, as shown in Equation (3):

[formula 3]

{Eng}_{Obj 1_{L}} = \frac{a}{a + b} {Eng}_{Obj} 1

{Eng}_{Obj 1_{R}} = \frac{b}{a + b} {Eng}_{Obj} 1

If object signal is assigned to the L and the R sound channel of stereo reduction audio signal, the ratio (a: channel allocation percent information b) that may be additionally need between L and R sound channel, distribute about object signal.Then, calculate information about object signal by use the OTT box to carry out the reduction audio mixing at the L of stereo reduction audio signal and R sound channel, for example CLD and ICC information are described this operation in detail hereinafter with reference to Figure 23.

Referring to Figure 23, reducing audio mixing operating period in case from a plurality of OTT boxes, obtain CLD and ICC information, and provide each channel allocation percent information of a plurality of object signal, it can calculate the multichannel bit stream, and this multichannel bit stream can be according to the terminal user to object location information with reproduce any modification that configuration information makes and adaptive variation.In addition, processed if stereo reduction audio signal needs between reduction audio mixing pre-treatment period, it can obtain about how handling the information of this reduction audio signal between reduction audio mixing pre-treatment period, and with the information transmission that obtains to pretreater.That is to say,, just have no idea to calculate the multichannel bit stream and obtain the operation information necessary of pretreater if each channel allocation percent information of a plurality of object signal is not provided.The channel allocation percent information of object signal can be by the ratio (unit: dB) represent of two integers or scalar (scalar).

As mentioned above, if an object signal is assigned between two sound channels of stereo reduction audio signal, may need the channel allocation percent information of object signal.This channel allocation percent information may be the value of fixing, and it indicates this object signal to be assigned to ratio between two sound channels of stereo reduction audio signal.Optionally, the channel allocation percent information of object signal can change to another frequency band from a frequency band of object signal, especially when with this channel allocation percent information during as ICC information.Obtain stereo reduction audio signal if operate by the reduction audio mixing of complexity, if for example object signal belongs to two sound channels of stereo reduction audio signal, and reduce this object signal of audio mixing from a frequency band of object signal to another frequency band by changing ICC information, the detailed description that need reduce audio mixing to this object signal that may be extra is with the object signal of decoding final rendering.This embodiment can be applied to all possible object structure described above.

After this, describe pre-service in detail below with reference to Figure 24 to 27.If the reduction audio signal that is input in the object decoder is a stereophonic signal, before being input to the multi-channel decoder of object decoder, the reduction audio signal of this input needs pretreated, because multi-channel decoder can not will belong to the signal map of L channel of reduction audio signal of input to R channel.

Therefore, in order to make the terminal user position of object signal of L channel that belongs to the reduction audio signal of input can be moved to R channel, the reduction audio signal of this input needs pretreated, and pretreated reduction audio signal can be transfused to multi-channel decoder.

Can from play up matrix, obtain the pre-service that pretreatment information is carried out stereo reduction audio signal by neutralizing, and suitably handle stereo reduction audio signal, below will describe this operation in detail according to pretreatment information from the object bit stream.

Figure 24 is used to explain the block diagram that how to dispose stereo reduction audio signal according to 4 object signal objects 1 to object 4.Referring to Figure 24, the first object signal object 1 is assigned to L and R sound channel with ratio a: b, the second object signal object 2 is assigned to L and R sound channel with ratio c: d, and the 3rd object signal object 3 only is assigned to the L sound channel, and the 4th object signal object 4 only is assigned to the R sound channel.Can generate information such as CLD and ICC by between a plurality of OTT, transmitting first to fourth object signal object 1 to each of object 4, and can generate the reduction audio signal according to the information that generates.

Suppose that the terminal user obtains to play up matrix by first to fourth object signal object 1 to the position and the level of object 4 suitably is set, and 5 sound channels are arranged.This is played up matrix and can be represented by formula (4):

[formula 4]

[\begin{matrix} 30 & 10 & 20 & 30 & 10 \\ 10 & 30 & 20 & 10 & 30 \\ 22 & 22 & 22 & 22 & 22 \\ 21 & 21 & 31 & 11 & 11 \end{matrix}]

Referring to formula (4), when 5 coefficient sums of every row of 4 row equal predefined reference value, just 100 o'clock, its level of determining corresponding object signal did not change.5 the coefficient sums of each row and the amount of the difference between the predefined reference value are exactly the change amount (unit: dB) of the level of corresponding object signal in 4 row.The first, second, third, fourth and the 5th row of playing up matrix of formula (4) are represented FL respectively, FR, C, RL and RR sound channel.

First row of playing up matrix of formula (4) and has 5 coefficients altogether corresponding to the first object signal object 1, and just 30,10,20,30 and 10.Because first these 5 coefficient sums of row are 100, its level of determining the first object signal object 1 does not change, and only has the locus of the first object signal object 1 that change has taken place.Even the different sound channel direction of 5 parametric representations of first row, they can be two sound channel: L and R sound channel by rough classification also.Then, the first object signal object 1 ratio of distributing between L and R sound channel can be by 70% (=(30+30+20) * 0.5): 30% (=(10+10+20) * 0.5) calculated.Therefore, the matrix of playing up of formula (4) indicates the level of the first object signal object 1 not change, and the first object signal object 1 is assigned between L and the R sound channel with 70%: 30% ratio.If 5 coefficient sums of arbitrary row of playing up matrix of formula (4) less than or greater than 100, its level of determining corresponding object signal changes, and then, corresponding object signal can be processed by pre-service, or be converted into ADG and transmit.

For pre-service reduction audio signal, can calculate the allocation proportion of this reduction audio signal between the parameter band, parameter in the parameter band is to extract from the signal that obtains by the reduction audio signal is carried out the QMF/ mixing transformation, and this reduction audio signal can be redistributed between the parameter band according to playing up being provided with of matrix.Below will describe the various audio signal of will reducing in detail and be redistributed to method in the parameter band.

In first reassignment method, use the side information (for example CLD and ICC information) of L-and R-down-mix audio signal and use and the employed method of multichannel codec method much at one decode respectively L-and R-down-mix audio signal respectively.Then, recovery is assigned to the object signal in L-and the R-down-mix audio signal.In order to reduce calculated amount, can be only by CLD information decode L-and R-down-mix audio signal.Can determine that the object signal of each recovery is assigned to the ratio between L-and the R-down-mix audio signal according to side information.

Object signal after each recovers can be assigned between L-and the R-down-mix audio signal according to playing up matrix.Then, use OTT the object signal of having reallocated to be reduced audio mixing to sound channel ground, thereby finish this pre-service based on sound channel.In brief, first reassignment method adopts and the employed identical method of multichannel codec.Yet first reassignment method all needs to carry out and the as many decoding processing of object signal for each sound channel, and needs reallocation to handle and handle based on the reduction audio mixing of sound channel.

In second reassignment method, be different from first reassignment method, from L-and R-reduction audio signal, do not recover object signal.What substitute is, each L-and R-reduction audio signal are divided into two parts: as shown in figure 25, the L_L or the R_R of a part are left in the corresponding sound channel, and the L_R of other parts or R_L are reallocated.Referring to Figure 25, L_L indication L-down-mix audio signal should be left on the part in the L sound channel, L_R indication L-down-mix audio signal should be added to part in the R sound channel.Same, R_R indication R-down-mix audio signal should stay part in the R sound channel, and R_L indication R-down-mix audio signal should be added to part in the L sound channel.Each L-and R-down-mix audio signal can be according to the ratios that is assigned to as defined each object signal of formula (2) between L-and the R-reduction audio signal, and should be assigned to the ratio between pretreated L and the R sound channel and be divided into two parts (L_L and L_R, or R_R and R_L) as defined each object signal of formula (3).Therefore, it can be assigned to ratio between L-and the R-reduction audio signal and each object signal by each object signal relatively and should how to determine that reallocation L-and R-reduce audio signal between pretreated L and R sound channel for the ratio of pretreated L and R sound channel by reallocation.

Below described and recently the L-sound channel signal has been divided into signal L_L and L_R according to predefined energy.In case the L-sound channel signal is divided into signal L_L and L_R, then need to determine the ICC between signal L_L and L_R.Can be according to about the ICC message of object signal and easily determine ICC between signal L_L and L_R.That is to say, can distribute to ratio between signal L_L and the L_R according to each object signal and determine ICC between signal L_L and L_R.

The second reduction audio mixing reassignment method below will be described in further detail.Suppose that L-and R-down-mix audio signal L and R are obtained by as shown in figure 24 method, and the first, second, third and the 4th object signal object 1 (OBJECT1), object 2 (OBJECT2), object 3 (OBJECT3) and object 4 (OBJECT4) are respectively with 1: 2,2: 3, the ratio of 1: 0 and 0: 1 was assigned between L-and R-down-mix audio signal L and the R.A plurality of object signal can be reduced audio mixing by a plurality of OTT boxes, and can be from the reduction audio mixing of object signal acquired information, for example CLD and ICC information.

Be that the example playing up matrix set up of first to fourth object signal object 1 to object 4 is represented by formula (4).This is played up matrix and comprises the positional information of first to fourth object signal object 1 to object 4.Then, can play up matrix by use carries out pre-service and obtains pretreated L-and R-down-mix audio signal L and R.Below described and how to have set up and explain that this plays up matrix with reference to formula (3).

Can calculate first to fourth object signal object 1 each to the object 4 by formula (5) and be assigned to ratio between pretreated L-and R-down-mix audio signal L and the R:

[formula 5]

Object1：

{Eng}_{Obj 1_{L^{'}}} = 30 + 30 + 20 * 0.5 = 70, {Eng}_{Obj 1_{R^{'}}} = 10 + 10 + 20 * 0.5 = 30

{Eng}_{Obj 1_{L^{'}}} : {Eng}_{Obj 1_{R^{'}}} = 70 : 30

Object2：

{Eng}_{Obj 2_{L^{'}}} = 10 + 10 + 20 * 0.5 = 30, {Eng}_{Obj 2_{R^{'}}} = 30 + 30 + 20 * 0.5 = 70

{Eng}_{Obj 2_{L^{'}}} : {Eng}_{{Obj 2}_{R^{'}}} = 30 : 70

Object3：

{Eng}_{{Obj 3}_{L^{'}}} = 22 + 22 + 22 * 0.5 = 55, {Eng}_{{Obj 3}_{R^{'}}} = 22 + 22 + 22 * 0.5 = 55

{Eng}_{{Obj 3}_{L^{'}}} : {Eng}_{{Obj 3}_{R^{'}}} = 55 : 55

Object4：

{Eng}_{{Obj 4}_{L^{'}}} = 21 + 11 + 31 * 0.5 = 47.5, {Eng}_{{Obj 4}_{R^{'}}} = 21 + 11 + 31 * 0.5 = 47.5

{Eng}_{{Obj 4}_{L^{'}}} : {Eng}_{{Obj 4}_{R^{'}}} = 47.5 : 47.5

Can calculate first to fourth object signal object 1 each to the object 4 by formula (6) and be assigned to the ratio of L-and R-down-mix audio signal L and R:

[formula 6]

Object1：

{Eng}_{{Obj 1}_{L}} : {Eng}_{{Obj 1}_{R}} = 1 : 2

Object2：

{Eng}_{{Obj 2}_{L}} : {Eng}_{{Obj 2}_{R}} = 2 : 3

Object3：

{Eng}_{{Obj 3}_{L}} : {Eng}_{{Obj 3}_{R}} = 1 : 0

Object4：

{Eng}_{{Obj 4}_{L}} : {Eng}_{{Obj 4}_{R}} = 0 : 1

Referring to formula (5), the part sum of part and the 3rd object signal object 3 that is assigned to pretreated R-down-mix audio signal that is assigned to the 3rd object signal object 3 of pretreated L-down-mix audio signal is 110, and then its level of determining the 3rd object signal object 3 has increased by 10.On the other hand, the part sum of part and the 4th object signal object 4 that is assigned to pretreated R-down-mix audio signal of distributing to the 4th object signal object 4 of pretreated L-down-mix audio signal L is 95, and then its level of determining the 4th object signal object 4 has reduced 5.If have reference value 100 at first to fourth object signal object 1 to the matrix of playing up of object 4, and this measures of dispersion of playing up coefficient sum and reference value 100 in each row of matrix is represented the amount (unit: dB) of the level change of corresponding object signal, its level that can determine that the level of the 3rd object signal object 3 has increased 10dB and the 4th object signal object 4 has reduced 5dB.

Formula (5) and formula (6) can be rearranged row and advance formula (7):

[formula 7]

Object1：

{Eng}_{{Obj 1}_{L}} : {Eng}_{{Obj 1}_{R}} = 33.3 : 66.7

{Eng}_{{Obj 1}_{L^{'}}} : {Eng}_{{Obj 1}_{R^{'}}} = 70 : 30

Object2：

{Eng}_{{Obj 2}_{L}} : {Eng}_{{Obj 2}_{R}} = 40 : 60

{Eng}_{{Obj 2}_{L^{'}}} : {Eng}_{{Obj 2}_{R^{'}}} = 30 : 70

Object3：

{Eng}_{{Obj 3}_{L}} : {Eng}_{{Obj 3}_{R}} = 100 : 0

{Eng}_{{Obj 3}_{L^{'}}} : {Eng}_{{Obj 3}_{R^{'}}} = 50 : 50

Object4：

{Eng}_{{Obj 4}_{L}} : {Eng}_{{Obj 4}_{R}} = 0 : 100

{Eng}_{{Obj 4}_{L^{'}}} : {Eng}_{{Obj 4}_{R^{'}}} = 50 : 50

Formula (7) comprises that each first to fourth object signal object 1 to object 4 is assigned to L-before the pre-service and the ratio between the R-down-mix audio signal, and each first to fourth object signal object 1 after object 4 is assigned to pre-service L-and the ratio between the R-down-mix audio signal.Therefore, by using formula (7), it can easily determine each first to fourth object signal object 1 how much should be reallocated by pre-service to object 4.For instance, referring to formula (7), the second object signal object 2 became 30: 70 from the ratio that is assigned between L-and the R-down-mix audio signal from 40: 60, then its can determine by allocate in advance to the L-down-mix audio signal the second object signal object 2 1/4th (25%) part need be switched in the R-down-mix audio signal.This operation will be by becoming more obvious with reference to formula (8):

[formula 8]

Object 1: 55% part of allocating in advance to the object 1 of R need be switched to L

Object 2: 25% part of allocating in advance to the object 1 of L need be switched to R

Object 3: 50% part that is assigned to the object 1 of L in advance need be switched to R

Object 4: 50% part that is assigned to the object 1 of R in advance need be switched to L.

By using formula (8), available formula (9) is represented signal L_L, L_R, R_L and the R_R of Figure 25:

[formula 9]

{Eng}_{L_L} = {Eng}_{{Obj 1}_{L}} + 0.75 \cdot {Eng}_{{Obj 2}_{L}} + 0.5 \cdot {Eng}_{Obj 3}

{Eng}_{L_R} = 0.25 \cdot {Eng}_{{Obj 2}_{L}} + 0.5 \cdot {Eng}_{Obj 3}

{Eng}_{R_L} = 0.55 \cdot {Eng}_{{Obj 1}_{R}} + 0.5 \cdot {Eng}_{Obj 4}

{Eng}_{R_R} = 0.45 \cdot {Eng}_{Obj 1_{R}} + {Eng}_{{Obj 2}_{R}} + 0.5 \cdot {Eng}_{Obj 4}

The value of each object signal in the formula (9) can be by quantizing CLD information by going of providing of OTT box the ratio that corresponding object signal is assigned between L and the R sound channel represented by using, as shown in Equation (10):

[formula 10]

{Eng}_{{Obj 1}_{L}} = \frac{1 0^{\frac{CLD 2}{10}}}{1 + 10^{\frac{CLD 2}{10}}} \cdot \frac{10^{\frac{CLD 1}{10}}}{1 + 10^{\frac{CLD 1}{10}}} \cdot {Eng}_{L},

{Eng}_{{Obj 2}_{L}} = \frac{10^{\frac{CLD 2}{10}}}{1 + 10^{\frac{CLD 2}{10}}} \cdot \frac{1}{1 + 10^{\frac{CLD 1}{10}}} \cdot {Eng}_{L}

{Eng}_{{Obj 1}_{R}} = \frac{10^{\frac{CLD 4}{10}}}{1 + 10^{\frac{CLD 4}{10}}} \cdot \frac{10^{\frac{CLD 3}{10}}}{1 + 10^{\frac{CLD 3}{10}}} \cdot {Eng}_{R},

{Eng}_{{Obj 2}_{R}} = \frac{10^{\frac{CLD 4}{10}}}{1 + 10^{\frac{CLD 4}{10}}} \cdot \frac{1}{1 + 10^{\frac{CLD 3}{10}}} \cdot {Eng}_{R}

{Eng}_{Obj 3} = \sqrt{\frac{1}{1 + 10^{\frac{CLD 2}{10}}}} \cdot {Eng}_{L},

{Eng}_{Obj 4} = \frac{1}{1 + 10^{\frac{CLD 4}{10}}} \cdot {Eng}_{R}

The CLD that is used for each resolution block of Figure 25 can use formula (11) to determine:

[formula 11]

{CLD}_{pars 1} = 10 \log_{10} (\frac{L_L + ϵ}{L_R + ϵ}),

ε be constant to avoid division by 0, for example: be lower than peak signal output 96dB.

{CLD}_{pars 2} = 10 \log_{10} (\frac{R_L + ϵ}{R_R + ϵ})

In this mode, employed CLD and the ICC information that is used for generating signal L_L and L_R of resolution block can be determined, and employed CLD and the ICC information that is used for generating signal R_L and R_R of resolution block can be determined according to the R-down-mix audio signal according to the L-down-mix audio signal.As shown in figure 25, in case obtained signal L_L, L_R, R_L and R_R can increase signal L_R and R_R, thereby obtain pretreated stereo reduction audio signal.If final sound channel is a stereo channels, can export the L-and the R-down-mix audio signal that obtain by pre-service.In this case, any possible change of each object signal still needs to be adjusted.For this reason, may additionally provide the predetermined module of carrying out the ADG functions of modules.Can use the method identical to calculate to be used to the information of the level of adjusting each object signal, and following this operation will be described in further detail with calculating ADG information.Optionally, during pretreatment operation, adjust the level of each object signal.In this case, can use the method identical to carry out adjustment to the level of each object signal with handling ADG.For the embodiment of Figure 25, optionally, as shown in figure 26, in order to adjust the signal L that obtained by audio mixing and the correlativity between the R, the decorrelation operation can be carried out by decorrelator and mixer, rather than is carried out by resolution block PARSING 1 and PARSING 2.Referring to Figure 26, L-and R-sound channel signal that Pre_L and Pre_R indication are obtained by the level adjustment.One among signal Pre_L and the Pre_R is imported in the decorrelator, and enters by in the operation of the performed audio mixing of mixer, thereby obtains the adjusted signal of correlativity.

Pretreated stereo reduction audio signal can be transfused to multi-channel decoder.In order to provide and to export by the set object's position signal of the terminal user multichannel compatible mutually with reproducing configuration information, not only need pretreated reduction audio signal, also need to be used to carry out the side information based on sound channel of multi-channel decoding.Below will describe the side information that how to obtain based on sound channel in detail by explaining described example once more.Defined pretreated reduction audio signal L and the R that inputs to multi-channel decoder can be represented by formula (12) according to formula (5):

[formula 12]

Eng _L′＝Eng _{L_L}+Eng _{R_L}＝0.7Eng _Obj1+0.3Eng _Obj2+0.5Eng _Obj3+0.5Eng _Obj4

Eng _R′＝Eng _{L_R}+Eng _{R_R}＝0.3Eng _Obj1+0.7Eng _Obj2+0.5Eng _Obj3+0.5Eng _Obj4

First to fourth object signal object 1 each to the object 4 is assigned to FL, RL, and C, the ratio between FR and the RR sound channel can be determined by formula (13):

[formula 13]

Eng _FL＝0.3Eng _Obj1+0.1Eng _Obj2+0.2Eng _Obj3+0.21·100/95·Eng _Obj4

Eng _RL＝0.3Eng _Obj1+0.1Eng _Obj2+0.2Eng _Obj3+0.11·100/95·Eng _Obj4

Eng _C?＝0.2Eng _Obj1+0.2Eng _Obj2+0.2Eng _Obj3+0.31·100/95·Eng _Obj4

Eng _FR＝0.1Eng _Obj1+0.3Eng _Obj2+0.2Eng _Obj3+0.21·100/95·Eng _Obj4

Eng _RR＝0.1Eng _Obj1+0.3Eng _Obj2+0.2Eng _Obj3+0.11·100/95·Eng _Obj4

As shown in figure 27, pretreated reduction audio signal L and R can be extended to 5.1 sound channels by MPS.Referring to Figure 27, need in the parameter band, calculate the parameter TTT0 of TTT box and parameter OTTA, OTTB and the OTTC of OTT box, even the parameter band is not shown for convenience's sake.

TTT box TTT0 can be used to two kinds of different patterns: a kind of pattern and a kind of predictive mode based on energy.When the pattern that is used for based on energy, TTT box TTT0 needs two CLD information.When being used for predictive mode, TTT box TTT0 needs two CPC information and an ICC information.

CLD information in order to calculate based on energy model can use formula (6), (10) and (13) to calculate the signal L among Figure 27 ", R " and the energy ratio of C.Signal L " energy level can calculate by formula (14):

[formula 14]

{Eng}_{L^{''}} = {Eng}_{FL} + {Eng}_{RL} = 0.6 {Eng}_{Obj 1} + 0.2 {Eng}_{Obj 2} + 0.4 {Eng}_{Obj 3} + 0.32 \cdot 100 / 95 \cdot {Eng}_{Obj 4}

= 0.6 \cdot \frac{1}{3} \cdot \frac{10^{\frac{CLD 2}{10}}}{1 + 10^{\frac{CLD 2}{10}}} \cdot \frac{10^{\frac{CLD 1}{10}}}{1 + 10^{\frac{CLD 1}{10}}} \cdot {Eng}_{L}

+ 0.2 \cdot \frac{2}{5} \cdot \frac{10^{\frac{CLD 2}{10}}}{1 + 10^{\frac{CLD 2}{10}}} \cdot \frac{1}{1 + 10^{\frac{CLD 1}{10}}} \cdot {Eng}_{L}

+ 0.4 \cdot \frac{1}{1 + 10^{\frac{CLD 2}{10}}} \cdot {Eng}_{L}

+ 0.32 \cdot 100 / 95 \cdot \frac{1}{1 + 10^{\frac{CLD 4}{10}}} \cdot {Eng}_{R}

Formula (14) also can be used to calculate R " or the energy level of C.Thereafter, can be according to signal L ", R " and the energy level of C calculate the CLD information that is used for TTT box TTT0, as shown in Equation (15):

[formula 15]

{TTT}_{CLD 1} = 10 \log_{10} (\frac{{Eng}_{L^{''}} + {Eng}_{R^{''}}}{{Eng}_{C^{''}}})

{TTT}_{CLD 2} = 10 \log_{10} (\frac{{Eng}_{C^{''}}}{{Eng}_{R^{''}}})

Can set up formula (14) according to formula (10).Even formula (10) has only defined the energy value that how to calculate the L sound channel, also can use formula (10) to calculate the energy value of R sound channel.By such mode, the CLD and the ICC value that can calculate the third and fourth OTT box according to the CLD and the ICC value of the first and second OTT boxes.Yet can this be applied to all tree structures, and only be applied to specific tree structure with the decoder object signal.The information that is included in the object bit stream can be transferred to each OTT box.Optionally, the information that is included in the object bit stream can only be transferred to some OTT boxes, and by calculating the information of the OTT box can obtain to indicate the information that do not receive.

Can be by using described method to calculate to be used for the parameter of OTT box OTTA, OTTB and OTTC, for example CLD and ICC information.These multichannel parameters can be transfused to multi-channel decoder, and then enter multi-channel decoding, thereby obtain according to object location information that the terminal user expected and reproduce the multi-channel signal that configuration information is suitably played up.

If the level of object signal is not adjusted because of pre-service, the multichannel parameter can comprise the ADG parameter.Below will describe described example once more and describe the ADG CALCULATION OF PARAMETERS in detail.

When playing up matrix and be established, the level of the 3rd object signal can increase 10dB, the level of the 4th object signal can reduce 5dB, then the level of the 3rd object signal component in L can increase 10dB, and the level of the 4th object signal component in L can reduce 5dB, and the level that can use formula (16) to calculate third and fourth object signal is adjusted the ratio Ratio of preceding and adjusted energy level _{ADG, L}:

[formula 16]

Can be by formula (10) substitution formula (16) be determined ratio Ratio _{ADG, L}Also can use formula (16) to calculate the ratio Ratio of R sound channel _{ADG, R}Each Ratio _{ADG, L}And Ratio _{ADG, R}The variation of the energy of the relevant parameters band that expression causes because of the adjustment of the level of object signal.Then, can use Ratio _{ADG, L}And Ratio _{ADG, R}Calculate ADG value ADG (L) and ADG (R), as shown in Equation (17):

[formula 17]

ADG(L′)＝10log ₁₀(Ratio _ADG，L′)

ADG(R′)＝10log ₁₀(Ratio _ADG，R′)

In case determined ADG parameter A DG (L) and ADG (R), can use the ADG quantization table to quantize ADG parameter A DG (L) and ADG (R), and the ADG value after the transmission quantification.If do not need further accurately to adjust ADG value ADG (L) and ADG (R), can carry out the adjustment of ADG value ADG (L) and ADG (R) by pretreater, rather than use the MPS demoder.

Be used for being illustrated in the quantity and at interval and the quantity of the employed parameter band of multi-channel decoder and at interval can be different of parameter band of the object signal of object bit stream.In this case, the parameter band of object bit stream can be mapped on the parameter band of multi-channel decoder linearly.More specifically, if the special parameter band of object bit stream extends on the parameter band of two multi-channel decoders, can carry out linear mapping, divide this special parameter band of object bit stream with the ratio between two parameter bands that are assigned to multi-channel decoder according to the relevant parameter band.On the other hand, if be included in the special parameter band of multi-channel decoder more than the parameter band of one object bit stream, parameter value that can equalization object bit stream.Optionally, the mapping that can use the parameter band mapping table of existing multichannel standard to come the execution parameter band.

When the object encoding and decoding are when being used for teleconference, the voice of different people are corresponding to object signal.Object decoder is exported respectively corresponding to the voice of object signal and is given particular speaker.Yet when having simultaneously when speaking more than a people, object decoder is difficult to by decoding suitable assigner's voice to give different loudspeakers, and people's voice play up the deterioration that may cause audio distortions and sound quality.In order to overcome this problem, indicate whether to have the information of speaking simultaneously to be included in the bit stream more than a people.Then, if having determined to have more than a people according to this information speaks simultaneously, can revise the bit stream based on sound channel, then identical with the reduction audio signal decoded signal (barely-decoded) that almost do not have is exported to each loudspeaker.

For instance, suppose to have 3 people a, b and c, and these three people a, the voice of b and c need decoded and export to loudspeaker A, B and C respectively.As these three people a, when b and c spoke simultaneously, these three people a, the voice of b and c can be included in the reduction audio signal, and this reduction audio signal is by to representing this three people a respectively, and the object signal of the voice of b and c is reduced audio mixing and obtained.In this case, about corresponding respectively to this three people a, the information of the reduction audio signal of the part of the voice of b and c can be configured to the multichannel bit stream.Then, can use typical object coding/decoding method this reduction audio signal of decoding, make these three people a, the voice of b and c can be exported to loudspeaker A, B and C respectively.Yet, loudspeaker A, each output of B and C may distortion, and may have than the lower discrimination of original reduction audio signal.In addition, these three people a, the voice of b and c possibly can't be isolated each other completely.In order to overcome this problem, indicate this three people a, the information that b and c speak simultaneously can be included in the bit stream.Then, code converter generates the multichannel bit stream, makes to correspond respectively to this three people a by the reduction audio mixing object signal of the voice of b and c and the reduction audio signal that obtains is exported to loudspeaker A, each of B and C.By such mode, it can prevent distorted signals.

In fact, when speaking simultaneously, be difficult to separate everyone voice more than a people.Therefore, when reduction audio signal when being output, its sound quality may be higher than when reducing the sound quality of audio signal when being played up, so the voice of different people can be spaced apart from each other, and is exported to different loudspeakers.For this reason, code converter can generate the multichannel bit stream, so the reduction audio signal that is obtained from speak simultaneously more than a people can be exported to all loudspeakers, maybe this reduction audio signal can be exaggerated and then be exported to loudspeaker.

Whether speak simultaneously from one or more people for the reduction audio signal of denoted object bit stream, as mentioned above, object encoder, to replace providing extra information if can suitably revise the object bit stream.In this case, object decoder can be carried out typical decode operation to the object bit stream, makes the reduction audio signal can be exported to loudspeaker, but maybe this reduction audio signal can be exaggerated and do not expand to the initiation distortion, then is exported to loudspeaker.

Below detailed description is offered the 3D information of multi-channel decoder, for example HTRF.

When object decoder operates in ears pattern following time, the multi-channel decoder in the object decoder also operates under the ears pattern.The terminal user can be according to the locus of the object signal 3D information after with optimization, and for example HRTF is transferred to multi-channel decoder.

More specifically, as two object signal, just when object 1 and object 2, these two object signal objects 1 and object 2 are positioned over

position

1 and 2 respectively, play up the 3D information that matrix maker or code converter may have the position of denoted object signal object 1 and object 2.If play up the 3D information that the matrix maker has the position of denoted object signal object 1 and object 2, this plays up the matrix maker can give code converter with the 3D information transmission of the position of denoted object signal object 1 and object 2.On the other hand, if code converter has the 3D information of the position of denoted object signal object 1 and object 2, this plays up the matrix maker only will give code converter corresponding to this 3D information index information transmission.

In this case, can generate binaural signal according to 3D information assigned

address

1 and 2, as shown in Equation (18):

[formula 18]

L＝Obj1*HRTF _L，Pos1+Obj2*HRTF _L，Pos2

R＝Obj1*HRTF _R，Pos1+Obj2*HRTF _R，Pos2

Suppose to use the 5.1 channel loudspeaker systems sound of regenerating, multichannel ears demoder obtains ears sound by carrying out decoding, and this ears sound can be represented by formula (19):

[formula 19]

L＝FL*HRTF _L，FL+C*HRTF _L，C+FR*HRTF _L，FR+RL*HRTF _L，RL+RR*HRTF _L，RR

R＝FL*HRTF _R，FL+C*HRTF _R，C+FR*HRTF _R，FR+RL*HRTF _R，RL+RR*HRTF _R，RR

The L-channel component of object signal object 1 can be represented by formula (20):

[formula 20]

L _Obj1＝Obj1*HRTF _L，Pos1

L _Obj1＝FL _Obj1*HRTF _L，FL+C _Obj1*HRTF _L，C+FR _Obj1*HRTF _L，FR+RL _Obj1*HRTF _L，RL+RR _Obj1*HRTF _L，RR

The R-channel component of object signal object 1 and the L-of object signal object 2 and R-channel component also can use formula (20) to define.

For instance, if the energy level of object signal object 1 and object 2 is respectively a and b with the ratio of energy level summation, the ratio of part and whole object signal object 1 that is assigned to the object signal object 1 of FL sound channel is c, and the ratio of part and whole object signal object 2 that is assigned to the object signal object 2 of FL sound channel is d, and the ratio that object signal object 1 and object 2 are assigned to the FL sound channel is ac: bd.In this case, can determine the HRTF of FL sound channel, shown in formula (21):

[formula 21]

{HRTF}_{FL, L} = \frac{ac}{ac + bd} \cdot {HRTF}_{L, Pos 1} + \frac{bd}{ac + bd} \cdot {HRTF}_{L, Pos 2}

{HRTF}_{FL, R} = \frac{ac}{ac + bd} \cdot {HRTF}_{R, Pos 1} + \frac{bd}{ac + bd} \cdot {HRTF}_{R, Pos 2}

By such mode, can obtain employed 3D information in multichannel ears demoder.Because the employed 3D information exact position of indicated object signal better in multichannel ears demoder, it can decode the binaural signal of more vivo regenerating by use ears of employed 3D information in multichannel ears demoder, and the regeneration of this reproduction ratio when use is carried out multi-channel decoding corresponding to the 3D information of the position of 5 loudspeakers is better.

As mentioned above, can calculate employed 3D information in multichannel ears demoder than information according to the 3D information and the energy of the locus of indicated object signal.Optionally, when the ICC information according to object signal adds up to the 3D information of indicated object signal space position, can generate employed 3D information in multichannel ears demoder by suitable execution decorrelation.

Effect process can be used as a pretreated part and is performed.Optionally, the structure of effect process can be increased in the output of multi-channel decoder simply.In previous example,, need the extraction of in addition the L-sound channel signal being carried out object signal to division and the R-sound channel of L_L and L_R to the division of R_R and R_L in order to carry out effect process at object signal.

More specifically, at first can be from L-and R-sound channel signal the extraction object signal.Then, this L-sound channel signal can be divided into L_L and L_R, and this R-sound channel signal can be divided into R_R and R_L.To handling at this object signal implementation effect.Then, the object signal after the effect process can be divided into L-and R-channel component according to playing up matrix., the L-channel component of object signal effect process after can be increased to L_L and R_L, the R-channel component of the object signal after the effect process is increased to R_R and L_R thereafter.

Optionally, can at first generate pretreated L-and R-sound channel signal L and R.Thereafter, can be from pretreated L-and R-sound channel signal L and R the extraction object signal., can at object signal implementation effect handle, and the result of effect process be returned add to pretreated L-and R-sound channel signal thereafter.

Can revise the frequency spectrum of object signal by effect process.For instance, optionally improve the high pitch part of object signal or the level of bass part.For this reason, can only revise corresponding to the high pitch part of this object signal or the portions of the spectrum of bass part.In this case, need corresponding modify to be included in object-related information in the object bit stream.For instance, if the level of the bass part of special object signal has improved, the energy of the bass part of this special object signal has also improved.Then, be included in the energy that energy information in the object bit stream is represented this special object signal no longer exactly.In order to overcome this problem, can directly revise the energy information that is included in the object bit stream according to the variation of the energy of this special object signal.Optionally, the spectral change information that is provided by code converter can be applied in the forming of multichannel bit stream, and the energy variation of this special object signal can be reflected in the multichannel bit stream like this.

Figure 28 to 33 is used for explaining a plurality of object-based side informations and a plurality of reduction audio signal are merged into a side information and a block diagram that reduces audio signal.In the example of teleconference, a plurality of object-based side informations and a plurality of reduction audio signal must be merged into sometimes in a side information and the reduction audio signal, in this case, need to consider many factors.

Figure 28 is the block diagram of the object bit stream behind the coding.Referring to Figure 28, the object bit stream behind this coding comprises reduction audio signal and side information.This reduction audio signal and this side information are synchronous.Therefore, the object bit stream behind this coding can be easily decoded, and do not need to consider extra factor.Yet, in the situation that a plurality of bit streams is merged to an individual bit stream, must guarantee that the reduction audio signal of this individual bit stream and the side information that this individual bit flows are synchronous.

Figure 29 is used to explain the object bit stream BS1 that merges behind a plurality of codings and the block diagram of BS2.Referring to Figure 29,

reference marker

1,2 and 3 indication frame numbers.For a plurality of reduction audio signal being merged into an independent reduction audio signal, this reduction audio signal will be converted into the pulse code modulation (pcm) signal, this PCM signal is reduced audio mixing in time domain, and the PCM signal behind the reduction audio mixing will be converted into the compression coding and decoding form.Shown in Figure 29 (b),, may generate delay d in this operating period.Therefore, when when merging a plurality of bit streams and obtain, must guarantee that reduction audio signal with decoded bit stream is with synchronous fully with the side information of decoded bit stream with decoded bit stream.

If provided the reduction audio signal of bit stream and the delay between the side information, then can use scheduled volume to compensate this bit stream corresponding to this delay.The reduction audio signal of bit stream and the delay between the side information can change along with being used to generate the type of the compression coding and decoding device that reduces audio signal.Therefore, the bit of any possible delay can be included in the side information between the reduction audio signal of indication bit stream and the side information.

Figure 30 represents when generating the reduction audio signal of bit stream BS1 and BS2 by different codec types, or when the configuration of the side information of bit stream BS2 is different from the configuration of side information of bit stream BS1, two bit stream BS1 and BS2 are merged into the situation of an independent bit stream.Referring to Figure 30, when generating the reduction audio signal of bit stream BS1 and BS2 by different code/decode types, or when the configuration of the side information of bit stream BS2 is different from the configuration of side information of bit stream BS1, can determine that bit stream BS1 and BS2 have unlike signal and postpone d1 and d2, these delays are that the reduction audio signal is transformed to time-domain signal and uses single compression coding and decoding device to come the conversion time-domain signal to cause.In this case, be in the same place if simply bit stream BS1 is added with BS2, and do not consider the delay of unlike signal, then the reduction audio signal of bit stream BS1 may produce skew with the reduction audio signal of bit stream BS2, and the side information of bit stream BS1 may produce skew with the side information of bit stream BS2.In order to overcome this problem, the reduction audio signal with the bit stream BS1 that postpones d1 can further be postponed with the reduction audio signal synchronised with the bit stream BS2 that postpones d2.Then, can use the method identical to merge bit stream BS1 and BS2 with the embodiment of Figure 30.Will be merged if having more than a bit stream, the bit stream that wherein has maximum-delay is taken as reference bits and flows, and then, other bit stream is further postponed to flow synchronised with reference bits.The bit of the delay between indication reduction audio signal and the side information can be included in the object bit stream.

Can provide indication in bit stream, to have the bit of signal delay.Only when there is signal delay in the bit information indication in bit stream, can additionally provide the information of specification signal delay.In this way, it can minimize to be used in reference to and be shown in any possible required quantity of information of signal delay in the bit stream.

Figure 32 is used for explaining that the difference that how postpones by unlike signal compensates one of them the block diagram of two bit stream BS1 having that unlike signal postpones and BS2, specifically, how to compensate and has the bit stream BS2 that postpones of large-signal more than bit stream BS1.Referring to Figure 32, first to the 3rd frame of the side information of bit stream BS1 all can be used in its original mode.On the other hand, cannot use first to the 3rd frame of the side information of bit stream BS2 in its original mode because first to the 3rd frame of the side information of bit stream BS2 not with first to the 3rd frame of the side information of bit stream BS1 respectively synchronously.For instance, second frame of the side information of bit stream BS1 is not only corresponding to the part of first frame of the side information of bit stream BS2, also corresponding to the part of second frame of the side information of bit stream BS2.Can calculate corresponding to the part of second frame of the side information of the bit stream BS2 of second frame of the side information of bit stream BS 1 and the ratio of whole second frame of the side information of bit stream BS2, and corresponding to the part of first frame of the side information of the bit stream BS2 of second frame of the side information of bit stream BS1 and the ratio of whole first frame of the side information of bit stream BS2, and can come first and second frames of the side information of equalization or interpolation bit stream BS2 according to this result calculated.Shown in Figure 32 (b), in this way, first to the 3rd frame of the side information of bit stream BS2 can be respectively and first to the 3rd frame synchronised of the side information of bit stream BS1.Then, can use the method for the embodiment of Figure 29 to merge the side information of bit stream BS1 and the side information of bit stream BS2.The reduction audio signal of bit stream BS1 and BS2 can be merged into an independent reduction audio signal, and need not delay compensation.In this case, can be stored in the bit stream after the merging that obtains by merging bit stream BS1 and BS2 corresponding to the deferred message of signal delay d1.

Figure 33 is used for explaining that how compensating two bit streams with unlike signal delay has the more block diagram of the bit stream of small-signal delay.Referring to Figure 33, first to the 3rd frame of the side information of bit stream BS2 all can be used in its original mode.On the other hand, cannot use first to the 3rd frame of the side information of bit stream BS1 in its original mode because first to the 3rd frame of the side information of bit stream BS1 not with first to the 3rd frame of the side information of bit stream BS2 respectively synchronously.For instance, first frame of the side information of bit stream BS2 is not only corresponding to the part of first frame of the side information of bit stream BS1, also corresponding to the part of second frame of the side information of bit stream BS1.Can calculate corresponding to the part of first frame of the side information of the bit stream BS1 of first frame of the side information of bit stream BS2 and the ratio of whole first frame of the side information of bit stream BS1, and corresponding to the part of second frame of the side information of the bit stream BS1 of first frame of the side information of bit stream BS2 and the ratio of whole second frame of the side information of bit stream BS1, and can come first and second frames of the side information of equalization or interpolation bit stream BS1 according to this result calculated.Shown in Figure 33 (b), in this way, first to the 3rd frame of the side information of bit stream BS1 can be respectively and first to the 3rd frame synchronised of the side information of bit stream BS2.Then, can use the method for the embodiment of Figure 29 to merge the side information of bit stream BS1 and the side information of bit stream BS2.The reduction audio signal of bit stream BS1 and BS2 can be merged into an independent reduction audio signal, and need not delay compensation, postpones even this reduction audio signal has unlike signal.In this case, can be stored in the bit stream after the merging that obtains by merging bit stream BS1 and BS2 corresponding to the deferred message of signal delay d2.

If the object bit stream behind a plurality of codings is merged into an independent bit stream, the reduction audio signal of the object bit stream behind this coding need be merged into an independent reduction audio signal.In order to be merged into an independent reduction audio signal corresponding to a plurality of reduction audio signal of different compression coding and decoding devices, these reduction audio signal can be converted into PCM signal or frequency-region signal, and this PCM signal or frequency-region signal can be added in together in corresponding territory.Can use predetermined compression coding and decoding device come the conversion described result that add together thereafter.According to whether the reduction audio signal being added in together or whether being added in together in frequency domain in PCM operating period, and according to the type of compression coding and decoding, various signal delays may take place.Because demoder is the various signal delays of identification from bit stream that will be decoded like a cork, specify the deferred message of various signal delays to be included in the bit stream.These deferred messages are illustrated in delay sampling quantity in the PCM signal or the delay sampling quantity in frequency domain.

But the computer-readable code of the present invention's service recorder on computer-readable medium realized.This computer readable recording medium storing program for performing can be the pen recorder of any kind, and data are stored in computer-readable mode therein.The example of computer readable recording medium storing program for performing comprises ROM, RAM, CD-ROM, tape, floppy disk, optical data memories and the carrier wave data transmission of the Internet (for example by).Computer readable recording medium storing program for performing can be assigned with by a plurality of computer systems that are connected on the network, so computer-readable code is written into wherein, and is performed with non-centralized system.Common those skilled in the art can easily construct and be used to realize functional programs of the present invention, code and code segment.

As mentioned above, according to the present invention, benefit from object-based audio coding and coding/decoding method, the audiovideo of each object signal can be positioned.Like this, during reproducing object signal, can provide more lively sound.In addition, the present invention can be applied to interactive entertainment, and can provide more real pseudo-entity to experience to the user.

Although the present invention is described and illustrates with reference to its preferred embodiment, clearly those skilled in the art can make on the various ways and details on change, and do not break away from by defined spirit of the present invention of claim or category.

Claims

1. the method for a decoded audio signal, it comprises:

The reception expression comprises the object tag information whether the object description information of number information and at least one Word message is transmitted; With

By to extracting, obtain described object description information based on recurrence corresponding to the Word message of the described number information of literal number.

2. a literal that the method for claim 1, wherein described object description information is shown is unit with the byte.

3. audio-frequency decoding method, it comprises:

Receive reduction audio signal and object-based side information, described reduction audio signal is obtained by reduction audio mixing object signal;

From described object-based side information, extract metadata;

According to described metadata, show object-related information about described object signal.

4. audio-frequency decoding method as claimed in claim 3, wherein, described object-related information comprises corresponding in the explanation of the numbering of described object signal and object signal at least one.

5. audio-frequency decoding method as claimed in claim 3, wherein, described metadata is included in the head of described object-based side information.

6. audio-frequency decoding method as claimed in claim 3, it further comprises:

According to described object-based side information with play up control information and generate side information based on sound channel, described side information based on sound channel comprises the gain parameter of the gain that is used to control described reduction audio signal.

7. audio-frequency decoding method as claimed in claim 6, it further comprises:

Generate multi-channel audio signal according to described side information and described reduction audio signal based on sound channel.

8. audio-frequency decoding method as claimed in claim 3, it further comprises:

Calculate described gain parameter according to described control information, object level information and the reduction audio mixing gain information played up, described object level information is extracted out from described object-based side information.

9. audio coding method, it comprises:

Generate the reduction audio signal by reduction audio mixing object signal;

Generate object-based side information by from described object signal, extracting object-related information; With

The metadata that will be used for playing up described object-related information is inserted into described object-based side information.

10. audio coding method as claimed in claim 9, it further comprises:

Generate bit stream by merging described reduction audio signal with the object-based side information that is inserted with described metadata.

11. an audio decoding apparatus, it comprises:

Demodulation multiplexer, it is configured to extract reduction audio signal and object-based side information from the sound signal of input, and described reduction audio signal is obtained by reduction audio mixing object signal;

Code converter, it is configured to extract metadata from described object-based side information; With

Renderer, it is configured to show object-related information about described object signal according to described metadata.

12. audio decoding apparatus as claimed in claim 11, wherein, described renderer provides plays up control information to described code converter, and described code converter is according to described object-based side information and describedly play up control information and generate side information based on sound channel, and described side information based on sound channel comprises the gain parameter of the gain that is used to control described reduction audio signal.

13. audio decoding apparatus as claimed in claim 12, it further comprises:

Multi-channel decoder, it generates multi-channel audio signal according to described side information and described reduction audio signal based on sound channel.

14. a processor readable medium recording program performing records the program that is used for requiring in the processor enforcement of rights 1 described method on it.

15. a computer readable recording medium storing program for performing records the computer program that is used to carry out a kind of audio-frequency decoding method on it, described audio-frequency decoding method comprises:

From described object-based side information, extract metadata;

16. computer readable recording medium storing program for performing as claimed in claim 15, wherein, described audio-frequency decoding method further comprises:

According to described object-based side information with play up control information and generate side information based on sound channel, described side information based on sound channel comprises the gain parameter of the gain that is used to control described reduction audio signal; With

17. a computer readable recording medium storing program for performing records the computer program that is used to carry out a kind of audio coding method on it, described audio coding method comprises:

Generate the reduction audio signal by reduction audio mixing object signal;

Metadata is inserted in the described object-based side information, and described metadata is represented described object-related information.