CN101542597B

CN101542597B - Methods and apparatuses for encoding and decoding object-based audio signals

Info

Publication number: CN101542597B
Application number: CN2008800003869A
Authority: CN
Inventors: 金东秀; 房熙锡; 林宰显; 尹圣龙; 李显国
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2007-02-14
Filing date: 2008-02-14
Publication date: 2013-02-27
Anticipated expiration: 2028-02-14
Also published as: CN101542595B; CN101542597A; CN101542595A; CN101542596A; CN101542596B

Abstract

An audio decoding method and apparatus and an audio encoding method and apparatus which can efficiently process object-based audio signals are provided. The audio decoding method includes receiving a downmix signal, which is obtained by downmixing a plurality of object signals, and object side information, extracting metadata from the object-side information and displaying an information regarding the object signals based on the metadata.

Description

The method and apparatus that is used for the object-based sound signal of Code And Decode

Technical field

The present invention relates to a kind of audio coding method and device, and a kind of audio-frequency decoding method and device, wherein can effectively process object-based sound signal by carrying out the Code And Decode operation.

Background technology

In general, in multi-channel audio coding and decoding technique, a plurality of sound channel signals of multi-channel signal are reduced audio mixing and are advanced in the sound channel signal of less number, and transmission is about the side information (side information) of original channel signal and recover to have multi-channel signal with the as many sound channel of original multi-channel signal.

Object-based audio coding and decoding technique and multi-channel audio coding and decoding technique are advancing a plurality of sound source reduction audio mixings in the sound source signals of less number, and the side information aspect of transmitting about original sound source is basically similar.Yet, in object-based audio coding and decoding technique, it is the fundamental element (for example sound of musical instrument or people's voice) of sound channel signal for object signal, be regarded as identical with sound channel signal in multi-channel audio coding and the decoding technique, and also can be by coding/decoding.

In other words, in object-based audio coding and decoding technique, object signal is considered to be will be by the main body of coding/decoding (entities).In this, object-based audio coding and decoding technique and multi-channel audio coding and decoding technique are distinguishing, this difference is that multichannel audio coding/decoding operation is simply according to information between sound channel and by coding/decoding, and with irrelevant by the number of elements in the sound channel signal of coding/decoding.

Summary of the invention

Technical matters

The invention provides a kind of audio coding method and device, and a kind of audio-frequency decoding method and device, wherein can coding or decoded audio signal so that this sound signal can be applied to various environment.

Technical scheme

According to an aspect of the present invention, it provides a kind of audio-frequency decoding method, comprising: receive reduction audio signal and object-based side information, this reduction audio signal obtains by a plurality of object signal of reduction audio mixing; From object-based side information, extract metadata; Show object-related information about object signal according to metadata.According to another aspect of the present invention, it provides a kind of audio coding method, comprising: mix a plurality of object signal by reduction and generate the reduction audio signal; Generate object-based side information by from object signal, extracting object-related information; With will be inserted into for the metadata of indicated object relevant information object-based side information.

According to another aspect of the present invention, it provides a kind of audio decoding apparatus, comprise: demodulation multiplexer, it is configured to extract reduction audio signal and object-based side information from the sound signal of input, and this reduction audio signal obtains by a plurality of object signal of reduction audio mixing; Code converter, it is configured to extract metadata from object-based side information; Renderer, it shows object-related information about object signal according to metadata.

According to another aspect of the present invention, it provides a kind of computer readable recording medium storing program for performing, wherein record be used to the computer program of carrying out a kind of audio-frequency decoding method, this audio-frequency decoding method comprises: receive reduction audio signal and object-based side information, this reduction audio signal obtains by a plurality of object signal of reduction audio mixing; From object-based side information, extract metadata; With the object-related information that shows according to metadata about object signal.

According to another aspect of the present invention, it provides a kind of computer readable recording medium storing program for performing, wherein record be used to the computer program of carrying out a kind of audio coding method, this audio coding method comprises: mix a plurality of object signal by reduction and generate the reduction audio signal; Generate object-based side information by extract object-related information from object signal, this metadata represents object-related information.

Description of drawings

Fig. 1 is the block scheme of typical object-based audio coding/decoding system;

Fig. 2 is the block scheme according to the audio decoding apparatus of first embodiment of the invention;

Fig. 3 is the block scheme according to the audio decoding apparatus of second embodiment of the invention;

Fig. 4 is the block scheme according to the audio decoding apparatus of third embodiment of the invention;

Fig. 5 is the block scheme that can be used for any reduction audio mixing gain (ADG) module of audio decoding apparatus shown in Figure 4;

Fig. 6 is the block scheme according to the audio decoding apparatus of fourth embodiment of the invention;

Fig. 7 is the block scheme according to the audio decoding apparatus of fifth embodiment of the invention;

Fig. 8 is the block scheme according to the audio decoding apparatus of sixth embodiment of the invention;

Fig. 9 is the block scheme according to the audio decoding apparatus of seventh embodiment of the invention;

Figure 10 is the block scheme according to the audio decoding apparatus of eighth embodiment of the invention;

Figure 11 and 12 is the block diagrams for the operation of interpretive code converter;

Figure 13 to 16 is the block diagrams be used to the structure of explaining object-based side information;

Figure 17 to 22 is the block diagrams that the fragment of a plurality of object-based side informations are merged into an independent side information for explaining;

Figure 23 to 27 is for the block diagram of explaining pretreatment operation; With

Figure 28 to 33 is for explaining that a plurality of bit streams that will use object-based signal decoding are merged into the block diagram of the situation of a bit stream.

Implement optimal mode of the present invention

Describe the present invention in detail referring now to accompanying drawing, represented in the accompanying drawings exemplary embodiment of the present invention.

Can be applied to object-based audio frequency according to a kind of audio coding method of the present invention and device and a kind of audio-frequency decoding method and device and process operation, but the present invention is not limited to this.In other words, this audio coding method and device and audio-frequency decoding method and device also can be applied to the various signal processing operations outside the object-based audio frequency processing operation.

Fig. 1 is the block scheme of typical object-based audio coding/decoding system.As a rule, the sound signal that inputs to object-based audio coding apparatus is not corresponding with the sound channel of multi-channel signal, and these sound signals are object signal independently.In this, object-based audio coding apparatus is different from the multi-channel audio coding device, and its difference is the sound channel signal of multi-channel audio coding device input multi-channel signal.

For instance, be imported in the multi-channel audio signal such as the left front sound channel signal of 5.1 sound channel signals and the sound channel signal the right front channels signal, yet the object signal of the little main body of the ratio sound channel signal such as people's voice or musical instrument sound (for example sound of violin or piano) can be imported in the object-based audio coding apparatus.

Referring to Fig. 1, this object-based audio coding/decoding system comprises: object-based audio coding apparatus and object-based audio decoding apparatus.Object-based audio coding apparatus comprises object encoder 100, and object-based audio decoding apparatus comprises object decoder 111 and mixer/renderer 113.

Object encoder 100 receives N object signal, and generate object-based reduction audio signal and side information with one or more sound channels, described side information comprises a plurality of information of extracting, for example energy difference information, phase information and correlation information from N object audio signal.Side information and object-based reduction audio signal are merged into a single bit stream, and this bit stream is transferred to object-based decoding device.

Side information can comprise and indicates whether to carry out based on the audio coding/decoding of sound channel or the sign of object-based audio coding/decoding, then, can determine to carry out according to the sign of side information and still carry out object-based audio coding/decoding based on the audio coding/decoding of sound channel.Side information also can comprise energy information about object signal, grouping information, repose period information, reduction audio mixing gain information and deferred message.

Side information and object-based reduction audio signal can be integrated in the individual bit stream, and this bit stream can be transferred to object-based audio decoding apparatus.

Object decoder 111 receives object-based reduction audio signal and the side information from object-based audio coding apparatus, and recovers to have object signal with N object signal like attribute according to object-based reduction audio signal and side information.The object signal that is generated by object decoder 111 is not assigned to any position in the multichannel space, be that each of mixer/renderer 113 object signal that will be generated by object decoder 111 is assigned to the precalculated position in the multichannel space, and the level of definite object signal like this can be by reproducing object signal by each relevant position of mixer/renderer 113 appointments and each corresponding level of being determined by mixer/renderer 113.The control information relevant with each object signal that is generated by object decoder 111 can change in time, then, can be changed according to control information by locus and the level of the object signal of object decoder 111 generations.

Fig. 2 is the block scheme according to the audio decoding apparatus 120 of first embodiment of the invention.Referring to Fig. 2, this audio decoding apparatus 120 can be carried out adaptive decoding by analysis and Control information.

Referring to Fig. 2, this audio decoding apparatus 120 comprises: object decoder 121, mixer/renderer 123, and parametric converter 125.This audio decoding apparatus 120 also comprises the demodulation multiplexer (not shown), be used for extracting reduction audio signal and side information from the bit stream of input, and this demodulation multiplexer will be applied in all audio decoding apparatus according to other embodiments of the invention.

Object decoder 121 generates a plurality of object signal according to the reduction audio signal with by the amended side information that parametric converter 125 provides.Mixer/renderer 123 is assigned to precalculated position in the multichannel space according to control information with each of the object signal that generated by object decoder 121, and determines the level of the object signal that generated by object decoder 121.Parametric converter 125 generates amended side information by merging side information and control information.Then, parametric converter 125 is transferred to object decoder 121 with amended side information.

Object decoder 121 can be carried out adaptive decoding by the control information in the side information after the analysis modify.

For instance, if control information is indicated the first object signal and second object signal to be assigned to the identical position in the multichannel space and is had identical level, typical audio decoding apparatus first and second object signal of can decoding respectively are then by audio mixing/play up operation they are arranged in the multichannel space.

On the other hand, learning in the control information of object decoder 121 from amended side information of described audio decoding apparatus 120 that the first and second object signal are assigned to the same position in the multichannel space and have same level, is independent sound sources as the first and second object signal.Thereby object decoder 121 is regarded the first and second object signal as an independent sound source and first and second object signal of decoding, and not with they separately decodings.Like this, the reduced complexity of decoding.In addition, because need the quantity of sound source to be processed to reduce, the complexity of audio mixing/play up has also reduced.

Audio decoding apparatus 120 can be used effectively in the quantity of working as object signal greater than this situation of the quantity of output channels, because a plurality of object signal probably is assigned to identical locus.

Optionally, audio decoding apparatus 120 can be used in when the first object signal and second object signal and be assigned to same position in the multichannel space, but has this situation of varying level.In this case, audio decoding apparatus 120 is considered as single signal first and second object signal of decoding with the first and second object signal, and first and second object signal of not decoding respectively, and decoded the first and second object signal are transferred to mixer/renderer 123.More particularly, the control information of object decoder 121 from amended side information obtains the information about the difference between the level of the first and second object signal, and according to the information that obtains first and second object signal of decoding.Like this, even the first and second object signal have varying level, also the first and second object signal can be decoded as the single sound source.

Equally optionally, object decoder 121 can be adjusted according to control information the level of the object signal that is generated by object decoder 121.Then, object decoder 121 decodable codes are adjusted the object signal of over level.Thereby mixer/renderer 123 does not need to adjust the level of the decoded object signal that is provided by object decoder 121, and as long as will be arranged in the multichannel space by the decoded object signal that object decoder 121 provides simply.In brief, because object decoder 121 has been adjusted the level of the object signal that is generated by object decoder 121 according to control information, mixer/renderer 123 can easily be arranged into the object signal that is generated by object decoder 121 in the multichannel space, and does not need the extra level of adjusting the object signal that is generated by object decoder 121.Therefore, can reduce the complexity of audio mixing/play up.

According to the embodiment of Fig. 2, the object decoder of audio decoding apparatus 120 can be carried out decode operation adaptively by the analysis to control information, thereby has reduced the complexity of the complexity of decoding and audio mixing/play up.Can use the merging of the described method of being carried out by audio decoding apparatus 120.

Fig. 3 is the block scheme according to the audio decoding apparatus 130 of second embodiment of the invention.Referring to Fig. 3, this audio decoding apparatus 130 comprises object decoder 131 and mixer/renderer 133.This audio decoding apparatus 130 is characterised in that: it not only provides side information to object decoder 131, also provides side information to mixer/renderer 133.

Even when the object signal that exists corresponding to repose period, audio decoding apparatus 130 also can be carried out decode operation effectively.For instance, the second to the 4th object signal may be corresponding to the musical performance phase of instrument playing, the first object signal may be corresponding to quiet (mute) phase of the musical performance of only having powerful connections, and static (silent) phase that may play corresponding to accompaniment of the first object signal.In this case, indicate in a plurality of object signal which can be included in the side information corresponding to the information of repose period, and this side information can be provided for mixer/renderer 133 and object decoder 131.

Object decoder 131 can be by to not decoding to minimize decoding complex degree corresponding to the object signal of repose period.131 1 object signal of object decoder are set to corresponding to 0 value, and with the level transmissions of this object signal to mixer/renderer 133.In general, the object signal with 0 value is regarded as identical with the object signal with non-zero value, and enters together audio mixing/play up operation.

On the other hand, audio decoding apparatus 130 transmission comprise which signal in a plurality of object signal of indication be side information corresponding to the information of repose period to mixer/renderer 133, the audio mixing that then can stop object signal corresponding to repose period to enter to be carried out by mixer/renderer 133/play up operation.Therefore, audio decoding apparatus 130 can stop the unnecessary increase of the complexity of audio mixing/play up.

Fig. 4 is the block scheme according to the audio decoding apparatus 140 of third embodiment of the invention.Referring to Fig. 4, these audio decoding apparatus 140 usefulness multi-channel decoders 141 replace object decoder and mixer/renderer, and a plurality of object signal of decoding after object signal suitably is arranged in the multichannel space.

Specifically, audio decoding apparatus 140 comprises multi-channel decoder 141 and parametric converter 145.Multi-channel decoder 141 generates multi-channel signal, the object signal of these multi-channel signals has been arranged in the multichannel space according to reduction audio signal and spatial parameter information, and this spatial parameter information is the parameter information based on sound channel that is provided by parametric converter 145.Parametric converter 145 is analyzed by next side information and the control information of audio coding apparatus (not shown) transmission, and according to the as a result span parameter information of analyzing.More specifically, parametric converter 145 generates spatial parameter information by merging side information and control information, and this control information comprises configuration information and the audio mixing information reproduced.That is to say that it is spatial data corresponding to one to two (OTT) box or two to three (TTT) box that parametric converter 145 is carried out combined transformation with side information and control information.

Audio decoding apparatus 140 can be carried out multi-channel decoding operation, and wherein object-based decode operation and audio mixing/play up operation is merged, and can skip the decoding to each object signal.Therefore, can reduce the complexity of decoding and/or audio mixing/play up.

For instance, when the multi-channel signal that reproduces 10 object signal with 5.1 channel loudspeaker systems and obtain according to these 10 object signal, typical object-based audio decoding apparatus becomes to correspond respectively to the decoded signal of these 10 object signal next life according to reduction audio signal and side information, and generate 5.1 sound channel signals in the multichannel space by these 10 object signal suitably are arranged into, then these object signal become and are suitable for 5.1 channel loudspeaker environment.Yet between 5.1 sound channel signal generations, the efficient that generates 10 object signal is very low, and becomes more serious during poor the increasing of this problem between the number of channels of the quantity of object signal and the multi-channel signal that will generate.

On the other hand, according to embodiment shown in Figure 4, audio decoding apparatus 140 generates the spatial parameter information that is suitable for 5.1 sound channel signals according to side information and control information, and spatial parameter information and reduction audio signal are offered multi-channel decoder 141.Then, multi-channel decoder 141 generates 5.1 sound channel signals according to spatial parameter information and reduction audio signal.In other words, when the quantity of the sound channel that will export is 5.1 sound channel, audio decoding apparatus 140 can easily generate 5.1 sound channel signals according to the reduction audio signal, and do not need to generate 10 object signal, then this audio decoding apparatus with respect to common audio decoding apparatus more efficient aspect the complexity.

When calculating the calculated amount required corresponding to the spatial parameter information of each OTT box and TTT box when carrying out audio mixing/play up operate required calculated amount after each object signal of decoding by analyzing the side information that come by the audio coding apparatus transmission and control information, this audio decoding apparatus 140 is more efficient.

Come the module that is used for span parameter information is joined typical multichannel audio decoding device by analyzing side information and control information, this audio decoding apparatus 140 can be easily obtained, and the compatibility with typical multichannel audio decoding device can be kept.Same, audio decoding apparatus 140 can improve sound quality by the existing instrument with typical multi-channel decoding device, and such as the envelope shaping device, the subband time domain is processed (STP) instrument and decorrelator.By described content, can infer that all advantages of typical multichannel audio coding/decoding method all can be easily applied to object-based audio-frequency decoding method.

The spatial parameter information that is transferred to multi-channel decoder 141 by parametric converter 145 can be compressed to be suitable for transmission.Optionally, this spatial parameter information can have the form the same with the data of being transmitted by typical multi-channel encoder device.That is to say that spatial parameter information can be carried out Hofmann decoding operation or pilot tone decode operation, and can be used as unpressed spatial cues data (spatial cue data) and be transferred to each module.Front a kind of mode is suitable for the multichannel audio decoding device of spatial parameter communication to remote control position, rear a kind of mode is also very convenient, because do not need the multichannel audio decoding device spatial cues data of compression to be transformed into the unpressed spatial cues data of using easily in decode operation.

May cause postponing according to the configuration to the spatial parameter information of the analysis of side information and control information.In order to compensate this delay, can provide an extra impact damper to the reduction audio signal, can compensate like this delay between reduction audio signal and the bit stream.Optionally, can provide an extra impact damper to the spatial parameter information that obtains from control information, like this can the compensation space parameter information and bit stream between delay.Yet these methods are inconvenient, because extra impact damper need to be provided.Optionally, side information can be transmitted before the reduction audio signal, and it has considered the delay between contingent reduction audio signal and the spatial parameter information.In this case, the spatial parameter information that obtains by merging side information and control information does not need to be adjusted again and can easily be used.

If a plurality of object signal of reduction audio signal have varying level, any reduction audio mixing gain (ADG) module of energy direct compensation reduction audio signal can be determined the associated level of object signal, and can use such as levels of channels poor (CLD) information, the spatial cues data of (ICC) information of correlativity between sound channel and sound channel predictive coefficient (CPC) information and so on are assigned to precalculated position in the multichannel space with each object signal.

For instance, if predetermine one signal of control information indication will be assigned to the precalculated position in the multichannel space, and the level of this object signal is higher than other object signal, typical multi-channel decoder can calculate poor between the channel energies of reduction audio signal, and will reduce audio signal according to the result who calculates and be divided into some output channels.Yet, the volume that typical multi-channel decoder can not increase or reduce to reduce sound in the audio signal.In other words, typical multi-channel decoder will reduce simply audio signal and distribute to some output channels, and not increase or reduce to reduce the volume of sound in the audio signal.

Each precalculated position that is assigned in the multichannel space of a plurality of object signal of the reduction audio signal that will be generated by object encoder according to control information also is relatively very simple.Yet, increase or the amplitude that reduces the predetermine one signal needs special technology.In other words, if use the reduction audio signal that is generated by object encoder, the amplitude that reduces to reduce each object signal of audio signal is difficult.

Therefore, according to one embodiment of the invention, can change with ADG module 147 as shown in Figure 5 the relative amplitude of object signal according to control information.This ADG module 147 can be installed in the multi-channel decoder 141, or is separated with multi-channel decoder 141.

If suitably adjust the relative amplitude of object signal of reduction audio signal with ADG module 147, then can carry out the object decoding with typical multi-channel decoder.If the reduction audio signal that is generated by object encoder is monophony or stereophonic signal or multi-channel signal with three or more sound channels, this reduction audio signal can be processed by ADG module 147.If the reduction audio signal that is generated by object encoder has two or more sound channels, and need to be existed only in by the predetermine one signal that ADG module 147 is adjusted in the sound channel in the reduction audio signal, then ADG module 147 can only be applied to comprising the sound channel of this predetermine one signal, rather than is applied to reduce all sound channels of audio signal.Reduction audio signal after being processed by described method by ADG module 147 can be processed with typical multi-channel decoder at an easy rate, and does not need to revise the structure of multi-channel decoder.

Even when the signal of final output is not the multi-channel signal that can be reproduced by multi-channel loudspeaker, but binaural signal also can use ADG module 147 to adjust the relative amplitude of the object signal of final output signal.

As using substituting of ADG module 147, between the generation of a plurality of object signal, can comprise in the control information that appointment will be applied to the gain information of the yield value of each object signal.For this reason, revise possibly the structure of typical multi-channel decoder.Even need to revise the structure of existing multi-channel decoder, during decode operation, by yield value being applied to each object signal, and do not need to calculate ADG and each object signal of compensation, the method is reducing aspect the decoding complex degree still very easily.

ADG module 147 can not only be used to adjust the level of object signal, also can be used to revise the spectrum information of special object signal.More specifically, ADG module 147 can not only be used to increase or reduce the level of special object signal, also can be used to revise the spectrum information of special object signal, for example amplifies high pitch or the bass part of special object signal.Can not use ADG module 147 and revise spectrum information.

Fig. 6 is the block scheme according to the audio decoding apparatus 150 of fourth embodiment of the invention.Referring to Fig. 6, this audio decoding apparatus 150 comprises multichannel ears demoder 151, the first parametric converters 157, and the second parametric converter 159.

The second parametric converter 159 is provided by side information and the control information that is provided by audio coding apparatus, and comes the configuration space parameter information according to analysis result.The first parametric converter 157 is by increasing three-dimensional (3D) information, and for example a related transfer function (HRTF) parameter is to spatial parameter information, and disposing can be by virtual three-dimensional (3D) parameter information of multichannel ears demoder 151 uses.Multichannel ears demoder 151 generates binaural signal by the ears parameter information being used to the reduction audio signal.

The first parametric converter 157 and the second parametric converter 159 can be replaced by an independent module, it is parameter transformation module 155, it receives side information, control information and 3D information, and disposes the ears parameter information according to side information, control information and HRTF parameter.

As a rule, the binaural signal that comprises the reduction audio signal of 10 object signal in order to use headphone generate to be used for to reproduce, object signal must generate respectively 10 decoded signals corresponding to 10 object signal according to reduction audio signal and side information.Thereafter, mixer/renderer is assigned to precalculated position in the multichannel space to be suitable for 5 channel loudspeaker environment with reference to control information with each of 10 object signal.Thereafter, mixer/renderer generates 5 sound channel signals that can use 5 channel loudspeakers to reproduce.Thereafter, mixer/renderer is applied to 3D information in 5 sound channel signals, thereby generates 2 sound channel signals.In brief, described common audio-frequency decoding method comprises: reproduce 10 object signal, these 10 object signal are converted to 5 sound channel signals, and generate 2 sound channel signals according to 5 sound channel signals, as seen its efficient is very low.

On the other hand, audio decoding apparatus 150 can easily generate according to object signal the binaural signal that can use headphone to reproduce.In addition, audio decoding apparatus 150 comes the configuration space parameter information by the analysis to side information and control information, and generates binaural signal with typical multichannel ears demoder.And, even if when being equipped with integrated parametric converter, audio decoding apparatus 150 still can use typical multichannel ears demoder, this parametric converter receives side information, control information and HRTF parameter, and disposes the ears parameter information according to side information, control information and HRTF parameter.

Fig. 7 is the block scheme according to the audio decoding apparatus 160 of fifth embodiment of the invention.Referring to Fig. 7, audio decoding apparatus 160 comprises pretreater 161, multi-channel decoder 163, and parametric converter 165.

Parametric converter 165 generates the spatial parameter information that can be used by multi-channel decoder 163, and the parameter information that uses of pretreated device 161.The pretreatment operation that pretreater 161 is carried out the reduction audio signal, and transmission pretreatment operation result's reduction audio signal is to multi-channel decoder 163.163 pairs of reduction audio signal of being come by pretreater 161 transmission of multi-channel decoder are carried out decode operation, thus output stereophonic signal, ears stereophonic signal or multi-channel signal.Example by the performed pretreatment operation of pretreater 161 comprises: revise in time domain or frequency domain or conversion reduction audio signal by filtering.

If the reduction audio signal that is input in the audio decoding apparatus 160 is stereophonic signal, before this reduction audio signal is imported into multi-channel decoder 163, this reduction audio signal can be used to the reduction audio mixing pre-service carried out by pretreater 161, because multi-channel decoder 163 can not will be mapped to corresponding to the object signal of the L channel of stereo reduction audio signal the R channel of multi-channel signal by decoding.Therefore, for the object signal that can will belong to the L channel of stereo reduction audio signal is transferred on the R channel, this stereo reduction audio signal may need pretreated device 161 pre-service, and pretreated reduction audio signal can be input to multi-channel decoder 163.

Can be according to from side information with carry out the pre-service of stereo reduction audio signal from the pretreatment information that control information obtains.

Fig. 8 is the block scheme according to the audio decoding apparatus 170 of sixth embodiment of the invention.Referring to Fig. 8, this audio decoding apparatus 170 comprises multi-channel decoder 171, preprocessor 173 and parametric converter 175.

Parametric converter 175 generates the spatial parameter information that can be used by multi-channel decoder 171, and can be post-treated the parameter information that device 173 uses.Preprocessor 173 is carried out the aftertreatment to the signal of being exported by multi-channel decoder 171.The example of the signal that multi-channel decoder 171 is exported comprises: stereophonic signal, ears stereophonic signal and multi-channel signal.

The example of the post-processing operation that preprocessor 173 is performed comprises: revise or each sound channel or all sound channels of conversion output signal.For instance, if side information comprises the basic frequency information about the predetermine one signal, preprocessor 173 can be removed harmonic component with reference to this basic frequency information from the predetermine one signal.The multichannel audio coding/decoding method may be efficient not for karaoke OK system.Yet if be included in the side information about the basic frequency information of voice object signal, and the harmonic component of voice object signal is removed during post-processing operation, can realize high performance karaoke OK system by the embodiment that uses Fig. 8.The embodiment of Fig. 8 also can be applicable to the object signal except the voice object signal.For instance, can use the embodiment of Fig. 8 to remove the sound of being scheduled to musical instrument.Equally, can use with the embodiment of Fig. 8 about the basic frequency information of object signal and amplify predetermined harmonic component.In brief, post-treatment parameters can be realized the application of the multi-effect that can't be carried out by multi-channel decoder 171, the amplification of the insertion of the effect that for example echoes, the adding of noise and bass part.

Preprocessor 173 can directly be used an extra effect to the reduction audio signal, maybe will reduce audio signal and be increased to the output of the multi-channel decoder 171 of effect.Frequency spectrum or modification reduction audio signal that preprocessor 173 can in officely be what is the need for and be changed object when wanting.If directly the implementation effect processing operates (such as to reducing the reverberation of audio signal) and the signal that the effect process operation obtains is transferred to multi-channel decoder 171 is not very suitable, preprocessor 173 can join the signal that obtains through the effect process operation output of multi-channel decoder 171, to replace being transferred to multi-channel decoder 171 to the direct implementation effect processing of reduction audio signal and the result of effect process.

Fig. 9 is the block scheme according to the audio decoding apparatus 180 of seventh embodiment of the invention.Referring to Fig. 9, audio decoding apparatus 180 comprises pretreater 181, multi-channel decoder 183, preprocessor 185 and parametric converter 187.

Explanation to described pretreater 161 can directly apply to pretreater 181.Preprocessor 185 can be used to the output of the output of pretreater 181 and multi-channel decoder 183 is added together, so that final signal to be provided.In this case, preprocessor 185 has adopted a totalizer simply, is used for adding signal.Can provide effect parameter to any one application with implementation effect in pretreater 181 and the preprocessor 185.In addition, the adding of the signal that obtains by the output of effects applications being given reduction audio signal and multi-channel decoder 183, and the output of effects applications to multi-channel decoder 183 can be performed simultaneously.

Pretreater 161 among Fig. 7 and Fig. 9 and 181 can be carried out playing up for the reduction audio signal according to customer-furnished control information.In addition, the pretreater 161 among Fig. 7 and Fig. 9 and 181 can increase or reduce the level of object signal and the frequency spectrum of change object signal.In this case, the pretreater 161 among Fig. 7 and Fig. 9 and 181 can be carried out the function of ADG module.

Change according to the frequency spectrum of the adjustment of the playing up of the object signal of object signal directional information, object signal level and object signal can be performed simultaneously.In addition, can be by carry out some playing up according to the object signal of object signal directional information with

pretreater

161 or 181, the change of the adjustment of some object signal level and the frequency spectrum of some object signal, and any not by

pretreater

161 or 181 carry out according to the playing up of the object signal of object signal directional information, the change of the adjustment of object signal level and the frequency spectrum of object signal can be carried out with the ADG module.For instance, be poor efficiency by the frequency spectrum that changes object signal with the ADG module, because the ADG module is used quantization level spacing and parameter band interval.In this case, can come accurately to change the frequency spectrum of object signal to frequency one by one with

pretreater

161 or 181, and adjust the level of object signal with the ADG module.

Figure 10 is the block scheme according to the audio decoding apparatus of eighth embodiment of the invention.Referring to Figure 10, this audio decoding apparatus 200 comprises plays up matrix maker 201, code converter 203, multi-channel decoder 205, pretreater 207, surround processor 208 and totalizer 209.

Play up matrix maker 201 and generate and play up matrix, its expression is about the object location information of the position of object signal, and about the reproduction configuration information of object signal level, and will play up matrix and offer code converter 203.Play up matrix maker 201 and generate 3D information, for example HRTF coefficient according to object location information.HRTF describes the sound source of optional position and the transition function of the sonic transmissions between the ear-drum, and returns the value that height and direction according to sound source change.If filter the signal that does not have directivity with HRTF, it seems to be reproduced equally from specific direction that this signal sounds.

Can change in time by playing up the object location information that matrix maker 201 receives and reproducing configuration information, and can be provided by the terminal user.

Code converter according to object-based side information, play up matrix and 3D information and generate side information based on sound channel, and multi-channel decoder 205 necessary side information and 3D information based on sound channel are offered multi-channel decoder 205.That is to say, the side information based on sound channel about M sound channel that code converter 203 transmission obtains from the object-based parameter information about N object signal, and each 3D information of N object signal is to multi-channel decoder 205.

Multi-channel decoder 205 generates multi-channel audio signal according to the reduction audio signal with by the side information based on sound channel that code converter provides, and according to 3D information multi-channel audio signal is carried out 3D and play up, thereby generates the 3D multi-channel signal.Play up matrix maker 201 and can comprise 3D information database (not shown).

If need to will reduce audio signal input to multi-channel decoder 205 before pre-service reduction audio signal, code converter 203 transmission about pretreated information to pretreater 207.Object-based side information comprises the information about all object signal, plays up matrix and comprises object location information and reproduce configuration information.Code converter 203 is according to object-based side information and play up matrix and generate side information based on sound channel, and then generates audio mixing and the necessary side information based on sound channel of regeneration object signal according to this channel information.After this, code converter 203 will be transferred to based on the side information of sound channel multi-channel decoder 205.

Side information and the 3D information based on the sound channel that are provided by code converter 203 can comprise frame index.Therefore, multi-channel decoder 205 can be by coming synchronously side information and the 3D information based on sound channel with frame index, and 3D information only can be applied to the particular frame of bit stream.In addition, even 3D information has been updated, also can be by coming with frame index easily synchronously based on the side information of sound channel and the 3D information after the renewal.That is to say that frame index can be included in respectively in the side information and 3D information based on sound channel, so that multi-channel decoder 205 is synchronously based on side information and the 3D information of sound channel.

If necessary, before the reduction audio signal of input was input to multi-channel decoder 205, pretreater 207 can be carried out pre-service to the reduction audio signal of input.As mentioned above, if the reduction audio signal of input is stereophonic signal, and need to reproduce the object signal that belongs to L channel from R channel, before the reduction audio signal is imported into multi-channel decoder 205, this reduction audio signal need to be carried out the pre-service by pretreater 207 execution, because multi-channel decoder 205 can not be transferred to another sound channel from a sound channel with object signal.Code converter 203 can offer pretreater 207 with the necessary information of reduction audio signal of pre-service input.The reduction audio signal that is obtained by pretreater 207 execution pre-service can be transferred to multi-channel decoder 205.

Surround processor 208 and totalizer 209 can directly apply to extra effect the reduction audio signal, maybe will reduce audio signal and increase output to the multi-channel decoder 205 of having used effect.Frequency spectrum or modification reduction audio signal that surround processor 208 can in officely be what is the need for and be changed object when wanting.If the effect process that the reduction audio signal is directly carried out such as reverberation operates, and it is inappropriate will being transferred to multi-channel decoder 205 by the signal that the effect process operation obtains, surround processor 208 can increase the signal that obtained by effect process operation simply to the output of multichannel processor 205, to replace directly reduction audio signal implementation effect being processed and the result of effect process is transferred to multi-channel decoder 205.

Below will describe in detail by playing up the matrix of playing up that matrix maker 201 generates.

Playing up matrix is the position of indicated object signal and the matrix of reproduction configuration.That is to say, if N object signal and M sound channel are arranged, play up matrix and can indicate and how in every way N object signal to be mapped on M the sound channel.

More specifically, when N object signal is mapped on M the sound channel, can sets up a N*M and play up matrix.In this case, this is played up matrix and comprises that N is capable, and this N is capable to represent respectively N object signal, and the M row, and these M row represent respectively M sound channel.Each of the M of the every delegation in N is a capable coefficient is real number or integer, and its expression is assigned to the object signal part of corresponding sound channel and the ratio of whole object signal.

More specifically, to play up M coefficient of the capable every delegation of N in the matrix be real number to N*M.Then, if playing up M coefficient sum of delegation in the matrix, N*M equals predetermined reference value, and for example 1, its level that can determine object signal does not change.If M coefficient sum is less than 1, its level that can determine object signal has reduced.If M coefficient sum is greater than 1, its level that can determine object signal has increased.The reference value that should be scheduled to can be the numerical value outside 1.The level variable quantity of object signal is limited in the 12dB scope.For instance, if predetermined reference value is 1, and M coefficient sum be 1.5, and its level that can determine object signal has increased 12dB.If predetermined reference value is 1, and M coefficient sum be 0.5, and its level that can determine object signal has reduced 12dB.If predetermined reference value is 1, and M coefficient sum be 0.5 to 1.5, its can determine object signal-12dB and+changed scheduled volume between the 12dB, this scheduled volume can be definite linearly by M coefficient sum.

M the coefficient that N*M plays up the capable every delegation of N in the matrix can be integer.Then, if playing up M coefficient sum of certain delegation in the matrix, N*M equals predetermined reference value, and for example 10,20,30 or 100, its level that can determine object signal does not change.If M coefficient sum is less than predetermined reference value, its level that can determine object signal reduces.If M coefficient sum is greater than predetermined reference value, its level that can determine object signal increases.The level variable quantity of object signal is limited in the scope of 12dB for example.The measures of dispersion of M coefficient sum and predetermined reference value represents the level variable quantity (unit: dB) of object signal.For instance, if M coefficient sum surpasses predetermined reference value 1, its level that can determine object signal has increased 2dB.Therefore, if predetermined reference value is 20, and M coefficient sum be 23, and its level that can determine object signal has increased 6dB.If predetermined reference value is 20, and M coefficient sum be 15, and its level that can determine object signal has reduced 10dB.

For instance, if 6 object signal and 5 sound channels (namely left front (FL) are arranged, right front (FR), middle (C), left back (RL) and right back (RR) sound channel), set up a 6*5 and play up matrix, it has 6 row, correspond respectively to 6 object signal, and 5 row, correspond respectively to 5 sound channels.The coefficient that this 6*5 plays up matrix is integer, and each in 6 object signal of its indication is dispensed on 5 ratios in the sound channel.This 6*5 plays up matrix can have reference value 10.Then, equal 10 if 6*5 plays up 5 coefficient sums of any delegation of 6 row in the matrix, its level that can determine corresponding object signal does not change.6*5 plays up 5 coefficient sums of any delegation of 6 row in the matrix and the measures of dispersion of reference value represents the amount that the level of corresponding object signal changes.For instance, if 6*5 plays up 5 coefficient sums of any delegation of 6 row in the matrix and the difference of reference value is 1, its level that can determine corresponding object signal has changed 2dB.This 6*5 plays up matrix and can be represented by formula (1):

[formula 1]

[\begin{matrix} 3 & 1 & 2 & 2 & 2 \\ 2 & 4 & 3 & 1 & 2 \\ 0 & 0 & 12 & 0 & 0 \\ 7 & 0 & 0 & 0 & 0 \\ 2 & 2 & 2 & 2 & 2 \\ 2 & 1 & 1 & 2 & 1 \end{matrix}]

6*5 referring to formula (1) plays up matrix, and the first row is corresponding to the first object signal, and represents that this first object signal has been assigned to FL, FR, C, the ratio of one of them among RL and the RR.Because the first coefficient of the first row has maximum round values 3, and the coefficient sum of the first row is 10, and it can determine that the first object signal mainly is assigned to the FL sound channel, and the level of the first object signal does not change.Because the second coefficient corresponding to the second row of second object signal has maximum round values 4, and the coefficient sum of the second row is 12, and it can determine that the second object signal mainly is assigned to the FR sound channel, and the level of second object signal has increased 4dB.Because the 3rd coefficient corresponding to the third line of the 3rd object signal has maximum round values 12, and the coefficient sum of the third line is 12, and it can determine that the 3rd object signal only is assigned to the C sound channel, and the level of the 3rd object signal has increased 4dB.Because all coefficients corresponding to the fifth line of the 5th object signal have identical round values 2, and the coefficient sum of fifth line is 10, can determine that the 5th object signal distributed to FL fifty-fifty, FR, C, RL and RR sound channel, and the level of the 5th object signal does not change.

Optionally, when N object signal is mapped in M the sound channel, sets up a N* (M+1) and play up matrix.This N* (M+1) plays up matrix and N*M, and to play up matrix closely similar.More specifically, play up in the matrix at N* (M+1), as playing up in the matrix at N*M, first to M coefficient of the every delegation during N is capable represents to be assigned to FL, FR, C, the ratio of the corresponding object signal in RL and the RR sound channel.Yet N* (M+1) plays up matrix and plays up from N*M that matrix is different to be, it has extra row (namely (M+1) row), is used for the level of indicated object signal.

N* (M+1) plays up matrix and is different from N*M and plays up matrix, and its indication is distribution object signal between M sound channel how, and whether the level of independent denoted object signal changes.Then, play up matrix by using N* (M+1), it can easily obtain the information about the variation of the level in any object signal, and does not need extra calculating.Almost to play up matrix identical with N*M because N* (M+1) plays up matrix, and this N* (M+1) plays up matrix and can easily be converted into N*M and play up matrix and do not need extra calculating, and vice versa.

Equally optionally, when N object signal is mapped in M the sound channel, sets up a N*2 and play up matrix.This N*2 plays up the angle position of the first row denoted object signal of matrix, and the possible level of each of secondary series denoted object signal changes.This N*2 plays up matrix can come the indicated object signal with the regular interval of 1 or 3 degree in the scope of 0-360 degree angle position.The object signal of mean allocation can be represented by predetermined value in all directions, rather than use angle represents.

This N*2 plays up matrix and can be converted into N*3 and play up matrix, and this N*3 plays up the not only 2D direction of denoted object signal of matrix, the 3D direction of going back the denoted object signal.More specifically, the N*3 secondary series of playing up matrix can be used to the 3D direction of denoted object signal.The 3rd row that N*3 plays up matrix are used and are played up the employed identical method of matrix with N*M and indicate the possible level of each object signal to change.If the final reproduction mode of object decoder is that ears are stereo, play up matrix maker 201 can transmit each object signal of indication the position 3D information or corresponding to the index of 3D information.Under latter event, code converter 203 may need to obtain corresponding to by the 3D information of playing up the index that matrix maker 201 transmits.In addition, if indicate the 3D information of position of each object signal received from playing up matrix maker 201, code converter 203 can be according to the 3D information that receives, play up matrix and object-based side information calculates the 3D information that can be used by multi-channel decoder 205.

Playing up matrix and 3D information can be according to by the terminal user adaptive change being carried out in object location information and the modification that the reproduction configuration information is made in real time.Therefore, in playing up matrix and 3D information about playing up whether matrix and 3D information have been upgraded and the message upgraded is transferred to code converter 203 with time interval of rule, this time interval for example is 0.5 second.Then, if detect the renewal of playing up in matrix and the 3D information, code converter 203 can and existingly be played up matrix and the linear transformation of existing 3D information and executing to the renewal that receives, supposes that this plays up the in time linear variation of matrix and this 3D information.

When playing up matrix and 3D information and be transferred to code converter 203, if object location information and reproduce configuration information and do not revised by the terminal user, matrix is played up in indication and 3D information not have the information of change can be transferred to code converter 203.On the other hand, when playing up matrix and 3D information and be transferred to code converter 203, matrix and 3D information have changed and the information upgraded can be transferred to code converter 203 if object location information and reproduce configuration information and revised by the terminal user, the indication in playing up matrix and 3D information are played up.Renewal and the renewal in the 3D information of playing up in the matrix more specifically, can be transferred to respectively code converter 203.Renewal and/or the renewal in the 3D information of optionally, playing up in the matrix can jointly be represented by a predetermined typical value.Then, this predetermined typical value can be with this predetermined typical value of indication corresponding to playing up the renewal in the matrix or being transferred to code converter 203 corresponding to the information of the renewal in the 3D information.By such mode, its easily information code converter 203 play up matrix and whether 3D information has renewal.

Play up matrix class like the N*M represented with formula (1) plays up matrix and can comprise extra row, comes the 3D directional information of indicated object signal.In this case, these extra row can be in-90 degree 3D directional information of indicated object signal to the angular ranges of+90 degree.These extra row not only can be provided for the N*M matrix, can also be provided for N* (M+1) and play up matrix and N*2 matrix.The 3D directional information of object signal is not to use in the normal decoder pattern of multi-channel decoder.Yet the 3D directional information of object signal must be used in the ears pattern of multi-channel decoder.The 3D directional information of this object signal can be transmitted with playing up matrix.Optionally, the 3D directional information of object signal can be transmitted with 3D information.In ears mode decoding operating period, the 3D directional information of this object signal does not affect the side information based on sound channel, but affects 3D information.

Can be used as about the information of locus and object signal level and to play up matrix and be provided.Optionally, can be represented by the modification of the frequency spectrum of object signal about the information of locus and object signal level, for example strengthen bass part or the high pitch part of object signal.In this case, can be used as level in employed each parameter band by the multichannel codec about the information of the modification of the frequency spectrum of object signal changes and is transmitted.If the modification of the frequency spectrum of terminal user's control object signal can be used as and plays up the spectral matrix that matrix separates and be transmitted about the information of the modification of the frequency spectrum of object signal.How many object signal this spectral matrix has how many row are just arranged, and has how many parameters how many row are just arranged.Each coefficient of this spectral matrix represents about the whole information of the charged Heibei provincial opera of each parameter.

Below will describe the operation of code converter 203 thereafter.This code converter 203 according to object-based side information, to play up matrix information and 3D information be the side information that multi-channel decoder 205 generates based on sound channel, and the side information that will be somebody's turn to do based on sound channel is transferred to multi-channel decoder 205.In addition, this code converter 203 is that multi-channel decoder 205 generates the 3D information, and with this 3D communication to multi-channel decoder 205.If the reduction audio signal of input is in that to be input to multi-channel decoder 205 front needs pretreated, this code converter 203 can transmit the information of reducing audio signal about this input.

This code converter 203 can receive object-based side information, and this object-based side information indicates a plurality of object signal are how to be included in the reduction audio signal of input.Object-based side information can be by using OTT box and TTT box, and by using CLD, and ICC and CPC information indicate a plurality of object signal are how to be included in the reduction audio signal of input.This object-based side information can provide the explanation of several different methods with each the information of indication about a plurality of object signal, and can how to be included in the side information by the denoted object signal, and these methods can be carried out by object encoder.

In the TTT of multichannel codec box situation, L, C and R signal can or be expanded audio mixing to L and R signal by the reduction audio mixing.In this case, the C signal can share some bit of L and R signal.Yet this is infrequent in the situation of reduction audio mixing or expansion audio mixing object signal.Therefore, the OTT box is used to carry out expansion audio mixing or the reduction audio mixing of object coding/decoding more widely.Even the C signal comprises the independent signal component except L and R signal section, the TTT box can be used to carry out expansion audio mixing or the reduction audio mixing of object coding/decoding.

For instance, as shown in figure 11, if 6 object signal are arranged, these 6 object signal can be converted to the reduction audio signal by the OTT box, can obtain information about each object signal with the OTT box.

Referring to Figure 11,6 object signal can represent by a reduction audio signal with by the integrally provided information (for example, CLD and ICC information) of 5 OTT boxes 211,213,215,217 and 219.Structure shown in Figure 11 can change in every way.That is to say that referring to Figure 11, an OTT box 211 can receive two in 6 object signal.In addition, OTT box 211,213,215,217 and 219 classification method of attachment can arbitrarily change.Therefore, side information can comprise indication OTT box 211,213,215,217 and be connected the graded-structure information how classification connects, and indicate each object signal to be input to the input position information of which OTT box.If OTT box 211,213,215,217 and 219 forms any tree structure, the method for employed this any tree structure of expression of multichannel codec can be used to indicate this graded-structure information.In addition, this input position information can be indicated by variety of way.

Side information can also comprise the information about the quiet phase of each object signal.In this case, OTT box 211,213,215,217 and 219 in time adaptive change of tree structure.For instance, referring to Figure 11, (OBJECT1) is quiet when the first object signal, and be dispensable about the information of an OTT box 211, and only have second object signal (OBJECT2) to be input in the 4th OTT box 217.Then, correspondingly change the tree structure of OTT box 211,213,215,217 and 219.Then, the information about the variation of the tree structure of OTT box 211,213,215,217 and 219 can be included in the side information.

If predetermined object signal is quiet, can provide indication not use information corresponding to the OTT box of predetermine one signal, and indication does not have the information of the clue that the OTT box can use.In this case, it can not reduce the size of side information by do not comprise the information about the OTT box that is not used or TTT box in side information.Even the tree structure of a plurality of OTT or TTT box has been modified, it can be that quiet information easily determines to open or close which OTT or TTT box according to which object signal of expression.Therefore, do not need to transmit continually possible about the information the revised tree structure to OTT or TTT box.On the contrary, indicate the information of quiet object signal to be transmitted.Then, demoder can determine easily which part of the tree structure of OTT or TTT box need to be modified.Therefore, it can minimize the size of the information that need to be transferred to demoder.In addition, it can easily transmit clue about object signal to demoder.

Figure 12 is for explaining how a plurality of object signal are included in the block diagram of reduction audio signal.In the embodiment of Figure 11, it has adopted a kind of OTT box structure of multichannel codec.Yet, in the embodiment of Figure 12, used a kind of distortion of OTT box structure of multichannel codec.That is to say that referring to Figure 12, a plurality of object signal are imported in each box, and only generate at last a reduction audio signal.Referring to Figure 12, about a plurality of object signal each information can by each object signal recently the representing of total energy magnitude of energy level (energy level) and object signal.Yet along with the increase of object signal quantity, the energy level of each object signal has reduced with the ratio of the total energy magnitude of object signal.In order to overcome this problem, an object signal (hereinafter referred to as the highest energy object signal) that in the preset parameter band, has the highest energy level in a plurality of object signal of search, and provide the ratio of energy level with the energy level of highest energy object signal of other object signal (hereinafter referred to as non-highest energy object signal), with as the information about each object signal.In this case, in case the information of the absolute value of the energy level of given indication highest energy object signal and highest energy object signal just can easily be determined the energy level of the object signal of other non-highest energy.

The energy level of the object signal of highest energy in multipoint control unit (MCU), carry out a plurality of bit streams are merged to individual bit stream is essential.Yet, in most of the cases, the energy level of highest energy object signal is optional, because can easily obtain from the energy level of other non-highest energy object signal the absolute value of the energy level of highest energy object signal with the ratio of the energy level of highest energy object signal.

For instance, suppose the object signal A that has 4 to belong to the preset parameter band, B, C and D, and object signal A is the highest energy object signal.Then, the ENERGY E of preset parameter band _PAbsolute value E with the energy level of object signal A _ASatisfy formula (2):

[formula 2]

E _p＝E _A+(a+b+c)E _A

E_{A} = \frac{E_{p}}{1 + a + b + c}

A wherein, b and c be indicated object signal B respectively, the ratio of the energy level of C and D and the energy level of object signal A.Referring to formula (2), it can be according to ratio a, the ENERGY E of b and c and preset parameter band _PCome the absolute value E of the energy level of calculating object signal A _ATherefore, unless need to a plurality of bit streams be merged in the individual bit stream absolute value E of the energy level of object signal A with MCU _ADo not need to be included in this bit stream.The absolute value E of the energy level of denoted object signal A _AThe information that whether is included in the bit stream can be included in the head of bit stream, thereby has reduced the size of bit stream.

On the other hand, if need to use MCU that a plurality of bit streams are incorporated in the independent bit stream, then the energy level of highest energy object signal is exactly essential.In this case, the energy level sum that recently calculates according to the energy level of non-highest energy object signal and the energy level of the object signal of highest energy may be different with the energy level of the reduction audio signal that obtains by all object signal of reduction audio mixing.For instance, when the energy level of reduction audio signal was 100, the energy level sum that calculates may be 98 or 103, and this is owing to for example causing in the mistake that quantizes to cause during the reconciliation quantization operation.In order to overcome this problem, the difference of the energy level of reduction audio signal and the energy level sum that calculates can be similar to compensation by be multiplied by each energy level that calculates with pre-determined factor.If the energy level of reduction audio signal is X, and the energy level sum that calculates is Y, each energy level that calculates can be multiplied by X/Y.If the difference of the energy level of reducing audio signal and the energy level sum that calculates do not compensated, these quantization errors may be included in parameter band and the frame, thereby cause distorted signals.

Therefore, in predetermined parameter band, the information of indicating in a plurality of object signal which to have maximum energy absolute value is essential.This information can be represented by a plurality of bits.Being used to indicate in a plurality of object signal which in the preset parameter band has the necessary bit number of ceiling capacity absolute value and changes according to the quantity of object signal.Along with the increase of object signal quantity, in the preset parameter band, be used to indicate a plurality of object signal in which have the necessary bit number of ceiling capacity absolute value and also increase.On the other hand, along with the minimizing of object signal quantity, in the preset parameter band, be used to indicate in a plurality of object signal which and have the necessary bit number of ceiling capacity absolute value and also reduce.Predetermined bit number may be divided to be equipped with indication in a plurality of object signal which when the preset parameter band increases and to have the ceiling capacity absolute value in advance.Optionally, can be identified in a plurality of object signal of preset parameter band indicating which according to specific information and have the necessary bit number of ceiling capacity absolute value.

By using for the OTT and/or the employed CLD of TTT box that reduce at the multichannel codec, the identical method of the size of ICC and CPC information, the large I that in a plurality of object signal of each parameter band indicating which has the information of ceiling capacity absolute value is reduced, for example, by service time difference method, frequency differential method or pilot tone decoding method.

For which of indicating a plurality of object signal in each parameter band has the ceiling capacity absolute value, can use the huffman table of optimization.The energy level that in this case, may need the denoted object signal in what order with the information of the ratio of the energy level of the object signal with highest energy absolute value.For instance, if 5 object signal (namely the first to the 5th object signal) are arranged, and the 3rd object signal is the highest energy object signal, and the information about the 3rd object signal may be provided.Then, can provide in every way first, second, the ratio of the energy level of the 4th and the 5th object signal and the energy level of the 3rd object signal, below these modes will be described in further detail.

Can sequentially provide first, second, the ratio of the energy level of the 4th and the 5th object signal and the energy level of the 3rd object signal.The ratio of energy level with the energy level of the 3rd object signal of the 4th, the 5th, first and second object signal optionally, sequentially is provided in the mode of circulation.Then, the indication that provides first, second, the information of the energy level of the 4th and the 5th object signal and the order of the ratio of the energy level of the 3rd object signal can be included in top of file or can be sent out in the interim of a plurality of frames.The multichannel codec can be determined CLD and ICC information according to the serial number of OTT box.Same, the information how indication is mapped to each object signal in the bit stream also is essential.

In the situation of multichannel codec, can be represented by the serial number of OTT or TTT box about the information corresponding to each sound channel.According to a kind of object-based audio coding method, if N object signal arranged, this N object signal may need by proper number.Yet for the terminal user, controlling N object signal with object decoder is essential sometimes.In this case, the terminal user may not only need the serial number of N object signal, also need to for the explanation of this N object signal, for example indicate the first object signal corresponding to female voice, and the second object signal is corresponding to the explanation of piano sound.These explanations of N object signal can be used as in the head that metadata is included in bit stream, and then along with this bit stream is transmitted together.More specifically, these explanations of N object signal can text mode be provided, or by providing with code table or code word.

Information about the correlativity between the object signal also is essential sometimes.For this reason, the correlativity between highest energy object signal and other the non-highest energy object signal can be calculated.In this case, an independent relevance values can be assigned to all object signal, just as use an ICC value in all OTT boxes.

If object signal is stereophonic signal, the when ICC information of the left channel energy of object signal and R channel energy is essential.Can be with coming the left channel energy of calculating object signal and the ratio of R channel energy according to the energy level absolute value of highest energy object signal and the energy level of other non-highest energy object signal with the identical method of the energy level of recently calculating a plurality of object signal of the energy level of highest energy object signal.For instance, if the absolute value of the left side of highest energy object signal and the energy level of R channel is respectively A and B, and the ratio of the energy level of the L channel of non-highest energy object signal and A, and the ratio of the energy level of the R channel of non-highest energy object signal and B is respectively x and y, and the left side of non-highest energy object signal and the energy level of R channel can be calculated by A*x and B*y.In this way, can calculate the L channel of stereo object signal and the ratio of R channel.

When object signal is monophonic signal, also to use the absolute value of the energy level of highest energy object signal, ratio with energy level with the energy level of highest energy object signal of other non-highest energy object signal, the reduction audio signal that is obtained by the monophone object signal is stereophonic signal, and this monophone object signal is included in two sound channels of stereo reduction audio signal.In this case, be included in the energy of the part of each the monophone object signal in the L channel of stereo reduction audio signal, the energy and the correlation information that are included in accordingly the part of the monophone object signal in the R channel that reduces audio signal are necessary, and it is applied directly to stereo object signal.If the monophone object signal is included among the L and R sound channel of stereo reduction audio signal, the L-of monophone object signal and R-channel component may only have level differences, and this monophone object may have from 1 relevance values to whole parameter band.In this case, in order to reduce data volume, the indication monophone object signal that provides that can be extra has from 1 information to the relevance values of whole parameter band.Then, do not need to be each parameter band indication relevance values 1.What substitute is the whole parameter band of relevance values 1 indication.

By during a plurality of object signal being added together to generate the reduction audio signal, slicing (clipping) may occur.In order to address this problem, the predefine gain can be multiply by this reduction audio signal, the maximum level that then should reduce audio signal can surpass the slicing threshold value.But this predefine gain time to time change.Therefore, the information about this predefine gain is essential.If the reduction audio signal is stereophonic signal, in order to prevent slicing, can provide different yield values with the R-sound channel for this L-that reduces audio signal.In order to reduce volume of transmitted data, the different gains value can separately not transmitted.What substitute is transmission different gains value sum, and the ratio of different gains value.Then, be compared to the situation of transmitting respectively the different gains value, it can lower dynamic range and reduce volume of transmitted data.

In order further to reduce volume of transmitted data, can provide a bit to be used to indicate during the total by a plurality of object signal generates the reduction audio signal whether slicing occurs.Then, only when definite slicing occured, yield value just was transmitted.These slicing information are for being essential in order to stop slicing during merging a plurality of reduction audio signal sums of a plurality of bit streams.In order to stop slicing, can multiply by by the inverse of predefined yield value a plurality of reduction audio signal sums to stop slicing.

Figure 13 to 16 is the block diagrams for the whole bag of tricks of explaining the object-based side information of configuration.The embodiment of Figure 13 to 16 not only can be applied to monophone or stereo object signal, also can be applied to the multichannel object signal.

Referring to Figure 13, multichannel object signal (object A (sound channel 1) is to object A (sound channel n)) is imported in the object encoder 221.Then, this object encoder 221 generates reduction audio signal and side information according to multichannel object signal (object A (sound channel 1) is to object A (sound channel n)).Object encoder 223 receives a plurality of object signal objects 1 to object n and the reduction audio signal that generated by object encoder 221, and generates another reduction audio signal and another side information according to object signal object 1 to object N and the reduction audio signal that receives.Multiplexer 225 will be combined by the side information of object encoder 221 generations and the side information that is generated by object encoder 223.

Referring to Figure 14, object encoder 233 generates the first bit stream according to multichannel object signal (object A (sound channel 1) is to object A (sound channel n)).Then, object encoder 231 generates the second bit stream according to a plurality of non-multichannel object signal objects 1 to object n.Then, object encoder 235 merges to an individual bit stream by using for the almost identical method that under helping at MCU a plurality of bit streams is merged to an individual bit stream with the first and second bit streams.

Referring to Figure 15, multi-channel encoder 241 generates the reduction audio signal according to multichannel object signal (object A (sound channel 1) is to object A (sound channel n)) and based on the side information of sound channel.Object encoder 243 receives the reduction audio signal that generated by multi-channel encoder 241 and a plurality of non-multichannel object signal object 1 to object n, and generates an object bit stream and side information according to the reduction audio signal that receives and object signal object 1 to object n.Multiplexer 245 will be by combining based on the side information of sound channel and the side information that is generated by object encoder 243 that multi-channel encoder 241 generates, and the result that merges of output.

Referring to Figure 16, multi-channel encoder 253 generates the reduction audio signal according to multichannel object signal (object A (sound channel 1) is to object A (sound channel n)) and based on the side information of sound channel.Object encoder 251 generates reduction audio signal and side information according to a plurality of non-multichannel object signal objects 1 to object n.Object encoder 255 receives the reduction audio signal that is generated by multi-channel encoder 253 and the reduction audio signal that is generated by object encoder 251, and the reduction audio signal that will receive combines.Multiplexer 257 will be combined by the side information of object encoder 251 generations and the side information that is generated based on sound channel by multi-channel encoder 253, and the result of output merging.

In teleconference, use in the situation of object-based audio coding, sometimes a plurality of object bit streams must be merged into an independent bit stream.Below will describe in detail a plurality of object bit streams will be merged into an independent bit stream.

Figure 17 is the block diagram that merges two object bit streams for explaining.Referring to Figure 17, when two object bit streams are merged into an independent object bit stream, be present in respectively two side informations in the object bit stream, for example CLD and ICC informational needs are modified.Can be simply by using extra OTT box, namely the 11 OTT box, and the side information of use such as the CLD that is provided by the 11 OTT box and ICC information is merged into an independent object bit stream with two object bit streams.

The tree structure information of each of these two object bit streams must merge in the tree structure information after the merging, two object bit streams are merged into an independent object bit stream.For this reason, merging any extra configuration information that generates by two object bit streams can be modified, the numeral index that is used for the OTT box of two object bit streams of generation also will be modified, and only carry out a small amount of extra processing, the computing of for example being carried out by the 11 OTT box, and the reduction audio mixing of two reduction audio signal of two object bit streams.In this way, two object bit streams can easily be merged into an independent object bit stream, and do not need to revise information about each of a plurality of object signal, therefore, provide a kind of method that simply two bit streams is generated a bit stream.

Referring to Figure 17, the 11 OTT box is optional.In this case, two of two object bit streams reduction audio signal can be taken as two down-mix audio signal and use.Then, two object bit streams can be merged into an independent object bit stream, and need not extra calculating.

Figure 18 be for explain with two or more independently the object bit stream be merged into the block diagram of an independent object bit stream with stereo reduction audio signal.Referring to Figure 18, if two or more independently the object bit stream have different parameter band quantity, can be for the mapping of object bit stream execution parameter band, the parameter band quantity that has like this an object bit stream of less parameters band rises to identical with the parameter band quantity of another object bit stream.

More specifically, can come the mapping of execution parameter band with predetermined mapping table.In this case, can come the mapping of execution parameter band with simple linear formula.

If overlapping parameter band is arranged, consider amount that overlapping parameter band overlaps each other and hybrid parameter value suitably.Paying the utmost attention in this situation of complexity, can be for the mapping of two object bit stream execution parameter bands, so the parameter band quantity that has than a bit stream of multiparameter band in two object bit streams reduces to the same with the parameter band quantity of another object bit stream.

In the embodiment of Figure 17 and 18, two or more independently the object bit stream can be merged into an object bit stream after the merging, and do not need the independently calculating of the existing parameter of object bit stream.Yet, merging in a plurality of these situations of reduction audio signal, may need again to be calculated by the QMF/ hybrid analysis about the parameter of this reduction audio signal.Yet this calculates needs very large calculated amount, thereby comprises the usefulness of the embodiment of Figure 17 and 18.Therefore, a kind of method need to be proposed, even when the reduction audio signal is reduced audio mixing, can extracting parameter and do not need the QMF/ hybrid analysis or synthesize.For this reason, the information of energy about each parameter band of each reduction audio signal can be included in the object bit stream.Then, when reduction audio signal when being reduced audio mixing, can easily calculate information such as CLD information according to these energy informations, and not need QMF/ hybrid analysis or synthetic.These energy informations can represent the highest energy level of each parameter band, or the absolute value of the energy level of the highest energy object signal of each parameter band.Can further reduce calculated amount by the ICC value of using the whole parameter band that obtains from time domain.

During a plurality of reduction audio signal reduction audio mixings, slicing (clipping) may occur.In order to overcome this problem, can reduce the level of reduction audio signal.If the level of reduction audio signal has been lowered, may need to be included in the object bit stream about the level information of the level after reduction being lowered of audio signal.Be used for stoping the level information of slicing can be applied to each frame of object bit stream, or only be applied to occuring therein some frame of slicing.Can be by the contrary level that should be used for calculating original reduction audio signal to the level information that be used for to stop the slicing that during decode operation, occurs.Be used for stoping the level information of slicing to be calculated in time domain, then do not need to introduce QMF/ and mix synthetic or analysis.Can carry out with structure as shown in figure 12 a plurality of object signal are merged into an independent object bit stream, describe this operation in detail hereinafter with reference to Figure 19.

Figure 19 be for explain with two independently the object bit stream be merged into the block diagram of an independent object bit stream.Referring to Figure 19, the first box 261 generates the first object bit stream, and the second box 263 generates the second object bit stream.Then, the 3rd box 265 generates the 3rd object bit stream by merging the first and second bit streams.In this case, if the first and second object bit streams comprise the information about the absolute value of the energy level of the highest energy object signal of each parameter band, ratio with energy level with the energy level of highest energy object signal of other non-highest energy object signal, and about the gain information of yield value, this yield value will multiply each other with the reduction audio signal that comes from the first and

second boxes

261 and 263, the 3rd box 265 can generate the 3rd object bit stream by the first and second bit streams are combined, and does not need extra calculation of parameter or extraction.

The 3rd box 265 receives a plurality of reduction audio signal DOWNMIX_A and DOWNMIX_B.The 3rd box 265 will reduce audio signal DOWNMIX_A and DOWNMIX_B is converted to the PCM signal, thereby and these PCM signals are added together generate independent reduction audio signal.In this operating period, yet, slicing may occur.In order to overcome this problem, reduction audio signal DOWNMIX_A and DOWNMIX_B can be multiplied by a predefined yield value.Information about this predefined yield value can be included in the 3rd object bit stream, and transmits with the 3rd object bit stream.

Below will describe in further detail a plurality of object bit streams will be merged into an independent object bit stream.Referring to Figure 19, side information A can comprise which is the information of highest energy object signal to object n about a plurality of object signal objects 1, and the ratio of the energy level of other non-highest energy object signal and the energy level of highest energy object signal.Same, side information B can comprise the information the same with side information A, it comprises which is the information of highest energy object signal to object n about a plurality of object signal objects 1, and the ratio of the energy level of other non-highest energy object signal and the energy level of highest energy object signal.

As shown in figure 20, SIDE_INFO_A and SIDE_INFO_B can be included in the bit stream concurrently.In this case, can additionally provide a bit to be used to indicate the bit stream that exists whether concurrently more than.

Referring to Figure 20, for whether indicating predetermined bit stream comprises more than the bit stream after the merging of one bit stream, indicating predetermined bit stream is the information of the bit stream after merging, and will be included in the predetermined bit stream about the information of bit stream quantity.And the information that is included in any original position about bit stream in the predetermined bit stream can be provided in the head of predetermined bit stream, and thereafter then more than one bit stream.In this case, demoder can determine whether this predetermined bit stream comprises more than the bit stream after the merging of one bit stream by the information that analysis is arranged in the head of predetermined bit stream.Such bit stream merging method is except increasing the minority identifier to not needing extra processing the bit stream.Yet these identifiers need to be provided in the interim of a plurality of frames, and the bit stream of such bit stream merging method each bit stream of needing demoder to go to determine that this demoder receives after whether merging.

As the replacement of described bit stream merging method, can be by so that demoder can not identify the mode whether a plurality of bit streams be merged into individual bit stream a plurality of bit streams not being merged into a bit stream.Describe this mode in detail hereinafter with reference to Figure 21.

Referring to Figure 21, the energy level of the energy level of the highest energy object signal that is relatively represented by SIDE_INFO_A and the highest energy object signal that represented by SIDE_INFO_B.Then, the highest energy object signal that has the bit stream after more the object signal of high energy level is confirmed as merging in these two object signal.For instance, if the energy level of the highest energy object signal that is represented by SIDE_INFO_A is higher than the energy level of the highest energy object signal that is represented by SIDE_INFO_B, the highest energy object signal that is then represented by SIDE_INFO_A is exactly the highest energy object signal of the bit stream after merging.Then, the bit stream after the energy Ratios information of SIDE_INFO_A can be used to merge, and the energy Ratios information of SIDE_INFO_B can be multiplied by the ratio of the energy level of the highest energy object signal among A and the B.

Then, SIDE_INFO_A and SIDE_INFO_B one of them comprise energy Ratios information about the information of the highest energy object signal of the bit stream after merging, with the energy Ratios information of the highest energy object signal that is represented by SIDE_INFO_A, and can be used to bit stream after this merging by the highest energy object signal that SIDE_INFO_B represents.The method comprises the again calculating to the energy Ratios information of SIDE_INFO_B.Yet, to the again calculating of the energy Ratios information of SIDE_INFO_B relatively and uncomplicated.In the method, demoder possibly can't determine whether received bit stream comprises more than the bit stream after the merging of a bit stream, and can use typical demoder method.

Merge the almost identical method of the employed method of bit stream that comprises monophone reduction audio signal by using, two object bit streams that comprise stereo reduction audio signal can easily be merged into an independent object bit stream, and do not need the again calculating about the information of object signal.In an object bit stream, there is the information about tree structure, the object signal information that obtains each branch (namely each box) from tree structure is being followed in reduction audio mixing object signal back.

Below described the object bit stream, supposed that this specific object only is assigned to L channel or the R channel of stereo reduction audio signal.Yet object signal normally is assigned to two sound channels of stereo reduction audio signal.Therefore, how below will to describe in detail according to the object bit stream of two sound channels distributing to stereo reduction audio signal and the formation object bit stream.

Figure 22 is the block diagram that generates the method for stereo reduction audio signal by a plurality of object signal of audio mixing for explaining, more specifically, a kind of for will be from object 1 to object 4 object signal reduction audio mixings of 4 to the method for L and R stereophonic signal.For instance, the first object signal object 1 is assigned to L and R sound channel with ratio a:b, shown in formula (3):

[formula 3]

{Eng}_{{Obj 1}_{L}} = \frac{a}{a + b} {Eng}_{Obj} 1

{Eng}_{{Obj 1}_{R}} = \frac{b}{a + b} {Eng}_{Obj} 1

If object signal is assigned to L and the R sound channel of stereo reduction audio signal, the channel allocation percent information of the ratio (a:b) that may be additionally need between L and R sound channel, distribute about object signal.Then, calculate information about object signal by use the OTT box to carry out the reduction audio mixing for the L of stereo reduction audio signal and R sound channel, for example CLD and ICC information are described this operation in detail hereinafter with reference to Figure 23.

Referring to Figure 23, reducing audio mixing operating period in case from a plurality of OTT boxes, obtain CLD and ICC information, and provide each channel allocation percent information of a plurality of object signal, it can calculate the multichannel bit stream, and this multichannel bit stream can be according to the terminal user to object location information with reproduce any modification that configuration information makes and adaptive variation.In addition, processed if stereo reduction audio signal needs between reduction audio mixing pre-treatment period, it can obtain about how processing the information of this reduction audio signal between reduction audio mixing pre-treatment period, and with the communication of acquisition to pretreater.That is to say, if each channel allocation percent information of a plurality of object signal is not provided, just has no idea to calculate the multichannel bit stream and obtain the necessary information of operation of pretreater.The channel allocation percent information of object signal can be by the ratio (unit: dB) represent of two integers or scalar (scalar).

As mentioned above, if an object signal is assigned between two sound channels of stereo reduction audio signal, may need the channel allocation percent information of object signal.This channel allocation percent information may be the value of fixing, and it indicates this object signal to be assigned to ratio between two sound channels of stereo reduction audio signal.Optionally, the channel allocation percent information of object signal can change to another frequency band from a frequency band of object signal, especially when with this channel allocation percent information during as ICC information.If the reduction audio mixing by complexity operates to obtain stereo reduction audio signal, if for example object signal belongs to two sound channels of stereo reduction audio signal, and reduce audio mixing this object signal from a frequency band of object signal to another frequency band by changing ICC information, the detailed description that need to reduce to this object signal audio mixing that may be extra is with the object signal of decoding final rendering.This embodiment can be applied to all possible object structure described above.

After this, describe pre-service in detail below with reference to Figure 24 to 27.If the reduction audio signal that is input in the object decoder is stereophonic signal, before being input to the multi-channel decoder of object decoder, the reduction audio signal of this input needs pretreated, because multi-channel decoder can not will belong to the signal map of L channel of reduction audio signal of input to R channel.

Therefore, in order to make the terminal user will belong to the position movement of object signal of L channel of reduction audio signal of input to R channel, the reduction audio signal of this input needs pretreated, and pretreated reduction audio signal can be input to multi-channel decoder.

Can from play up matrix, obtain the pre-service that pretreatment information is carried out stereo reduction audio signal by neutralizing from the object bit stream, and suitably process stereo reduction audio signal according to pretreatment information, below will describe this operation in detail.

Figure 24 is the block diagram that how to dispose stereo reduction audio signal to object 4 according to 4 object signal objects 1 for explaining.Referring to Figure 24, the first object signal object 1 is assigned to L and R sound channel with ratio a:b, second object signal object 2 is assigned to L and R sound channel with ratio c:d, and the 3rd object signal object 3 only is assigned to the L sound channel, and the 4th object signal object 4 only is assigned to the R sound channel.Can generate information such as CLD and ICC by between a plurality of OTT, transmitting first to fourth object signal object 1 to each of object 4, and can generate the reduction audio signal according to the information that generates.

Suppose that the terminal user obtains to play up matrix by first to fourth object signal object 1 to position and the level of object 4 suitably is set, and 5 sound channels are arranged.This is played up matrix and can be represented by formula (4):

[formula 4]

[\begin{matrix} 30 & 10 & 20 & 30 & 10 \\ 10 & 30 & 20 & 10 & 30 \\ 22 & 22 & 22 & 22 & 22 \\ 21 & 21 & 31 & 11 & 11 \end{matrix}]

Referring to formula (4), when 5 coefficient sums of every row of 4 row equal predefined reference value, namely 100 o'clock, its level of determining corresponding object signal did not change.The amount of the difference in 4 row between 5 of every delegation coefficient sums and the predefined reference value is exactly the change amount (unit: dB) of the level of corresponding object signal.The first, second, third, fourth and fifth row of playing up matrix of formula (4) represent respectively FL, FR, C, RL and RR sound channel.

The first row of playing up matrix of formula (4) is corresponding to the first object signal object 1, and has altogether 5 coefficients, and namely 30,10,20,30 and 10.Because these 5 coefficient sums of the first row are 100, its level of determining the first object signal object 1 does not change, and only has the locus of the first object signal object 1 that change has occured.Even the different sound channel direction of 5 Parametric Representations of the first row, they can be two sound channel: L and R sound channel by rough classification also.Then, the first object signal object 1 ratio of distributing between L and R sound channel can be by 70% (=(30+30+20) * 0.5): 30% (=(10+10+20) * 0.5) calculated.Therefore, the matrix of playing up of formula (4) indicates the level of the first object signal object 1 not change, and the first object signal object 1 is assigned between L and the R sound channel with the ratio of 70%:30%.If 5 coefficient sums of arbitrary row of playing up matrix of formula (4) are less than or greater than 100, its level of determining corresponding object signal changes, and then, corresponding object signal can be processed by pre-service, or is converted into ADG and transmission.

For pre-service reduction audio signal, can calculate the allocation proportion of this reduction audio signal between the parameter band, parameter in the parameter band is to extract from the signal that obtains by the reduction audio signal is carried out the QMF/ mixing transformation, and this reduction audio signal can according to play up matrix arrange be redistributed between the parameter band.Below will describe the various audio signal of will reducing in detail and be redistributed to method in the parameter band.

In the first reassignment method, use respectively the side information (for example CLD and ICC information) of L-and R-down-mix audio signal and use and the almost identical method of the employed method of multichannel codec decode respectively L-and R-down-mix audio signal.Then, recovery is assigned to the object signal in L-and the R-down-mix audio signal.In order to reduce calculated amount, can be only by CLD information decode L-and R-down-mix audio signal.Can determine that the object signal of each recovery is assigned to the ratio between L-and the R-down-mix audio signal according to side information.

Object signal after each recovers can be assigned between L-and the R-down-mix audio signal according to playing up matrix.Then, use OTT the object signal of having reallocated to be reduced audio mixing based on sound channel to sound channel ground, thereby finish this pre-service.In brief, the first reassignment method adopts and the employed identical method of multichannel codec.Yet the first reassignment method needs to carry out with the as many decoding of object signal for each sound channel to be processed, and needs reallocation to process and based on the reduction stereo process of sound channel.

In the second reassignment method, be different from the first reassignment method, from L-and R-reduction audio signal, do not recover object signal.What substitute is, each L-and R-reduction audio signal are divided into two parts: as shown in figure 25, L_L or the R_R of a part are left in the corresponding sound channel, and the L_R of other parts or R_L are reallocated.Referring to Figure 25, L_L indication L-down-mix audio signal should be left on the part in the L sound channel, L_R indication L-down-mix audio signal should be added to part in the R sound channel.Same, R_R indication R-down-mix audio signal should stay part in the R sound channel, and R_L indication R-down-mix audio signal should be added to part in the L sound channel.Each L-and R-down-mix audio signal can be according to the ratios that is assigned to such as defined each object signal of formula (2) between L-and the R-reduction audio signal, and should be assigned to the ratio between pretreated L and the R sound channel and be divided into two parts (L_L and L_R, or R_R and R_L) such as defined each object signal of formula (3).Therefore, it can be assigned to ratio between L-and the R-reduction audio signal and each object signal by each object signal relatively and should how to be determined that to the ratio of pretreated L and R sound channel reallocation L-and R-reduce audio signal between pretreated L and R sound channel by reallocation.

Below described according to predefined energy and recently the L-sound channel signal has been divided into signal L_L and L_R.In case the L-sound channel signal is divided into signal L_L and L_R, then need to determine the ICC between signal L_L and L_R.Can be according to about the ICC message of object signal and easily determine ICC between signal L_L and L_R.That is to say, can distribute to ratio between signal L_L and the L_R according to each object signal and determine ICC between signal L_L and L_R.

The second reduction audio mixing reassignment method below will be described in further detail.Suppose that L-and R-down-mix audio signal L and R are obtained by as shown in figure 24 method, and the first, second, third and the 4th object signal object 1 (OBJECT1), object 2 (OBJECT2), object 3 (OBJECT3) and object 4 (OBJECT4) are respectively with 1:2,2:3, the ratio of 1:0 and 0:1 is assigned between L-and R-down-mix audio signal L and the R.A plurality of object signal can be reduced audio mixing by a plurality of OTT boxes, and can be from the reduction audio mixing of object signal acquired information, for example CLD and ICC information.

Be that the example playing up matrix set up of first to fourth object signal object 1 to object 4 is represented by formula (4).This is played up matrix and comprises that first to fourth object signal object 1 is to the positional information of object 4.Then, can be by carrying out pre-service and obtain pretreated L-and R-down-mix audio signal L and R with playing up matrix.Below reference formula (3) has been described and how to be set up and explain that this plays up matrix.

Can calculate each to the object 4 of first to fourth object signal object 1 by formula (5) and be assigned to ratio between pretreated L-and R-down-mix audio signal L and the R:

[formula 5]

Object1：

{Eng}_{{Obj 1}_{L'}} = 30 + 30 + 20 * 0.5 = 70,

{Eng}_{{Obj 1}_{R'}} = 10 + 10 + 20 * 0.5 = 30

{Eng}_{{Obj 1}_{L'}} : {Eng}_{{Obj 1}_{R'}} = 70 : 30

Object2：

{Eng}_{{Obj 2}_{L'}} = 10 + 10 + 20 * 0.5 = 30,

{Eng}_{{Obj 2}_{R'}} = 30 + 30 + 20 * 0.5 = 70

{Eng}_{{Obj 2}_{L'}} : {Eng}_{{Obj 2}_{R'}} = 30 : 70

Object3：

{Eng}_{{Obj 3}_{L'}} = 22 + 22 + 22 * 0.5 = 55,

{Eng}_{{Obj 3}_{R'}} = 22 + 22 + 22 * 0.5 = 55

{Eng}_{{Obj 3}_{L'}} : {Eng}_{{Obj 3}_{R'}} = 55 : 55

Object4：

{Eng}_{{Obj 4}_{L'}} = 21 + 11 + 31 * 0.5 = 47.5,

{Eng}_{{Obj 4}_{R'}} = 21 + 11 + 31 * 0.5 = 47.5

{Eng}_{{Obj 4}_{L'}} : {Eng}_{{Obj 4}_{R'}} = 47.5 : 47.5

Can calculate each to the object 4 of first to fourth object signal object 1 by formula (6) and be assigned to the ratio of L-and R-down-mix audio signal L and R:

[formula 6]

Object1：

{Eng}_{{Obj 1}_{L}} : {Eng}_{{Obj 1}_{R}} = 1 : 2

Object2：

{Eng}_{{Obj 2}_{L}} : {Eng}_{{Obj 2}_{R}} = 2 : 3

Object3：

{Eng}_{{Obj 3}_{L}} : {Eng}_{{Obj 3}_{R}} = 1 : 0

Object4：

{Eng}_{{Obj 4}_{L}} : {Eng}_{{Obj 4}_{R}} = 0 : 1

Referring to formula (5), the part sum of part and the 3rd object signal object 3 that is assigned to pretreated R-down-mix audio signal that is assigned to the 3rd object signal object 3 of pretreated L-down-mix audio signal is 110, and then its level of determining the 3rd object signal object 3 has increased by 10.On the other hand, the part sum of part and the 4th object signal object 4 that is assigned to pretreated R-down-mix audio signal of distributing to the 4th object signal object 4 of pretreated L-down-mix audio signal L is 95, and then its level of determining the 4th object signal object 4 has reduced 5.If have reference value 100 for first to fourth object signal object 1 to the matrix of playing up of object 4, and this plays up coefficient sum in every delegation of matrix and the measures of dispersion of reference value 100 represents the amount (unit: dB) that the level of corresponding object signal changes, its level that can determine the 3rd object signal object 3 has increased 10dB, and the level of the 4th object signal object 4 has reduced 5dB.

Formula (5) and formula (6) can be rearranged row and advance formula (7):

[formula 7]

Object1：

{Eng}_{{Obj 1}_{L}} : {Eng}_{{Obj 1}_{R}} = 33.3 : 66.7

{Eng}_{{Obj 1}_{L'}} : {Eng}_{{Obj 1}_{R'}} = 70 : 30

Object2：

{Eng}_{{Obj 2}_{L}} : {Eng}_{{Obj 2}_{R}} = 40 : 60

{Eng}_{{Obj 2}_{L'}} : {Eng}_{{Obj 2}_{R'}} = 30 : 70

Object3：

{Eng}_{{Obj 3}_{L}} : {Eng}_{{Obj 3}_{R}} = 100 : 0

{Eng}_{{Obj 3}_{L'}} : {Eng}_{{Obj 3}_{R'}} = 50 : 50

Object4：

{Eng}_{{Obj 4}_{L}} : {Eng}_{{Obj 4}_{R}} = 0 : 100

{Eng}_{{Obj 4}_{L'}} : {Eng}_{{Obj 4}_{R'}} = 50 : 50

Formula (7) comprises that each first to fourth object signal object 1 to object 4 is assigned to L-before the pre-service and the ratio between the R-down-mix audio signal, and each first to fourth object signal object 1 after object 4 is assigned to pre-service L-and the ratio between the R-down-mix audio signal.Therefore, by using formula (7), it can easily determine each first to fourth object signal object 1 how much should be reallocated by pre-service to object 4.For instance, referring to formula (7), second object signal object 2 becomes 30:70 from the ratio that is assigned between L-and the R-down-mix audio signal from 40:60, then its can determine by allocate in advance to the L-down-mix audio signal second object signal object 2 1/4th (25%) part need to be switched in the R-down-mix audio signal.This operation will become more obvious by reference formula (8):

[formula 8]

Object 1: 55% part of allocating in advance to the object 1 of R need to be switched to L

Object 2: 25% part of allocating in advance to the object 1 of L need to be switched to R

Object 3: 50% part that is assigned in advance the object 1 of L need to be switched to R

Object 4: 50% part that is assigned in advance the object 1 of R need to be switched to L.

By using formula (8), available formula (9) represents signal L_L, L_R, R_L and the R_R of Figure 25:

[formula 9]

{Eng}_{L_L} = {Eng}_{{Obj 1}_{L}} + 0.75 \cdot {Eng}_{{Obj 2}_{L}} + 0.5 \cdot {Eng}_{Obj 3}

{Eng}_{L_R} = 0.25 \cdot {Eng}_{{Obj 2}_{L}} + 0.5 \cdot {Eng}_{Obj 3}

{Eng}_{R_L} = 0.55 \cdot {Eng}_{{Obj 1}_{R}} + 0.5 \cdot {Eng}_{Obj 4}

{Eng}_{R_R} = 0.45 \cdot {Eng}_{{Obj 1}_{R}} + {Eng}_{{Obj 2}_{R}} + 0.5 \cdot {Eng}_{Obj 4}

The value of each object signal in the formula (9) can be by quantizing CLD information by going of providing of OTT box the ratio that corresponding object signal is assigned between L and the R sound channel represented by using, shown in formula (10):

[formula 10]

{Eng}_{{Obj 1}_{L}} = \frac{10^{\frac{CLD 2}{10}}}{1 + 10^{\frac{CLD 2}{10}}} \cdot \frac{10^{\frac{CLD 1}{10}}}{1 + 10^{\frac{CLD 1}{10}}} \cdot {Eng}_{L},

{Eng}_{{Obj 2}_{L}} = \frac{10^{\frac{CLD 2}{10}}}{1 + 10^{\frac{CLD 2}{10}}} \cdot \frac{1}{1 + 10^{\frac{CLD 1}{10}}} \cdot {Eng}_{L}

{Eng}_{{Obj 1}_{R}} = \frac{10^{\frac{CLD 4}{10}}}{1 + 10^{\frac{CLD 4}{10}}} \cdot \frac{10^{\frac{CLD 3}{10}}}{1 + 10^{\frac{CLD 3}{10}}} \cdot {Eng}_{R},

{Eng}_{{Obj 2}_{R}} = \frac{10^{\frac{CLD 4}{10}}}{1 + 10^{\frac{CLD 4}{10}}} \cdot \frac{1}{1 + 10^{\frac{CLD 3}{10}}} \cdot {Eng}_{R}

{Eng}_{Obj 3} = \sqrt{\frac{1}{1 + 10^{\frac{CLD 2}{10}}}} \cdot {Eng}_{L},

{Eng}_{Obj 4} = \frac{1}{1 + 10^{\frac{CLD 4}{10}}} \cdot {Eng}_{R}

The CLD that is used for each resolution block of Figure 25 can use formula (11) to determine:

[formula 11]

{CLD}_{pars 1} = 10 \log_{10} (\frac{L_L + ϵ}{L_R + ϵ})

ε be constant to avoid division by 0, for example: be lower than peak signal output 96dB.

{CLD}_{pars 2} = 10 \log_{10} (\frac{R_L + ϵ}{R_R + ϵ})

In this mode, employed CLD and the ICC information for generate signal L_L and L_R according to the L-down-mix audio signal of resolution block can be determined, and employed CLD and the ICC information for generate signal R_L and R_R according to the R-down-mix audio signal of resolution block can be determined.As shown in figure 25, in case obtained signal L_L, L_R, R_L and R_R can increase signal L_R and R_R, thereby obtain pretreated stereo reduction audio signal.If final sound channel is stereo channels, can export the L-and the R-down-mix audio signal that are obtained by pre-service.In this case, any possible change of each object signal still needs to be adjusted.For this reason, may additionally provide the predetermined module of carrying out the ADG functions of modules.Can calculate information for the level of adjusting each object signal with the method identical with calculating ADG information, and following this operation will be described in further detail.Optionally, during pretreatment operation, adjust the level of each object signal.In this case, can carry out adjustment to the level of each object signal with the method identical with processing ADG.For the embodiment of Figure 25, optionally, as shown in figure 26, in order to adjust the signal L that obtained by audio mixing and the correlativity between the R, the decorrelation operation can be carried out by decorrelator and mixer, rather than is carried out by resolution block PARSING 1 and PARSING 2.Referring to Figure 26, L-and R-sound channel signal that Pre_L and Pre_R indication are obtained by the level adjustment.One among signal Pre_L and the Pre_R is imported in the decorrelator, and enters by in the operation of the performed audio mixing of mixer, thereby obtains the signal after correlativity is adjusted.

Pretreated stereo reduction audio signal can be input to multi-channel decoder.For provide with by the set object's position signal of terminal user with reproduce mutually compatible multichannel output of configuration information, not only need pretreated reduction audio signal, also need for the side information based on sound channel of carrying out multi-channel decoding.Below will describe the side information that how to obtain based on sound channel in detail by again explaining described example.Defined pretreated reduction audio signal L and the R that inputs to multi-channel decoder can be represented by formula (12) according to formula (5):

[formula 12]

Eng _L′＝Eng _{L_L}+Eng _{R_L}

＝0.7Eng _Obj1+0.3Eng _Obj2+0.5Eng _Obj3+0.5Eng _Obj4

Eng _R′＝Eng _{L_R}+Eng _{R_R}

＝0.3Eng _Obj1+0.7Eng _Obj2+0.5Eng _Obj3+0.5Eng _Obj4

Each to the object 4 of first to fourth object signal object 1 is assigned to FL, RL, and C, the ratio between FR and the RR sound channel can be determined by formula (13):

[formula 13]

Eng _FL＝0.3Eng _Obj1+0.1Eng _Obj2+0.2Eng _Obj3+0.21·100/95·Eng _Obj4

Eng _RL＝0.3Eng _Obj1+0.1Eng _Obj2+0.2Eng _Obj3+0.11·100/95·Eng _Obj4

Eng _C＝0.2Eng _Obj1+0.2Eng _Obj2+0.2Eng _Obj3+0.31·100/95·Eng _Obj4

Eng _FR＝0.1Eng _Obj1+0.3Eng _Obj2+0.2Eng _Obj3+0.21·100/95·Eng _Obj4

Eng _RR＝0.1Eng _Obj1+0.3Eng _Obj2+0.2Eng _Obj3+0.11·100/95·Eng _Obj4

As shown in figure 27, pretreated reduction audio signal L and R can be extended to 5.1 sound channels by MPS.Referring to Figure 27, need in the parameter band, calculate the parameter TTT0 of TTT box and parameter OTTA, OTTB and the OTTC of OTT box, even the parameter band is not shown for convenience's sake.

TTT box TTT0 can be used to two kinds of different patterns: a kind of pattern and a kind of predictive mode based on energy.When the pattern that is used for based on energy, TTT box TTT0 needs two CLD information.When being used for predictive mode, TTT box TTT0 needs two CPC information and an ICC information.

CLD information in order to calculate based on energy model can use formula (6), (10) and (13) to calculate the signal L among Figure 27 ", R " and the energy Ratios of C.Signal L " energy level can be calculated by formula (14):

[formula 14]

{Eng}_{L''} = {Eng}_{FL} + {Eng}_{RL} = 0.6 {Eng}_{Obj 1} + 0.2 {Eng}_{Obj 2} + 0.4 {Eng}_{Obj 3} + 0.32 \cdot 100 / 95 \cdot {Eng}_{Obj 4}

= 0.6 \cdot \frac{1}{3} \cdot \frac{10^{\frac{CLD 2}{10}}}{1 + 10^{\frac{CLD 2}{10}}} \cdot \frac{10^{\frac{CLD 1}{10}}}{1 + 10^{\frac{CLD 1}{10}}} \cdot {Eng}_{L}

+ 0.2 \cdot \frac{2}{5} \cdot \frac{10^{\frac{CLD 2}{10}}}{1 + 10^{\frac{CLD 2}{10}}} \cdot \frac{1}{1 + 10^{\frac{CLD 1}{10}}} \cdot {Eng}_{L}

+ 0.4 \cdot \frac{1}{1 + 10^{\frac{CLD 2}{10}}} \cdot {Eng}_{L}

+ 0.32 \cdot 100 / 95 \cdot \frac{1}{1 + 10^{\frac{CLD 4}{10}}} \cdot {Eng}_{R}

Formula (14) also can be used to calculate R " or the energy level of C.Thereafter, can be according to signal L ", R " and the energy level of C calculate CLD information for TTT box TTT0, shown in formula (15):

[formula 15]

{TTT}_{CLD 1} = 10 \log_{10} (\frac{{Eng}_{L''} + {Eng}_{R''}}{{Eng}_{C''}})

{TTT}_{CLD 2} = 10 \log_{10} (\frac{{Eng}_{C''}}{{Eng}_{R''}})

Can set up formula (14) according to formula (10).Even formula (10) has only defined the energy value that how to calculate the L sound channel, also can use formula (10) to calculate the energy value of R sound channel.By such mode, CLD and the ICC value that can calculate the third and fourth OTT box according to CLD and the ICC value of the first and second OTT boxes.Yet can this be applied to all tree structures, and only be applied to specific tree structure with the decoder object signal.The information that is included in the object bit stream can be transferred to each OTT box.Optionally, the information that is included in the object bit stream can only be transferred to some OTT boxes, and by calculating the information that can obtain to indicate the OTT box that does not receive information.

Can be by calculate the parameter for OTT box OTTA, OTTB and OTTC, for example CLD and ICC information with described method.These multichannel parameters can be input to multi-channel decoder, and then enter multi-channel decoding, thereby obtain the object location information of expecting according to the terminal user and reproduce the multi-channel signal that configuration information is suitably played up.

If the level of object signal is not adjusted because of pre-service, the multichannel parameter can comprise the ADG parameter.Below the calculating that described example describes the ADG parameter in detail will be described again.

When playing up matrix and be established, the level of the 3rd object signal can increase 10dB, the level of the 4th object signal can reduce 5dB, then the level of the 3rd object signal component in L can increase 10dB, and the level of the 4th object signal component in L can reduce 5dB, and the level that can use formula (16) to calculate the third and fourth object signal is adjusted the ratio Ratio of the energy level after front and the adjustment _{ADG, L}:

[formula 16]

Can be by formula (10) substitution formula (16) be determined ratio Ratio _{ADG, L}Also can use formula (16) to calculate the ratio Ratio of R sound channel _{ADG, R}Each Ratio _{ADG, L}And Ratio _{ADG, R}The variation of the energy of the corresponding parameter band that expression causes because of the adjustment of the level of object signal.Then, can use Ratio _{ADG, L}And Ratio _{ADG, R}Calculate ADG value ADG (L) and ADG (R), shown in formula (17):

[formula 17]

ADG(L′)＝10log ₁₀(Ratio _ADG，L′)

ADG(R′)＝10log ₁₀(Ratio _ADG，R′)

In case determined ADG parameter A DG (L) and ADG (R), can quantize ADG parameter A DG (L) and ADG (R) with the ADG quantization table, and the ADG value after the transmission quantification.If do not need further accurate adjustment ADG value ADG (L) and ADG (R), can be carried out by pretreater the adjustment of ADG value ADG (L) and ADG (R), rather than use the MPS demoder.

The quantity and the interval that are used for being illustrated in the quantity of parameter band of object signal of object bit stream and the employed parameter band of interval and multi-channel decoder can be different.In this case, the parameter band of object bit stream can be mapped on the parameter band of multi-channel decoder linearly.More specifically, if the special parameter band of object bit stream extends on the parameter band of two multi-channel decoders, can carry out linear mapping, divide this special parameter band of object bit stream with the ratio between two parameter bands that are assigned to multi-channel decoder according to the relevant parameter band.On the other hand, if be included in the special parameter band of multi-channel decoder more than the parameter band of one object bit stream, parameter value that can equalization object bit stream.Optionally, can come with the parameter band mapping table of existing multichannel standard the mapping of execution parameter band.

When the object encoding and decoding are for teleconference the time, the voice of different people are corresponding to object signal.Object decoder is exported respectively voice corresponding to object signal to particular speaker.Yet when having simultaneously when speaking more than a people, object decoder is difficult to by decoding suitable assigner's voice to different loudspeakers, and people's voice play up the deterioration that may cause audio distortions and sound quality.In order to overcome this problem, indicate whether to have the information of speaking simultaneously more than a people to be included in the bit stream.Then, if having determined to have more than a people according to this information speaks simultaneously, can revise the bit stream based on sound channel, then almost the do not have decoded signal (barely-decoded) identical with the reduction audio signal is exported to each loudspeaker.

For instance, suppose to have 3 people a, b and c, and these three people a, the voice of b and c need decoded and export to respectively loudspeaker A, B and C.As these three people a, when b and c spoke simultaneously, these three people a, the voice of b and c can be included in the reduction audio signal, and this reduction audio signal is by to representing respectively this three people a, and the object signal of the voice of b and c is reduced audio mixing and obtained.In this case, about corresponding respectively to this three people a, the information of the reduction audio signal of the part of the voice of b and c can be configured to the multichannel bit stream.Then, can be with typical object coding/decoding method this reduction audio signal of decoding, so that these three people a, the voice of b and c can be exported to respectively loudspeaker A, B and C.Yet, loudspeaker A, each output of B and C may distortion, and may have than the lower discrimination of original reduction audio signal.In addition, these three people a, the voice of b and c possibly can't be isolated each other completely.In order to overcome this problem, indicate this three people a, the information that b and c speak simultaneously can be included in the bit stream.Then, code converter generates the multichannel bit stream, so that correspond respectively to this three people a by the reduction audio mixing, and the object signal of the voice of b and c and the reduction audio signal that obtains is exported to loudspeaker A, each of B and C.By such mode, it can prevent distorted signals.

In fact, when speaking simultaneously more than a people, be difficult to separate everyone voice.Therefore, when reduction audio signal when being output, its sound quality may be higher than when reducing the sound quality of audio signal when being played up, so the voice of different people can be spaced apart from each other, and is exported to different loudspeakers.For this reason, code converter can generate the multichannel bit stream, so the reduction audio signal that obtains from speak simultaneously more than a people can be exported to all loudspeakers, maybe this reduction audio signal can be exaggerated and then be exported to loudspeaker.

Whether speak simultaneously from one or more people for the reduction audio signal of denoted object bit stream, as mentioned above, object encoder, to replace providing extra information if can suitably revise the object bit stream.In this case, object decoder can be carried out typical decode operation to the object bit stream, so that the reduction audio signal can be exported to loudspeaker, but maybe this reduction audio signal can be exaggerated and do not expand to the initiation distortion, then is exported to loudspeaker.

The below offers the 3D information of multi-channel decoder, for example HTRF with detailed description.

When object decoder operates in ears pattern lower time, the multi-channel decoder in the object decoder also operates under the ears pattern.The terminal user can be according to the locus of the object signal 3D information after with optimization, and for example HRTF is transferred to multi-channel decoder.

More specifically, when two object signal are arranged, namely when object 1 and object 2, these two object signal objects 1 and object 2 are positioned over respectively

position

1 and 2, play up the 3D information that matrix maker or code converter may have the position of denoted object signal object 1 and object 2.If play up the 3D information that the matrix maker has the position of denoted object signal object 1 and object 2, this plays up the matrix maker can be with the 3D communication of the position of denoted object signal object 1 and object 2 to code converter.On the other hand, if code converter has the 3D information of the position of denoted object signal object 1 and object 2, this plays up the matrix maker only will be transferred to code converter corresponding to the index information of this 3D information.

In this case, can generate binaural signal according to 3D information assigned

address

1 and 2, shown in formula (18):

[formula 18]

L＝Obj1*HRTF _L，Pos1+Obj2*HRTF _L，Pos2

R＝Obj1*HRTF _R，Pos1+Obj2*HRTF _R，Pos2

Supposing will be with the 5.1 channel loudspeaker systems sound of regenerating, and multichannel ears demoder obtains ears sound by carrying out decoding, and this ears sound can be represented by formula (19):

[formula 19]

L＝FL*HRTF _L，FL+C*HRTF _L，C+FR*HRTF _L，FR+RL*HRTF _L，RL+RR*HRTF _L，RR

R＝FL*HRTF _R，FL+C*HRTF _R，C+FR*HRTF _R，FR+RL*HRTF _R，RL+RR*HRTF _R，RR

The L-channel component of object signal object 1 can be represented by formula (20):

[formula 20]

L _Obj1＝Obj1*HRTF _L，Pos1

L _Obj1＝FL _Obj1*HRTF _L，FL+C _Obj1*HRTF _L，C+FR _Obj1*HRTF _L，FR+RL _Obj1*HRTF _L，RL+RR _Obj1*HRTF _L，RR

The R-channel component of object signal object 1 and the L-of object signal object 2 and R-channel component also can use formula (20) to define.

For instance, if the energy level of object signal object 1 and object 2 is respectively a and b with the ratio of energy level summation, the ratio of part and whole object signal object 1 that is assigned to the object signal object 1 of FL sound channel is c, and the ratio of part and whole object signal object 2 that is assigned to the object signal object 2 of FL sound channel is d, and the ratio that object signal object 1 and object 2 are assigned to the FL sound channel is ac:bd.In this case, can determine the HRTF of FL sound channel, shown in formula (21):

[formula 21]

{HRTF}_{FL, L} = \frac{ac}{ac + bd} \cdot {HRTF}_{L, Pos 1} + \frac{bd}{ac + bd} \cdot {HRTF}_{L, Pos 2}

{HRTF}_{FL, R} = \frac{ac}{ac + bd} \cdot {HRTF}_{R, Pos 1} + \frac{bd}{ac + bd} \cdot {HRTF}_{R, Pos 2}

By such mode, can obtain employed 3D information in multichannel ears demoder.Because the employed 3D information exact position of indicated object signal better in multichannel ears demoder, it can decode the binaural signal of more vivo regenerating by the ears of using employed 3D information in multichannel ears demoder, and this reproduction ratio is when using the regeneration when carrying out multi-channel decoding corresponding to the 3D information of the position of 5 loudspeakers better.

As mentioned above, can calculate employed 3D information in multichannel ears demoder according to 3D information and the energy Ratios information of the locus of indicated object signal.Optionally, when the ICC information according to object signal adds up to the 3D information of indicated object signal space position, can be created on employed 3D information in the multichannel ears demoder by suitable execution decorrelation.

Effect process can be used as a pretreated part and is performed.Optionally, the structure of effect process can be increased in the output of multi-channel decoder simply.In previous example, in order to carry out the effect process for object signal, need the extraction of in addition the L-sound channel signal being carried out object signal to division and the R-sound channel of L_L and L_R to the division of R_R and R_L.

More specifically, at first can be from L-and R-sound channel signal the extraction object signal.Then, this L-sound channel signal can be divided into L_L and L_R, and this R-sound channel signal can be divided into R_R and R_L.To processing for this object signal implementation effect.Then, the object signal after the effect process can be divided into L-and R-channel component according to playing up matrix., the L-channel component of object signal effect process after can be increased to L_L and R_L, the R-channel component of the object signal after the effect process is increased to R_R and L_R thereafter.

Optionally, can at first generate pretreated L-and R-sound channel signal L and R.Thereafter, can be from pretreated L-and R-sound channel signal L and R the extraction object signal., can for object signal implementation effect process, and the result of effect process be returned add to pretreated L-and R-sound channel signal thereafter.

Can revise by effect process the frequency spectrum of object signal.For instance, optionally improve the high pitch part of object signal or the level of bass part.For this reason, can only revise corresponding to the high pitch part of this object signal or the portions of the spectrum of bass part.In this case, need corresponding modify to be included in object-related information in the object bit stream.For instance, if the level of the bass part of special object signal has improved, the energy of the bass part of this special object signal has also improved.Then, be included in the energy that energy information in the object bit stream represents this special object signal no longer exactly.In order to overcome this problem, can directly revise the energy information that is included in the object bit stream according to the variation of the energy of this special object signal.Optionally, the spectral change information that is provided by code converter can be applied in the forming of multichannel bit stream, and the energy variation of this special object signal can be reflected in the multichannel bit stream like this.

Figure 28 to 33 is merged into a side information and a block diagram that reduces audio signal for explaining with a plurality of object-based side informations and a plurality of reduction audio signal.In the example of teleconference, sometimes a plurality of object-based side informations and a plurality of reduction audio signal must be merged in a side information and the reduction audio signal, in this case, need to consider many factors.

Figure 28 is the block diagram of the object bit stream behind the coding.Referring to Figure 28, the object bit stream behind this coding comprises reduction audio signal and side information.This reduction audio signal and this side information are synchronous.Therefore, the object bit stream behind this coding can be easily decoded, and do not need to consider extra factor.Yet, in the situation that a plurality of bit streams is merged to an individual bit stream, must guarantee that the reduction audio signal of this individual bit stream and the side information that this individual bit flows are synchronous.

Figure 29 is for the object bit stream BS1 behind a plurality of codings of explanation merging and the block diagram of BS2.Referring to Figure 29,

reference marker

1,2 and 3 indication frame numbers.For a plurality of reduction audio signal being merged into an independent reduction audio signal, this reduction audio signal will be converted into pulse code modulation (PCM) (PCM) signal, this PCM signal is reduced audio mixing in time domain, and the PCM signal behind the reduction audio mixing will be converted into the compression coding and decoding form.Shown in Figure 29 (b), in this operating period, may generate delay d.Therefore, when when merging a plurality of bit streams and obtain decoded bit stream, must guarantee the reduction audio signal of decoded bit stream and side information Complete Synchronization with decoded bit stream.

If provided the reduction audio signal of bit stream and the delay between the side information, then can use the scheduled volume corresponding to this delay to compensate this bit stream.The reduction audio signal of bit stream and the delay between the side information can change along with being used for generating the type of the compression coding and decoding device that reduces audio signal.Therefore, the bit of any possible delay can be included in the side information between the reduction audio signal of indication bit stream and the side information.

Figure 30 represents when generating the reduction audio signal of bit stream BS1 and BS2 by different codec types, or when the configuration of the side information of bit stream BS2 is different from the configuration of side information of bit stream BS1, two bit stream BS1 and BS2 are merged into the situation of an independent bit stream.Referring to Figure 30, when generating the reduction audio signal of bit stream BS1 and BS2 by different code/decode types, or when the configuration of the side information of bit stream BS2 is different from the configuration of side information of bit stream BS1, can determine that bit stream BS1 and BS2 have unlike signal and postpone d1 and d2, these delays are that the reduction audio signal is transformed to time-domain signal and comes the conversion time-domain signal to cause with single compression coding and decoding device.In this case, if simply bit stream BS1 and BS2 are added together, and do not consider the delay of unlike signal, then the reduction audio signal of bit stream BS1 may produce skew with the reduction audio signal of bit stream BS2, and the side information of bit stream BS1 may produce skew with the side information of bit stream BS2.In order to overcome this problem, the reduction audio signal with the bit stream BS1 that postpones d1 can further be postponed with the reduction audio signal synchronised with the bit stream BS2 that postpones d2.Then, can merge bit stream BS1 and BS2 with the method identical with the embodiment of Figure 30.If have more than a bit stream mergedly, the bit stream that wherein has a maximum-delay is taken as reference bits stream, then, other bit stream further postponed with reference bits stream synchronised.The bit of the delay between indication reduction audio signal and the side information can be included in the object bit stream.

Can provide indication in bit stream, to have the bit of signal delay.Only when there is signal delay in the bit information indication in bit stream, can additionally provide the information of specification signal delay.In this way, it can minimize and be used to indicate any possible required quantity of information of signal delay in bit stream.

Figure 32 is for explaining that the difference that how postpones by unlike signal compensates one of them the block diagram of two bit stream BS1 having that unlike signal postpones and BS2, specifically, how to compensate and has the bit stream BS2 that postpones of large-signal more than bit stream BS1.Referring to Figure 32, the first to the 3rd frame of the side information of bit stream BS1 all can be used in its original mode.On the other hand, cannot use in its original mode the first to the 3rd frame of the side information of bit stream BS2 because the first to the 3rd frame of the side information of bit stream BS2 not with the first to the 3rd frame of the side information of bit stream BS1 respectively synchronously.For instance, the second frame of the side information of bit stream BS1 is not only corresponding to the part of the first frame of the side information of bit stream BS2, also corresponding to the part of the second frame of the side information of bit stream BS2.Can calculate corresponding to the part of the second frame of the side information of the bit stream BS2 of the second frame of the side information of bit stream BS1 and the ratio of whole second frame of the side information of bit stream BS2, and corresponding to the part of the first frame of the side information of the bit stream BS2 of the second frame of the side information of bit stream BS1 and the ratio of whole first frame of the side information of bit stream BS2, and can come according to the result of this calculating the first and second frames of the side information of equalization or interpolation bit stream BS2.Shown in Figure 32 (b), in this way, the first to the 3rd frame of the side information of bit stream BS2 can be respectively and the first to the 3rd frame synchronised of the side information of bit stream BS1.Then, can merge the side information of bit stream BS1 and the side information of bit stream BS2 with the method for the embodiment of Figure 29.The reduction audio signal of bit stream BS1 and BS2 can be merged into an independent reduction audio signal, and need not delay compensation.In this case, can be stored in the bit stream after the merging that obtains by merging bit stream BS1 and BS2 corresponding to the deferred message of signal delay d1.

Figure 33 is for explaining that how compensating two bit streams with unlike signal delay has the more block diagram of the bit stream of small-signal delay.Referring to Figure 33, the first to the 3rd frame of the side information of bit stream BS2 all can be used in its original mode.On the other hand, cannot use in its original mode the first to the 3rd frame of the side information of bit stream BS1 because the first to the 3rd frame of the side information of bit stream BS1 not with the first to the 3rd frame of the side information of bit stream BS2 respectively synchronously.For instance, the first frame of the side information of bit stream BS2 is not only corresponding to the part of the first frame of the side information of bit stream BS1, also corresponding to the part of the second frame of the side information of bit stream BS1.Can calculate corresponding to the part of the first frame of the side information of the bit stream BS1 of the first frame of the side information of bit stream BS2 and the ratio of whole first frame of the side information of bit stream BS1, and corresponding to the part of the second frame of the side information of the bit stream BS1 of the first frame of the side information of bit stream BS2 and the ratio of whole second frame of the side information of bit stream BS1, and can come according to the result of this calculating the first and second frames of the side information of equalization or interpolation bit stream BS1.Shown in Figure 33 (b), in this way, the first to the 3rd frame of the side information of bit stream BS1 can be respectively and the first to the 3rd frame synchronised of the side information of bit stream BS2.Then, can merge the side information of bit stream BS1 and the side information of bit stream BS2 with the method for the embodiment of Figure 29.The reduction audio signal of bit stream BS1 and BS2 can be merged into an independent reduction audio signal, and need not delay compensation, postpones even this reduction audio signal has unlike signal.In this case, can be stored in the bit stream after the merging that obtains by merging bit stream BS1 and BS2 corresponding to the deferred message of signal delay d2.

If the object bit stream behind a plurality of codings is merged into an independent bit stream, the reduction audio signal of the object bit stream behind this coding need to be merged into an independent reduction audio signal.In order to be merged into corresponding to a plurality of reduction audio signal of different compression coding and decoding devices an independent reduction audio signal, these reduction audio signal can be converted into PCM signal or frequency-region signal, and this PCM signal or frequency-region signal can be added in together in corresponding territory.Can with predetermined compression coding and decoding device come the conversion described result that be added together thereafter.According to whether the reduction audio signal being added in together or whether being added in together in frequency domain in PCM operating period, and according to the type of compression coding and decoding, various signal delays may occur.Because demoder is the various signal delays of identification from bit stream that will be decoded like a cork, specify the deferred message of various signal delays to be included in the bit stream.These deferred messages are illustrated in delay sampling quantity in the PCM signal or the delay sampling quantity in frequency domain.

The present invention can realize with the computer-readable code that is recorded on the computer-readable medium.This computer readable recording medium storing program for performing can be the pen recorder of any type, and data are stored in computer-readable mode therein.The example of computer readable recording medium storing program for performing comprises ROM, RAM, CD-ROM, tape, floppy disk, optical data memories and the carrier wave data transmission of the Internet (for example by).Computer readable recording medium storing program for performing can be assigned with by a plurality of computer systems that are connected on the network, so computer-readable code is written into wherein, and is performed with non-centralized system.Common those skilled in the art can easily construct be used to realizing functional programs of the present invention, code and code segment.

As mentioned above, according to the present invention, benefit from object-based audio coding and coding/decoding method, the audiovideo of each object signal can be positioned.Like this, during reproducing object signal, can provide more lively sound.In addition, the present invention can be applied to interactive entertainment, and can provide more real pseudo-entity to experience to the user.

Although the present invention is described and illustrates with reference to its preferred embodiment, clearly those skilled in the art can make on the various ways and details on change, and do not break away from by the defined spirit of the present invention of claim or category.

Claims

1. audio-frequency decoding method, it comprises:

Receive reduction audio signal, object-based side information and control information, described reduction audio signal is obtained by the reduction a plurality of object signal of audio mixing, and position and the level of object signal included in the described reduction audio signal are controlled in described control information;

Extract metadata from described object-based side information, described metadata is indicated the explanation of described object signal;

Play up parameter and spatial parameter by generating with described control information and described object-based side information, the described parameter of playing up is used to the described reduction audio signal of pre-service, and described spatial parameter is used to generate multi-channel audio signal;

By the described parameter of playing up is applied to described reduction audio signal and comes object signal in the described reduction audio signal of pre-service, so that generate pretreated reduction audio signal; And

By with described spatial parameter described pretreated reduction audio signal being decoded to generate described multi-channel audio signal,

Wherein said metadata is provided with text formatting.

2. audio-frequency decoding method as claimed in claim 1, wherein, described metadata is included in the head of the bit stream that comprises described object-based side information.

3. audio decoding apparatus, it comprises:

Demodulation multiplexer, it is configured to extract reduction audio signal, object-based side information and control information from the sound signal of input, described reduction audio signal is obtained by the reduction a plurality of object signal of audio mixing, and position and the level of object signal included in the described reduction audio signal controlled in described control information;

Parametric converter, it is configured to extract metadata from described object-based side information, described metadata is indicated the explanation of described object signal, and described parametric converter is configured to play up parameter and spatial parameter by generating with described control information and described object-based side information, the described parameter of playing up is used to the described reduction audio signal of pre-service, and described spatial parameter is used to generate multi-channel audio signal;

Pretreater, it is configured to by the described parameter of playing up is applied to described reduction audio signal and comes object signal in the described reduction audio signal of pre-service, so that generate pretreated reduction audio signal; And

Multi-channel decoder, it is configured to described pretreated reduction audio signal be decoded by coming with described spatial parameter, generating described multi-channel audio signal,

Wherein said metadata is provided with text formatting.

4. device as claimed in claim 3, wherein, described metadata is included in the head of the bit stream that comprises described object-based side information.