CN101479785B - Method for encoding and decoding object-based audio signal and apparatus thereof - Google Patents

Method for encoding and decoding object-based audio signal and apparatus thereof Download PDF

Info

Publication number
CN101479785B
CN101479785B CN2007800238696A CN200780023869A CN101479785B CN 101479785 B CN101479785 B CN 101479785B CN 2007800238696 A CN2007800238696 A CN 2007800238696A CN 200780023869 A CN200780023869 A CN 200780023869A CN 101479785 B CN101479785 B CN 101479785B
Authority
CN
China
Prior art keywords
information
signal
audio
audio signal
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2007800238696A
Other languages
Chinese (zh)
Other versions
CN101479785A (en
Inventor
尹圣龙
房熙锡
李顯国
金东秀
林宰顯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LG Electronics Inc
Original Assignee
LG Electronics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LG Electronics Inc filed Critical LG Electronics Inc
Priority claimed from PCT/KR2007/004797 external-priority patent/WO2008039039A1/en
Publication of CN101479785A publication Critical patent/CN101479785A/en
Application granted granted Critical
Publication of CN101479785B publication Critical patent/CN101479785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

Provided are an audio encoding method and apparatus and an audio decoding method and apparatus in which audio signals can be encoded or decoded so that sound images can be localized at any desired position for each object audio signal. The audio decoding method generating a third downmix signal by combining a first downmix signal extracted from a first audio signal and a second downmix signal extracted from a second audio signal; generating third object-based side information by combining first object-based side information extracted from the first audio signal and second object-based side information extracted from the second audio signal; converting the third object-based side information into channel-based side information; and generating a multi-channel audio signal using the third downmix signal and the channel-based side information.

Description

The method and apparatus that is used for the object-based sound signal of Code And Decode
Technical field
The present invention relates to a kind of audio coding method and device, and a kind of audio-frequency decoding method and device, wherein the acoustic image of each object audio signal can be located in the position of any hope.
Background technology
In general, in multi-channel audio coding and decoding technique, a plurality of sound channel signals of multi-channel signal are reduced audio mixing and are advanced in the minority sound channel signal, and transmission has multi-channel signal with the as many sound channel of original multi-channel signal about the side information of original channel signal and recovery.
Object-based audio coding and decoding technique and multi-channel audio coding and decoding technique are advancing a plurality of sound source reduction audio mixings in the minority sound source signals, and the side information aspect of transmitting about original sound source is similar basically.Yet, in object-based audio coding and decoding technique, object signal, it is the fundamental element (for example sound of musical instrument or people's voice) of sound channel signal, be regarded as identical with sound channel signal in multi-channel audio coding and the decoding technique, and also can be by coding/decoding.
In other words, in object-based audio coding and decoding technique, each object signal will be regarded as the main body (entities) of coding/decoding.In this, object-based audio coding and decoding technique and multi-channel audio coding and decoding technique are distinguishing, this difference is that the multichannel audio coding/decoding is simple according to information between sound channel and by coding/decoding, and with irrelevant by the number of elements in the sound channel signal of coding/decoding.
Summary of the invention
Technical matters
The invention provides a kind of audio coding method and device, and a kind of audio-frequency decoding method and device, wherein can be to coding audio signal or decoding so that the acoustic image of each object audio signal can be located in the position of any hope.
Technical scheme
According to an aspect of the present invention, it provides a kind of audio-frequency decoding method, comprise: the first second reduction audio signal of reducing audio signal and extracting from second sound signal by combination is extracted from first sound signal generates the 3rd reduction audio signal; By making up the first object-based side information that from this first sound signal, extracts and the second object-based side information that from this second sound signal, extracts, generate the 3rd object-based side information; The 3rd object-based side information is transformed to side information based on sound channel; And by utilizing the 3rd reduction audio signal and should generating multi-channel audio signal based on the side information of sound channel.
According to another aspect of the present invention, it provides a kind of audio decoding apparatus, comprise: the multipoint control unit combiner, be used for the first reduction audio signal of extracting from first sound signal by combination and the second reduction audio signal of from second sound signal, extracting generating the 3rd reduction audio signal, and the first object-based side information that extracts by combination and the second object-based side information that extracts are to generate the 3rd object-based side information from this second sound signal from this first sound signal; Code converter is for the side information that the 3rd object-based side information is transformed to based on sound channel; And multi-channel decoder, be used for by utilizing the 3rd reduction audio signal and should generating multi-channel audio signal based on the side information of sound channel.
According to another aspect of the present invention, it provides a kind of computer readable recording medium storing program for performing, wherein record a kind of audio-frequency decoding method, this method comprises: the first second reduction audio signal of reducing audio signal and extracting from second sound signal by combination is extracted from first sound signal generates the 3rd reduction audio signal; By making up the first object-based side information that from this first sound signal, extracts and the second object-based side information that from this second sound signal, extracts, generate the 3rd object-based side information; The 3rd object-based side information is transformed to side information based on sound channel; And by utilizing the 3rd reduction audio signal and should generating multi-channel audio signal based on the side information of sound channel.
Beneficial effect
Provide a kind of audio coding method and device, and a kind of audio-frequency decoding method and device, wherein can be to coding audio signal or decoding so that the acoustic image of each object audio signal can be located in the position of any hope.
Description of drawings
By the following detailed description and accompanying drawing, the present invention's easy to understand more that will become, accompanying drawing is exemplary, and it is not construed as limiting the invention, wherein:
Fig. 1 is the block scheme of typical object-based audio coding/decoding system;
Fig. 2 is the block scheme according to the audio decoding apparatus of first embodiment of the invention;
Fig. 3 is the block scheme according to the audio decoding apparatus of second embodiment of the invention;
Fig. 4 is used for explaining amplitude difference and mistiming for the acoustic image location influence, and it is separate;
Fig. 5 is the functional arrangement about amplitude difference and the corresponding relation between the mistiming, and wherein this amplitude difference and mistiming are that acoustic image is positioned the precalculated position is needed;
Fig. 6 represents to comprise the form of the control data of harmonic information;
Fig. 7 is the block scheme according to the audio decoding apparatus of third embodiment of the invention;
Fig. 8 is the block scheme of art reduction audio mixing gain (ADG) module in the audio decoding apparatus that can be applied to as shown in Figure 7;
Fig. 9 is the block scheme according to the audio decoding apparatus of fourth embodiment of the invention;
Figure 10 is the block scheme according to the audio decoding apparatus of fifth embodiment of the invention;
Figure 11 is the block scheme according to the audio decoding apparatus of sixth embodiment of the invention;
Figure 12 is the block scheme according to the audio decoding apparatus of seventh embodiment of the invention;
Figure 13 is the block scheme according to the audio decoding apparatus of eighth embodiment of the invention;
Figure 14 is the block diagram of application that is used in three-dimensional (3D) information of frame by audio decoding apparatus shown in Figure 13 for explaining;
Figure 15 is the block scheme according to the audio decoding apparatus of ninth embodiment of the invention;
Figure 16 is the block scheme according to the audio decoding apparatus of tenth embodiment of the invention;
Figure 17-the 19th is be used to explaining the block diagram of audio-frequency decoding method according to an embodiment of the invention;
Figure 20 is the block scheme of audio coding apparatus according to an embodiment of the invention.
Implement optimal mode of the present invention
Describe the present invention in detail referring now to accompanying drawing, represented exemplary embodiment of the present invention in the accompanying drawings.
Can be applied to object-based audio frequency according to a kind of audio coding method of the present invention and device and a kind of audio-frequency decoding method and device and handle operation, but the present invention is not limited to this.In other words, this audio coding method and device and audio-frequency decoding method and device also can be applied to the various signal processing operations outside the object-based audio frequency processing operation.
Fig. 1 is the block scheme of typical object-based audio coding/decoding system.As a rule, the sound signal that inputs to object-based audio coding apparatus is not corresponding with the sound channel of multi-channel signal, and these sound signals are object signal independently.In this, object-based audio coding apparatus is different with the multi-channel audio coding device, and its difference is the sound channel signal of multi-channel audio coding device input multi-channel signal.
For instance, be imported in the multi-channel audio coding device such as the left front sound channel signal of 5.1 sound channel signals and the sound channel signal the right front channels signal, yet the object audio signal of the little main body of the ratio sound channel signal such as people's voice or musical instrument sound (for example sound of violin or piano) can be imported in the object-based audio coding apparatus.
Referring to Fig. 1, this object-based audio coding/decoding system comprises: object-based audio coding apparatus and object-based audio decoding apparatus.Object-based audio coding apparatus comprises object encoder 100, and object-based audio decoding apparatus comprises object decoder 111 and renderer 113.
Object encoder 100 receives N object audio signal, and generate the object-based reduction audio signal and the side information that have one or more sound channels, above-mentioned side information comprises many message slots that extract, for example energy difference, phase differential and relevance values from N object audio signal.Side information and object-based reduction audio signal are integrated with in the single bit stream, and this bit stream is transferred to object-based decoding device.
Side information can comprise and indicates whether to carry out based on the audio coding/decoding of sound channel or the sign of object-based audio coding/decoding, then, can determine that the audio coding/decoding of carrying out based on sound channel still is to carry out object-based audio coding/decoding according to the sign of side information.Side information also can comprise envelope information about object signal, grouping information, repose period information and deferred message.Side information also can comprise simple crosscorrelation information between object level difference information, object, reduction audio mixing gain information, reduction upmixed channels level difference information and absolute object energy information.
Object decoder 111 receives from object-based reduction audio signal and side information based on the object audio coding apparatus, and recovers to have object signal with N object audio signal like attribute according to object-based reduction audio signal and side information.The object signal that is generated by object decoder 111 is not assigned to any position in the multichannel space.Therefore each of renderer 113 object signal that will be generated by object decoder 111 is distributed to the precalculated position in the multichannel space, and renderer 113 is determined the level of object signal, like this can be by reproducing object signal by each relevant position of renderer 113 appointments and each corresponding level of being determined by renderer 113.The control information relevant with each object signal that is generated by object decoder 111 can change in time, then, can be changed according to control signal by level and the locus of the object signal of object decoder 111 generations.
Fig. 2 is the block scheme according to the audio decoding apparatus 120 of first embodiment of the invention.Referring to Fig. 2, this audio decoding apparatus 120 comprises: object decoder 121, renderer 123 and parametric converter 125.This audio decoding apparatus 120 also comprises the demodulation multiplexer (not shown), be used for extracting reduction audio signal and side information from the bit stream of input, and this demodulation multiplexer will be applied in all audio decoding apparatus according to other embodiments of the invention.
Object decoder 121 generates a plurality of object signal according to the reduction audio signal with by the amended side information that parametric converter 125 provides.Each of the object signal that renderer 123 will be generated by object decoder 121 is assigned to the precalculated position in the multichannel space, and determines level by the object signal of object decoder 121 generations according to control information.Parametric converter 125 generates amended side information by combination side information and control information.Then, parametric converter 125 is transferred to object decoder 121 with amended side information.
Object decoder 121 can be carried out adaptive decoding by the control information in the side information after the analysis modify.
For instance, if control information indicates first object signal and second object signal to be assigned to identical position in the multichannel space, and has identical level, typical audio decoding apparatus first and second object signal of can decoding respectively then are then by audio mixing/play up operation they are arranged in the multichannel space.
On the other hand, learn that first and second object signal are assigned to the same position in the multichannel space in the control information of object decoder 121 from amended side information of audio decoding apparatus 120, and having same level, is independent sound sources as first and second object signal.Thereby object decoder 121 is regarded first and second object signal as an independent sound source and first and second object signal of decoding, and not with they separately decodings.Like this, complexity of decoding has reduced.In addition, because the quantity of the sound source of need handling has reduced, the complexity of audio mixing/play up has also reduced.
Audio decoding apparatus 120 can effectively be used in quantity when object signal greater than this situation of the quantity of output channels, because a plurality of object signal probably is assigned to identical locus.
Optionally, audio decoding apparatus 120 can be used in when first object signal and second object signal and be assigned to same position in the multichannel space, but has this situation of varying level.In this case, audio decoding apparatus 120 is considered as one first and second object signal of decoding with first and second object signal, and first and second object signal of not decoding respectively, and decoded first and second object signal are transferred to renderer 123.More particularly, the control information of object decoder 121 from amended side information obtains the information about the difference between the level of first and second object signal, and according to the information that obtains first and second object signal of decoding.Like this, even first and second object signal have varying level, also first and second object signal can be decoded as the single sound source.
Equally optionally, object decoder 121 can be adjusted the level of the object signal that is generated by object decoder 121 according to control information.Then, object decoder 121 decodable codes are adjusted the object signal of over level.Thereby renderer 123 does not need to adjust the decoded object signal that is provided by object decoder 121, and as long as simply will be arranged in the multichannel space by the decoded object signal that object decoder 121 provides.In brief, because object decoder 121 has been adjusted the level of the object signal that is generated by object decoder 121 according to control information, renderer 123 can be easy to and will be arranged in the multichannel space by the object signal that object decoder 121 generates, and does not need extra adjustment by the level of the object signal of object decoder 121 generations.Therefore, can reduce the complexity of audio mixing/play up.
According to the embodiment of Fig. 2, the object decoder of audio decoding apparatus 120 can be by coming adaptive execution decode operation to the analysis of control information, thereby reduce the complexity of complexity of decoding and audio mixing/play up.Can use the combination of the said method of being carried out by audio decoding apparatus 120.
Fig. 3 is the block scheme according to the audio decoding apparatus 130 of second embodiment of the invention.Referring to Fig. 3, audio decoding apparatus 130 comprises object decoder 131 and renderer 133.This audio decoding apparatus 130 is characterised in that: it not only provides side information to object decoder 131, also offers renderer 133.
Even when the object signal that exists corresponding to repose period, audio decoding apparatus 130 also can effectively be carried out decode operation.For instance, second to the 4th object signal may be corresponding to the musical performance phase of instrument playing, and the repose period that first object signal may be played corresponding to accompaniment.In this case, indicate in a plurality of object signal which can be included in the side information corresponding to the information of repose period, and this side information can be provided for renderer 133 and object decoder 131.
Object decoder 131 can be by to not decoding to minimize decoding complex degree corresponding to the object signal of repose period.131 1 object signal of object decoder are set to corresponding to 0 value, and give renderer 133 with the level transmissions of this object signal.In general, the object signal with 0 value is regarded as identical with the object signal with non-0 value, and enters audio mixing/play up operation together.
On the other hand, audio decoding apparatus 130 transmission comprises that a plurality of target object of indication give renderer 133 corresponding to the side information of the information of repose period, the audio mixing that then stops object signal corresponding to repose period to enter to be carried out by renderer 133/play up operation.Therefore, audio decoding apparatus 130 can stop the unnecessary increase of the complexity of audio mixing/play up.
Renderer 133 can use the audio mixing parameter information that is included in the control information to define the acoustic image of each object signal in the stereo scene.The audio mixing parameter information can only comprise amplitude information or comprise amplitude information and temporal information.The audio mixing parameter information not only influences the location of stereo sound image, also influences the user for the psychoacoustic sensation of spatial sound quality.
For instance, by what generate by elutriation service time method and amplitude elutriation method more respectively, and two acoustic images that use 2 channel stereo loudspeakers to reproduce in same position, can learn that amplitude elutriation method can realize the accurate location of acoustic image, and time elutriation method can provide the natural sound of the deep sense in space.Then, if renderer 133 only uses amplitude elutriation method to arrange object signal in the multichannel space, renderer 133 can each acoustic image of accurate localization, but the deep sense of the sound when elutriation service time method can not be provided.According to the type of sound source, the user's accurate location of preference sound rather than deep sense of sound sometimes, vice versa.
Fig. 4 (a) and 4 (b) explain that intensity difference (amplitude difference) and mistiming are for the acoustic image location influence when using 2 channel stereo loudspeakers to come reproducing signal.Referring to Fig. 4 (a) and 4 (b), according to independently amplitude difference and mistiming mutually, an acoustic image is navigated to predetermined angular.For example, can use the amplitude difference of about 8dB, or the mistiming of the about 0.5ms that equates with the amplitude difference of 8dB is positioned at angle 20 with acoustic image.Therefore, even only provide amplitude difference as the audio mixing parameter information, also can be by amplitude difference being converted to the multiple sound that the mistiming obtains to have different attribute, wherein the mistiming is equal to amplitude difference during the acoustic image location.
Fig. 5 represents about acoustic image being positioned angle 10,20 and 30 needed amplitude differences and the function of corresponding relation between the mistiming.Function shown in Fig. 5 can obtain according to Fig. 4 (a) and 4 (b).Referring to Fig. 5, the comparison of multiple amplitude difference-mistiming can be provided to acoustic image is positioned the precalculated position.For example, the amplitude difference of supposing 8dB is provided as the audio mixing parameter information acoustic image is positioned at angle 20.According to function shown in Figure 5, also can use the combination of the mistiming of the amplitude difference of 3dB and 0.3ms that acoustic image is positioned at angle 20.In this case, not only provide amplitude difference information also to provide time difference information as the audio mixing parameter information, thereby strengthened spatial impression.
Therefore, in order to generate the sound of the attribute with user's expectation in audio mixing/play up operating period, the audio mixing parameter information can be by suitable conversion, makes it possible to carry out the amplitude elutriation that is suitable for the user and any one in the time elutriation.That is to say that if the audio mixing parameter information only comprises amplitude difference information, but user's expectation has the sound of the deep sense in space, this amplitude difference information can be converted into the time difference information that is equal to amplitude difference information with reference to psychoacoustic data.Optionally, if the user expects the accurate location of sound and the acoustic image of the deep sense in space simultaneously, amplitude difference information can be converted into amplitude difference information and be equal to the combination of the time difference information of original amplitude information.
Optionally, if the audio mixing parameter information only comprises time difference information, but the user expects the accurate location of acoustic image, this time difference information can be converted into the amplitude difference information that is equal to time difference information, maybe can be converted into the combination of amplitude difference information and time difference information, this combination can be by the accurate location that strengthens acoustic image and the preference that spatial impression satisfies the user.
Still optionally, if the audio mixing parameter information comprises amplitude difference information and time difference information, and the user selects the accurate location of acoustic image, and the combination of amplitude difference information and time difference information can be converted into the amplitude difference information of the combination that is equal to original amplitude difference information and time difference information.On the other hand, if the audio mixing parameter information comprises amplitude difference information and time difference information, and the user expects the enhancing of spatial impression, and the combination of amplitude difference information and time difference information can be converted into the time difference information that is equal to amplitude difference information and original time difference information combination.
Referring to Fig. 6, control information can comprise audio mixing about one or more object signal/play up information and harmonic information.Harmonic information can comprise the Pitch Information about one or more object signal, fundamental frequency information and dominant frequency take a message in the breath at least one and the explanation of the frequency spectrum of each subband of each object signal and energy.
Because with the subband be the deficiency of sharpness of the renderer of unit executable operations, harmonic information can be used in playing up operating period and handle object signal.
If this harmonic information comprises the Pitch Information about one or more object signal, can weaken or strengthen the gain that predetermined frequency area is adjusted each object signal by using comb filter or contrary comb filter.For instance, if in a plurality of object signal is voice sound signal, these object signal can be used to Karaoke by only weakening voice sound signal.Optionally, if harmonic information comprises the dominant frequency domain information about one or more object signal, then can carry out the processing that weakens or strengthen the dominant frequency territory.Still optionally, if harmonic information comprises the spectrum information about one or more object signal, can be by carrying out not by the weakening of any subband boundary limitation or strengthening the gain of controlling each object signal.
Fig. 7 is the block scheme of audio decoding apparatus 140 in accordance with another embodiment of the present invention.Referring to Fig. 7, audio decoding apparatus 140 uses multi-channel decoders 141 to replace object decoder and renderer, and in object signal by proper arrangement decoding a plurality of object signal in back in the multichannel space.
Specifically, audio decoding apparatus 140 comprises multi-channel decoder 141 and parametric converter 145.Multi-channel decoder 141 generates multi-channel signal, the object signal of these multi-channel signals is arranged in the multichannel space according to reduction audio signal and spatial parameter information, and this spatial parameter information is the side information based on sound channel that is provided by parametric converter 145.Parametric converter 145 is analyzed by next side information and the control information of audio coding apparatus (not shown) transmission, and according to the parameter information of analyzing of the span as a result.More specifically, parametric converter 145 generates spatial parameter information by side information and control information, and this control information comprises playback configuration information and audio mixing information.That is to say that corresponding to one to two (OTT) box or two to three (TTT) box, parametric converter 145 is spatial data to the combined transformation of side information and control information.
Audio decoding apparatus 140 can be carried out multi-channel decoding operation, wherein object-based decode operation and audio mixing/play up operation to be merged, and can skip decoding to each object signal.Therefore, can reduce the complexity of decoding and/or audio mixing/play up.
For instance, when the multi-channel signal that uses 5.1 channel loudspeaker playback systems to reproduce 10 object signal and obtain according to these 10 object signal, typical object-based audio decoding apparatus becomes to correspond respectively to the decoded signal of these 10 object signal next life according to reduction audio signal and side information, and by these 10 object signal proper arrangements are generated 5.1 sound channel signals in the multichannel space, then these object signal become and are suitable for 5.1 channel loudspeaker environment.Yet during 5.1 sound channel signals generated, the efficient that generates 10 object signal was very low, and the difference of this problem between the number of channels of the quantity of object signal and the multi-channel signal that will generate becomes more serious when increasing.
On the other hand, according to embodiment shown in Figure 7, audio decoding apparatus 140 generates the spatial parameter information that is suitable for 5.1 sound channel signals according to side information and control information, and spatial parameter information and reduction audio signal are offered multi-channel decoder 141.Then, multi-channel decoder 141 generates 5.1 sound channel signals according to spatial parameter information and reduction audio signal.In other words, when the number of channels that will export is 5.1 sound channels, audio decoding apparatus 140 can be easy to generate 5.1 sound channel signals according to the reduction audio signal, and do not need to generate 10 object signal, then this audio decoding apparatus with respect to common audio decoding apparatus more efficient aspect the complexity.
When calculating the calculated amount required corresponding to the spatial parameter information of each OTT box and TTT box when carrying out audio mixing/play up operate required calculated amount after each object signal decoding by analyzing the side information that come by the audio coding apparatus transmission and control information, this audio decoding apparatus 140 is more effective.
Come a module that is used for span parameter information is joined typical multichannel audio decoding device by analyzing side information and control information, can obtain this audio decoding apparatus 140, and can keep the compatibility with typical multichannel audio decoding device.Same, audio decoding apparatus 140 can improve sound quality by the existing instrument that uses typical multi-channel decoding device, and such as the envelope shaping device, the subband time domain is handled (STP) instrument and decorrelator.By foregoing, can infer that all advantages of typical multichannel audio coding/decoding method all can be applied to object-based audio-frequency decoding method easily.
The spatial parameter information that is transferred to multi-channel decoder 141 by parametric converter 145 can be compressed to be suitable for transmission.Optionally, spatial parameter information can have the form the same with the data of being transmitted by typical multi-channel encoder device.That is to say that spatial parameter information can enter Hofmann decoding operation or pilot tone decode operation, and can be used as unpressed spatial cues data (space cue data) and be transferred to each module.Preceding a kind of being suitable for comes the transmission space parameter information to give the multichannel audio decoding device by remote control, the back a kind of also very convenient because do not need the multichannel audio decoding device the compression the spatial cues data-switching to the easier unpressed spatial cues data of in decode operation, using.
May cause reducing delay between audio signal and the spatial parameter information according to the configuration of the spatial parameter information of the analysis of side information and control information.For fear of this point, can provide an extra impact damper to be used for the reduction audio signal or be used for spatial parameter information, reduce audio signal like this and spatial parameter information can be synchronized with each other.Yet these methods are inconvenient, because extra impact damper need be provided.Optionally, side information can be transmitted before the reduction audio signal, and it has considered the delay between contingent reduction audio signal and the spatial parameter information.In this case, the spatial parameter information that obtains by combination side information and control information does not need to be adjusted again and can be easy to use.
If a plurality of object signal of reduction audio signal have varying level, art reduction audio mixing gain (ADG) module of energy direct compensation reduction audio signal can be determined the associated level of object signal, and can use such as levels of channels difference information, the spatial cues data of (ICC) information of correlativity between sound channel and sound channel predictive coefficient (CPC) information and so on are assigned to precalculated position in the multichannel space with each object signal.
For instance, if predetermine one signal of control information indication will be assigned to the precalculated position in the multichannel space, and the level of this object signal is higher than other object signal, typical multi-channel decoder can calculate poor between the channel energies of reduction audio signal, and will reduce audio signal according to result calculated and be divided into some output channels.Yet, the volume that typical multi-channel decoder can not increase or reduce to reduce sound in the audio signal.In other words, typical multi-channel decoder simply will reduce audio signal and distribute to some output channels, and not increase or reduce to reduce the volume of sound in the audio signal.
Each precalculated position that is assigned in the multichannel space of a plurality of object signal that will be generated by object encoder according to control information also is relatively very simple.Yet, increase or the amplification that reduces the predetermine one signal needs special technique.In other words, if use the reduction audio signal that is generated by object encoder, the amplitude that reduces to reduce each object signal of audio signal is difficult.
Therefore, according to one embodiment of the invention, can use as shown in Figure 8 ADG module 147 to change the correlation magnitude of object signal according to control information.Any one amplitude of a plurality of object signal that in particular, can be by using the reduction audio signal that ADG module 147 increases or reduce to be transmitted by object encoder.The reduction audio signal that is obtained by the 147 execution compensation of ADG module can be carried out multi-channel decoding.
If use the 147 suitable adjustment of ADG module to reduce the relative amplitude of the object signal of audio signal, then can use typical multi-channel decoder to carry out the object decoding.If the reduction audio signal that is generated by object encoder is monophony or stereophonic signal or multi-channel signal with three or more sound channels, this reduction audio signal can be handled by ADG module 147.If the reduction audio signal that is generated by object encoder has two or more sound channels, and need be existed only in by the predetermine one signal that ADG module 147 is adjusted in the sound channel in the reduction audio signal, then ADG module 147 can only be applied to comprising the sound channel of this predetermine one signal, rather than is applied to reduce all sound channels of audio signal.Reduction audio signal after being handled by said method by ADG module 147 can use typical multi-channel decoder to handle easily, and does not need to revise the structure of multi-channel decoder.
Even when the signal of final output is not the multi-channel signal that can be reproduced by multi-channel loudspeaker, but binaural signal, can use ADG module 147 to go to adjust the correlation magnitude of the object signal of final output signal.
As using substituting of ADG module 147, during the generation of a plurality of object signal, can comprise in the control information that appointment will be applied to the gain information of the yield value of each object signal.For this reason, revise the structure of typical multi-channel decoder possibly.Even need to revise the structure of existing multi-channel decoder, during decode operation, by yield value being applied to each object signal, and do not need to calculate ADG and each object signal of compensation, this method is reducing aspect the decoding complex degree still very easily.
Fig. 9 is the block scheme according to the audio decoding apparatus 150 of fourth embodiment of the invention.Referring to Fig. 9, audio decoding apparatus 150 is characterised in that the generation binaural signal.
Specifically, audio decoding apparatus 150 comprises multichannel ears demoder 151, the first parametric converters 157 and second parametric converter 159.
Second parametric converter 159 is provided by side information and the control information that is provided by audio coding apparatus, and comes the configuration space parameter information according to analysis result.First parametric converter 157 is by increasing three-dimensional (3D) information, and for example a related transfer function (HRTF) parameter is given spatial parameter information, and disposing can be by the ears parameter information of multichannel ears demoder 151 uses.Multichannel ears demoder 151 generates virtual three-dimensional (3D) signal for the reduction audio signal by applying virtual 3D parameter information.
First parametric converter 157 and second parametric converter 159 can be replaced by an independent module, it is parameter transformation module 155, it receives side information, control information and HRTF parameter, and disposes the ears parameter information according to side information, control information and HRTF parameter.
As a rule, for the binaural signal of the reproduction of the reduction audio signal of using headphone generate to be used for to comprise 10 object signal, object signal must generate 10 decoded signals corresponding to 10 object signal respectively according to reduction audio signal and side information.Thereafter, renderer is assigned to precalculated position in the multichannel space to be suitable for 5 channel loudspeaker environment with reference to control signal with each of 10 object signal.Thereafter, renderer generates 5 sound channel signals that can use 5 channel loudspeakers to reproduce.Thereafter, renderer is applied to the HRTF parameter in 5 sound channel signals, thereby generates 2 sound channel signals.In brief, above-mentioned common audio-frequency decoding method comprises: reproduce 10 object signal, these 10 object signal are converted to 5 sound channel signals, and generate 2 sound channel signals according to 5 sound channel signals, as seen its efficient is very low.
On the other hand, audio decoding apparatus 150 can be easy to the binaural signal that generation can use headphone to reproduce according to object audio signal.In addition, audio decoding apparatus 150 comes the configuration space parameter information by the analysis to side information and control information, and uses typical multichannel ears demoder to generate binaural signal.Yet, even if when it is equipped with integrated parametric converter, audio decoding apparatus 150 still can use typical multichannel ears demoder, this parametric converter receives side information, control information and HRTF parameter, and disposes the ears parameter information according to side information, information processed and HRTF parameter.
Figure 10 is the block scheme according to the audio decoding apparatus 160 of fifth embodiment of the invention.Referring to Figure 10, audio decoding apparatus 160 comprises reduction audio mixing processor 161, multi-channel decoder 163 and parametric converter 165.Reduction audio mixing processor 161 and parametric converter 163 can be substituted by single module 167.
Parametric converter 165 generates and can be reduced the parameter information that audio mixing processor 161 uses by spatial parameter information and the quilt that multi-channel decoder 163 uses.The pretreatment operation that reduction audio mixing processor 161 is carried out the reduction audio signal, and transmission pretreatment operation result's reduction audio signal is given multi-channel decoder 163.163 pairs of reduction audio signal of being come by 161 transmission of reduction audio mixing processor of multi-channel decoder are carried out decode operation, thus output stereophonic signal, ears stereophonic signal or multi-channel signal.The example of the pretreatment operation that reduction audio mixing processor 161 is performed comprises: revises in time domain or frequency domain or conversion reduces audio signal by filtering.
If the reduction audio signal that is input in the audio decoding apparatus 160 is stereophonic signal, before this reduction audio signal is transfused to multi-channel decoder 163, this reduction audio signal can be used to be handled by the reduction audio mixing that reduction audio mixing processor 161 is carried out, because multi-channel decoder 163 can not be mapped to corresponding L channel and R channel with the component of reduction audio signal, wherein L channel is of multichannel, and R channel is multichannel another.Therefore, for the object signal that can will be categorized into L channel is transferred on the direction of R channel, the reduction audio signal that inputs to audio decoding apparatus 160 can be carried out the pre-service of reduction audio mixing processor, and pretreated reduction audio signal can be transfused to multi-channel decoder 163.
Can be according to from side information with carry out the pre-service of stereo reduction audio signal from the pretreatment information that control information obtains.
Figure 11 is the block scheme according to the audio decoding apparatus 170 of sixth embodiment of the invention.Referring to Figure 11, audio decoding apparatus 170 comprises multi-channel decoder 171, sound channel processor 173 and parametric converter 175.
The parameter information that parametric converter 175 generates the spatial parameter information that can be used by multi-channel decoder 171 and can be used by sound channel processor 173.Sound channel processor 173 is carried out the aftertreatment to the signal of being exported by multi-channel decoder 171.The example of the signal that multi-channel decoder 171 is exported comprises: stereophonic signal, ears stereophonic signal and multi-channel signal.
The example of the post-processing operation that sound channel processor 173 is performed comprises: revise or each sound channel or all sound channels of conversion output signal.For instance, if side information comprises the basic frequency information about the predetermine one signal, sound channel processor 173 can be removed harmonic component with reference to this basic frequency information from the predetermine one signal.The multichannel audio coding/decoding method may be efficient inadequately for karaoke OK system.Yet if be included in the side information about the basic frequency information of voice object, and the harmonic component of voice object signal is removed during aftertreatment, can realize high performance karaoke OK system by the embodiment that uses Figure 11.The embodiment of Figure 11 also can be applicable to the object signal except the voice object signal.For instance, can use the embodiment of Figure 11 to remove the sound of being scheduled to musical instrument.Equally, can use the embodiment of Figure 11 to use about the basic frequency information of object signal and amplify predetermined harmonic component.
Sound channel processor 173 can be carried out extra effect process to the reduction audio signal.Optionally, sound channel processor 173 can join the signal that is obtained by extra effect process the signal of multi-channel decoder 171 outputs.Frequency spectrum or modification reduction audio signal that sound channel processor 173 can in officely be what is the need for and be changed object when wanting.If directly the implementation effect processing is operated (such as to reducing the reverberation of audio signal) and the signal that the effect process operation obtains is transferred to multi-channel decoder 171 is not very suitable, sound channel processor 173 can join the signal that obtains through the effect process operation output of multi-channel decoder 171, to replace the processing of reduction audio signal implementation effect.
Audio decoding apparatus 170 can be designed to not only comprise sound channel processor 173, also comprises reduction audio mixing processor.In this case, reduction audio mixing processor can be arranged at before the multi-channel decoder 171, and sound channel processor 173 can be arranged at after the multi-channel decoder 171.
Figure 12 is the block scheme according to the audio decoding apparatus 210 of seventh embodiment of the invention.Referring to Figure 12, audio decoding apparatus 210 uses multi-channel decoder 213 to replace object decoder.
Particularly, audio decoding apparatus 210 comprises multi-channel decoder 213, code converter 215, renderer 217 and 3D information database 219.
Renderer 217 is determined the 3D position of a plurality of object signal corresponding to the 3D information of index data according to being included in the control information.Code converter 215 is by comprehensively generating side information based on sound channel about the positional information of a plurality of object audio signal, and wherein renderer 217 has been applied to 3D information in these object audio signal.Multi-channel decoder 213 is exported the 3D signal by being applied to the reduction audio signal based on the side information of sound channel.
Related transfer function (HRTF) can be used as a kind of 3D information and is used.HRTF is a kind of transition function, its described at an arbitrary position sound source and the transmission of the sound wave between the ear, and return a value that changes according to the position of sound source and height.If use HRTF to come filtering not with the signal of directivity, this signal can be heard as from certain direction and reproduce.
When receiving incoming bit stream, audio decoding apparatus 210 uses the demodulation multiplexer (not shown) to extract object-based reduction audio signal and object-based parameter information from incoming bit stream.Then, renderer 217 extracts the index data that is used for determining a plurality of object audio signal position from control information, and extracts (withdraw) 3D information corresponding with the index data that extracts out from 3D information database 219.
Specifically, not only level information can be comprised by the audio decoding apparatus 210 employed audio mixing parameter informations that are included in the control information, the necessary index data of search 3D information can also be comprised.The audio mixing parameter information also can comprise the temporal information about the mistiming between sound channel, positional information and one or more parameter that obtains by appropriate combination level information and temporal information.
Can initially determine the position of object audio signal according to default audio mixing parameter information, and change the position by the 3D information of using corresponding to user's desired position to object audio signal subsequently.Optionally, if the user wishes only 3D effect to be applied to some object audio signal, level information and the temporal information of not wishing to use the object audio signal of 3D effect about other user can be used as the audio mixing parameter information.
Code converter 215 generates the side information based on sound channel about the M sound channel by the positional information of comprehensively being transmitted by audio coding apparatus about the object-based parameter information of N object signal and a plurality of object signal, and renderer 217 will be applied in the positional information of above-mentioned object signal such as the 3D information of HRTF.
Multi-channel decoder 213 becomes sound signal according to the reduction audio signal next life with the side information based on sound channel that is provided by code converter 215, and is included in by use and carries out 3D based on the 3D information in the side information of sound channel and play up operation and generate the 3D multi-channel signal.
Figure 13 is the block scheme according to the audio decoding apparatus 220 of eighth embodiment of the invention.Referring to Figure 13, audio decoding apparatus 220 is different from audio decoding apparatus shown in Figure 12 210, and its difference is that code converter 225 transmits discretely based on the side information of sound channel and 3D information and gives multi-channel decoder 223.In other words, the code converter 225 of audio decoding apparatus 220 is from about obtaining the side information based on sound channel about M sound channel the object-based parameter information of N object signal, and transmission is given multi-channel decoder 223 based on the side information of sound channel and each the 3D information that is applied to N object signal, however code converter 215 transmission of audio decoding apparatus 210 comprise 3D information based on the side information of sound channel to multi-channel decoder 213.
Referring to Figure 14, can comprise a plurality of frame index based on side information and the 3D information of sound channel.Therefore, multi-channel decoder 223 can come synchronously side information and 3D information based on sound channel with reference to each frame index based on the side information of sound channel and 3D information, and can use 3D information and give frame corresponding to the bit stream of this 3D information.For example, the 3D information with index 2 can be applied to the beginning of the frame 2 with index 2.
Because side information and 3D information based on sound channel all comprise frame index, even 3D information is upgraded the temporary position based on the side information of sound channel that can determine effectively also that 3D information will be applied to along with the time.In other words, code converter 225 comprises 3D information and based on a plurality of frame index in the side information of sound channel, thus multi-channel decoder 223 can be easily synchronously based on side information and the 3D information of sound channel.
Reduction audio mixing processor 231, code converter 235, renderer 237 and 3D information database can be substituted by an independent module 239.
Figure 15 is the block scheme according to the audio decoding apparatus 230 of ninth embodiment of the invention.Referring to Figure 15, audio decoding apparatus 230 is different from audio decoding apparatus shown in Figure 13 220, and its difference is that audio decoding apparatus 230 further comprises reduction audio mixing processor 231.
Specifically, audio decoding apparatus 230 comprises code converter 235, renderer 237,3D information database 238, multi-channel decoder 233 and reduction audio mixing processor 231.Code converter 235, renderer 237,3D information database 238 is identical respectively with counterpart shown in Figure 13 with multi-channel decoder 233.231 pairs of stereo reduction audio signal of reduction audio mixing processor are carried out pretreatment operation to adjust the position.3D information database 238 can merge with renderer 237.Can also be provided for using desired effects gives audio decoding apparatus 230 for the module of reduction audio signal.
Figure 16 represents the block scheme according to the audio decoding apparatus 240 of tenth embodiment of the invention.Referring to Figure 16, audio decoding apparatus 240 is different from audio decoding apparatus shown in Figure 15 230, and its difference is that audio decoding apparatus 240 comprises multipoint control unit combiner 241.
That is to say that audio decoding apparatus 240 is the same with audio decoding apparatus 230, comprise reduction audio mixing processor 243, multi-channel decoder 244, code converter 245, renderer 247 and 3D information database 249.A plurality of bit streams that 241 combinations of multipoint control unit combiner are obtained by object-based coding, thus single bit stream obtained.For instance, when input is used for first bit stream of first sound signal and is used for second bit stream of second sound signal, multipoint control unit combiner 241 extracts the first reduction audio signal from first bit stream, from second bit stream, extract the second reduction audio signal, and generate the 3rd reduction audio signal by making up the first and second reduction audio signal.In addition, multipoint control unit combiner 241 extracts the first object-based side information from first bit stream, from second bit stream, extract the second object-based side information, and by making up the first object-based side information and the second object-based side information generates the 3rd object-based side information.Thereafter, multipoint control unit combiner 241 generates bit stream by making up the 3rd reduction audio signal and the 3rd object-based side information, and exports the bit stream that generates.
Therefore, according to tenth embodiment of the invention, be compared to coding or the situation of each object signal of decoding, even by the signal of two or more communication parties' transmission, it also can be processed effectively.
Multipoint control unit combiner 241 is in order to extract respectively from a plurality of bit streams a plurality of, and merge in the independent reduction audio signal with the corresponding reduction audio signal of different compression coding and decodings, these reduction audio signal need be converted into the signal in pulse code modulation (pcm) signal or the predetermined frequency area according to the compression coding and decoding type of reduction audio signal, PCM signal or the signal that obtains by conversion may need to combine, and the signal demand that obtains by combination uses predetermined compression coding and decoding to change.In this case, whether be merged in signal in PCM signal or the predetermined frequency area according to the reduction audio signal, may postpone.Yet this delay possibly can't correctly be estimated by demoder.Therefore, this delay may need to be included in the bit stream and with bit stream to be transmitted.This postpones the quantity of the delay sampling of indication in the PCM signal or the quantity of the delay sampling in predetermined frequency area.
Compare with the quantity of the input signal of handling usually at typical multichannel coding/decoding operating period (for example 5.1 sound channels or 7.1 sound channel coding/decodings operation), the quantity of the input signal that need handle in object-based audio coding/decoding operating period is quite big sometimes.Therefore, object-based audio coding/decoding method needs higher bit rate than typical audio coding/decoding based on sound channel.Yet because object-based audio coding/decoding method comprises the processing of the object signal that the contrast sound channel signal is littler, it can use object-based audio coding/decoding method to generate dynamic output signal.
To explain audio coding method according to an embodiment of the invention in detail referring to accompanying drawing 17-20 below.
In object-based audio coding method, object signal can be defined as representing independent sound, such as the mankind's voice or the sound of musical instrument.Optionally, sound with same characteristic features, all sound if any stringed musical instrument (violin for example, viola and violoncello), the sound that belongs to same frequency band, or can be grouped together according to the sound that direction and the angle of sound source is classified into identical category, and defined by identical object signal.Still optionally, can use the combination of said method to define object signal.
A plurality of object signal can be used as reduction audio signal and side information and are transmitted.Between the startup stage of the information that will be transmitted, to the reduction audio signal or the reduction audio signal a plurality of object signal each energy or power be carried out initial calculation with for detection of the reduction audio signal envelope.Result calculated can be used to the level ratio of connection object signal or reduction audio signal or calculating object signal.
Linear predictive coding (LPC) algorithm can be used to more low bit rate.Specifically, generate a plurality of LPC coefficients of envelope of expression signal by signal analysis, and these LPC coefficients will be transmitted to replace transmitting the envelope information about signal.This method is efficiently aspect bit rate.Yet the LPC parameter is variant with the actual envelope of signal probably, and this method needs extra processing, such as error recovery.In brief, the method that relates to the envelope information of transmitting signal can guarantee the high-quality of sound, but this needing to have caused the increase of information transmitted amount.On the other hand, relate to and use the method for LPC coefficient can reduce the information transmitted amount that needs, but need extra processing, such as error recovery, this will cause the reduction of sound quality.
According to one embodiment of present invention, can use the combination of these methods.In other words, can represent the envelope of signal as the LPC coefficient with the energy of signal or power or index value or corresponding to another value of energy or the power of signal.
Envelope information about signal can be that unit obtains with time period or frequency band.Specifically, referring to Figure 17, be that the unit obtains with the frame about the envelope information of signal.Optionally, if signal is represented by the band structure that uses the bank of filters of organizing such as quadrature mirror filter (QMF), envelope information about signal can be with frequency subband, the group of frequency subband, or the group that frequency subband is separated is that unit obtains, and it is than the frequency subband entity of fritter more that frequency subband is separated.Still optionally, based on the method for frame, the use of the combination of the method for separating based on the method for frequency subband with based on frequency subband is also within protection scope of the present invention.
Still optionally, the low frequency component of supposing signal has the high fdrequency component more information than signal, envelope information itself about the low frequency component of signal can be transmitted, yet, envelope information about the high fdrequency component of signal can be worth to represent by LPC coefficient or other, and transmission LPC coefficient or other are worth to replace the envelope information about the high fdrequency component of signal.But the low frequency component of signal not necessarily just has more information than the high fdrequency component of signal.Therefore need be according to the actual conditions said method of applying in a flexible way.
According to one embodiment of the invention, will be transmitted corresponding to envelope information or the index data of the part (hereinafter referred to as major part) of signal, the part of this signal is to show as major part at time/frequency axis.Optionally, the energy of the major part of expression signal and the value (for example LPC coefficient) of power can be transmitted, and do not transmit these values corresponding to the non-major part of signal.Still optionally, envelope information or index data corresponding to the major part of signal be can transmit, and the energy of the non-major part of representing signal or the value of power also can be transmitted.Still optionally, only transmit the information about the major part of signal, like this can be according to the non-major part of coming estimated signal about the information of the major part of signal.Still optionally, can use the combination of said method.
For instance, referring to Figure 18, if signal is divided into main period and non-main period, but transmit about the information usage flag of signal four kinds of diverse ways for (a)-(d).
In order to transmit a plurality of object signal of the combination that reduces audio signal and side information, as the part of decode operation, the reduction audio signal need be divided into a plurality of elements, for example, has considered the ratio of the level of object signal.For the independence between the element that guarantees to reduce audio signal, need extra execution decorrelation operation.
The sound channel signal that likens to the codec unit in the multichannel decoding method as the object signal of the codec unit in the object-based decoding method has more independence.In other words, sound channel signal comprises a plurality of object signal, so need be by decorrelation.In yet another aspect, be independently between the object signal, be easy to carry out channel separation so can use the feature of object signal and do not need the decorrelation operation.
Specifically, referring to Figure 19, object signal A, B and C are in turn as the main object on the frequency axis.In this case, do not need according to object signal A, the level ratio of B and C and will reduce audio signal and be divided into a plurality of signals does not need to carry out decorrelation yet.Instead, about object signal A, the information of the main period of B and C will be transmitted, or yield value is applied to each object signal A, on each frequency component of B and C, thereby skip decorrelation.Therefore, it can reduce calculated amount, and can reduce the required bit rate of the necessary side information of decorrelation.
In brief, in order to skip decorrelation, can be used as side information about the information of the frequency domain that comprises each object signal and be transmitted, this decorrelation is performed to guarantee by the independence of dividing according to the ratio of the object signal rate of reduction audio signal between a plurality of signals that the reduction audio signal obtains.Optionally, the different gains value be can use and main period and non-main period given, therefore each object signal shows as mainly in the main period, and each object signal shows as not too mainly in the non-main period, and the information about main period can mainly be provided as side information.Still optionally, can be used as side information about the information of main period and be transmitted, and do not transmit not information about non-main period.Still optionally, the combination as the said method that substitutes of decorrelation method can be used.
The said method that substitutes as the decorrelation method can be applied to all signal objects, or only is applied to the object signal that some has the obvious discernible major cycle.Same, can frame be that unit is employed as the said method that substitutes of decorrelation method.
Below will describe the coding of the object audio signal of using residual signals in detail.
In general, in object-based audio coding/decoding method, a plurality of object signal are encoded, and coding result is transmitted as the combination that reduces audio signal and side information.Then, from the reduction audio signal, recover a plurality of object signal according to side information by decoding, and the object signal after recovering for example, is generated final sound channel signal according to control information by suitable audio mixing in user's request.Object-based audio coding/decoding method generally is devoted to change the output channels signal freely according to control signal under the help of mixer.Yet no matter object-based audio coding/decoding method also can be used to generate according to the sound channel output of predefine mode control information.
For this reason, side information not only comprises the necessary information of a plurality of object signal of acquisition from the reduction audio signal, also comprises generating the necessary audio mixing parameter information of sound channel signal.Then, do not need the help of mixer just can generate final channel output signal.In this case, can use this residual error coding/decoding algorithm to improve sound quality.
Typical residual error coding/decoding method comprises the coding/decoding signal and the signal behind the coding/decoding and the mistake between the original signal is carried out coding/decoding, just residual signals.During decode operation, the signal behind the coding is decoded, the signal behind the while compensation coding and the mistake between the original signal, thus recover the signal identical as far as possible with original signal.Because the mistake between decoded signal and the original signal as a rule is inappreciable, it can reduce the amount of carrying out the necessary extraneous information of residual error coding/decoding.
If the output of the final sound channel of demoder has been determined, not only to be provided for generating the necessary audio mixing parameter information of final sound channel signal, also to provide residual coding information with as side information.In this case, it can improve sound quality.
Figure 20 is the block scheme of audio coding apparatus 310 according to an embodiment of the invention.With reference to Figure 20, audio coding apparatus 310 is characterised in that it has used residual signals.
Specifically, audio coding apparatus 310 comprises scrambler 311, demoder 313, the first mixer 315, the second mixers 319, totalizer 317 and bit stream makers 321.
First mixer 315 is carried out the audio mixing operation for original signal, and second mixer 319 is carried out passing through original signal is carried out the audio mixing operation of encoding operation and the resulting signal of decode operation.Residual signals between the signal of totalizer 317 calculating first mixer 315 outputs and the signal of second mixer, 319 outputs.Bit stream maker 321 joins residual signals in the side information, and the result of transmission after adding.Like this, it can improve sound quality.
The calculating of residual signals can be applied to all parts of signal, or only is applied to the low frequency part of signal.Optionally, the calculating of residual signals can be comprised based on frame in the main signal frequency-domain of frame by variable only being applied to.Still optionally, can use the combination of said method.
Because comprise that the amount of side information of residual signals information is bigger than the amount of the side information that does not comprise residual signals information, the calculating of residual signals can only be applied to signal those directly influence parts of sound quality, thereby prevent the growth that bit rate is too much.But the computer-readable code of the present invention's service recorder on computer-readable medium realized.This computer readable recording medium storing program for performing can be the pen recorder of any kind, and data are stored in computer-readable mode therein.The example of computer readable recording medium storing program for performing comprises ROM, RAM, CD-ROM, disk, floppy disk, optical data memories and the carrier wave data transmission of the Internet (for example by).Computer readable recording medium storing program for performing can be assigned with by a plurality of computer systems that are connected on the network, so computer-readable code is written into wherein, and is performed with non-centralized system.Common those skilled in the art can be easy to construct for realizing functional programs of the present invention, code and code segment.
Industrial applicibility
As mentioned above, according to the present invention, by benefiting from the advantage of object-based audio coding and coding/decoding method, the acoustic image of each object audio signal can be positioned.Then, it can provide more real sound by the reproduction of object audio signal.In addition, the present invention can be applied to interactive entertainment, and can provide more real pseudo-entity to experience to the user.
Although the present invention is described and illustrates with reference to its preferred embodiment, clearly those skilled in the art can make on the various ways and details on change, and do not break away from by the defined spirit of the present invention of following claim or category.

Claims (17)

1. audio-frequency decoding method, it comprises:
The second reduction audio signal by making up the first reduction audio signal of extracting at least and extract from second sound signal from first sound signal generates the 3rd reduction audio signal;
By making up the first object-based side information that from described first sound signal, extracts and the second object-based side information that from described second sound signal, extracts at least, generate the 3rd object-based side information;
Receiving control information;
Based on the described the 3rd object-based side information and described control information, generate parameter information;
By described control information being applied to the described the 3rd object-based side information, the described the 3rd object-based side information is transformed to side information based on sound channel;
By described parameter information being applied to described the 3rd reduction audio signal, with the described the 3rd reduction audio signal of reducing after audio signal is processed into processing;
Generate the multi-channel audio signal that described side information based on sound channel and the reduction audio signal after described processing produce,
Wherein, each in the described first, second, and third object-based side information comprises simple crosscorrelation information between object level difference information, object, reduction audio mixing gain information, reduction upmixed channels level difference information and absolute object energy information,
Wherein, described object-based side information comprises envelope information or the index data corresponding to the part of main object signal, with the value of expression corresponding to the linear forecast coding coefficient of the part of non-main object signal, described main object signal and described non-main object signal are included in the object signal of described the 3rd reduction audio signal.
2. audio-frequency decoding method according to claim 1, it further comprises: by using 3D information to described multi-channel audio signal, generate the multi-channel audio signal of having used virtual three-dimensional effect (3D) on it.
3. audio-frequency decoding method according to claim 2, wherein, described side information based on sound channel comprises described 3D information.
4. audio-frequency decoding method according to claim 2, wherein, described 3D information comprise for described based on the synchronous information of the side information of sound channel.
5. audio-frequency decoding method according to claim 2, wherein, described 3D information is based on control information and is selected from the 3D information database, described 3D information data library storage many 3D information.
6. audio-frequency decoding method according to claim 2, wherein, described 3D information comprises a related transfer function (HRTF).
7. audio-frequency decoding method according to claim 1, it further comprises: if described the 3rd reduction audio signal is stereo reduction audio signal, revise the sound channel signal of described the 3rd reduction audio signal.
8. audio-frequency decoding method according to claim 1, it further comprises: desired effects is applied to described multi-channel audio signal.
9. audio decoding apparatus, it comprises:
The multipoint control unit combiner, be used for by making up the first reduction audio signal of extracting from first sound signal and the second reduction audio signal of from second sound signal, extracting at least generating the 3rd reduction audio signal, and by making up the first object-based side information that from described first sound signal, extracts and the second object-based side information that from described second sound signal, extracts at least to generate the 3rd object-based side information;
The parameter converter, based on the described the 3rd object-based side information and control information, produce parameter information, and, by described control information being applied to the described the 3rd object-based side information, convert the described the 3rd object-based side information to based on sound channel side information;
Reduction audio mixing processor is by described parameter information being applied to described the 3rd reduction audio signal, with the described the 3rd reduction audio signal of reducing after audio signal is processed into processing;
Multi-channel decoder produces the multi-channel audio signal that side information and the reduction audio signal after described processing based on sound channel produce,
Wherein, each in the described first, second, and third object-based side information comprises simple crosscorrelation information between object level difference information, object, reduction audio mixing gain information, reduction upmixed channels level difference information and absolute object energy information,
Wherein, described object-based side information comprises envelope information or the index data corresponding to the part of main object signal, with the value of expression corresponding to the linear forecast coding coefficient of the part of non-main object signal, described main object signal and described non-main object signal are included in the object signal of described the 3rd reduction audio signal.
10. audio decoding apparatus according to claim 9, wherein, described multi-channel decoder is by being applied to 3D information described multi-channel audio signal to generate the multi-channel audio signal of having used virtual 3D effect on it.
11. audio decoding apparatus according to claim 10, wherein, described side information based on sound channel comprises described 3D information.
12. audio decoding apparatus according to claim 10, wherein, described 3D information comprise for described based on the synchronous information of the side information of sound channel.
13. audio decoding apparatus according to claim 12, wherein, described 3D information is selected from the 3D information database based on control information.
14. audio decoding apparatus according to claim 13, wherein, described 3D information data library storage many 3D information.
15. audio decoding apparatus according to claim 10, wherein, described 3D information comprises HRTF.
16. audio decoding apparatus according to claim 9, wherein, if described the 3rd reduction audio signal is the stereo downmix signal, the sound channel signal of described the 3rd reduction audio signal is modified.
17. audio decoding apparatus according to claim 9, it further comprises the sound channel processor, is used for desired effects is applied to described multi-channel audio signal.
CN2007800238696A 2006-09-29 2007-10-01 Method for encoding and decoding object-based audio signal and apparatus thereof Active CN101479785B (en)

Applications Claiming Priority (15)

Application Number Priority Date Filing Date Title
US84829306P 2006-09-29 2006-09-29
US60/848,293 2006-09-29
US82980006P 2006-10-17 2006-10-17
US60/829,800 2006-10-17
US86330306P 2006-10-27 2006-10-27
US60/863,303 2006-10-27
US86082306P 2006-11-24 2006-11-24
US60/860,823 2006-11-24
US88071407P 2007-01-17 2007-01-17
US60/880,714 2007-01-17
US88094207P 2007-01-18 2007-01-18
US60/880,942 2007-01-18
US94837307P 2007-07-06 2007-07-06
US60/948,373 2007-07-06
PCT/KR2007/004797 WO2008039039A1 (en) 2006-09-29 2007-10-01 Methods and apparatuses for encoding and decoding object-based audio signals

Publications (2)

Publication Number Publication Date
CN101479785A CN101479785A (en) 2009-07-08
CN101479785B true CN101479785B (en) 2013-08-07

Family

ID=40839594

Family Applications (4)

Application Number Title Priority Date Filing Date
CN2007800241203A Active CN101484935B (en) 2006-09-29 2007-10-01 Methods and apparatuses for encoding and decoding object-based audio signals
CN2007800238696A Active CN101479785B (en) 2006-09-29 2007-10-01 Method for encoding and decoding object-based audio signal and apparatus thereof
CN2007800242526A Active CN101479787B (en) 2006-09-29 2007-10-01 Method for encoding and decoding object-based audio signal and apparatus thereof
CN2007800242333A Active CN101479786B (en) 2006-09-29 2007-10-01 Method for encoding and decoding object-based audio signal and apparatus thereof

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN2007800241203A Active CN101484935B (en) 2006-09-29 2007-10-01 Methods and apparatuses for encoding and decoding object-based audio signals

Family Applications After (2)

Application Number Title Priority Date Filing Date
CN2007800242526A Active CN101479787B (en) 2006-09-29 2007-10-01 Method for encoding and decoding object-based audio signal and apparatus thereof
CN2007800242333A Active CN101479786B (en) 2006-09-29 2007-10-01 Method for encoding and decoding object-based audio signal and apparatus thereof

Country Status (1)

Country Link
CN (4) CN101484935B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101710113B1 (en) * 2009-10-23 2017-02-27 삼성전자주식회사 Apparatus and method for encoding/decoding using phase information and residual signal
CN105047206B (en) 2010-01-06 2018-04-27 Lg电子株式会社 Handle the device and method thereof of audio signal
EP2751803B1 (en) * 2011-11-01 2015-09-16 Koninklijke Philips N.V. Audio object encoding and decoding
US9552818B2 (en) * 2012-06-14 2017-01-24 Dolby International Ab Smooth configuration switching for multichannel audio rendering based on a variable number of received channels
RU2649944C2 (en) 2012-07-02 2018-04-05 Сони Корпорейшн Decoding device, decoding method, coding device, coding method and program
RU2652468C2 (en) 2012-07-02 2018-04-26 Сони Корпорейшн Decoding device, decoding method, encoding device, encoding method and program
KR20150032651A (en) * 2012-07-02 2015-03-27 소니 주식회사 Decoding device and method, encoding device and method, and program
TWI517142B (en) 2012-07-02 2016-01-11 Sony Corp Audio decoding apparatus and method, audio coding apparatus and method, and program
US9479886B2 (en) * 2012-07-20 2016-10-25 Qualcomm Incorporated Scalable downmix design with feedback for object-based surround codec
US9761229B2 (en) 2012-07-20 2017-09-12 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for audio object clustering
AU2013301864B2 (en) * 2012-08-10 2016-04-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and methods for adapting audio information in spatial audio object coding
CA3013766C (en) * 2013-01-29 2020-11-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Decoder for generating a frequency enhanced audio signal, method of decoding, encoder for generating an encoded signal and method of encoding using compact selection side information
TWI530941B (en) 2013-04-03 2016-04-21 杜比實驗室特許公司 Methods and systems for interactive rendering of object based audio
JP6388939B2 (en) 2013-07-31 2018-09-12 ドルビー ラボラトリーズ ライセンシング コーポレイション Handling spatially spread or large audio objects
JP6321181B2 (en) 2013-09-12 2018-05-09 ドルビー ラボラトリーズ ライセンシング コーポレイション System side of audio codec
US9911423B2 (en) * 2014-01-13 2018-03-06 Nokia Technologies Oy Multi-channel audio signal classifier
PT3136384T (en) * 2014-04-25 2019-04-22 Ntt Docomo Inc Linear prediction coefficient conversion device and linear prediction coefficient conversion method
CN104036788B (en) * 2014-05-29 2016-10-05 北京音之邦文化科技有限公司 The acoustic fidelity identification method of audio file and device
US20160104263A1 (en) * 2014-10-09 2016-04-14 Media Tek Inc. Method And Apparatus Of Latency Profiling Mechanism
RU2678136C1 (en) * 2015-02-02 2019-01-23 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Device and method for processing encoded audio signal
EP3258467B1 (en) * 2015-02-10 2019-09-18 Sony Corporation Transmission and reception of audio streams
BR122022019910B1 (en) * 2015-06-24 2024-03-12 Sony Corporation AUDIO PROCESSING APPARATUS AND METHOD, AND COMPUTER READABLE NON-TRAINER STORAGE MEDIUM
WO2018086947A1 (en) * 2016-11-08 2018-05-17 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for encoding or decoding a multichannel signal using a side gain and a residual gain
GB201808897D0 (en) * 2018-05-31 2018-07-18 Nokia Technologies Oy Spatial audio parameters
CN111292725B (en) * 2020-02-28 2022-11-25 北京声智科技有限公司 Voice decoding method and device
CN112351379B (en) * 2020-10-28 2021-07-30 歌尔光学科技有限公司 Control method of audio component and intelligent head-mounted device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1503572A (en) * 2002-11-21 2004-06-09 Progressive to lossless embedded audio coder (PLEAC) with multiple factorization reversible transform
CN1647155A (en) * 2002-04-22 2005-07-27 皇家飞利浦电子股份有限公司 Parametric representation of spatial audio

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100682904B1 (en) * 2004-12-01 2007-02-15 삼성전자주식회사 Apparatus and method for processing multichannel audio signal using space information

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1647155A (en) * 2002-04-22 2005-07-27 皇家飞利浦电子股份有限公司 Parametric representation of spatial audio
CN1503572A (en) * 2002-11-21 2004-06-09 Progressive to lossless embedded audio coder (PLEAC) with multiple factorization reversible transform

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ITU-T.Concepts of Object-Oriented Spatial Audio Coding.《Concepts of Object-Oriented Spatial Audio Coding》.2006, *
J.Breebaart et.al.MPEG Spatial Audio Coding/MPEG Surround:Overview and Current Status.《AES 119th Convention》.2005,第1-17页. *
LARS VILLEMOES et.al.MPEG SURROUND:THE FORTHCOMING ISO STANDARD FOR SPATIAL AUDIO CODING.《AES 28TH International Conference》.2006, *

Also Published As

Publication number Publication date
CN101479786B (en) 2012-10-17
CN101479787A (en) 2009-07-08
CN101479786A (en) 2009-07-08
CN101479785A (en) 2009-07-08
CN101484935A (en) 2009-07-15
CN101479787B (en) 2012-12-26
CN101484935B (en) 2013-07-17

Similar Documents

Publication Publication Date Title
CN101479785B (en) Method for encoding and decoding object-based audio signal and apparatus thereof
US11343631B2 (en) Compatible multi-channel coding/decoding
CN101542595B (en) For the method and apparatus of the object-based sound signal of Code And Decode
CN102595303B (en) Code conversion equipment and method and the method for decoding multi-object audio signal
CN101044550B (en) Device and method for generating a coded multi-channel signal and device and method for decoding a coded multi-channel signal
JP4601669B2 (en) Apparatus and method for generating a multi-channel signal or parameter data set
KR101065704B1 (en) Methods and apparatuses for encoding and decoding object-based audio signals
JP5269039B2 (en) Audio encoding and decoding
RU2406166C2 (en) Coding and decoding methods and devices based on objects of oriented audio signals
CN104681030A (en) Apparatus and method for encoding/decoding signal
CN104428835A (en) Encoding and decoding of audio signals
CN101385077A (en) Apparatus and method for encoding/decoding signal
JP5173811B2 (en) Audio signal decoding method and apparatus
KR100763920B1 (en) Method and apparatus for decoding input signal which encoding multi-channel to mono or stereo signal to 2 channel binaural signal
RU2455708C2 (en) Methods and devices for coding and decoding object-oriented audio signals
CN101930738B (en) Multi-track audio signal decoding method and device
CN101385078A (en) Method for encoding and decoding object-based audio signal and apparatus thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant