CN101263742B

CN101263742B - Audio coding

Info

Publication number: CN101263742B
Application number: CN200680033690.4A
Authority: CN
Inventors: D·J·布里巴特
Original assignee: Koninklijke Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 2005-09-13
Filing date: 2006-08-31
Publication date: 2014-12-17
Anticipated expiration: 2026-08-31
Also published as: RU2008114359A; US8654983B2; TW200721111A; JP2012181556A; EP1927266B1; EP1927266A1; KR101512995B1; JP2009508157A; JP5698189B2; WO2007031896A1; TWI415111B; JP5587551B2; BRPI0615899B1; KR20150008932A; KR20080047446A; CN101263742A; BRPI0615899A2; US20080205658A1; RU2419249C2; KR101562379B1

Abstract

A spatial decoder unit (23) is arranged for transforming one or more audio channels (s; 1, r) into a pair of binaural output channels (Ib, rb). The device comprises a parameter conversion unit (234) for converting the spatial parameters (sp) into binaural parameters (bp) containing binaural information. The device additionally comprises a spatial synthesis unit (232) for transforming the audio channels (L, R) into a pair of binaural signals (Lb, Rb) while using the binaural parameters (bp). The spatial synthesis unit (232) preferably operates in a transform domain, such as the QMF domain.

Description

Audio coding

The present invention relates to audio coding.The invention particularly relates to for audio input signal being converted to the equipment and method that binaural sound (binaural) outputs signal, wherein this input signal comprises at least one audio track and represents the parameter of additional auditory channel.

Record and reproduce binaural sound audio signal, namely comprising the audio signal of the responsive certain party tropism information of people's ear, is well-known.Binaural sound record utilizes two microphones be arranged in manikin head to carry out usually, makes the sound that recorded sound captures corresponding to people's ear, and comprises any impact because head and ear shape cause.The difference that binaural sound record and stereo (i.e. stereo system) record is, reproducing binaural sound record needs headphone (headset), and stereo record is loudspeaker reproduction and providing.Although binaural sound record allows only to utilize two sound channels to reproduce all spatial informations, stereo record can not provide same spatial perception.

Conventional dual track (stereo system) or multichannel (such as 5.1) record are transformed to binaural sound record by utilizing one group of perception transfer function to carry out convolution to each normal signal.These perception transfer functions carry out modeling on the head of people and other possible positions to the impact of signal.One class well-known perception transfer function is so-called head related transfer functions (Head-Related Transfer Function, HRTF).A kind of interchangeable perception transfer function type is binaural sound room impulse response (BRIR), and it also considers the reflection caused by room wall, ceiling and floor.

When multi-channel signal, utilize one group of perception function that signal is transformed to binaural sound tracer signal, usually mean and the signal of perception function and all sound channels is carried out convolution.Because common convolution has higher requirements to calculating, therefore signal and HRTF are converted to frequently (Fourier) territory usually, utilize in a frequency domain the multiplication of the requirement much less calculated to replace convolution.

By the sound channel by the Parametric Representation original amount of relation between the sound channel of smaller amounts and instruction original channel, it is also well-known for reducing the audio track quantity that will send or store.By single (monophone) sound channel, one group of stereophonic signal can add that multiple spatial parameter be associated represents thus, and one group of 5.1 signal can represent with two sound channels and one group of spatial parameter be associated, or even add that the spatial parameter be associated represents by single sound channel.This in spatial encoder to " lower mixing (downmixing) " that multiple audio track carries out, with corresponding " above the mixing (upmixing) " to audio signal in spatial encoder, usually perform in transform domain or subband domain, such as, perform in QMF (quadrature mirror filter) territory.

Patent Cooperation Treaty application WO2004/028204 discloses and utilizes head related transfer functions to generate the system of binaural signals; Document " The Refer ence Model Architecturefor MPEG Spatial Audio Coding "; Herre et al; Audio EngineeringSociety Convention Paper, New York; 28May 2005; XP009059973 discloses MPEG reference model structure; Document " Synthesis Ambiance in ParametricStereo Coding ", Endegard et al; Preprints of papers presentedat AES Convention, 8May 2004, pages 1-12, XP008048096 discloses the example of parameter stereo coding; And document " MP 3Surround:Efficient andCompatible Coding of Multi-Channel Audio "; Herre et al.AudioEngineering Society, Conyention Preprint, 8 May 2004; XP002338414 discloses the example of MP3 around coding.

Instantly when the input sound channel mixed will be converted into binaural output channels, method of the prior art first utilizes spatial decoder to carry out upper mixing to input sound channel, mixed intermediate channel in generation, is then converted to binaural channels by intermediate channel mixed on these.This process produces five or six intermediate channel usually, and then these intermediate channel must be reduced to two binaural channels.First expand number of channels and then the way that reduces number of channels is obviously efficient, and add computation complexity.In addition, five or six intermediate channel being intended for use multi-channel loudspeaker reproduction are reduced to only two sound channels being intended for use binaural sound and reproducing, inevitably introduce man made noise and therefore reduce sound quality.

QMF territory above-mentioned is similar but not identical with frequency (Fourier) territory.If spatial decoder to produce binaural sound output signal, then first lower mixed audio signal must be converted to QMF territory for upper mixing, then converted by inverse QMF to produce time domain M signal, next frequency domain is converted to for being multiplied with (through Fourier transform) HRTF, being finally inversely transformed to produce time domain output signal.Will be appreciated that, owing to must carry out several conversion continuously, this process is not efficient.

Amount of calculation involved in the prior art method will make to be difficult to design the hand-held consumer devices that can be produced binaural sound output signal by lower mixed audio signal, such as portable MP 3 player.Even if this equipment can realize, due to required computation burden, its battery life also can be very short.

The object of the invention is to overcome these and other problem of prior art, and the spatial decoder unit that can produce a pair binaural output channels from mixed audio track a group is provided, under wherein said one group, mixed audio track is represented by an audio input channels and the spatial parameter group be associated, and this decoder has the efficiency of raising.

Therefore, the invention provides and utilize spatial parameter and single audio input channels, produce the spatial decoder unit of a pair binaural output channels, this equipment comprises: parameter conversion unit, it utilizes parameterized perceptual transfer function, spatial parameter is converted to binaural parameters, and this binaural parameters depends on spatial parameter and parameterized perceptual transfer function; Single converter unit, single audio input channels is transformed to the audio track after conversion by it; Decorrelation unit, it carries out decorrelation to the audio track after conversion, generates the decorrelated signals after conversion; And spatial synthesis unit, its by binaural parameters is applied to the audio track after conversion (S) and conversion after decorrelated signals synthesize the binaural channels after a pair conversion; With a pair inverse transformation block, the binaural channels after conversion is inversely transformed into a pair binaural output channels by it.

By spatial parameter is converted to binaural parameters, spatial synthesis unit can directly synthesize a pair binaural channels, and does not need the binaural sound synthesis unit that adds.Owing to not producing unnecessary M signal, the requirement in calculating reduces, and also essentially eliminates the introducing of man made noise simultaneously.

In spatial decoder unit of the present invention, the synthesis of binaural channels can be performed in the transform domain in such as QMF territory, and without the need to transform to frequency domain and next inverse transformation to the additional step of time domain.Owing to can omit two shift step, therefore amount of calculation and storage demand reduce all greatly.Therefore spatial decoder unit of the present invention can relatively easily realize in portable consumer device.

Further, in spatial decoder unit of the present invention, binaural channels directly produces from lower mixed sound channel, and each binaural channels comprises the binaural signals for utilizing headphone or like device to carry out binaural sound reproduction.Parameter conversion unit derives binaural parameters, and it is for producing binaural channels from space (namely mixing) parameter.The derivation of this binaural parameters relates to parameterized perceptual transfer function, such as HRTF (head related transfer functions) and/or binaural sound room impulse response (BRIR).Therefore according to the present invention, the process of perception transfer function is carried out in parameter field, and in the prior art, this process is carried out in a time domain or in a frequency domain.Because the resolution in parameter field is usually less than the resolution in time domain or frequency domain, this will make computation complexity reduce further.

Preferably, parameter conversion unit is configured to, and in order to determine binaural parameters, combines in parameter field by inputting (lower mixing) audio track to all perception transfer function contributions that binaural channels will be made.In other words, spatial parameter and parameterized perceptual transfer function combine by this way, and the parameter namely after combination causes producing having and mix the binaural sound that the binaural sound obtained in the prior art of M signal afterwards outputs signal similar statistical property with on relating to and output signal.

In a preferred embodiment, spatial decoder unit of the present invention comprises the one or more converter unit for audio input channels being transformed to the audio input channels after conversion further, with a pair for the binaural channels after synthesis being inversely transformed into the inverse transformation block of a pair binaural output channels, wherein spatial synthesis unit is configured to work in transform domain or subband domain, preferably in QMF territory.

Spatial decoder unit of the present invention can comprise two converter units, and parameter conversion unit is configured to adopt the perception transfer function parameters only relating to three sound channels, two in these three sound channels contributions incorporating the front and back sound channel of synthesis.In this embodiment, parameter conversion unit can be configured to process levels of channels (such as CLD), channel coherence (such as ICC), Channel Prediction (such as CPC) and/or phase place (such as IPD) parameter.

Spatial decoder unit of the present invention only includes single converter unit, and comprises decorrelation unit further, carries out decorrelation for the single sound channel after the conversion that exports single converter unit.In this embodiment, parameter conversion unit can be configured to process levels of channels (such as CLD), channel coherence (such as ICC) and/or phase place (such as IPD) parameter.

Spatial decoder unit of the present invention additionally can comprise stereo reverberation unit.This stereo reverberation unit can be configured to work in time domain or transform domain or subband (such as QMF) territory.

The present invention is also provided for the spatial decoder device producing a pair binaural output channels from incoming bit stream, it is the demultiplexer unit of single lower mixed layer sound channel and signal parameter, the lower hybrid decoder unit of this single lower mixed layer sound channel of decoding that this equipment comprises incoming bit stream DeMux, and utilizes this spatial parameter and single lower mixed layer sound channel to produce the spatial decoder unit of a pair binaural output channels.

In addition, the invention provides the consumer device comprising spatial decoder unit as defined above and/or spatial decoder device and audio system.The present invention further provides the method utilizing spatial parameter and single audio input channels to produce a pair binaural output channels, the method comprises the steps: to utilize parameterized perceptual transfer function that spatial parameter is converted to binaural parameters, and this binaural parameters depends on spatial parameter and parameterized perceptual transfer function; Single audio input channels is transformed to the audio track after conversion; Decorrelation is carried out to the audio track after conversion, generates the decorrelated signals after conversion; Decorrelated signals with by being applied to by binaural parameters after the audio track after conversion (S) and conversion, synthesizes the binaural channels after a pair conversion, and the binaural channels after conversion is inversely transformed into a pair binaural output channels.Further aspect according to the inventive method becomes obvious by by description below.

The present invention additionally provides the computer program performing method as defined above.Computer program can comprise the one group of computer executable instructions be stored in such as CD or DVD and so on data medium.This group computer executable instructions allowing programmable calculator to perform method as defined above such as can also be downloaded from remote server by the Internet and obtain.

Below in conjunction with the exemplary embodiment shown in accompanying drawing, the present invention is further explained, wherein:

Fig. 1 schematically shows the application of the head related transfer functions according to prior art.

Fig. 2 schematically shows the spatial audio coding device equipment according to prior art.

Fig. 3 schematically shows according to prior art, is coupled to the spatial audio decoders equipment of binaural sound synthesis device.

Fig. 4 schematically shows the spatial audio decoders unit according to prior art.

Fig. 5 schematically shows the example of spatial audio decoders unit.

Fig. 6 schematically shows according to spatial audio decoders equipment of the present invention.

Fig. 7 schematically shows the spatial audio decoders unit of Fig. 6, and it provides transform domain reverberation unit.

Fig. 8 schematically shows the spatial audio decoders unit of Fig. 6, and it provides time domain reverberation unit.

Fig. 9 schematically shows according to the present invention, provides the consumer device of space audio decoding device.

Fig. 1 schematically shows according to prior art, the application of the perception transfer function of such as head related transfer functions.Shown binaural sound synthesis device 3 comprises six HRTF unit 31, each transfer function comprised for input sound channel and output channels particular combination.In the example shown, have three audio input channels ch1, ch2 and ch3, its correspond to sound channel l (left side), c (in) and r (right side).First sound channel ch1 is fed to two the HRTF unit 31 comprising HRTF (1, L) and HRTF (1, R) respectively.In this example embodiment, HRTF (1, L) determines the head related transfer functions of the first sound channel to the contribution of left binaural signals.

Those skilled in the art will know that by carrying out routine (stereo) record and binaural sound record, and derivation represent that binaural sound record determines HRTF relative to the transfer function of the shape of routine record.Binaural sound record utilizes two microphones being arranged on manikin head to make, the sound making recorded sound correspond to people's ear to catch, and comprise any impact that the existence due to head and ear shape or even hair and shoulder causes.

If HRTF process occurs in time domain, HRTF and sound channel (time domain) audio signal are carried out convolution.But usually, HRTF is converted to frequency domain, and be then multiplied the transfer function obtained as a result with audio signal frequency spectrum (the not shown Fourier transformation unit of Fig. 1 and inverse Fourier transform unit).Suitable overlap-add (OLA) technology relating to frame overlapping time can be used to adapt to the HRTF that length is greater than fast fourier transform (FFT) frame.

After carrying out HRTF process by suitable HRTF unit 31, the left and right signal obtained as a result is added by respective adder 32, generates (time domain) left binaural signals lb and right binaural signals rb.

In Fig. 1, the binaural sound synthesis device 3 of exemplary prior art has three input sound channels.Current audio system has five or six sound channels usually, as the situation in so-called 5.1 systems.But, in order to reduce the data volume that will be sent out and/or store, usually multiple audio track is reduced (" lower mixing ") to one or two sound channel.Multiple signal parameters of instruction original channel characteristic and correlation allow one or two sound channel to be expanded (" upper mixing ") sound channel to original amount.Fig. 2 is illustrated schematically according to encoder device 1 between the exemplary space of prior art.

Spatial encoder equipment 1 comprises space encoding (SE) unit 11, lower hybrid coding (DE) unit 12 and multiplexer (Mux) 13.Space encoding unit 11 receives five audio input channels lf (left front), lr (left back), rf (right front), rr (right back) and c (centre).Space encoding unit 11 times mixing five input sound channels, produce two sound channel l (left side) and r (right side), and signal parameter sp (attention, space encoding unit 11 can produce single sound channel and replace two sound channel l and r).In the embodiment shown, wherein five sound channels by under be mixed into two sound channels (so-called 5-2-5 configuration), signal parameter sp such as comprises:

Note, " left side " is optional low frequency (subwoofer) sound channel, and " afterwards " sound channel also can be referred to as " around " sound channel.

Two that are produced by space encoding unit 11 lower mixed layer sound channel l and r are fed to lower hybrid coding (DE) unit 12, and it uses the class coding being intended to reduce data volume usually.Lower mixed layer sound channel l and r after such coding and signal parameter sp is multiplexed unit 13 and carries out multiplexed, to produce output bit flow bs.

In alternative embodiment (not shown), five (or six) sound channels by under be mixed into single (monophone) sound channel (so-called 5-1-5 configuration), and signal parameter sp can such as comprise:

In this alternative embodiment, the lower mixed layer sound channel s after coding and signal parameter sp is also multiplexed device unit 13 and carries out multiplexed, produces output bit flow bs.

If this bit stream bs will be used to generation a pair binaural channels, then the method for prior art will first to two lower mixed layer sound channel l and r (or alternatively, single lower mixed layer sound channel) carry out upper mixing, produce five or six original channel, then these five or six original channel are converted to two binaural channels.The example of the prior art method is shown in Fig. 3.

Demultiplexer (Demux) unit 21 ', lower hybrid decoding unit 22 ' and spatial decoder unit 23 ' is comprised according to the spatial decoder device 2 ' of prior art.Binaural sound synthesis device 3 is coupled to the spatial decoder unit 23 ' of spatial decoder device 2 '.

Demultiplexer unit 21 ' receives bit stream bs that can be identical with the bit stream bs of Fig. 2, and outputs signal the lower mixed layer sound channel of parameter s p and two coding.Signal parameter sp is sent to spatial decoder unit 23 ', and first the lower mixed layer sound channel of coding is decoded by lower hybrid decoding unit 22 ', produces lower mixed layer sound channel l and r of decoding.Spatial decoder unit 23 ' performs the inverse operation of space coding unit 11 in Fig. 2 in essence, and exports five audio tracks.These five audio tracks are fed to binaural sound synthesis device 3, and it can have the structure similar to the equipment 3 of Fig. 1, but also have additional HRTF unit 31, to adapt to all five sound channels.As the example in Fig. 1, binaural sound synthesis device 3 exports two binaural channels lb (left binaural sound) and rb (right binaural sound).

The example arrangement of the spatial decoder unit 23 ' of prior art is shown in Fig. 4.The unit 23 ' of Fig. 4 comprises mixed cell 230 ' on two to three, three space combination (SS) unit 232 ' and three decorrelation (D) unit 239 '.On two to three, mixed cell 230 ' receives lower mixed layer sound channel 1 and r and signal parameter sp, and produces three sound channel l, r and ce.Each of these sound channels is fed to decorrelation unit 239 ', and it produces version after the decorrelation of respective sound channel.Each of sound channel l, r and ce, its respective decorrelation version and the signal parameter sp be associated are fed to respective space combination (or upper mixing) unit 232 '.The spatial synthesis unit 232 ' receiving sound channel l such as exports output channels lf (left front) and lr (left back).Spatial synthesis unit 232 ' implements matrix multiplication usually, and matrix parameter is determined by signal parameter sp.

Note, in the example in fig. 4, produce six output channels.In certain embodiments, the 3rd decorrelation unit 239 ' and the 3rd spatial synthesis unit 232 ' can omit, and only produce five output channels thus.But in all embodiments, the spatial synthesis unit 23 ' of prior art will produce the output channels more than two.Further attention, in order to get across, any (QMF) converter unit and inverse (QMF) converter unit omit from the only schematic example of Fig. 4.In an actual embodiment, space decoding will perform in the transform domain in such as QMF territory.

Configuration in Fig. 3 is not efficient.Spatial decoder device 2 ' by two lower mixed layer sound channels, (l and r) be converted to five upper mixed (centre) sound channels, then go up mixed sound channels by five and be reduced to two binaural channels by binaural sound synthesis device 3.In addition, the upper mixing in spatial decoder unit 23 ' normally performs in the subband domain in such as QMF (orthogonal mirror image filtering) territory.But binaural sound synthesis device 3 is processing signals in frequency (namely Fourier transform) territory usually.Due to these two territory differences, first the signal of lower mixed layer sound channel is transformed to QMF territory by spatial decoder device 2 ', and the signal after process conversion, then switches back to time domain by upper mixed signal.Next, binaural sound synthesis device 3 will own (five in this example) on these, and mixed signal transforms to frequency domain, and the signal after process conversion, then switches back to time domain by binaural signals.Will be clear, involved amount of calculation is quite large, ites is desirable to carry out more efficient signal transacting, particularly when this process performs in the handheld device.

The present invention by being integrated in spatial decoder device by binaural sound synthesis device, and effectively performs binaural sound process to provide much efficient process in parameter.The example of spatial decoder unit is illustrated schematically in Fig. 5, and space combined according to the invention and binaural sound decoder apparatus (being called spatial decoder device in order to easy) are shown in Fig. 6.

Spatial decoder unit 23 shown in Figure 5 comprises converter unit 231, space combination (SS) unit 232, inverse transformation block 233, Parameter Switch (PC) unit 234 and stores (Mem) unit 235.In the example of fig. 5, spatial decoder unit 23 comprises two converter units 231, but only have single converter unit 231 (as shown in Figure 6) or plural converter unit 231 to occur in other example, this depends on the quantity of lower mixed layer sound channel.

Each of converter unit 231 receives lower mixed layer sound channel l and r (also see Fig. 3) respectively.Each converter unit 231 is configured to each sound channel (signal) to transform to suitable conversion or subband domain, is QMF territory in present example.Sound channel L after QMF conversion and R is fed to spatial synthesis unit 232, and it preferably performs matrix manipulation to the signal of sound channel L and R, thus produces transform domain binaural channels Lb and Rb.Inverse transformation block 233 performs inverse transformation, is inverse QMF conversion in the present example, thus produces binaural sound time domain sound channel lb and rb.

Spatial synthesis unit 232 can be similar or identical with the spatial synthesis unit 232 ' of prior art in Fig. 4.But the parameter that this unit uses is different from the parameter used in prior art.More particularly, parameter conversion unit 234 utilizes the HRTF parameter hp be stored in memory cell 235, and the spatial parameter sp of routine is converted to binaural parameters bp.These HRTF parameter hp can comprise:

Each frequency band is for the average level of left transfer function, and wherein said left transfer function is the function of azimuth (angle in horizontal plane), the elevation angle (angle in vertical plane) and distance,

Each frequency band is for the average level of right transfer function, and wherein said right transfer function is the function of azimuth, the elevation angle and distance, and

The average phase of each frequency band or time difference, it is the function of azimuth, the elevation angle and distance.

In addition, also following parameter can be comprised:

The coherence of each HRTF frequency band left and right transfer function estimates, and it is the function of azimuth, the elevation angle and distance, and/or

For absolute phase and/or the time parameter of left and right transfer function, it is the function of azimuth, the elevation angle and distance.

The actual HRTF parameter used will depend on specific embodiment.

Spatial synthesis unit 232 can utilize following formula to determine binaural channels Lb and Rb:

[\begin{matrix} Lb [k, m] \\ Rb [k, m] \end{matrix}] = H_{k} [\begin{matrix} L [k, m] \\ R [k, m] \end{matrix}] = {[\begin{matrix} h_{11} & h_{12} \\ h_{21} & h_{22} \end{matrix}]}_{k} [\begin{matrix} L [k, m] \\ R [k, m] \end{matrix}] - - - (1)

Wherein index k represent QMF mix (frequently) tape index, index m represent QMF (time) gap index.Matrix H _kparameter h _ijdetermined by binaural parameters (bp in Fig. 5).Indicated by index k, matrix H _kcan be depending on QMF mixed zone.In one example in which, parameter conversion unit (234 in Fig. 5) produces binaural parameters, and then this parameter is converted to matrix parameter h by spatial synthesis unit 232 _ij.In another example, matrix parameter h _ijidentical with the binaural parameters that parameter conversion unit (234 in Fig. 5) produces, and directly can be applied when not changing by spatial synthesis unit 232.

When two lower mixed layer sound channels (5-2-5 configuration), matrix H _kparameter h _ijcan determine as follows.In the prior art, spatial decoder unit in Fig. 4, mixed layer sound channel l and r under two (inputs) is converted to three (output) sound channel l, r and ce and (will be appreciated that by 2 to 3 decoder elements 230 ', output channels l with r usually can not be identical with input sound channel l with r, and input sound channel will use l in the following discussion for this reason ₀and r ₀mark.

According to an example, parameter conversion unit (in Fig. 5 and 6 234) is configured to utilize only considers three sound channels (such as l, r and c) the perception transfer function parameters contributed, in these three sound channels two (such as l and r) comprise compound each before (lf, and rear (lr, rr) sound channel rf).That is, each front and rear sound channel is grouped into together to improve efficiency.

The operation of mixed cell 230 ' on two to three can be described with following matrix manipulation:

[\begin{matrix} l \\ r \\ c \end{matrix}] = [\begin{matrix} m_{11} & m_{12} \\ m_{21} & m_{22} \\ m_{31} & m_{32} \end{matrix}] [\begin{matrix} l_{0} \\ r_{0} \end{matrix}] - - - (2)

Matrix project m _ijdepend on spatial parameter.Identical with matrix entries object relation and 5.1MPEG surround decoder device of spatial parameter.Concerning three obtain signal l, r and c each, the effect of perception transfer function (being HRTF in this example) parameter of (perception) position expected corresponding to these sound sources is determined.Concerning center channel (c), the spatial parameter of sound source position can directly be applied, and obtains two output signal l for center _b(c) and r _b(c):

[\begin{matrix} l_{B} (c) \\ r_{B} (c) \end{matrix}] = [\begin{matrix} H_{l} (c) \\ H_{r} (c) \end{matrix}] c = [\begin{matrix} P_{l} (c) e^{+ jφ (c) / 2} \\ P_{r} (c) e^{- jφ (c) / 2} \end{matrix}] c - - - (3)

As can be seen from equation (3), HRTF parameter processing comprises signal and the average power level P corresponding to center channel sound source position _land P _rbe multiplied, and phase difference is symmetrical.This process is independently carried out each QMF band, utilizes from HRTF parameter to the mapping of QMF bank of filters on the one hand, utilizes the mapping of band from spatial parameter to QMF on the other hand.

For left (l) sound channel, utilize weight w _lfand w _rf, will be single contribution from left front and HRTF parameter combinations that is left subsequent channel.The complex parameter obtained simulates the impact of front sound channel and rear sound channel in statistical significance.Use following equation to generate the binaural sound for L channel to export (l _b, r _b):

[\begin{matrix} l_{b} (l) \\ r_{b} (l) \end{matrix}] = [\begin{matrix} H_{l} (l) \\ H_{r} (l) \end{matrix}] l - - - (4)

Wherein

H_{l} (l) = \sqrt{w_{lf}^{2} P_{l}^{2} (lf) + w_{lr}^{2} P_{l}^{2} (lr)} - - - (5)

With

Weight w _lfand w _rfdepend on the CLD parameter of Unit 1 to 2 for lf and lr:

w_{lf}^{2} = \frac{10^{{CLD}_{l} / 10}}{1 + 10^{{CLD}_{l} / 10}} - - - (7),

w_{lr}^{2} = \frac{1}{1 + 10^{{CLD}_{l} / 10}} - - - (8)

In a similar manner, the binaural sound obtained for R channel according to following equation exports:

[\begin{matrix} L_{b} (r) \\ R_{b} (r) \end{matrix}] = [\begin{matrix} H_{l} (r) \\ H_{r} (r) \end{matrix}] r - - - (9),

Wherein

H_{r} (r) = \sqrt{w_{rf}^{2} P_{r}^{2} (rf) + w_{rr}^{2} P_{r}^{2} (rr)} - - - (11)

w_{rf}^{2} = \frac{10^{{CLD}_{r} / 10}}{1 + 10^{{CLD}_{r} / 10}} - - - (12)

w_{rr}^{2} = \frac{1}{1 + 10^{{CLD}_{r} / 10}} - - - (13) .

Notice, in both cases, phase modification item is applied to pick up the ears (contra-lateralear).Further, because the auditory system of people is very insensitive in the binaural sound phase place of about more than 2kHz for frequency, therefore phase modification item only needs to be applied in low frequency region.Therefore for remaining frequency range, real-valued process is just enough (supposes real-valued m _ij).

Notice further, above-mentioned equation hypothesis carries out non-coherent additions to (HRTF) filtering signal of lf and lr.Relevant for the interchannel of lf and lr (and lf and rr) sent (ICC) parameter can also be included in equation, to consider front/rear correlation by a kind of possible expansion.

Above-mentioned all treatment steps can combine in parameter field, thus obtain 2 × 2 matrixes in individual signals territory:

[\begin{matrix} l_{b} \\ r_{b} \end{matrix}] = [\begin{matrix} h_{11} & h_{12} \\ h_{21} & h_{22} \end{matrix}] [\begin{matrix} l_{0} \\ r_{0} \end{matrix}] - - - (14),

Wherein

h ₁₁＝m ₁₁H _l(l)+m ₂₁H _l(r)+m ₃₁H _l(c)(15a)

h ₁₂＝m ₁₂H _l(l)+m ₂₂H _l(r)+m ₃₂H _l(c)(15b)

h ₂₁＝m ₁₁H _r(l)+m ₂₁H _r(r)+m ₃₁H _r(c)(15c)

h ₂₂＝m ₁₂H _r(l)+m ₂₂H _r(r)+m ₃₂H _r(c)(15d).

The present invention processes binaural sound (namely HRTF) information in essence in parameter field, replaces processing in the frequency or in the time domain in prior art.By this way, amount of calculation can be saved largely.

According to the present invention, only comprise DeMux (Demux) unit 21, lower hybrid decoding unit 22 and space/binaural sound decoder element 23 with the spatial decoder device 2 that non-limitative example is shown in Figure 6.Demultiplexer unit 21 can be similar with lower hybrid decoding unit 22 ' to the demultiplexer unit 21 ' of prior art shown in Fig. 3 with lower hybrid decoding unit 22.Except lower mixed layer sound channel is different with the number of associated transform unit, the spatial decoder unit 23 of Fig. 6 is identical with the spatial decoder unit 23 of Fig. 5.Spatial decoder device due to Fig. 6 is configured to for single lower mixed layer sound channel s, therefore only provide single converter unit 231, and decorrelation (D) unit 239 has been added the version d after for generation of mixed signal S decorrelation under (transform domain).Because the signal parameter sp be associated with single lower mixed layer sound channel s is different from those signal parameters be associated with two lower mixed layer sound channels usually, the parameter during the binaural parameters bp therefore produced by parameter conversion unit 234 is usual and embodiment illustrated in fig. 5 is different.

In the arrangement of figure 6, the input of binaural sound decoder comprises the monophone input signal s being attended by spatial parameter sp.Binaural sound synthesis unit generates stereo output signal, and the statistical property of this signal inputs the statistical property of carrying out HRTF process and will obtain close to original 5.1, can describe with following equation:

l_{b} = H_{l} (lf) &CircleTimes; lf + H_{l} (rf) &CircleTimes; rf + H_{l} (lr) &CircleTimes; lr + H_{l} (Rr) &CircleTimes; Rr + H_{l} (c) &CircleTimes; c - - - (16)

r_{b} = H_{r} (lf) &CircleTimes; lf + H_{r} (rf) &CircleTimes; rf + H_{r} (lr) &CircleTimes; lr + H_{r} (rr) &CircleTimes; rr + H_{r} (c) &CircleTimes; c - - - (17)

The spatial parameter of given description sound channel lf, rf, lr, rr and c statistical property and correlation and the parameter of HRTF impulse response, can also estimate binaural sound and export l _b, r _bstatistical property (namely the approximation of binaural parameters).More specifically, (each sound channel) average energy, average phase-difference and coherence can be estimated, next recover (re-instate) by decorrelation with to the matrix operation of monophone input signal.

(relatively) level change (and therefore defining levels of channels difference parameter), (on average) phase difference and coherence that binaural parameters to comprise in two binaural output channels each estimate (each transform domain time/frequency sheet (tile)).

The first step, utilizes the CLD parameter sent to calculate the relative power (power relative to monophone input signal) of five (or six) sound channel (5.1) signals.The relative power of front left channel is provided by following equation:

σ_{lf}^{2} = r_{1} ({CLD}_{fs}) r_{1} ({CLD}_{fc}) r_{1} ({CLD}_{f}) - - - (18),

Wherein

r_{1} (CLD) = \frac{10^{CLD / 10}}{1 + 10^{CLD / 10}} - - - (19),

With

r_{2} (CLD) = \frac{1}{1 + 10^{CLD / 10}} - - - (20) .

Similarly, the relative power of other sound channel is provided by following equation:

σ_{rf}^{2} = r_{1} ({CLD}_{fs}) r_{1} ({CLD}_{fc}) r_{2} ({CLD}_{f}) - - - (21 a)

σ_{c}^{2} = r_{1} ({CLD}_{fs}) r_{2} ({CLD}_{fc}) - - - (21 b)

σ_{ls}^{2} = r_{2} ({CLD}_{fs}) r_{1} ({CLD}_{s}) - - - (21 c)

σ_{rs}^{2} = r_{2} ({CLD}_{fs}) r_{2} ({CLD}_{s}) - - - (21 d)

Then, the relative power σ of left binaural output channels (relative to monophone input sound channel) can be calculated _l ²desired value, right binaural output channels relative power σ _r ²desired value and multiplication cross L _br _b ^*desired value.Then provide binaural sound by following equation and export (ICC _b) coherence:

{ICC}_{B} = \frac{| &lang; L_{B} R_{B}^{*} &rang; |}{σ_{L} σ_{R}} - - - (22)

And average phase angle (IPD _b) provided by following equation:

{IPD}_{B} = \arg (&lang; L_{B} R_{B}^{*} &rang;) - - - (23)

Levels of channels difference (the CLD that binaural sound exports _b) provided by following equation:

{CLD}_{B} = 10 \log_{10} (\frac{σ_{L}^{2}}{σ_{R}^{2}}) - - - (24)

Finally, input with monophone compared with total (linearly) gain g of exporting of binaural sound _bprovided by following equation:

g_{B} = \sqrt{σ_{L}^{2} + σ_{R}^{2}} - - - (25)

To the IPD in binaural sound matrix _b, CLD _b, ICC _band g _bmatrix coefficient needed for Parameter reconstruction is obtained by the parametric stereo decoder of routine simply, uses overall gain g _bexpand:

h ₁₁＝g _Bc _Lcos(α+β)exp(jIPD _B/2)(26a)

h ₁₂＝g _Bc _Lsin(α+β)exp(jIPD _B/2)(26b)

h ₂₁＝g _Bc _Rcos(-α+β)exp(-jIPD _B/2)(26c)

h ₂₂＝g _Bc _Rsin(-α+β)exp(-jIPD _B/2)(26d)

Wherein

α＝0.5arccos(ICC _B) (27)

β = \arctan (\frac{c_{R} - c_{L}}{c_{R} + c_{L}} \tan (α)) - - - (28)

c_{L} = \sqrt{\frac{10^{{CLD}_{B} / 10}}{1 + 10^{{CLD}_{B} / 10}}} - - - (29)

c_{R} = \sqrt{\frac{1}{1 + 10^{{CLD}_{B} / 10}}} - - - (30)

The further embodiment of spatial decoder unit of the present invention can comprise reverberation unit.Having been found that when producing binaural sound sound, adding the distance that reverberation can improve perception.For this reason, the spatial decoder unit 23 of Fig. 7 is provided with stereo reverberation unit 237, and this stereo reverberation unit 237 is connected in parallel with spatial synthesis unit 232.The stereo reverberation unit 237 of Fig. 7 receives the single lower mixed signal S of QMF transform domain and exports two reverb signals, and it is added into transform domain binaural signals (the sound channel Lb in Fig. 6 and Lr) by adder unit 238.Then, the signal after combination carries out inverse transformation by inverse transformation block 233 before output.

In the embodiment in fig. 8, stereo reverberation unit 237 is configured to the reverberation in generation time domain and receives the single lower mixed signal s of time domain.Stereo reverberation unit 237 output time-domain reverb signal, it joins the time-domain signal of binaural channels lb and rb by adder unit 238.Any one embodiment both provides suitable reverberation.

Invention additionally provides consumer device, such as hand-held consumer devices, and comprise the audio system of spatial decoder unit as defined above or spatial decoder device.Hand-held consumer devices can be made up of MP3 player or like device.Consumer device is schematically shown in Fig. 9.Consumer device 50 is shown as comprising according to spatial decoder device 2 (see Fig. 6) of the present invention.

The present invention is based on following experience, namely by according to binaural sound information amendment spatial parameter, greatly can reduce the spatial decoder device of combination and the computation complexity of binaural sound synthesis device.This just makes spatial decoder device in same signal processing operations, can perform space decoding and the process of perception transfer function efficiently, and avoids introducing any man made noise.

Notice, any term used in the document all should not be construed as and limits the scope of the invention.In fact, word " comprises " and " comprising " does not mean that any element that eliminating does not particularly point out.Single (circuit) element can substitute with multiple (circuit) element or their equivalent.

It will be apparent to those skilled in the art that and the invention is not restricted to above-described embodiment, can when do not depart from as claims define invention scope make a lot of amendment and add.

Claims

1. a spatial decoder unit (23), it utilizes spatial parameter (sp) and single audio input channels (s) coming to produce a pair binaural output channels (lb, rb), and this equipment comprises:

-parameter conversion unit (234), it utilizes parameterized perceptual transfer function (hp) that spatial parameter (sp) is converted to binaural parameters (bp), and this binaural parameters depends on spatial parameter and parameterized perceptual transfer function;

-single converter unit (231), this single audio input channels (s) is transformed to the audio track after conversion (S) by it;

-decorrelation unit (239), it carries out decorrelation to the audio track (S) after conversion, generates the decorrelated signals (D) after conversion;

-spatial synthesis unit (232), it passes through the decorrelated signals (D) after binaural parameters (bp) being applied to the audio track after conversion (S) and conversion and synthesizes the binaural channels (Lb, Rb) after a pair conversion; With

-a pair inverse transformation block (233), the binaural channels (Lb, Rb) after conversion is inversely transformed into a pair binaural output channels (lb, rb) by it.

2. spatial decoder unit according to claim 1, wherein parameter conversion unit (234) is configured to, and in order to determine binaural parameters, is combined by audio input channels in parameter field to all perception transfer function contributions that binaural channels will be made.

3. spatial decoder unit according to claim 1, wherein parameter conversion unit (234) is configured to process levels of channels (CLD), channel coherence (ICC) and/or phase place (IPD) parameter.

4. spatial decoder unit according to claim 1, comprises the stereo reverberation unit (237) be configured to work in time domain further.

5. spatial decoder unit according to claim 1, comprises the stereo reverberation unit (237) be configured to work in transform domain or subband domain further.

6. spatial decoder unit according to claim 5, wherein stereo reverberation unit (237) is configured to work in QMF territory.

7. a spatial decoder device (2), it produces a pair binaural output channels (lb, rb) from incoming bit stream (bs), and this equipment comprises:

-demultiplexer unit (21), incoming bit stream DeMux is single lower mixed layer sound channel and signal parameter (sp) by it,

-lower hybrid decoder unit (22), it is decoded to this single lower mixed layer sound channel (s), and

-spatial decoder unit according to claim 1 (23).

8. equipment according to claim 7, wherein spatial decoder unit (23) comprises reverberation unit (237).

9. an audio system, comprises spatial decoder unit according to claim 1 (23) and/or spatial decoder device according to claim 7 (2).

10. a consumer device, comprises spatial decoder unit according to claim 1 (23) and/or spatial decoder device according to claim 7 (2).

11. 1 kinds of methods utilizing spatial parameter (sp) and single audio input channels (s) coming to produce a pair binaural output channels (lb, rb), the method comprises the steps:

-utilizing parameterized perceptual transfer function (hp), spatial parameter (sp) is converted to binaural parameters (bp), and this binaural parameters depends on spatial parameter and parameterized perceptual transfer function;

-single audio input channels (s) is transformed to the audio track after conversion (S);

-decorrelation is carried out to the audio track (S) after conversion, generate the decorrelated signals (D) after conversion;

-by binaural parameters (bp) being applied to the decorrelated signals (D) after the audio track after conversion (S) and conversion, synthesize the binaural channels (Lb, Rb) after a pair conversion; And

Binaural channels (Lb, Rb) after conversion is inversely transformed into a pair binaural output channels (lb, rb).