CN102157152A

CN102157152A - Method for coding stereo and device thereof

Info

Publication number: CN102157152A
Application number: CN2010101138059A
Authority: CN
Inventors: 吴文海; 苗磊; 郎玥; 张琦
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2010-02-12
Filing date: 2010-02-12
Publication date: 2011-08-17
Anticipated expiration: 2030-02-12
Also published as: US9105265B2; WO2011097915A1; US20120300945A1; CN102157152B

Abstract

The embodiment of the invention relates to a method for coding stereo. The method comprises the steps of: converting a stereo left channel signal and a stereo right channel signal on a time domain to a frequency domain so as to form into a left channel signal and a right channel signal on the frequency domain; performing the down mixing on the left channel signal and the right channel signal on the frequency domain to generate a single channel down mixing signal; transmitting the bit of the coded and quantified down mixing signal; extracting space parameters of the left channel signal and the right channel signal on the frequency domain; estimating the group delay and the group phase between a stereo left channel and a stereo right channel by the left channel signal and the right channel signal on the frequency domain; and coding and quantifying the group delay, the group phase and the space parameters to obtain the high-quality stereo coding property under low code rate.

Description

The method of stereo coding, device

Technical field

The embodiment of the invention relates to the multimedia field, relates in particular to a kind of stereo treatment technology, is specially method, the device of stereo coding.

Background technology

Existing stereo encoding method, intensity stereo is arranged, BCC (Binaual Cure Coding) and PS (Parametric-Stereo coding) coding method, normal conditions, adopt intensity coding need extract energy between left and right acoustic channels than ILD (InterChannel Level Difference) parameter, the ILD parameter is encoded as side information, and preferentially be sent to decoding end to help to recover stereophonic signal.ILD is a ubiquity and the characteristics of signals parameter that reflects acoustic field signal, ILD can embody preferably to the sound field energy, yet the stereo sound field that often has spatial context and left and right directions, only adopt and transmit the requirement that the stereosonic mode of ILD recovery reduction can not satisfy the recovery original stereo signal, so proposed to transmit more multiparameter with the scheme of better recovery stereophonic signal, except extracting the most basic ILD parameter, also propose to transmit the phase differential (IPD:InterChannel Phase Difference) of left and right acoustic channels and the simple crosscorrelation ICC parameter of left and right acoustic channels, sometimes also can comprise the L channel and following phase differential (OPD) parameter of mixed signal, the parameter of these reaction stereophonic signal spatial contexts and left and right directions sound field information and ILD parameter are encoded as side information jointly and send to decoding end with the reduction stereophonic signal.

Encoder bit rate is one of important evaluation factor of multimedia signal encoding performance, employing to low code check is the common target of pursuing of industry, existing stereo coding technology transmits LPD when transmitting ILD, ICC and OPD parameter certainly will need to improve encoder bit rate, because LPD, ICC and OPD parameter all are the local characteristics parameters of signal, the branch that is used to the to react stereophonic signal breath of taking a message, the LPD of encoded stereo signal, ICC and OPD parameter, each the branch band coding LPD that needs stereophonic signal, ICC and OPD parameter, each of stereophonic signal is divided band, each divides band IPD coding to need a plurality of bits, each divides band ICC coding to need a plurality of bits, the rest may be inferred, then the stereo coding parameter needs a large amount of bit numbers could strengthen the information of sound field, require the next part that can only strengthen to divide band at low code check, do not reach the effect of reduction true to nature, cause between the stereo information that recovers under the low code check and the original input signal bigger gap being arranged, from auditory effect, can bring extremely uncomfortable auditory perception to the listener.

Summary of the invention

The embodiment of the invention provides a kind of stereo encoding method, device and system, strengthens sound field information under the low code check, promotes code efficiency.

The embodiment of the invention provides a kind of method of stereo coding, and described method comprises:

Conversion time domain stereo left channel signal and right-channel signals form left channel signals and right-channel signals on the frequency domain to frequency domain; Left channel signals on the frequency domain and right-channel signals are mixed signal through mixing down to generate under the monophony, transmit the bit after described mixed signal down carries out coded quantization; The spatial parameter of left channel signals and right-channel signals on the extraction frequency domain; Utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels; Described group delay of quantization encoding and faciation position and described spatial parameter.

The embodiment of the invention provides a kind of method of estimating stereophonic signal, and described method comprises:

Determine cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain; Described cross correlation function to weighting carries out pre-service; Estimation obtains group delay and the faciation position between stereo left and right sound track signals according to the pre-service result.

The embodiment of the invention provides a kind of device of estimating stereophonic signal, and described device comprises:

The weighting cross-correlation unit is used for definite cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain; Pretreatment unit is used for the described cross correlation function of weighting is carried out pre-service; Estimation unit, estimation obtains group delay and the faciation position between stereo left and right sound track signals according to the pre-service result.

The embodiment of the invention provides a kind of equipment of stereophonic signal coding, and described equipment comprises:

Converting means is used for conversion time domain stereo left channel signal and right-channel signals and forms left channel signals and right-channel signals on the frequency domain to frequency domain; Under load in mixture and put, the left channel signals and the right-channel signals that are used on the frequency domain are mixed signal through mixing down to generate under the monophony; The parameter extraction device is used to extract the spatial parameter of left channel signals and right-channel signals on the frequency domain; Estimate the stereophonic signal device, be used to utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels; Code device is used for described group delay of quantization encoding and faciation position, mixes signal under described spatial parameter and the described monophony.

The embodiment of the invention provides a kind of system of stereophonic signal coding, and described system comprises:

Equipment, receiving equipment and the transfer equipment of stereophonic signal coding as mentioned above, receiving equipment is used to receive stereo input signal and is used for stereo coding equipment; Transfer equipment 52 is used to transmit the result of described stereo coding equipment 51.

Therefore, by introducing the embodiment of the invention, group delay and faciation position are estimated and are applied to stereo coding that feasible azimuth information method of estimation by the overall situation can obtain sound field information more accurately under low code check, strengthen sound field effect, promoted code efficiency greatly.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is that a stereo encoding method is implemented synoptic diagram;

Fig. 2 is that another stereo encoding method is implemented synoptic diagram;

Fig. 3 is that another stereo encoding method is implemented synoptic diagram;

Fig. 4 a is that another stereo encoding method is implemented synoptic diagram;

Fig. 4 b is another stereo encoding method embodiment synoptic diagram;

Fig. 5 is that another stereo encoding method is implemented synoptic diagram;

Fig. 6 is that an estimation stereophonic signal device is implemented synoptic diagram;

Fig. 7 is that another estimation stereophonic signal device is implemented synoptic diagram;

Fig. 8 is that another estimation stereophonic signal device is implemented synoptic diagram;

Fig. 9 is that another estimation stereophonic signal device is implemented synoptic diagram;

Figure 10 is that another estimation stereophonic signal device is implemented synoptic diagram;

Figure 11 is that a stereophonic signal encoding device is implemented synoptic diagram;

Figure 12 is that a stereophonic signal coded system is implemented synoptic diagram;

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.

Embodiment one:

Fig. 1 is the synoptic diagram that a stereo encoding method is implemented, and comprising:

Step 101: conversion time domain stereo left channel signal and right-channel signals form left channel signals and right-channel signals on the frequency domain to frequency domain.

Step 102: L channel frequency-region signal on the frequency domain and R channel frequency-region signal mix signal (DMX) through mixing down to generate under the monophony, transmit the bit after the DMX signal carries out coded quantization, and the spatial parameter of left channel signals and right-channel signals on the frequency domain that extracts is carried out quantization encoding.

Spatial parameter is the parameter of representative stereophonic signal spatial character, as the ILD parameter.

Step 103: utilize left channel signals and the group delay between right-channel signals (Group Delay) and faciation position (Group Phase) on the left and right sound track signals estimation frequency domain on the frequency domain.

Group delay reflects the overall azimuth information of the time delays of the envelope between the stereo left and right acoustic channels, and the global information of the similarity of the waveform of stereo left and right acoustic channels behind time unifying is reflected in the faciation position.

Step 104: group delay that the described estimation of quantization encoding obtains and faciation position.

Group delay and faciation position form the content of waiting to transmit the side information code stream through quantization encoding.

In the method for embodiment of the invention stereo coding, when extracting stereophonic signal spatial character parameter, estimate group delay and faciation position, estimation obtains group delay and the faciation position is applied in the stereo coding, make spatial parameter and the overall situation the effective combination of azimuth information, azimuth information method of estimation by the overall situation can obtain sound field information more accurately under low code check, strengthen sound field effect, promoted code efficiency greatly.

Embodiment two:

Fig. 2 is the synoptic diagram of another stereo encoding method embodiment, comprising:

Step 201, conversion time domain stereo left channel signal and right-channel signals are formed on stereo left channel signal X on the frequency domain to frequency domain ₁(k) and right-channel signals X ₂(k), wherein k is the index value of the Frequency point of frequency signal.

Step 202 is descended to mix operation to left channel signals on the frequency domain and right-channel signals, and mixed signal and transmission under the coded quantization, and encoded stereo spatial parameter quantize to form side information and also transmit, and can comprise the steps:

Step 2021, left channel signals on the frequency domain and right-channel signals are descended to mix, and generate and mix signal DMX under the monophony after synthesizing.

Step 2022 is mixed signal DMX under the coded quantization monophony, and transmits the information that quantizes.

Step 2023, the left channel signals on the extraction frequency domain and the ILD parameter of right-channel signals.

Step 2024 is carried out quantization encoding to described ILD parameter and is formed side information and transmission.

2021,2022 steps and 2023,2024 steps are independent of each other mutually, can independently carry out, and the side information that the former forms can carry out multiplexing back with the side information that the latter forms and transmit.

In another embodiment, mix signal under the monophony that obtains and can carry out the time-domain signal that frequency-time domain transformation obtains mixing under the monophony signal DMX again through mixing down, the bit that the time-domain signal that mixes signal DMX under the monophony is carried out behind the coded quantization transmits.

Step 203 is estimated group delay and faciation position between the left and right sound track signals on the frequency domain.

Utilize left and right sound track signals on the frequency domain to estimate that group delay and faciation position between left and right sound track signals comprise the cross correlation function of determining about stereo left and right acoustic channels frequency-region signal, estimate to obtain the group delay and the faciation position of stereophonic signal according to the signal of cross correlation function, as shown in Figure 3, specifically can comprise the steps:

Step 2031 is determined about the cross correlation function between stereo left and right sound track signals on the frequency domain.

The cross correlation function of stereo left and right acoustic channels frequency-region signal can be the cross correlation function of weighting, in determining the process of cross correlation function, the cross correlation function of estimating group delay and faciation position is weighted operation and compares that to make the stereophonic signal coding result incline to more stable with other operations, the cross correlation function of weighting is the weighting of product of the conjugation of L channel frequency-region signal and R channel frequency-region signal, and the value of the cross correlation function of described weighting on half frequency of the length N of stereophonic signal time-frequency conversion is 0.The form of the cross correlation function of stereo left and right acoustic channels frequency-region signal can followingly be represented:

C_{r} (k) = \begin{matrix} W (k) X_{1} (k) {X^{*}}_{2} (k) & 0 \leq k \leq N / 2 \\ 0 & k > N / 2 \end{matrix},

Wherein w (k) represents weighting function, X ^* ₂(k) expression X ₂(k) conjugate function perhaps also can be expressed as: C _r(k)=X ₁(k) X ^* ₂(k) 0≤k≤N/2+1.In the form of another cross correlation function, in conjunction with different weighted type, the cross correlation function of stereo left and right acoustic channels frequency-region signal can followingly be represented:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = 0 \\ 2 {* X}_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = N / 2 \\ 0 & k > N / 2 \end{matrix},

Wherein, N is the length of stereophonic signal time-frequency conversion, | X ₁(k) | and | X ₂(k) | be X ₁(k) and X ₂(k) Dui Ying amplitude.The cross correlation function of weighting is at frequency 0, and frequency N/2 goes up and be that the inverse of left and right sound track signals amplitude product on corresponding frequency, the cross correlation function of weighting are left and right sound track signals 2 times of the inverse of amplitude product on other frequencies.In other enforcement, the cross correlation function of the weighting of stereo left and right acoustic channels frequency-region signal can also be expressed as other form, for example:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = 0 \\ 2 * X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = N / 2 \\ 0 & k > N / 2 \end{matrix},

To this, present embodiment does not limit, and any distortion of above-mentioned each formula is all within protection domain.

Step 2032 is carried out inverse time conversion frequently to the cross correlation function about the weighting of stereo left and right acoustic channels frequency-region signal and is obtained cross correlation function time-domain signal C _r(n), the cross correlation function time-domain signal is the signal of plural number herein.

Step 2033, estimation obtains the group delay and the faciation position of stereophonic signal according to the cross correlation function time-domain signal.

In another embodiment, can be directly estimate to obtain the group delay and the faciation position of stereophonic signal about the cross correlation function between stereo left and right sound track signals on the frequency domain according to what step 2031 was determined.

In step 2033, can directly estimate to obtain the group delay and the faciation position of stereophonic signal according to the cross correlation function time-domain signal; Also can carry out some Signal Pretreatment, estimate the group delay and the faciation position of stereophonic signal based on pretreated signal the cross correlation function time-domain signal.

If the cross correlation function time-domain signal is carried out some Signal Pretreatment, estimate that based on pretreated signal the group delay and the faciation position of stereophonic signal can comprise:

1) the cross correlation function time-domain signal is carried out normalized or smoothing processing;

Wherein the cross correlation function time-domain signal being carried out smoothing processing can followingly carry out:

C _ravg(n)＝α*C _ravg(n)+β*C _r(n)

Wherein, α and β are the constants of weighting, 0≤α≤1, β=1-α, in present embodiment, before estimating group delay and faciation position, carry out smoothly waiting pre-service to make that the group delay that estimates is better stable to the cross correlation function time-domain signal between the left and right acoustic channels that obtains.

2) the cross correlation function time-domain signal is carried out further carrying out smoothing processing after the normalized;

3) absolute value to the cross correlation function time-domain signal carries out normalized or smoothing processing;

Wherein the absolute value of cross correlation function time-domain signal being carried out smoothing processing can followingly carry out:

C _{ravg_abs}(n)＝α*C _ravg(n)+β*|C _r(n)|，

4) the cross correlation function time-domain signal carries out normalized absolute value signal afterwards and further carries out smoothing processing.

Understandable, before the group delay and faciation position of estimating stereophonic signal, can also comprise other processing for the pre-treatment of cross correlation function time-domain signal, auto-correlation processing etc. for example, this moment, the pre-service to the cross correlation function time-domain signal also comprised auto-correlation or/and smoothing processing etc.

Pre-treatment in conjunction with above-mentioned cross correlation function time-domain signal, estimate the group delay and the identical estimation mode of faciation position employing of stereophonic signal in the step 2033, also can estimate respectively, concrete, can adopt the embodiment of following estimation faciation position and group delay at least:

Step 2033 embodiment one, shown in Fig. 4 a:

Estimate to obtain group delay according to the cross correlation function time-domain signal or based on the index of the value correspondence of amplitude maximum in the cross correlation function time-domain signal after handling, obtain the phase angle of the cross correlation function correspondence of group delay correspondence, estimate to obtain the faciation position, comprise the steps:

The relation of the index of judging the value correspondence of amplitude maximum in the time-domain signal cross correlation function and the symmetric interval relevant with transform length N, in one embodiment, if the index of the value correspondence of amplitude maximum is smaller or equal to N/2 in the time-domain signal cross correlation function, group delay equals the index of the value correspondence of amplitude maximum in this time-domain signal cross correlation function so, if the index of the value correspondence of amplitude maximum is greater than N/2 in the related function, group delay deducts transform length N for this index so, can be with [0, N/2] and (N/2, N] regard first symmetric interval and second symmetric interval relevant as with stereophonic signal time-frequency conversion length N, in another kind is implemented, the scope of judging can be [0, m] and (N-m, N] first symmetric interval and second symmetric interval, wherein m is less than N/2, the index of the value correspondence of amplitude maximum and the relevant information of m compare in the time-domain signal cross correlation function, the index of the value correspondence of amplitude maximum is positioned at interval [0 in time domain signal cross correlation function, m], then group delay equals the index of the value correspondence of amplitude maximum in this time-domain signal cross correlation function, the index of the value correspondence of amplitude maximum is positioned at interval (N-m in time domain signal cross correlation function, N], then group delay deducts transform length N for this index.But in actual applications, judge can be amplitude maximum in the time-domain signal cross correlation function the value correspondence index close on value, under the condition that does not influence subjective effect or qualification according to demand can suitably select to be slightly smaller than the index of value correspondence of amplitude maximum as Rule of judgment, as the index of second largest value correspondence of amplitude or and the index that differs in the value correspondence of fixing or preset range of amplitude maximal value all be suitable for, index with the value correspondence of amplitude maximum in the time-domain signal cross correlation function is an example, and a kind of concrete form embodies as follows:

Arg max|C wherein _Ravg(n) | be C _Ravg(n) index of the value correspondence of amplitude maximum in, present embodiment is protected the various distortion of above-mentioned form equally.

According to the phase angle of the time-domain signal cross correlation function correspondence of group delay correspondence, as group delay d _gMore than or equal to zero, by determining d _gThe phase angle of corresponding cross correlation value correspondence estimates to obtain the faciation position; Work as d _gLess than zero the time, the faciation position is exactly d _gThe phase angle of corresponding cross correlation value correspondence on the+N index, specifically can adopt any distortion of following a kind of form or this form to embody:

∠ C wherein _Ravg(d _g) be time-domain signal cross-correlation function value C _Ravg(d _g) phase angle, ∠ C _Ravg(d _g+ N) be time-domain signal cross-correlation function value C _Ravg(d _g+ N) phase angle.

Step 2033 embodiment two, shown in Fig. 4 b:

To described cross correlation function, or, extract its phase place based on the described cross correlation function after handling

Function ∠ C wherein _r(k is used to extract plural C _r(k) phase angle is asked for the average α of phase differential in frequency of low strap ₁Determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information, same, obtain faciation position information according to the phase place of the current frequency of described cross correlation function and the difference of frequency index and phase differential average product, specifically can adopt following mode:

α_{1} = E {\hat{Φ} (k + 1) - \hat{Φ} (k)}, k < Max;

d_{g} = \frac{a_{1} N}{2 * π * Fs};

θ_{g} = E {\hat{Φ} (k) - a_{1} * k}, k < Max

Wherein

The average of representing phase differential, the frequency of Fs for adopting, Max prevents the phase place rotation for calculating the upper limit of ending of group delay and faciation position.

Step 204: described group delay of quantization encoding and faciation position form side information and transmit.

Pre-if in the scope group delay is carried out scalar quantization at random, this scope is the positive negative value [Max of symmetry, Max] or the usable levels under the condition at random, adopt long time transmission or employing differential coding to handle to the group delay after the scalar quantization and obtain side information, the span of faciation position is usually [0,2*PI] in the scope, be specifically as follows [0,2*PI), also can be (PI, PI] scope in scalar quantization and coding are carried out in the faciation position, the side information that the group delay behind the quantization encoding and faciation position are formed carries out multiplexing formation encoding code stream, is sent to the stereophonic signal recovery device.

In the method for embodiment of the invention stereo coding, utilize the left and right sound track signals on the frequency domain to estimate that group delay and the faciation position that can embody signal overall situation azimuth information between the stereophonic signal left and right acoustic channels make that the azimuth information of sound field is effectively strengthened, the estimation of stereophonic signal spatial character parameter and group delay and faciation position combined be applied in the little stereo coding of demand bit rate, make spatial information and the overall situation the effective combination of azimuth information, obtain sound field information more accurately, strengthen sound field effect, promoted code efficiency greatly.

Embodiment three

The synoptic diagram that Fig. 5 implements for another stereo encoding method comprises:

On the enforcement basis of embodiment one and embodiment two, stereo coding also comprises respectively:

Step 105/205: estimate to obtain stereo parameter IPD according to described faciation position and group delay information, quantize described IPD parameter and transmission.

When quantizing IPD, estimate with group delay (Group Delay) and faciation position (Group Phase)

And carry out difference processing with original IPD (k), and the IPD of difference is carried out quantization encoding, can followingly represent:

\overset{&OverBar;}{IPD (k)} = \frac{- 2 π d_{g} * k}{N} + θ_{g}, 1 \leq k \leq N / 2 - 1

And to IPD _Diff(k) quantize, the bit after the quantification is delivered to decoding end, in another embodiment, also can directly quantize IPD, and bit stream is high slightly, quantizes more accurate.

In the present embodiment, estimate that stereo parameter IPD and coded quantization can promote code efficiency under the situation that has high code rate to use, strengthen sound field effect.

Embodiment four,

Fig. 6 is that the device 04 of an estimation stereophonic signal is implemented synoptic diagram, comprising:

Weighting cross-correlation unit 41 is used for definite cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain.

Weighting cross-correlation unit 41 receives stereo left and right sound track signals on the frequency domain, the stereo left and right sound track signals of frequency domain is handled the cross correlation function that obtains about the weighting between the stereo left and right sound track signals of frequency domain.

Pretreatment unit 42 is used for the described cross correlation function of weighting is carried out pre-service.

Pretreatment unit 42 receives the described cross correlation function of the weighting that obtains according to weighting cross-correlation unit 41, and the described cross correlation function of weighting is carried out pre-service, obtains the pre-service result, promptly pretreated cross correlation function time-domain signal.

Estimation unit 43 is estimated group delay and faciation position between stereo left and right sound track signals according to the pre-service result.

Estimation unit 43 receives the pre-service result of pretreatment unit 42, obtain pretreated cross correlation function time-domain signal, the information of extracting described cross correlation function time-domain signal in addition judgement or comparison or calculating operation estimates to obtain group delay and faciation position between stereo left and right sound track signals.

Among this another embodiment, the device 04 of estimating stereophonic signal can also comprise frequency-time domain transformation unit 44, be used to receive the output of weighting cross-correlation unit 41, described cross correlation function about the weighting of the stereo left and right sound track signals of frequency domain is carried out inverse time conversion frequently obtain the cross correlation function time-domain signal, and send described cross correlation function time-domain signal to described pretreatment unit 42.

By introducing the embodiment of the invention, group delay and faciation position are estimated and are applied to stereo coding, feasible azimuth information method of estimation by the overall situation can obtain sound field information more accurately under low code check, strengthened sound field effect, promotes code efficiency greatly.

Embodiment five,

Fig. 7 implements synoptic diagram for another device 04 of estimating stereophonic signal, comprising:

Weighting cross-correlation unit 41 receives stereo left and right sound track signals on the frequency domain, the stereo left and right sound track signals of frequency domain is handled the cross correlation function that obtains about the weighting between the stereo left and right sound track signals of frequency domain.The cross correlation function of stereo left and right acoustic channels frequency-region signal can be the cross correlation function of weighting, make coding result more stable, the cross correlation function of weighting is the weighting of product of the conjugation of L channel frequency-region signal and R channel frequency-region signal, and the value of the cross correlation function of described weighting on half frequency of the length N of stereophonic signal time-frequency conversion is 0.The form of the cross correlation function of the weighting of stereo left and right acoustic channels frequency-region signal can followingly be represented:

C_{r} (k) = \begin{matrix} W (k) X_{1} (k) {X^{*}}_{2} (k) & 0 \leq k \leq N / 2 \\ 0 & k > N / 2 \end{matrix},

Wherein w (k) represents weighting function, X ^* ₂(k) expression X ₂(k) conjugate function perhaps also can be expressed as: C _r(k)=X ₁(k) X ^* ₂(k) 0≤k≤N/2+1.In the form of the cross correlation function of another weighting, in conjunction with different weighted type, the cross correlation function of the weighting of stereo left and right acoustic channels frequency-region signal can followingly be represented:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = 0 \\ 2 * X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = N / 2 \\ 0 & k > N / 2 \end{matrix},

Wherein, N is the length of stereophonic signal time-frequency conversion, | X ₁(k) | and | X ₂(k) | be X ₁(k) and X ₂(k) Dui Ying amplitude.The cross correlation function of weighting is at frequency 0, and frequency N/2 goes up and be that the inverse of left and right sound track signals amplitude product on corresponding frequency, the cross correlation function of weighting are left and right sound track signals 2 times of the inverse of amplitude product on other frequencies.

Perhaps also can adopt following form with and the distortion:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = 0 \\ 2 * X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = N / 2 \\ 0 & k > N / 2 \end{matrix} .

Frequency-time domain transformation unit 44, what receive that weighting cross-correlation unit 41 determines determines cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain, the cross correlation function about the weighting of stereo left and right acoustic channels frequency-region signal is carried out inverse time conversion frequently obtain cross correlation function time-domain signal C _r(n), the cross correlation function time-domain signal is the signal of plural number herein.

Pretreatment unit 42 receives frequency-time domain transformation according to the described cross correlation function time-domain signal that described cross correlation function obtains, and described cross correlation function is carried out pre-service, obtains the pre-service result, promptly passes through pretreated cross correlation function time-domain signal.

Pretreatment unit 42 can comprise in the following unit one or more according to different demands: normalization unit 421, pretreatment unit 422 and absolute value element 423.

1) the 421 pairs of cross correlation function time-domain signals in normalization unit carry out normalized or 422 pairs of cross correlation function time-domain signals of pretreatment unit and carry out pre-service and handle.

Wherein the cross correlation function time-domain signal being carried out pre-service handles and can followingly carry out: C _Ravg(n)=α * C _Ravg(n)+β * C _r(n)

Wherein, α and β are the constants of weighting, 0≤α≤1, β=1-α, in present embodiment, before estimating group delay and faciation position, the cross correlation function of the weighting between the left and right acoustic channels that obtains carried out pre-service such as pre-service and make that the group delay that estimates is better stable.

2) the 421 pairs of cross correlation function time-domain signals in normalization unit carry out after the normalized, and pretreatment unit 422 further carries out the pre-service processing to the result of normalization unit 421.

3) absolute value element 423 obtains the absolute value information of cross correlation function time-domain signal, the 421 pairs of described absolute value information in normalization unit carry out normalized or 422 pairs of described absolute value information of pretreatment unit are carried out the pre-service processing, perhaps carry out normalization earlier and carry out the pre-service processing again.

Wherein the absolute value of cross correlation function time-domain signal is carried out pre-service and handles and can followingly carry out,

C _{ravg_abs}(n)＝α*C _ravg(n)+β*|C _r(n)|。

4) the cross correlation function time-domain signal carries out normalized absolute value signal afterwards and further carries out the pre-service processing.

Pretreatment unit 42 is before the group delay and faciation position of estimating stereophonic signal, the processing unit that can also comprise other for the pre-treatment of cross correlation function time-domain signal, auto-correlation unit 424 etc. for example, the pre-service of 42 pairs of cross correlation function time-domain signals of pretreatment unit this moment also comprises auto-correlation or/and pre-service processing etc.

In another embodiment, described estimation stereophonic signal device 04 also can not comprise pretreatment unit, the result of frequency-time domain transformation unit 44 is directly sent in the following estimation unit 43 of described estimation stereophonic signal device 4, estimation unit 43 is used for estimating to obtain group delay according to the cross correlation function time-domain signal of weighting or based on the index of the value correspondence of the cross correlation function time-domain signal amplitude maximum of the weighting after handling, obtain the phase angle of the time-domain signal cross correlation function correspondence of group delay correspondence, estimate to obtain the faciation position.

Estimation unit 43, estimate group delay and faciation position between stereo left and right sound track signals according to the output of the output of pretreatment unit 42 or frequency-time domain transformation unit 44, as shown in Figure 8, estimation unit 43 further comprises: 431 judging units, receive the cross correlation function time frequency signal of pretreatment unit 42 or 44 outputs of frequency-time domain transformation unit, the relation of the index of judging the value correspondence of amplitude maximum in the time-domain signal cross correlation function and the symmetric interval relevant with transform length N, judged result is sent to group delay unit 432, the group delay that excites group delay unit 432 to estimate between the stereophonic signal left and right acoustic channels, in one embodiment, if the result of judging unit 431 is that the index of value correspondence of amplitude maximum in the time-domain signal cross correlation function is smaller or equal to N/2, group delay unit 432 estimates that group delay equals the index of the value correspondence of amplitude maximum in this time-domain signal cross correlation function, if the result of judging unit 431 is that the index of value correspondence of amplitude maximum in the related function is greater than N/2, group delay unit 432 estimates that group delay deducts transform length N for this index, can be with [0, N/2] and (N/2, N] regard first symmetric interval and second symmetric interval relevant as with stereophonic signal time-frequency conversion length N, in another kind is implemented, the scope of judging can be [0, m] and (N-m, N] first symmetric interval and second symmetric interval, wherein m is less than N/2, the index of the value correspondence of amplitude maximum and the relevant information of m compare in the time-domain signal cross correlation function, the index of the value correspondence of amplitude maximum is positioned at interval [0 in time domain signal cross correlation function, m], then group delay equals the index of the value correspondence of amplitude maximum in this time-domain signal cross correlation function, the index of the value correspondence of amplitude maximum is positioned at interval (N-m in time domain signal cross correlation function, N], then group delay deducts transform length N for this index.But in actual applications, judge can be amplitude maximum in the time-domain signal cross correlation function the value correspondence index close on value, under the condition that does not influence subjective effect or qualification according to demand can suitably select to be slightly smaller than the index of value correspondence of amplitude maximum as Rule of judgment, as the index of second largest value correspondence of amplitude or and the index that differs in the value correspondence of fixing or preset range of amplitude maximal value all be suitable for, comprise any distortion of following a kind of form or this form:

Arg max|C wherein _Ravg(n) | be C _Ravg(n) index of the value correspondence of amplitude maximum in.Faciation bit location 433 receives group delay unit 432 results, according to the phase angle of the group delay time-domain signal cross correlation function correspondence of estimating to obtain, as group delay d _gMore than or equal to zero, by determining d _gThe phase angle of corresponding cross correlation value correspondence estimates to obtain the faciation position; Work as d _gLess than zero the time, the faciation position is exactly d _gThe phase angle of corresponding cross correlation value correspondence on the+N index, specifically can adopt any distortion of following a kind of form or this form to embody:

Among another embodiment, the device 04 of described estimation stereophonic signal also comprises parameter characteristic unit 45, and as shown in Figure 9, stereo parameter IPD is estimated to obtain according to described faciation position and group delay information in the parameter characteristic unit.

Embodiment six,

Figure 10 implements synoptic diagram for another device 04 ' of estimating stereophonic signal, with embodiment five different being, the cross correlation function of the weighting of the stereo left and right acoustic channels frequency-region signal that the weighting cross-correlation unit is determined in the present embodiment sends pretreatment unit 42 or estimation unit 43 to, estimation unit 43 extracts the phase place of cross correlation function, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information, obtain faciation position information according to the phase place of the current frequency of cross correlation function and the difference of frequency index and phase differential average product.

Estimation unit 43 is estimated group delay and faciation position between stereo left and right sound track signals according to the output of the output of pretreatment unit 42 or weighting cross-correlation unit 41, estimation unit 43 further comprises: the 430 pairs of cross correlation functions in phase extraction unit, or, extract its phase place based on the cross correlation function after handling Function ∠ C wherein _r(k is used to extract plural C _r(k) phase angle, group delay unit 432 ' are asked for the average α of phase differential in frequency of low strap ₁Faciation position unit 433 ' determines group delay according to the product of phase differential and transform length and the ratio relation of frequency information, same, obtain faciation position information according to the phase place of the current frequency of cross correlation function and the difference of frequency index and phase differential average product, specifically can adopt following mode:

α_{1} = E {\hat{Φ} (k + 1) - \hat{Φ} (k)}, k < Max

d_{g} = - \frac{a_{1} N}{2 * π * Fs}

θ_{g} = E {\hat{Φ} (k) - a_{1} * k}, k < Max

Wherein

In the equipment of embodiment of the invention stereo coding, utilize the left and right sound track signals on the frequency domain to estimate that group delay and the faciation position that can embody signal overall situation azimuth information between the stereophonic signal left and right acoustic channels make that the azimuth information of sound field is effectively strengthened, the estimation of stereophonic signal spatial character parameter and group delay and faciation position combined be applied in the little stereo coding of demand bit rate, make spatial information and the overall situation the effective combination of azimuth information, obtain sound field information more accurately, strengthen sound field effect, promoted code efficiency greatly.

Embodiment seven,

Figure 11 is that the equipment 51 of stereophonic signal coding is implemented synoptic diagram, comprising:

Converting means 01 is used for conversion time domain stereo left channel signal and right-channel signals and forms left channel signals and right-channel signals on the frequency domain to frequency domain;

Under load in mixture and put 02, the left channel signals and the right-channel signals that are used on the frequency domain are mixed signal through mixing down to generate under the monophony;

Parameter extraction device 03 is used to extract the spatial parameter of left channel signals and right-channel signals on the frequency domain;

Estimate stereophonic signal device 04, be used to utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels;

Code device 05 is used for described group delay of quantization encoding and faciation position, mixes signal under described spatial parameter and the described monophony.

Estimate that wherein stereo 04 is applicable to the foregoing description four-embodiment six, estimate that stereophonic signal device 04 receives through left channel signals and right-channel signals on the frequency domain that obtains behind the converting means 01, utilize left and right sound track signals on the described frequency domain to take embodiment arbitrary among embodiment four-embodiment six to estimate to obtain group delay and faciation position between stereo left and right acoustic channels, and group delay and the faciation position that obtains be sent to code device 05, equally, code device 05 is also received the spatial parameter of left channel signals and right-channel signals on the frequency domain that extracts to parameter extraction device 03,05 pair of information that receives of code device is carried out quantization encoding and is formed side information, and code device 05 also quantizes the bit after described mixed signal down carries out coded quantization.Described code device 05 can be as a whole, be used to receive different information and carry out quantization encoding, also can be separated into a plurality of code devices and handle the different information that receives, as first code device 501 with under load in mixture and put 02 and be connected, be used for mixed information under the quantization encoding, second code device 502 is connected with the parameter extraction device, be used for the described spatial parameter of quantization encoding, the 3rd code device 503, be used for and estimate that the stereophonic signal device is connected, and is used for described group delay of quantization encoding and faciation position.In another embodiment, comprise parameter characteristic unit 45 if estimate stereophonic signal device 04, described code device can also comprise that the 4th code device is used for quantization encoding IPD.When quantizing IPD, estimate with group delay (Group Delay) and faciation position (Group Phase)

\overset{&OverBar;}{IPD (k)} = \frac{- 2 π d_{g} * k}{N} + θ_{g}, 1 \leq k \leq N / 2 - 1

And to IPD _Diff(k) bit after quantizing to obtain quantizing in another embodiment, also can directly quantize IPD, and bit stream is high slightly, quantizes more accurate.

The equipment 51 of described stereo coding can carry out the equipment of encoding process for stereophonic encoder or other according to different demands to stereo multi-channel signal.

Embodiment eight

Figure 12 is that the system 666 of stereophonic signal coding implements synoptic diagram, also comprises on the basis of stereophonic signal encoding device 51 as described in embodiment seven:

Receiving equipment 50 receives stereo input signal and is used for stereophonic signal encoding device 51; Transfer equipment 52 is used to transmit the result of described stereo coding equipment 51, and transfer equipment 52 sends to decoding end with the result of stereo coding equipment and is used for decoding generally speaking.

One of ordinary skill in the art will appreciate that all or part of flow process that realizes in the foregoing description method, be to instruct relevant hardware to finish by computer program, described program can be stored in the computer read/write memory medium, this program can comprise the flow process as the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only storage memory body (Read-Only Memory, ROM) or at random store memory body (Random AccessMemory, RAM) etc.

It should be noted that at last: above embodiment only in order to the explanation embodiment of the invention technical scheme but not limit it, although the embodiment of the invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that: it still can make amendment or be equal to replacement the technical scheme of the embodiment of the invention, and these modifications or be equal to replacement and also can not make amended technical scheme break away from the spirit and scope of embodiment of the invention technical scheme.

Claims

1. the method for a stereo coding is characterized in that, described method comprises:

Conversion time domain stereo left channel signal and right-channel signals form left channel signals and right-channel signals on the frequency domain to frequency domain;

Left channel signals on the frequency domain and right-channel signals are mixed signal through mixing down to generate under the monophony, transmit the bit after described mixed signal down carries out coded quantization;

The spatial parameter of left channel signals and right-channel signals on the extraction frequency domain;

Utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels;

Described group delay of quantization encoding and faciation position and described spatial parameter.

2. the method for claim 1, it is characterized in that: describedly utilize left and right sound track signals on the frequency domain to estimate to comprise before group delay between stereo left and right acoustic channels and the faciation position to determine about the cross correlation function between stereo left and right sound track signals on the frequency domain, described cross correlation function comprises the simple crosscorrelation of the weighting of the product of the conjugation of left channel signals and right-channel signals on the frequency domain.

3. method as claimed in claim 2 is characterized in that: the simple crosscorrelation of the weighting between the stereo left and right sound track signals of frequency domain can be expressed as:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = 0 \\ 2 * X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = N / 2 \\ 0 & k > N / 2 \end{matrix},

Or

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = 0 \\ 2 * X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = N / 2 \\ 0 & k > N / 2 \end{matrix}

Wherein, N is the length of stereophonic signal time-frequency conversion, | X ₁(k) | and | X ₂(k) | be X ₁(k) and X ₂(k) Dui Ying amplitude.The described cross correlation function of weighting is at frequency 0, and the value on the frequency N/2 is the inverse of left and right sound track signals amplitude product on corresponding frequency, and the described cross correlation function of weighting is left and right sound track signals on other frequencies 2 times of the inverse of amplitude product.

4. method as claimed in claim 3 is characterized in that: described method comprises that also described cross correlation function is carried out inverse time conversion frequently obtains the cross correlation function time-domain signal,

Or described cross correlation function is carried out inverse time conversion frequently obtain the cross correlation function time-domain signal, described time-domain signal is carried out pre-service.

5. method as claimed in claim 4 is characterized in that: according to the cross correlation function time-domain signal, describedly utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels to comprise:

Estimate to obtain group delay according to the cross correlation function time-domain signal or based on the index of the value correspondence of amplitude maximum in the cross correlation function time-domain signal after handling, obtain the phase angle of the cross correlation function correspondence of group delay correspondence, estimate to obtain the faciation position.

6. method as claimed in claim 3 is characterized in that: according to described cross correlation function, describedly utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels to comprise:

Extract the phase place of described cross correlation function, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information;

Obtain faciation position information according to the phase place of the current frequency of cross correlation function of weighting and the difference of frequency index and phase differential average product.

7. as claim 5 or 6 described methods, it is characterized in that, described method also comprises according to described faciation position and group delay estimates to obtain the stereo branch breath of taking a message, the described branch of the quantization encoding breath of taking a message, the described branch breath of taking a message comprises: the phase differential parameter between left and right acoustic channels, simple crosscorrelation parameter and/or L channel and the phase differential parameter of mixed signal down.

8. a method of estimating stereophonic signal is characterized in that, described method comprises:

Determine cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain;

Described weighting cross correlation function is carried out pre-service;

Estimation obtains group delay and the faciation position between stereo left and right sound track signals according to the pre-service result.

9. method as claimed in claim 8 is characterized in that: the cross correlation function of the weighting of the stereo left and right sound track signals of frequency domain can be expressed as:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = 0 \\ 2 * X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = N / 2 \\ 0 & k > N / 2 \end{matrix},

Or

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = 0 \\ 2 * X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = N / 2 \\ 0 & k > N / 2 \end{matrix}

Wherein, N is the length of stereophonic signal time-frequency conversion, | X ₁(k) | and | X ₂(k) | be X ₁(k) and X ₂(k) Dui Ying amplitude.The cross correlation function of described weighting is at frequency 0, and the value on the frequency N/2 is the inverse of left and right sound track signals amplitude product on corresponding frequency, and the cross correlation function of described weighting is left and right sound track signals on other frequencies 2 times of the inverse of amplitude product.

10. method as claimed in claim 9 is characterized in that, described method also comprises: the cross correlation function about the weighting of the stereo left and right sound track signals of frequency domain is carried out inverse time conversion frequently obtain the cross correlation function time-domain signal.

11. method as claimed in claim 10 is characterized in that, described cross correlation function time-domain signal is carried out pre-service comprise the cross correlation function time-domain signal is carried out normalized and smoothing processing that wherein said smoothing processing comprises:

C _ravg(n)＝α*C _ravg(n)+β*C _r(n)，

Perhaps the absolute value signal of described cross correlation function time-domain signal is carried out normalized and smoothing processing, wherein said smoothing processing comprises:

C _{ravg_abs}(n)＝α*C _ravg(n)+β*|C _r(n)|。

12. method as claimed in claim 11 is characterized in that, estimates that according to the pre-service result group delay and the faciation position that obtain stereophonic signal comprise:

The relation of the index of judging the value correspondence of amplitude maximum in the time-domain signal cross correlation function and the symmetric interval relevant with stereophonic signal time-frequency conversion length N, if the index of the value correspondence of amplitude maximum is positioned at first symmetric interval in the time-domain signal cross correlation function, group delay equals the index of the value correspondence of amplitude maximum in this time-domain signal cross correlation function so, if the index of the value correspondence of amplitude maximum is positioned at second symmetric interval in the related function, group delay deducts N for this index;

According to the phase angle of the cross correlation function correspondence of group delay correspondence, as group delay d _gMore than or equal to zero, by determining d _gThe phase angle of corresponding cross correlation value correspondence estimates to obtain the faciation position; Work as d _gLess than zero the time, the faciation position is d _gThe phase angle of corresponding cross correlation value correspondence on the+N index.

13. method as claimed in claim 12 is characterized in that, estimates that according to the pre-service result group delay and the faciation position that obtain stereophonic signal comprise:

Group delay

d_{g} = \begin{matrix} \arg \max | C_{ravg} (n) | & \arg \max | C_{ravg} (n) | \leq N / 2 \\ \arg \max | C_{ravg} (n) | - N & \arg \max | C_{ravg} (n) | > N / 2 \end{matrix},

The faciation position

θ_{g} = \begin{matrix} &angle; C_{ravg} (d_{g}) & d_{g} &GreaterEqual; 0 \\ &angle; C_{ravg} (d_{g} + N) & d_{g} < 0 \end{matrix},

Wherein, N is the length of stereophonic signal time-frequency conversion, argmax|C _Ravg(n) | be C _Ravg(n) index of the value correspondence of amplitude maximum in, ∠ C _Ravg(d _g) be cross-correlation function value C _Ravg(d _g) phase angle, ∠ C _Ravg(d _g+ N) be cross-correlation function value C _Ravg(d _g+ N) phase angle.

14. method as claimed in claim 8 is characterized in that, estimates that according to the pre-service result group delay and the faciation position that obtain between stereo left and right sound track signals comprise:

To described cross correlation function, or, extract it based on the cross correlation function after handling

Phase place

Function ∠ C wherein _r(k is used to extract plural C _r(k) phase angle;

In frequency of low strap, ask for the average α of phase differential ₁, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information, obtain the faciation position according to the phase place of the current frequency of described cross correlation function and the difference of frequency index and phase differential average product.

15. method as claimed in claim 14 is characterized in that, estimates that according to the pre-service result group delay and the faciation position that obtain between stereo left and right sound track signals comprise:

α_{1} = E {\hat{Φ} (k + 1) - \hat{Φ} (k)}, k < Max;

d_{g} = - \frac{a_{1} N}{2 * π * Fs};

θ_{g} = E {\hat{Φ} (k) - a_{1} * k}, k < Max,

Wherein

16. a device of estimating stereophonic signal is characterized in that, described device comprises:

The weighting cross-correlation unit is used for definite cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain;

Pretreatment unit is used for the cross correlation function of described weighting is carried out pre-service;

Estimation unit, estimation obtains group delay and the faciation position between stereo left and right sound track signals according to the pre-service result.

17. device as claimed in claim 16 is characterized in that, described device also comprises:

The frequency-time domain transformation unit carries out inverse time conversion frequently to the cross correlation function about the weighting of the stereo left and right sound track signals of frequency domain and obtains the cross correlation function time-domain signal.

18. device as claimed in claim 17 is characterized in that, describedly estimates to obtain the group delay of stereophonic signal and the estimation unit of faciation position comprises according to the pre-service result:

Judging unit is used for judging the index of value correspondence of time-domain signal cross correlation function amplitude maximum and the relation of the symmetric interval relevant with stereophonic signal time-frequency conversion length N;

The group delay unit, if the index of the value correspondence of amplitude maximum is positioned at first symmetric interval in the time-domain signal cross correlation function, group delay equals the index of the value correspondence of amplitude maximum in this time-domain signal cross correlation function so, if the index of the value correspondence of amplitude maximum is positioned at second symmetric interval in the related function, group delay deducts N for this index;

The faciation bit location is used for the phase angle according to the cross correlation function correspondence of group delay correspondence, as group delay d _gMore than or equal to zero, by determining d _gThe phase angle of corresponding cross correlation value correspondence estimates to obtain the faciation position; Work as d _gLess than zero the time, the faciation position is d _gThe phase angle of corresponding cross correlation value correspondence on the+N index.

19. device as claimed in claim 16 is characterized in that, describedly estimates to obtain group delay between stereo left and right sound track signals and the estimation unit of faciation position comprises according to the pre-service result:

The phase extraction unit is used for described cross correlation function, or based on the cross correlation function after handling, extracts its phase place

Function ∠ C wherein _r(k is used to extract plural C _r(k) phase angle;

The group delay unit is used for asking for the average α of phase differential in frequency of low strap ₁, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information;

Faciation position unit is used for obtaining faciation position information according to the phase place of the current frequency of described cross correlation function and the difference of frequency index and phase differential average product.

20. device as claimed in claim 16 is characterized in that, described device also comprises the parameter characteristic unit, is used for estimating to obtain stereo parameter IPD according to described faciation position and group delay.

21. the equipment of a stereophonic signal coding is characterized in that described equipment comprises:

Converting means is used for conversion time domain stereo left channel signal and right-channel signals and forms left channel signals and right-channel signals on the frequency domain to frequency domain;

Under load in mixture and put, the left channel signals and the right-channel signals that are used on the frequency domain are mixed signal through mixing down to generate under the monophony;

The parameter extraction device is used to extract the spatial parameter of left channel signals and right-channel signals on the frequency domain;

Estimate the stereophonic signal device, be used to utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels;

Code device is used for described group delay of quantization encoding and faciation position, mixes signal under described spatial parameter and the described monophony.

22. equipment as claimed in claim 21, it is characterized in that: described estimation stereophonic signal device utilizes left and right sound track signals on the frequency domain to estimate also to comprise before group delay between stereo left and right acoustic channels and the faciation position to determine about the cross correlation function between stereo left and right sound track signals on the frequency domain, and described cross correlation function comprises the simple crosscorrelation of the weighting of the product of the conjugation of left channel signals and right-channel signals on the frequency domain.

23. as claim 20 or 22 described equipment, it is characterized in that: the cross correlation function about the weighting between stereo left and right sound track signals on the frequency domain that described estimation stereophonic signal device is determined can be expressed as:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = 0 \\ 2 * X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = N / 2 \\ 0 & k > N / 2 \end{matrix},

Or

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = 0 \\ 2 * X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = N / 2 \\ 0 & k > N / 2 \end{matrix}

24. device as claimed in claim 23 is characterized in that: described estimation stereophonic signal device comprises the frequency-time domain transformation unit, is used for described cross correlation function is carried out the time-domain signal that inverse time conversion frequently obtains cross correlation function.

25. device as claimed in claim 24, it is characterized in that: described estimation stereophonic signal device comprises estimation unit, be used for estimating to obtain group delay according to the cross correlation function time-domain signal or based on the index of the value correspondence of the cross correlation function time-domain signal amplitude maximum after handling, obtain the phase angle of the cross correlation function correspondence of group delay correspondence, estimate to obtain the faciation position.

26. device as claimed in claim 24, it is characterized in that: described estimation stereophonic signal device comprises estimation unit, be used to extract the phase place of described cross correlation function, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information; Obtain faciation position information according to the phase place of the current frequency of cross correlation function and the difference of frequency index and phase differential average product.

27. the system of a stereo coding is characterized in that, described system comprises that receiving equipment is used to receive stereo input signal and is used for stereo coding equipment as the arbitrary described stereo coding equipment of claim 21-26, receiving equipment and transfer equipment; Transfer equipment 52 is used to transmit the result of described stereo coding equipment 51.