CN103366748A

CN103366748A - Stereo coding method and device

Info

Publication number: CN103366748A
Application number: CN2013102709304A
Authority: CN
Inventors: 吴文海; 苗磊; 郎玥; 张琦
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2010-02-12
Filing date: 2010-02-12
Publication date: 2013-10-23

Abstract

The embodiment of the invention relates to a stereo coding method, which comprises the following steps of transforming a left channel signal and a right channel signal of stereo in a time domain into a frequency domain to form a left channel signal and a right channel signal in the frequency domain; performing down-mixing on the left channel signal and the right channel signal in the frequency domain to generate a single-channel down-mixed signal, and transmitting bits of the coded and quantized down-mixed signal; extracting spatial parameters of the left channel signal and the right channel signal in the frequency domain; estimating a group delay and a group phase between the left and right channels of the stereo by utilizing the left channel signal and the right channel signal in the frequency domain; and quantitatively coding the group delay, the group phase and the spatial parameters to achieve high stereo coding performance under a low code rate.

Description

The method of stereo coding, device

Technical field

The embodiment of the invention relates to MultiMedia Field, relates in particular to a kind of stereo treatment technology, is specially method, the device of stereo coding.

Background technology

Existing stereo encoding method, intensity stereo is arranged, BCC (Binaual Cure Coding) and PS (Parametric-Stereo coding) coding method, normal conditions, the employing intensity coding need to extract the energy Ratios ILD(InterChannel Level Difference between left and right acoustic channels) parameter, the ILD parameter is encoded as side information, and preferentially be sent to decoding end to help to recover stereophonic signal.ILD is a ubiquity and the characteristics of signals parameter that reflects acoustic field signal, ILD can embody preferably to the sound field energy, yet the stereo sound field that often has spatial context and left and right directions, only adopt and transmit the requirement that the stereosonic mode of ILD recovery reduction can not satisfy the recovery original stereo signal, so proposed to transmit more multiparameter with the scheme of better recovery stereophonic signal, except extracting the most basic ILD parameter, also propose to transmit the phase differential (IPD:InterChannel Phase Difference) of left and right acoustic channels and the simple crosscorrelation ICC parameter of left and right acoustic channels, phase differential (OPD) parameter that sometimes also can comprise L channel and lower mixed signal is jointly encoded the parameter of these reaction stereophonic signal spatial contexts and left and right directions sound field information and ILD parameter and is sent to decoding end with the reduction stereophonic signal as side information.

Encoder bit rate is one of important evaluation factor of multimedia signal encoding performance, employing to low code check is the common target of pursuing of industry, existing stereo coding technology transmits LPD when transmitting ILD, ICC and OPD parameter certainly will need to improve encoder bit rate, because LPD, ICC and OPD parameter all are the local characteristics parameters of signal, minute information that is used for the reaction stereophonic signal, the LPD of encoded stereo signal, ICC and OPD parameter, each minute band coding LPD that needs stereophonic signal, ICC and OPD parameter, each of stereophonic signal minute band, each minute band IPD coding needs a plurality of bits, each minute band ICC coding needs a plurality of bits, the rest may be inferred, then the stereo coding parameter needs a large amount of bit numbers could strengthen the information of sound field, require the next part that can only strengthen to divide band at low code check, do not reach the effect of reduction true to nature, cause between the stereo information that recovers under the low code check and the original input signal larger gap being arranged, from auditory effect, can bring extremely uncomfortable auditory perception to the listener.

Summary of the invention

The embodiment of the invention provides a kind of stereo encoding method, device and system, strengthens sound field information under the low code check, promotes code efficiency.

The embodiment of the invention provides a kind of method of stereo coding, and described method comprises:

Conversion time domain stereo left channel signal and right-channel signals form left channel signals and right-channel signals on the frequency domain to frequency domain; Left channel signals on the frequency domain and right-channel signals transmit the bit after described lower mixed signal carries out coded quantization through mixed signal under the lower mixed generation monophony; The spatial parameter of left channel signals and right-channel signals on the extraction frequency domain; Utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels; The described group delay of quantization encoding and faciation position and described spatial parameter.

The embodiment of the invention provides a kind of method of estimating stereophonic signal, and described method comprises:

Determine the cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain; Described cross correlation function to weighting carries out pre-service; Estimation obtains group delay and the faciation position between stereo left and right sound track signals according to the pre-service result.

The embodiment of the invention provides a kind of device of estimating stereophonic signal, and described device comprises:

The weighting cross-correlation unit is used for definite cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain; Pretreatment unit is used for the described cross correlation function of weighting is carried out pre-service; Estimation unit, estimation obtains group delay and the faciation position between stereo left and right sound track signals according to the pre-service result.

The embodiment of the invention provides a kind of equipment of coding of stereo signals, and described equipment comprises:

Converting means is used for conversion time domain stereo left channel signal and right-channel signals and forms left channel signals and right-channel signals on the frequency domain to frequency domain; Lower mixing device is for mixed signal under the left channel signals on the frequency domain and the lower mixed generation monophony of right-channel signals process; The parameter extraction device is for the spatial parameter of left channel signals and right-channel signals on the extraction frequency domain; Estimate the stereophonic signal device, be used for utilizing left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels; Code device is used for the described group delay of quantization encoding and faciation position, mixed signal under described spatial parameter and the described monophony.

The embodiment of the invention provides a kind of system of coding of stereo signals, and described system comprises:

The as mentioned above equipment of coding of stereo signals, receiving equipment and transfer equipment, receiving equipment are used for receiving stereo input signal and are used for stereo coding equipment; Transfer equipment 52 is for the result who transmits described stereo coding equipment 51.

Therefore, by introducing the embodiment of the invention, group delay and faciation position are estimated and are applied to stereo coding, so that can obtain more accurately sound field information by the azimuth information method of estimation of the overall situation under low code check, strengthen sound field effect, promoted greatly code efficiency.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, accompanying drawing in the following describes only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is that a stereo encoding method is implemented schematic diagram;

Fig. 2 is that another stereo encoding method is implemented schematic diagram;

Fig. 3 is that another stereo encoding method is implemented schematic diagram;

Fig. 4 a is that another stereo encoding method is implemented schematic diagram;

Fig. 4 b is another stereo encoding method embodiment schematic diagram;

Fig. 5 is that another stereo encoding method is implemented schematic diagram;

Fig. 6 is that an estimation stereophonic signal device is implemented schematic diagram;

Fig. 7 is that another estimation stereophonic signal device is implemented schematic diagram;

Fig. 8 is that another estimation stereophonic signal device is implemented schematic diagram;

Fig. 9 is that another estimation stereophonic signal device is implemented schematic diagram;

Figure 10 is that another estimation stereophonic signal device is implemented schematic diagram;

Figure 11 is that a coding of stereo signals equipment is implemented schematic diagram;

Figure 12 is a coding of stereo signals System Implementation schematic diagram;

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.

Embodiment one:

Fig. 1 is the schematic diagram that a stereo encoding method is implemented, and comprising:

Step 101: conversion time domain stereo left channel signal and right-channel signals form left channel signals and right-channel signals on the frequency domain to frequency domain.

Step 102: the L channel frequency-region signal on the frequency domain and R channel frequency-region signal are through mixed signal (DMX) under the lower mixed generation monophony, transmit the bit after the DMX signal carries out coded quantization, and the spatial parameter of left channel signals and right-channel signals on the frequency domain that extracts is carried out quantization encoding.

Spatial parameter is the parameter of representative stereophonic signal spatial character, such as the ILD parameter.

Step 103: utilize left channel signals and the group delay between right-channel signals (Group Delay) and faciation position (Group Phase) on the left and right sound track signals estimation frequency domain on the frequency domain.

Group delay reflects the overall azimuth information of the time delays of the envelope between the stereo left and right acoustic channels, and the global information of the similarity of the waveform of stereo left and right acoustic channels behind time unifying is reflected in the faciation position.

Step 104: the group delay that the described estimation of quantization encoding obtains and faciation position.

Group delay and faciation position form the content of side information code stream to be transmitted through quantization encoding.

In the method for embodiment of the invention stereo coding, when extracting stereophonic signal spatial character parameter, estimate group delay and faciation position, estimation obtains group delay and the faciation position is applied in the stereo coding, so that the effective combination of azimuth information of spatial parameter and the overall situation, azimuth information method of estimation by the overall situation can obtain more accurately sound field information under low code check, strengthen sound field effect, promoted greatly code efficiency.

Embodiment two:

Fig. 2 is the schematic diagram of another stereo encoding method embodiment, comprising:

Step 201, conversion time domain stereo left channel signal and right-channel signals are formed on stereo left channel signal X on the frequency domain to frequency domain ₁(k) and right-channel signals X ₂(k), wherein k is the index value of the Frequency point of frequency signal.

Step 202 is carried out lower mixed operation to the left channel signals on the frequency domain and right-channel signals, mixed signal and transmission under the coded quantization, and encoded stereo spatial parameter, and quantification forms side information and transmits, and can comprise the steps:

Step 2021, the left channel signals on the frequency domain and right-channel signals are carried out lower mixed, generate mixed signal DMX under the monophony after synthesizing.

Step 2022, mixed signal DMX under the coded quantization monophony, and transmit the information that quantizes.

Step 2023, the left channel signals on the extraction frequency domain and the ILD parameter of right-channel signals.

Step 2024 is carried out quantization encoding to described ILD parameter and is formed side information and transmission.

2021,2022 steps and 2023,2024 steps are independent of each other mutually, can independently carry out, and the side information that the former forms can carry out multiplexing rear transmission with the side information that the latter forms.

In another embodiment, can carry out the time-domain signal that frequency-time domain transformation obtains mixed signal DMX under the monophony through mixed signal under the lower mixed monophony that obtains, the bit that the time-domain signal of mixed signal DMX under the monophony is carried out behind the coded quantization transmits again.

Step 203 is estimated group delay and faciation position between the left and right sound track signals on the frequency domain.

Utilize left and right sound track signals on the frequency domain to estimate that group delay and faciation position between left and right sound track signals comprise the cross correlation function of determining about stereo left and right acoustic channels frequency-region signal, obtain group delay and the faciation position of stereophonic signal according to the Signal estimation of cross correlation function, as shown in Figure 3, specifically can comprise the steps:

Step 2031 is determined about the cross correlation function between stereo left and right sound track signals on the frequency domain.

The cross correlation function of stereo left and right acoustic channels frequency-region signal can be the cross correlation function of weighting, in the process of determining cross correlation function the cross correlation function of estimating group delay and faciation position being weighted operation compares with other operations so that the coding of stereo signals result inclines to stable more, the cross correlation function of weighting is the weighting of product of the conjugation of L channel frequency-region signal and R channel frequency-region signal, and the value of the cross correlation function of described weighting on half frequency of the length N of stereophonic signal time-frequency conversion is 0.The form of the cross correlation function of stereo left and right acoustic channels frequency-region signal can followingly represent:

C_{r} (k) = \begin{matrix} W (k) X_{1} (k) {X^{*}}_{2} (k) & 0 \leq k \leq N / 2 \\ 0 & k > N / 2 \end{matrix},

W(k wherein) expression weighting function, X ^* ₂(k) expression X ₂(k) conjugate function perhaps also can be expressed as: C _r(k)=X ₁(k) X ^* ₂(k) 0≤k≤N/2+1.In the form of another cross correlation function, in conjunction with different weighted type, the cross correlation function of stereo left and right acoustic channels frequency-region signal can followingly represent:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = 0 \\ 2^{*} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = N / 2 \\ 0 & k > N / 2 \end{matrix},

Wherein, N is the length of stereophonic signal time-frequency conversion, | X ₁(k) | and | X ₂(k) | be X ₁(k) and X ₂(k) corresponding amplitude.The cross correlation function of weighting is at frequency 0, and frequency N/2 is upper to be the inverse of left and right sound track signals amplitude product on corresponding frequency, and the cross correlation function of weighting is left and right sound track signals 2 times of the inverse of amplitude product at other frequencies.In other enforcement, the cross correlation function of the weighting of stereo left and right acoustic channels frequency-region signal can also be expressed as other form, for example:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = 0 \\ 2^{*} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = N / 2 \\ 0 & k > N / 2 \end{matrix},

To this, the present embodiment does not limit, and the random variation of above-mentioned each formula is all within protection domain.

Step 2032 is carried out frequently conversion of inverse time to the cross correlation function about the weighting of stereo left and right acoustic channels frequency-region signal and is obtained cross correlation function time-domain signal C _r(n), the cross correlation function time-domain signal is the signal of plural number herein.

Step 2033, estimation obtains group delay and the faciation position of stereophonic signal according to the cross correlation function time-domain signal.

In another embodiment, can be directly estimate to obtain group delay and the faciation position of stereophonic signal according to what step 2031 was determined about the cross correlation function between stereo left and right sound track signals on the frequency domain.

In step 2033, can directly estimate to obtain group delay and the faciation position of stereophonic signal according to the cross correlation function time-domain signal; Also can carry out some Signal Pretreatment to the cross correlation function time-domain signal, based on group delay and the faciation position of pretreated Signal estimation stereophonic signal.

If the cross correlation function time-domain signal is carried out some Signal Pretreatment, can comprise based on group delay and the faciation position of pretreated Signal estimation stereophonic signal:

1) the cross correlation function time-domain signal is carried out normalized or smoothing processing;

Wherein the cross correlation function time-domain signal being carried out smoothing processing can followingly carry out:

C _ravg(n)=α*C _ravg(n)+β*C _r(n)

Wherein, α and β are the constants of weighting, 0≤α≤1, β=1-α, in the present embodiment, before estimating group delay and faciation position, the cross correlation function time-domain signal between the left and right acoustic channels that obtains is carried out smoothly waiting pre-service so that the group delay that estimates is better stable.

2) the cross correlation function time-domain signal is carried out further carrying out smoothing processing after the normalized;

3) absolute value of cross correlation function time-domain signal carried out normalized or smoothing processing;

Wherein the absolute value of cross correlation function time-domain signal being carried out smoothing processing can followingly carry out:

C _{ravg_abs}(n)=α*C _ravg(n)+β*|C _r(n)|，

4) the cross correlation function time-domain signal carries out normalized absolute value signal afterwards and further carries out smoothing processing.

Understandable, before the group delay and faciation position of estimating stereophonic signal, can also comprise other processing for the pre-treatment of cross correlation function time-domain signal, such as auto-correlation processing etc., this moment is to also the comprising auto-correlation or/and smoothing processing etc. the pre-service of cross correlation function time-domain signal.

Pre-treatment in conjunction with above-mentioned cross correlation function time-domain signal, estimate group delay and the identical estimation mode of faciation position employing of stereophonic signal in the step 2033, also can estimate respectively, concrete, can adopt at least the embodiment of following estimation faciation position and group delay:

Step 2033 embodiment one, shown in Fig. 4 a:

Estimate to obtain group delay according to the cross correlation function time-domain signal or based on the index corresponding to value of amplitude maximum in the cross correlation function time-domain signal after processing, obtain phase angle corresponding to cross correlation function corresponding to group delay, estimate to obtain the faciation position, comprise the steps:

Judge the relation of index corresponding to the value of amplitude maximum in the time-domain signal cross correlation function and the symmetric interval relevant with transform length N, in one embodiment, if index corresponding to the value of amplitude maximum is less than or equal to N/2 in the time-domain signal cross correlation function, group delay equals the index corresponding to value of amplitude maximum in this time-domain signal cross correlation function so, if index corresponding to the value of amplitude maximum is greater than N/2 in the related function, group delay deducts transform length N for this index so, can be with [0, N/2] and (N/2, N] regard first symmetric interval and second symmetric interval relevant with stereophonic signal time-frequency conversion length N as, in another kind is implemented, the scope of judging can be [0, m] and (N-m, N] the first symmetric interval and the second symmetric interval, wherein m is less than N/2, index corresponding to the value of amplitude maximum and the relevant information of m compare in the time-domain signal cross correlation function, the index corresponding to value of amplitude maximum is positioned at interval [0 in time domain Signal cross correlation function, m], then group delay equals the index corresponding to value of amplitude maximum in this time-domain signal cross correlation function, the index corresponding to value of amplitude maximum is positioned at interval (N-m in time domain Signal cross correlation function, N], then group delay deducts transform length N for this index.But in actual applications, what judge can be the value of closing on of index corresponding to the value of amplitude maximum in the time-domain signal cross correlation function, under the condition that does not affect subjective effect or restriction according to demand can suitably select to be slightly smaller than index corresponding to the value of amplitude maximum as Rule of judgment, as index corresponding to second largest value of amplitude or and index corresponding to value that differ in fixing or preset range of amplitude maximal value all be suitable for, index corresponding to the value of amplitude maximum is as example in the time-domain signal cross correlation function, and a kind of concrete form embodies as follows:

d_{g} = \begin{matrix} \arg \max | C_{ravg} (n) | & \arg \max | C_{ravg} (n) | \leq N / 2 \\ \arg \max | C_{ravg} (n) | - N & \arg \max | C_{ravg} (n) | > N / 2 \end{matrix},

Argmax|C wherein _Ravg(n) | be C _Ravg(n) index corresponding to value of amplitude maximum in, the present embodiment is protected the various distortion of above-mentioned form equally.

Phase angle corresponding to time-domain signal cross correlation function corresponding according to group delay is as group delay d _gMore than or equal to zero, by determining d _gThe phase angle that corresponding cross correlation value is corresponding estimates to obtain the faciation position; Work as d _gLess than zero the time, the faciation position is exactly d _gThe phase angle corresponding to cross correlation value of correspondence on the+N index, specifically can adopt the random variation of following a kind of form or this form to embody:

θ_{g} = \begin{matrix} &angle; C_{ravg} (d_{g}) d_{g} &GreaterEqual; 0 \\ &angle; C_{ravg} (d_{g} + N) d_{g} < 0 \end{matrix},

∠ C wherein _Ravg(d _g) be time-domain signal cross-correlation function value C _Ravg(d _g) phase angle, ∠ C _Ravg(d _g+ N) be time-domain signal cross-correlation function value C _Ravg(d _g+ N) phase angle.

Step 2033 embodiment two, shown in Fig. 4 b:

To described cross correlation function, or based on the described cross correlation function after processing, extract its phase place

Function ∠ C wherein _r(k is used for extracting plural C _r(k) phase angle is asked for the average α of phase differential in frequency of low strap ₁Determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information, same, obtain faciation position information according to the phase place of the current frequency of described cross correlation function and the difference of frequency index and phase differential average product, specifically can adopt following mode:

α_{1} = \begin{matrix} E {\hat{Φ} (k + 1) - \hat{Φ} (k)} & k < Max \end{matrix};

d_{g} = - \frac{a_{1} N}{2^{*} π^{*} Fs};

θ_{g} = \begin{matrix} E {\hat{Φ} (k) - {a_{1}}^{*} k} & k < {Max}_{10} \end{matrix}

Wherein

The average of expression phase differential, the frequency of Fs for adopting, Max prevents phase rotating for calculating the cut-off upper limit of group delay and faciation position.

Step 204: the described group delay of quantization encoding and faciation position form side information and transmit.

Pre-if in the random scope group delay is carried out scalar quantization, this scope is symmetrical positive negative value [Max, Max] or Stochastic Conditions under usable levels, group delay after the scalar quantization is adopted long time tranfer or adopts the differential coding processing to obtain side information, the span of faciation position is usually [0,2*PI] in the scope, be specifically as follows [0,2*PI), also can be (PI, PI] scope in scalar quantization and coding are carried out in the faciation position, the side information that the group delay behind the quantization encoding and faciation position are formed carries out multiplexing formation encoding code stream, is sent to the stereophonic signal recovery device.

In the method for embodiment of the invention stereo coding, utilize left and right sound track signals on the frequency domain to estimate the group delay that can embody signal overall situation azimuth information between the stereophonic signal left and right acoustic channels and faciation position so that the azimuth information of sound field is effectively strengthened, the estimation of stereophonic signal spatial character parameter and group delay and faciation position combined be applied in the little stereo coding of demand bit rate, so that the effective combination of azimuth information of spatial information and the overall situation, obtain more accurately sound field information, strengthen sound field effect, promoted greatly code efficiency.

Embodiment three

The schematic diagram that Fig. 5 implements for another stereo encoding method comprises:

On the enforcement basis of embodiment one and embodiment two, stereo coding also comprises respectively:

Step 105/205: obtain stereo parameter IPD according to described faciation position and group delay information estimator, quantize described IPD parameter and transmission.

When quantizing IPD, estimate with group delay (Group Delay) and faciation position (Group Phase)

And carry out difference processing with original IPD (k), and the IPD of difference is carried out quantization encoding, can followingly represent:

\overset{&OverBar;}{IPD (k)} = \frac{- 2 π {d_{g}}^{*} k}{N} + θ_{g},

1≤k≤N/2-1

And to IPD _Diff(k) quantize, the bit after the quantification is delivered to decoding end, in another embodiment, also can directly quantize IPD, and bit stream is slightly high, quantizes more accurate.

In the present embodiment, estimate that stereo parameter IPD and coded quantization can promote code efficiency in the situation that there is high code rate to use, and strengthen sound field effect.

Embodiment four,

Fig. 6 is that the device 04 of an estimation stereophonic signal is implemented schematic diagram, comprising:

Weighting cross-correlation unit 41 is used for definite cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain.

Weighting cross-correlation unit 41 receives stereo left and right sound track signals on the frequency domain, the stereo left and right sound track signals of frequency domain is processed the cross correlation function that obtains about the weighting between the stereo left and right sound track signals of frequency domain.

Pretreatment unit 42 is used for the described cross correlation function of weighting is carried out pre-service.

Pretreatment unit 42 receives the described cross correlation function of the weighting that obtains according to weighting cross-correlation unit 41, and the described cross correlation function of weighting is carried out pre-service, obtains the pre-service result, i.e. pretreated cross correlation function time-domain signal.

Estimation unit 43 is estimated group delay and faciation position between stereo left and right sound track signals according to the pre-service result.

Estimation unit 43 receives the pre-service result of pretreatment unit 42, obtain pretreated cross correlation function time-domain signal, the information of extracting described cross correlation function time-domain signal judged relatively or calculating operation estimate to obtain group delay and faciation position between stereo left and right sound track signals.

Among this another embodiment, the device 04 of estimating stereophonic signal can also comprise frequency-time domain transformation unit 44, be used for receiving the output of weighting cross-correlation unit 41, described cross correlation function about the weighting of the stereo left and right sound track signals of frequency domain is carried out frequently conversion of inverse time obtain the cross correlation function time-domain signal, and send described cross correlation function time-domain signal to described pretreatment unit 42.

By introducing the embodiment of the invention, group delay and faciation position are estimated and are applied to stereo coding, so that under low code check, can obtain more accurately sound field information by the azimuth information method of estimation of the overall situation, strengthened sound field effect, promote greatly code efficiency.

Embodiment five,

Fig. 7 implements schematic diagram for another device 04 of estimating stereophonic signal, comprising:

Weighting cross-correlation unit 41 receives stereo left and right sound track signals on the frequency domain, the stereo left and right sound track signals of frequency domain is processed the cross correlation function that obtains about the weighting between the stereo left and right sound track signals of frequency domain.The cross correlation function of stereo left and right acoustic channels frequency-region signal can be the cross correlation function of weighting, so that coding result is more stable, the cross correlation function of weighting is the weighting of product of the conjugation of L channel frequency-region signal and R channel frequency-region signal, and the value of the cross correlation function of described weighting on half frequency of the length N of stereophonic signal time-frequency conversion is 0.The form of the cross correlation function of the weighting of stereo left and right acoustic channels frequency-region signal can followingly represent:

C_{r} (k) = \begin{matrix} W (k) X_{1} (k) {X^{*}}_{2} (k) & 0 \leq k \leq N / 2 \\ 0 & k > N / 2 \end{matrix},

W(k wherein) expression weighting function, X ^* ₂(k) expression X ₂(k) conjugate function perhaps also can be expressed as: C _r(k)=X ₁(k) X ^* ₂(k) 0≤k≤N/2+1.In the form of the cross correlation function of another weighting, in conjunction with different weighted type, the cross correlation function of the weighting of stereo left and right acoustic channels frequency-region signal can followingly represent:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = 0 \\ 2^{*} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = N / 2 \\ 0 & k > N / 2 \end{matrix},

Wherein, N is the length of stereophonic signal time-frequency conversion, | X ₁(k) | and | X ₂(k) | be X ₁(k) and X ₂(k) corresponding amplitude.The cross correlation function of weighting is at frequency 0, and frequency N/2 is upper to be the inverse of left and right sound track signals amplitude product on corresponding frequency, and the cross correlation function of weighting is left and right sound track signals 2 times of the inverse of amplitude product at other frequencies.

Perhaps also can adopt following form with and the distortion:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = 0 \\ 2^{*} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = N / 2 \\ 0 & k > N / 2 \end{matrix} .

Frequency-time domain transformation unit 44, what receive that weighting cross-correlation unit 41 determines determines cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain, the cross correlation function about the weighting of stereo left and right acoustic channels frequency-region signal is carried out frequently conversion of inverse time obtain cross correlation function time-domain signal C _r(n), the cross correlation function time-domain signal is the signal of plural number herein.

Pretreatment unit 42 receives frequency-time domain transformation according to the described cross correlation function time-domain signal that described cross correlation function obtains, and described cross correlation function is carried out pre-service, obtains the pre-service result, namely passes through pretreated cross correlation function time-domain signal.

Pretreatment unit 42 can comprise in the following unit one or more according to different demands: normalization unit 421, pretreatment unit 422 and absolute value element 423.

1) the 421 pairs of cross correlation function time-domain signals in normalization unit carry out normalized or 422 pairs of cross correlation function time-domain signals of pretreatment unit and carry out pre-service and process.

Wherein the cross correlation function time-domain signal being carried out pre-service processes and can followingly carry out: C _Ravg(n)=α * C _Ravg(n)+β * C _r(n)

Wherein, α and β are the constants of weighting, 0≤α≤1, β=1-α, in the present embodiment, before estimating group delay and faciation position, the cross correlation function of the weighting between the left and right acoustic channels that obtains carried out the pre-service such as pre-service so that the group delay that estimates is better stable.

2) the 421 pairs of cross correlation function time-domain signals in normalization unit carry out after the normalized, and pretreatment unit 422 further carries out the pre-service processing to the result of normalization unit 421.

3) absolute value element 423 obtains the absolute value information of cross correlation function time-domain signal, the 421 pairs of described absolute value information in normalization unit carry out normalized or 422 pairs of described absolute value information of pretreatment unit are carried out the pre-service processing, perhaps carry out first normalization and carry out the pre-service processing again.

Wherein the absolute value of cross correlation function time-domain signal is carried out pre-service and processes and can followingly carry out,

C _{ravg_abs}(n)=α*C _ravg(n)+β*|C _r(n)|。

4) the cross correlation function time-domain signal carries out normalized absolute value signal afterwards and further carries out the pre-service processing.

Pretreatment unit 42 is before the group delay and faciation position of estimating stereophonic signal, the processing unit that can also comprise other for the pre-treatment of cross correlation function time-domain signal, such as auto-correlation unit 424 etc., the pre-service of 42 pairs of cross correlation function time-domain signals of pretreatment unit this moment also comprises auto-correlation or/and pre-service processing etc.

In another embodiment, described estimation stereophonic signal device 04 also can not comprise pretreatment unit, the result of frequency-time domain transformation unit 44 is directly sent in the following estimation unit 43 of described estimation stereophonic signal device 4, estimation unit 43 is used for estimating to obtain group delay according to the cross correlation function time-domain signal of weighting or based on the index corresponding to value of the cross correlation function time-domain signal amplitude maximum of the weighting after processing, obtain phase angle corresponding to time-domain signal cross correlation function corresponding to group delay, estimate to obtain the faciation position.

Estimation unit 43, according to group delay and the faciation position between the stereo left and right sound track signals of output estimation of the output of pretreatment unit 42 or frequency-time domain transformation unit 44, as shown in Figure 8, estimation unit 43 further comprises: 431 judging units, receive the cross correlation function time frequency signal of pretreatment unit 42 or 44 outputs of frequency-time domain transformation unit, judge the relation of index corresponding to the value of amplitude maximum in the time-domain signal cross correlation function and the symmetric interval relevant with transform length N, judged result is sent to group delay unit 432, the group delay that excites group delay unit 432 to estimate between the stereophonic signal left and right acoustic channels, in one embodiment, if the result of judging unit 431 is that index corresponding to the value of amplitude maximum in the time-domain signal cross correlation function is less than or equal to N/2, group delay unit 432 estimates that group delay equals the index corresponding to value of amplitude maximum in this time-domain signal cross correlation function, if the result of judging unit 431 is that index corresponding to the value of amplitude maximum in the related function is greater than N/2, group delay unit 432 estimates that group delay deducts transform length N for this index, can be with [0, N/2] and (N/2, N] regard first symmetric interval and second symmetric interval relevant with stereophonic signal time-frequency conversion length N as, in another kind is implemented, the scope of judging can be [0, m] and (N-m, N] the first symmetric interval and the second symmetric interval, wherein m is less than N/2, index corresponding to the value of amplitude maximum and the relevant information of m compare in the time-domain signal cross correlation function, the index corresponding to value of amplitude maximum is positioned at interval [0 in time domain Signal cross correlation function, m], then group delay equals the index corresponding to value of amplitude maximum in this time-domain signal cross correlation function, the index corresponding to value of amplitude maximum is positioned at interval (N-m in time domain Signal cross correlation function, N], then group delay deducts transform length N for this index.But in actual applications, what judge can be the value of closing on of index corresponding to the value of amplitude maximum in the time-domain signal cross correlation function, under the condition that does not affect subjective effect or restriction according to demand can suitably select to be slightly smaller than index corresponding to the value of amplitude maximum as Rule of judgment, as index corresponding to second largest value of amplitude or and index corresponding to value that differ in fixing or preset range of amplitude maximal value all be suitable for, comprise following a kind of form or the random variation of this form:

d_{g} = \begin{matrix} \arg \max | C_{ravg} (n) | & \arg \max | C_{ravg} (n) | \leq N / 2 \\ \arg \max | C_{ravg} (n) | - N & \arg \max | C_{ravg} (n) | > N / 2 \end{matrix},

Argmax|C wherein _Ravg(n) | be C _Ravg(n) index corresponding to value of amplitude maximum in.Faciation bit location 433 receives group delay unit 432 results, according to phase angle corresponding to group delay time-domain signal cross correlation function of estimating to obtain, as group delay d _gMore than or equal to zero, by determining d _gThe phase angle that corresponding cross correlation value is corresponding estimates to obtain the faciation position; Work as d _gLess than zero the time, the faciation position is exactly d _gThe phase angle corresponding to cross correlation value of correspondence on the+N index, specifically can adopt the random variation of following a kind of form or this form to embody:

θ_{g} = \begin{matrix} &angle; C_{ravg} (d_{g}) d_{g} &GreaterEqual; 0 \\ &angle; C_{ravg} (d_{g} + N) d_{g} < 0 \end{matrix},

Among another embodiment, the device 04 of described estimation stereophonic signal also comprises parameter characteristic unit 45, and as shown in Figure 9, the parameter characteristic unit obtains stereo parameter IPD according to described faciation position and group delay information estimator.

Embodiment six,

Figure 10 implements schematic diagram for another device 04 ' of estimating stereophonic signal, from embodiment five different being, the cross correlation function of the weighting of the stereo left and right acoustic channels frequency-region signal that the weighting cross-correlation unit is determined in the present embodiment sends pretreatment unit 42 or estimation unit 43 to, estimation unit 43 extracts the phase place of cross correlation function, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information, obtain faciation position information according to the phase place of the current frequency of cross correlation function and the difference of frequency index and phase differential average product.

Estimation unit 43 is according to group delay and faciation position between the stereo left and right sound track signals of output estimation of the output of pretreatment unit 42 or weighting cross-correlation unit 41, estimation unit 43 further comprises: the 430 pairs of cross correlation functions in phase extraction unit, or based on the cross correlation function after processing, extract its phase place

Function ∠ C wherein _r(k is used for extracting plural C _r(k) phase angle, group delay unit 432 ' are asked for the average α of phase differential in frequency of low strap ₁Faciation position unit 433 ' determines group delay according to the product of phase differential and transform length and the ratio relation of frequency information, same, obtain faciation position information according to the phase place of the current frequency of cross correlation function and the difference of frequency index and phase differential average product, specifically can adopt following mode:

α_{1} = \begin{matrix} E {\hat{Φ} (k + 1) - \hat{Φ} (k)} & k < Max \end{matrix}

d_{g} = - \frac{a_{1} N}{2^{*} π^{*} Fs}

θ_{g} = \begin{matrix} E {\hat{Φ} (k) - {a_{1}}^{*} k} & k < Max \end{matrix}

Wherein

In the equipment of embodiment of the invention stereo coding, utilize left and right sound track signals on the frequency domain to estimate the group delay that can embody signal overall situation azimuth information between the stereophonic signal left and right acoustic channels and faciation position so that the azimuth information of sound field is effectively strengthened, the estimation of stereophonic signal spatial character parameter and group delay and faciation position combined be applied in the little stereo coding of demand bit rate, so that the effective combination of azimuth information of spatial information and the overall situation, obtain more accurately sound field information, strengthen sound field effect, promoted greatly code efficiency.

Embodiment seven,

Figure 11 is that the equipment 51 of a coding of stereo signals is implemented schematic diagram, comprising:

Converting means 01 is used for conversion time domain stereo left channel signal and right-channel signals and forms left channel signals and right-channel signals on the frequency domain to frequency domain;

Lower mixing device 02 is for mixed signal under the left channel signals on the frequency domain and the lower mixed generation monophony of right-channel signals process;

Parameter extraction device 03 is for the spatial parameter of left channel signals and right-channel signals on the extraction frequency domain;

Estimate stereophonic signal device 04, be used for utilizing left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels;

Code device 05 is used for the described group delay of quantization encoding and faciation position, mixed signal under described spatial parameter and the described monophony.

Estimate that wherein stereo 04 is applicable to above-described embodiment four-embodiment six, estimate that stereophonic signal device 04 receives through left channel signals and right-channel signals on the frequency domain that obtains behind the converting means 01, utilize left and right sound track signals on the described frequency domain to take embodiment arbitrary among embodiment four-embodiment six to estimate to obtain group delay and faciation position between stereo left and right acoustic channels, and group delay and the faciation position that obtains be sent to code device 05, equally, code device 05 is also received the spatial parameter of left channel signals and right-channel signals on the frequency domain that extracts to parameter extraction device 03,05 pair of information that receives of code device is carried out quantization encoding and is formed side information, and code device 05 also quantizes the bit after described lower mixed signal carries out coded quantization.Described code device 05 can be as a whole, be used for receiving different information and carry out quantization encoding, also can be separated into a plurality of code devices and process the different information that receives, connect with lower mixing device 02 such as the first code device 501, be used for mixed information under the quantization encoding, the second code device 502 is connected with the parameter extraction device, be used for the described spatial parameter of quantization encoding, the 3rd code device 503, be used for and be connected that the stereophonic signal device connects, for the described group delay of quantization encoding and faciation position.In another embodiment, comprise parameter characteristic unit 45 if estimate stereophonic signal device 04, described code device can also comprise that the 4th code device is used for quantization encoding IPD.When quantizing IPD, estimate with group delay (Group Delay) and faciation position (Group Phase)

\overset{&OverBar;}{IPD (k)} = \frac{- 2 π {d_{g}}^{*} k}{N} + θ_{g},

1≤k≤N/2-1

And to IPD _Diff(k) bit after quantizing to obtain quantizing in another embodiment, also can directly quantize IPD, and bit stream is slightly high, quantizes more accurate.

The equipment 51 of described stereo coding can be stereophonic encoder or other equipment that stereo multi-channel signal is encoded and processed according to different demands.

Embodiment eight

Figure 12 is that the system 666 of a coding of stereo signals implements schematic diagram, also comprises on the basis of coding of stereo signals equipment 51 as described in embodiment seven:

Receiving equipment 50 receives stereo input signal and is used for coding of stereo signals equipment 51; Transfer equipment 52, for the result who transmits described stereo coding equipment 51, transfer equipment 52 sends to decoding end for decoding with the result of stereo coding equipment generally speaking.

One of ordinary skill in the art will appreciate that all or part of flow process that realizes in above-described embodiment method, to come the relevant hardware of instruction to finish by computer program, described program can be stored in the computer read/write memory medium, this program can comprise the flow process such as the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access Memory, RAM) etc.

It should be noted that at last: above embodiment is only in order to technical scheme that the embodiment of the invention is described but not limit it, although with reference to preferred embodiment the embodiment of the invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment or be equal to replacement the technical scheme of the embodiment of the invention, and these modifications or be equal to replacement and also can not make amended technical scheme break away from the spirit and scope of embodiment of the invention technical scheme.

Claims

1. the method for a stereo coding is characterized in that, described method comprises:

Conversion time domain stereo left channel signal and right-channel signals form left channel signals and right-channel signals on the frequency domain to frequency domain;

Left channel signals on the frequency domain and right-channel signals transmit the bit after described lower mixed signal carries out coded quantization through mixed signal under the lower mixed generation monophony;

The spatial parameter of left channel signals and right-channel signals on the extraction frequency domain;

Utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels;

The described group delay of quantization encoding and faciation position and described spatial parameter.

2. the method for claim 1, it is characterized in that: describedly utilize left and right sound track signals on the frequency domain to estimate to comprise before group delay between stereo left and right acoustic channels and the faciation position to determine about the cross correlation function between stereo left and right sound track signals on the frequency domain, described cross correlation function comprises the simple crosscorrelation of the weighting of the product of the conjugation of left channel signals and right-channel signals on the frequency domain.

3. method as claimed in claim 2, it is characterized in that: the simple crosscorrelation of the weighting between the stereo left and right sound track signals of frequency domain can be expressed as:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = 0 \\ 2^{*} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = N / 2 \\ 0 & k > N / 2 \end{matrix},

Or

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = 0 \\ 2^{*} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = N / 2 \\ 0 & k > N / 2 \end{matrix}

Wherein, N is the length of stereophonic signal time-frequency conversion, | X ₁(k) | and | X ₂(k) | be X ₁(k) and X ₂(k) corresponding amplitude, X ₁(k) be stereo left channel signal on the frequency domain, X ₂(k) be stereo right-channel signals on the frequency domain; The described cross correlation function of weighting is at frequency 0, and the value on the frequency N/2 is the inverse of left and right sound track signals amplitude product on corresponding frequency, and the described cross correlation function of weighting is at 2 times of the inverse of other frequencies amplitude product that is left and right sound track signals.

4. method as claimed in claim 3 is characterized in that: described method comprises that also described cross correlation function is carried out frequently conversion of inverse time obtains the cross correlation function time-domain signal,

Or described cross correlation function is carried out frequently conversion of inverse time obtain the cross correlation function time-domain signal, described time-domain signal is carried out pre-service.

5. method as claimed in claim 4 is characterized in that: according to the cross correlation function time-domain signal, describedly utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels to comprise:

Estimate to obtain group delay according to the cross correlation function time-domain signal or based on the index corresponding to value of amplitude maximum in the cross correlation function time-domain signal after processing, obtain phase angle corresponding to cross correlation function corresponding to group delay, estimate to obtain the faciation position.

6. method as claimed in claim 3 is characterized in that: according to described cross correlation function, describedly utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels to comprise:

Extract the phase place of described cross correlation function, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information;

Obtain faciation position information according to the phase place of the current frequency of cross correlation function of weighting and the difference of frequency index and phase differential average product.

7. such as claim 5 or 6 described methods, it is characterized in that, described method also comprises according to described faciation position and group delay estimates to obtain stereo minute information, described minute information of quantization encoding, information comprised in described minute: the phase differential parameter between left and right acoustic channels, the phase differential parameter of simple crosscorrelation parameter and/or L channel and lower mixed signal.

8. a method of estimating stereophonic signal is characterized in that, described method comprises:

Determine the cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain;

Described weighting cross correlation function is carried out pre-service;

Estimation obtains group delay and the faciation position between stereo left and right sound track signals according to the pre-service result.

9. method as claimed in claim 8, it is characterized in that: the cross correlation function of the weighting of the stereo left and right sound track signals of frequency domain can be expressed as:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = 0 \\ 2^{*} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = N / 2 \\ 0 & k > N / 2 \end{matrix}

Or

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = 0 \\ 2^{*} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = N / 2 \\ 0 & k > N / 2 \end{matrix}

Wherein, N is the length of stereophonic signal time-frequency conversion, | X ₁(k) | and | X ₂(k) | be X ₁(k) and X ₂(k) corresponding amplitude, X ₁(k) be stereo left channel signal on the frequency domain, X ₂(k) be stereo right-channel signals on the frequency domain; The cross correlation function of described weighting is at frequency 0, and the value on the frequency N/2 is the inverse of left and right sound track signals amplitude product on corresponding frequency, and the cross correlation function of described weighting is 2 times of inverse of the amplitude product of left and right sound track signals at other frequencies.

10. method as claimed in claim 9 is characterized in that, described method also comprises: the cross correlation function about the weighting of the stereo left and right sound track signals of frequency domain is carried out frequently conversion of inverse time obtain the cross correlation function time-domain signal.

11. method as claimed in claim 10 is characterized in that, described cross correlation function time-domain signal is carried out pre-service comprise the cross correlation function time-domain signal is carried out cross correlation function time-domain signal C after normalized and smoothing processing obtain processing _Ravg(n), wherein said smoothing processing comprises:

C _ravg(n)=α*C _ravg(n)+β*C _r(n)，

Perhaps the absolute value signal of described cross correlation function time-domain signal is carried out cross correlation function time-domain signal C after normalized and smoothing processing obtain processing _{Ravg_abs}(n), wherein said smoothing processing comprises:

C _{ravg_abs}(n)=α*C _ravg(n)+β*|C _r(n)|；

α and β are the constants of weighting, 0≤α≤1, β=1-α, C _r(n) be the cross correlation function time-domain signal.

12. method as claimed in claim 11 is characterized in that, estimates that according to the pre-service result group delay and the faciation position that obtain stereophonic signal comprise:

Judge the relation of index corresponding to the value of amplitude maximum in the cross correlation function time-domain signal and the symmetric interval relevant with stereophonic signal time-frequency conversion length N, if index corresponding to the value of amplitude maximum is positioned at the first symmetric interval [0 in the cross correlation function time-domain signal, m], group delay equals the index corresponding to value of amplitude maximum in this cross correlation function time-domain signal so, if index corresponding to the value of amplitude maximum is positioned at the second symmetric interval (N-m in the related function, N], group delay deducts N for this index; M is less than or equal to N/2;

According to phase angle corresponding to cross correlation function corresponding to group delay, as group delay d _gMore than or equal to zero, by determining d _gThe phase angle that corresponding cross correlation value is corresponding estimates to obtain the faciation position; Work as d _gLess than zero the time, the faciation position is d _gThe phase angle corresponding to cross correlation value of correspondence on the+N index.

13. method as claimed in claim 12 is characterized in that, estimates that according to the pre-service result group delay and the faciation position that obtain stereophonic signal comprise:

Group delay

d_{g} = \begin{matrix} \arg \max | C_{ravg} (n) | & \arg \max | C_{ravg} (n) | \leq N / 2 \\ \arg \max | C_{ravg} (n) | - N & \arg \max | C_{ravg} (n) | > N / 2 \end{matrix},

The faciation position

θ_{g} = \begin{matrix} &angle; C_{ravg} (d_{g}) d_{g} &GreaterEqual; 0 \\ &angle; C_{ravg} (d_{g} + N) d_{g} < 0 \end{matrix},

Wherein, N is the length of stereophonic signal time-frequency conversion, argmax|C _Ravg(n) | be C _Ravg(n) index corresponding to value of amplitude maximum in, ∠ C _Ravg(d _g) be cross-correlation function value C _Ravg(d _g) phase angle, ∠ C _Ravg(d _g+ N) be cross-correlation function value C _Ravg(d _g+ N) phase angle.

14. method as claimed in claim 8 is characterized in that, estimates that according to the pre-service result group delay and the faciation position that obtain between stereo left and right sound track signals comprise:

To described cross correlation function, or based on the cross correlation function after processing, extract it

Phase place

, function ∠ C wherein _r(k is used for extracting plural C _r(k) phase angle;

In frequency of low strap, ask for the average α of phase differential ₁, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information, obtain the faciation position according to the phase place of the current frequency of described cross correlation function and the difference of frequency index and phase differential average product.

15. method as claimed in claim 14 is characterized in that, estimates that according to the pre-service result group delay and the faciation position that obtain between stereo left and right sound track signals comprise:

α_{1} = \begin{matrix} E {\hat{Φ} (k + 1) - \hat{Φ} (k)} & k < Max \end{matrix};

d_{g} = - \frac{a_{1} N}{2^{*} π^{*} Fs};

θ_{g} = \begin{matrix} E {\hat{Φ} (k) - {a_{1}}^{*} k} & k < Max, \end{matrix}

Wherein

The average of expression phase differential, the frequency of Fs for adopting, Max prevents phase rotating, d for calculating the cut-off upper limit of group delay and faciation position _gBe group delay, θ _gBe the faciation position, N is the length of stereophonic signal time-frequency conversion.

16. a device of estimating stereophonic signal is characterized in that, described device comprises:

The weighting cross-correlation unit is used for definite cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain;

Pretreatment unit is used for the cross correlation function of described weighting is carried out pre-service;

Estimation unit, estimation obtains group delay and the faciation position between stereo left and right sound track signals according to the pre-service result.

17. device as claimed in claim 16 is characterized in that, described device also comprises:

The frequency-time domain transformation unit carries out frequently conversion of inverse time to the cross correlation function about the weighting of the stereo left and right sound track signals of frequency domain and obtains the cross correlation function time-domain signal.

18. device as claimed in claim 17 is characterized in that, describedly estimates to obtain group delay between stereo left and right sound track signals and the estimation unit of faciation position comprises according to the pre-service result:

Judging unit be used for to be judged the relation of index corresponding to the value of cross correlation function time-domain signal amplitude maximum and the symmetric interval relevant with stereophonic signal time-frequency conversion length N;

The group delay unit, if index corresponding to the value of amplitude maximum is positioned at the first symmetric interval [0 in the cross correlation function time-domain signal, m], group delay equals the index corresponding to value of amplitude maximum in this cross correlation function time-domain signal so, if index corresponding to the value of amplitude maximum is positioned at the second symmetric interval (N-m in the related function, N], group delay deducts N for this index; M is less than or equal to N/2;

The faciation bit location is used for according to phase angle corresponding to cross correlation function corresponding to group delay, as group delay d _gMore than or equal to zero, by determining d _gThe phase angle that corresponding cross correlation value is corresponding estimates to obtain the faciation position; Work as d _gLess than zero the time, the faciation position is d _gThe phase angle corresponding to cross correlation value of correspondence on the+N index.

19. device as claimed in claim 16 is characterized in that, describedly estimates to obtain group delay between stereo left and right sound track signals and the estimation unit of faciation position comprises according to the pre-service result:

The phase extraction unit is used for described cross correlation function, or based on the cross correlation function after processing, extracts its phase place

, function ∠ C wherein _r(k) be used for extracting plural C _r(k) phase angle;

The group delay unit is for the average α that asks for phase differential in frequency of low strap ₁, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information;

The faciation bit location is used for obtaining faciation position information according to the phase place of the current frequency of described cross correlation function and the difference of frequency index and phase differential average product.

20. device as claimed in claim 16 is characterized in that, described device also comprises the parameter characteristic unit, is used for estimating to obtain stereo parameter IPD according to described faciation position and group delay.

21. the equipment of a coding of stereo signals is characterized in that, described equipment comprises:

Converting means is used for conversion time domain stereo left channel signal and right-channel signals and forms left channel signals and right-channel signals on the frequency domain to frequency domain;

Lower mixing device is for mixed signal under the left channel signals on the frequency domain and the lower mixed generation monophony of right-channel signals process;

The parameter extraction device is for the spatial parameter of left channel signals and right-channel signals on the extraction frequency domain;

Estimate the stereophonic signal device, be used for utilizing left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels;

Code device is used for the described group delay of quantization encoding and faciation position, mixed signal under described spatial parameter and the described monophony.

22. equipment as claimed in claim 21, it is characterized in that: described estimation stereophonic signal device utilizes left and right sound track signals on the frequency domain to estimate also to comprise before group delay between stereo left and right acoustic channels and the faciation position to determine about the cross correlation function between stereo left and right sound track signals on the frequency domain, and described cross correlation function comprises the simple crosscorrelation of the weighting of the product of the conjugation of left channel signals and right-channel signals on the frequency domain.

23. such as claim 20 or 22 described equipment, it is characterized in that: the cross correlation function about the weighting between stereo left and right sound track signals on the frequency domain that described estimation stereophonic signal device is determined can be expressed as:

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = 0 \\ 2^{*} X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / | X_{1} (k) | | X_{2} (k) | & k = N / 2 \\ 0 & k > N / 2 \end{matrix},

Or

C_{r} (k) = \begin{matrix} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = 0 \\ 2^{*} X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & 1 \leq k \leq N / 2 - 1 \\ X_{1} (k) {X_{2}}^{*} (k) / \sqrt{X_{1} {(k)}^{2} + X_{2} {(k)}^{2}} & k = N / 2 \\ 0 & k > N / 2 \end{matrix}

24. device as claimed in claim 23 is characterized in that: described estimation stereophonic signal device comprises the frequency-time domain transformation unit, is used for described cross correlation function is carried out the time-domain signal that frequently conversion of inverse time obtains cross correlation function.

25. device as claimed in claim 24, it is characterized in that: described estimation stereophonic signal device comprises estimation unit, be used for estimating to obtain group delay according to the cross correlation function time-domain signal or based on index corresponding to the value of the cross correlation function time-domain signal amplitude maximum after processing, obtain phase angle corresponding to cross correlation function corresponding to group delay, estimate to obtain the faciation position.

26. device as claimed in claim 24, it is characterized in that: described estimation stereophonic signal device comprises estimation unit, be used for extracting the phase place of described cross correlation function, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information; Obtain faciation position information according to the phase place of the current frequency of cross correlation function and the difference of frequency index and phase differential average product.

27. the system of a stereo coding, it is characterized in that, described system comprises such as the arbitrary described stereo coding equipment of claim 21-26, receiving equipment and transfer equipment, and receiving equipment is used for receiving stereo input signal and is used for stereo coding equipment; Transfer equipment 52 is for the result who transmits described stereo coding equipment 51.