CN102157152A - Method for coding stereo and device thereof - Google Patents

Method for coding stereo and device thereof Download PDF

Info

Publication number
CN102157152A
CN102157152A CN2010101138059A CN201010113805A CN102157152A CN 102157152 A CN102157152 A CN 102157152A CN 2010101138059 A CN2010101138059 A CN 2010101138059A CN 201010113805 A CN201010113805 A CN 201010113805A CN 102157152 A CN102157152 A CN 102157152A
Authority
CN
China
Prior art keywords
cross correlation
correlation function
frequency
group delay
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010101138059A
Other languages
Chinese (zh)
Other versions
CN102157152B (en
Inventor
吴文海
苗磊
郎玥
张琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201010113805.9A priority Critical patent/CN102157152B/en
Priority to PCT/CN2010/079410 priority patent/WO2011097915A1/en
Publication of CN102157152A publication Critical patent/CN102157152A/en
Priority to US13/567,982 priority patent/US9105265B2/en
Application granted granted Critical
Publication of CN102157152B publication Critical patent/CN102157152B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients

Abstract

The embodiment of the invention relates to a method for coding stereo. The method comprises the steps of: converting a stereo left channel signal and a stereo right channel signal on a time domain to a frequency domain so as to form into a left channel signal and a right channel signal on the frequency domain; performing the down mixing on the left channel signal and the right channel signal on the frequency domain to generate a single channel down mixing signal; transmitting the bit of the coded and quantified down mixing signal; extracting space parameters of the left channel signal and the right channel signal on the frequency domain; estimating the group delay and the group phase between a stereo left channel and a stereo right channel by the left channel signal and the right channel signal on the frequency domain; and coding and quantifying the group delay, the group phase and the space parameters to obtain the high-quality stereo coding property under low code rate.

Description

The method of stereo coding, device
Technical field
The embodiment of the invention relates to the multimedia field, relates in particular to a kind of stereo treatment technology, is specially method, the device of stereo coding.
Background technology
Existing stereo encoding method, intensity stereo is arranged, BCC (Binaual Cure Coding) and PS (Parametric-Stereo coding) coding method, normal conditions, adopt intensity coding need extract energy between left and right acoustic channels than ILD (InterChannel Level Difference) parameter, the ILD parameter is encoded as side information, and preferentially be sent to decoding end to help to recover stereophonic signal.ILD is a ubiquity and the characteristics of signals parameter that reflects acoustic field signal, ILD can embody preferably to the sound field energy, yet the stereo sound field that often has spatial context and left and right directions, only adopt and transmit the requirement that the stereosonic mode of ILD recovery reduction can not satisfy the recovery original stereo signal, so proposed to transmit more multiparameter with the scheme of better recovery stereophonic signal, except extracting the most basic ILD parameter, also propose to transmit the phase differential (IPD:InterChannel Phase Difference) of left and right acoustic channels and the simple crosscorrelation ICC parameter of left and right acoustic channels, sometimes also can comprise the L channel and following phase differential (OPD) parameter of mixed signal, the parameter of these reaction stereophonic signal spatial contexts and left and right directions sound field information and ILD parameter are encoded as side information jointly and send to decoding end with the reduction stereophonic signal.
Encoder bit rate is one of important evaluation factor of multimedia signal encoding performance, employing to low code check is the common target of pursuing of industry, existing stereo coding technology transmits LPD when transmitting ILD, ICC and OPD parameter certainly will need to improve encoder bit rate, because LPD, ICC and OPD parameter all are the local characteristics parameters of signal, the branch that is used to the to react stereophonic signal breath of taking a message, the LPD of encoded stereo signal, ICC and OPD parameter, each the branch band coding LPD that needs stereophonic signal, ICC and OPD parameter, each of stereophonic signal is divided band, each divides band IPD coding to need a plurality of bits, each divides band ICC coding to need a plurality of bits, the rest may be inferred, then the stereo coding parameter needs a large amount of bit numbers could strengthen the information of sound field, require the next part that can only strengthen to divide band at low code check, do not reach the effect of reduction true to nature, cause between the stereo information that recovers under the low code check and the original input signal bigger gap being arranged, from auditory effect, can bring extremely uncomfortable auditory perception to the listener.
Summary of the invention
The embodiment of the invention provides a kind of stereo encoding method, device and system, strengthens sound field information under the low code check, promotes code efficiency.
The embodiment of the invention provides a kind of method of stereo coding, and described method comprises:
Conversion time domain stereo left channel signal and right-channel signals form left channel signals and right-channel signals on the frequency domain to frequency domain; Left channel signals on the frequency domain and right-channel signals are mixed signal through mixing down to generate under the monophony, transmit the bit after described mixed signal down carries out coded quantization; The spatial parameter of left channel signals and right-channel signals on the extraction frequency domain; Utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels; Described group delay of quantization encoding and faciation position and described spatial parameter.
The embodiment of the invention provides a kind of method of estimating stereophonic signal, and described method comprises:
Determine cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain; Described cross correlation function to weighting carries out pre-service; Estimation obtains group delay and the faciation position between stereo left and right sound track signals according to the pre-service result.
The embodiment of the invention provides a kind of device of estimating stereophonic signal, and described device comprises:
The weighting cross-correlation unit is used for definite cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain; Pretreatment unit is used for the described cross correlation function of weighting is carried out pre-service; Estimation unit, estimation obtains group delay and the faciation position between stereo left and right sound track signals according to the pre-service result.
The embodiment of the invention provides a kind of equipment of stereophonic signal coding, and described equipment comprises:
Converting means is used for conversion time domain stereo left channel signal and right-channel signals and forms left channel signals and right-channel signals on the frequency domain to frequency domain; Under load in mixture and put, the left channel signals and the right-channel signals that are used on the frequency domain are mixed signal through mixing down to generate under the monophony; The parameter extraction device is used to extract the spatial parameter of left channel signals and right-channel signals on the frequency domain; Estimate the stereophonic signal device, be used to utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels; Code device is used for described group delay of quantization encoding and faciation position, mixes signal under described spatial parameter and the described monophony.
The embodiment of the invention provides a kind of system of stereophonic signal coding, and described system comprises:
Equipment, receiving equipment and the transfer equipment of stereophonic signal coding as mentioned above, receiving equipment is used to receive stereo input signal and is used for stereo coding equipment; Transfer equipment 52 is used to transmit the result of described stereo coding equipment 51.
Therefore, by introducing the embodiment of the invention, group delay and faciation position are estimated and are applied to stereo coding that feasible azimuth information method of estimation by the overall situation can obtain sound field information more accurately under low code check, strengthen sound field effect, promoted code efficiency greatly.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is that a stereo encoding method is implemented synoptic diagram;
Fig. 2 is that another stereo encoding method is implemented synoptic diagram;
Fig. 3 is that another stereo encoding method is implemented synoptic diagram;
Fig. 4 a is that another stereo encoding method is implemented synoptic diagram;
Fig. 4 b is another stereo encoding method embodiment synoptic diagram;
Fig. 5 is that another stereo encoding method is implemented synoptic diagram;
Fig. 6 is that an estimation stereophonic signal device is implemented synoptic diagram;
Fig. 7 is that another estimation stereophonic signal device is implemented synoptic diagram;
Fig. 8 is that another estimation stereophonic signal device is implemented synoptic diagram;
Fig. 9 is that another estimation stereophonic signal device is implemented synoptic diagram;
Figure 10 is that another estimation stereophonic signal device is implemented synoptic diagram;
Figure 11 is that a stereophonic signal encoding device is implemented synoptic diagram;
Figure 12 is that a stereophonic signal coded system is implemented synoptic diagram;
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
Embodiment one:
Fig. 1 is the synoptic diagram that a stereo encoding method is implemented, and comprising:
Step 101: conversion time domain stereo left channel signal and right-channel signals form left channel signals and right-channel signals on the frequency domain to frequency domain.
Step 102: L channel frequency-region signal on the frequency domain and R channel frequency-region signal mix signal (DMX) through mixing down to generate under the monophony, transmit the bit after the DMX signal carries out coded quantization, and the spatial parameter of left channel signals and right-channel signals on the frequency domain that extracts is carried out quantization encoding.
Spatial parameter is the parameter of representative stereophonic signal spatial character, as the ILD parameter.
Step 103: utilize left channel signals and the group delay between right-channel signals (Group Delay) and faciation position (Group Phase) on the left and right sound track signals estimation frequency domain on the frequency domain.
Group delay reflects the overall azimuth information of the time delays of the envelope between the stereo left and right acoustic channels, and the global information of the similarity of the waveform of stereo left and right acoustic channels behind time unifying is reflected in the faciation position.
Step 104: group delay that the described estimation of quantization encoding obtains and faciation position.
Group delay and faciation position form the content of waiting to transmit the side information code stream through quantization encoding.
In the method for embodiment of the invention stereo coding, when extracting stereophonic signal spatial character parameter, estimate group delay and faciation position, estimation obtains group delay and the faciation position is applied in the stereo coding, make spatial parameter and the overall situation the effective combination of azimuth information, azimuth information method of estimation by the overall situation can obtain sound field information more accurately under low code check, strengthen sound field effect, promoted code efficiency greatly.
Embodiment two:
Fig. 2 is the synoptic diagram of another stereo encoding method embodiment, comprising:
Step 201, conversion time domain stereo left channel signal and right-channel signals are formed on stereo left channel signal X on the frequency domain to frequency domain 1(k) and right-channel signals X 2(k), wherein k is the index value of the Frequency point of frequency signal.
Step 202 is descended to mix operation to left channel signals on the frequency domain and right-channel signals, and mixed signal and transmission under the coded quantization, and encoded stereo spatial parameter quantize to form side information and also transmit, and can comprise the steps:
Step 2021, left channel signals on the frequency domain and right-channel signals are descended to mix, and generate and mix signal DMX under the monophony after synthesizing.
Step 2022 is mixed signal DMX under the coded quantization monophony, and transmits the information that quantizes.
Step 2023, the left channel signals on the extraction frequency domain and the ILD parameter of right-channel signals.
Step 2024 is carried out quantization encoding to described ILD parameter and is formed side information and transmission.
2021,2022 steps and 2023,2024 steps are independent of each other mutually, can independently carry out, and the side information that the former forms can carry out multiplexing back with the side information that the latter forms and transmit.
In another embodiment, mix signal under the monophony that obtains and can carry out the time-domain signal that frequency-time domain transformation obtains mixing under the monophony signal DMX again through mixing down, the bit that the time-domain signal that mixes signal DMX under the monophony is carried out behind the coded quantization transmits.
Step 203 is estimated group delay and faciation position between the left and right sound track signals on the frequency domain.
Utilize left and right sound track signals on the frequency domain to estimate that group delay and faciation position between left and right sound track signals comprise the cross correlation function of determining about stereo left and right acoustic channels frequency-region signal, estimate to obtain the group delay and the faciation position of stereophonic signal according to the signal of cross correlation function, as shown in Figure 3, specifically can comprise the steps:
Step 2031 is determined about the cross correlation function between stereo left and right sound track signals on the frequency domain.
The cross correlation function of stereo left and right acoustic channels frequency-region signal can be the cross correlation function of weighting, in determining the process of cross correlation function, the cross correlation function of estimating group delay and faciation position is weighted operation and compares that to make the stereophonic signal coding result incline to more stable with other operations, the cross correlation function of weighting is the weighting of product of the conjugation of L channel frequency-region signal and R channel frequency-region signal, and the value of the cross correlation function of described weighting on half frequency of the length N of stereophonic signal time-frequency conversion is 0.The form of the cross correlation function of stereo left and right acoustic channels frequency-region signal can followingly be represented:
C r ( k ) = W ( k ) X 1 ( k ) X * 2 ( k ) 0 ≤ k ≤ N / 2 0 k > N / 2 ,
Wherein w (k) represents weighting function, X * 2(k) expression X 2(k) conjugate function perhaps also can be expressed as: C r(k)=X 1(k) X * 2(k) 0≤k≤N/2+1.In the form of another cross correlation function, in conjunction with different weighted type, the cross correlation function of stereo left and right acoustic channels frequency-region signal can followingly be represented:
C r ( k ) = X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | k = 0 2 * X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | 1 ≤ k ≤ N / 2 - 1 X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | k = N / 2 0 k > N / 2 ,
Wherein, N is the length of stereophonic signal time-frequency conversion, | X 1(k) | and | X 2(k) | be X 1(k) and X 2(k) Dui Ying amplitude.The cross correlation function of weighting is at frequency 0, and frequency N/2 goes up and be that the inverse of left and right sound track signals amplitude product on corresponding frequency, the cross correlation function of weighting are left and right sound track signals 2 times of the inverse of amplitude product on other frequencies.In other enforcement, the cross correlation function of the weighting of stereo left and right acoustic channels frequency-region signal can also be expressed as other form, for example:
C r ( k ) = X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 k = 0 2 * X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 1 ≤ k ≤ N / 2 - 1 X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 k = N / 2 0 k > N / 2 ,
To this, present embodiment does not limit, and any distortion of above-mentioned each formula is all within protection domain.
Step 2032 is carried out inverse time conversion frequently to the cross correlation function about the weighting of stereo left and right acoustic channels frequency-region signal and is obtained cross correlation function time-domain signal C r(n), the cross correlation function time-domain signal is the signal of plural number herein.
Step 2033, estimation obtains the group delay and the faciation position of stereophonic signal according to the cross correlation function time-domain signal.
In another embodiment, can be directly estimate to obtain the group delay and the faciation position of stereophonic signal about the cross correlation function between stereo left and right sound track signals on the frequency domain according to what step 2031 was determined.
In step 2033, can directly estimate to obtain the group delay and the faciation position of stereophonic signal according to the cross correlation function time-domain signal; Also can carry out some Signal Pretreatment, estimate the group delay and the faciation position of stereophonic signal based on pretreated signal the cross correlation function time-domain signal.
If the cross correlation function time-domain signal is carried out some Signal Pretreatment, estimate that based on pretreated signal the group delay and the faciation position of stereophonic signal can comprise:
1) the cross correlation function time-domain signal is carried out normalized or smoothing processing;
Wherein the cross correlation function time-domain signal being carried out smoothing processing can followingly carry out:
C ravg(n)=α*C ravg(n)+β*C r(n)
Wherein, α and β are the constants of weighting, 0≤α≤1, β=1-α, in present embodiment, before estimating group delay and faciation position, carry out smoothly waiting pre-service to make that the group delay that estimates is better stable to the cross correlation function time-domain signal between the left and right acoustic channels that obtains.
2) the cross correlation function time-domain signal is carried out further carrying out smoothing processing after the normalized;
3) absolute value to the cross correlation function time-domain signal carries out normalized or smoothing processing;
Wherein the absolute value of cross correlation function time-domain signal being carried out smoothing processing can followingly carry out:
C ravg_abs(n)=α*C ravg(n)+β*|C r(n)|,
4) the cross correlation function time-domain signal carries out normalized absolute value signal afterwards and further carries out smoothing processing.
Understandable, before the group delay and faciation position of estimating stereophonic signal, can also comprise other processing for the pre-treatment of cross correlation function time-domain signal, auto-correlation processing etc. for example, this moment, the pre-service to the cross correlation function time-domain signal also comprised auto-correlation or/and smoothing processing etc.
Pre-treatment in conjunction with above-mentioned cross correlation function time-domain signal, estimate the group delay and the identical estimation mode of faciation position employing of stereophonic signal in the step 2033, also can estimate respectively, concrete, can adopt the embodiment of following estimation faciation position and group delay at least:
Step 2033 embodiment one, shown in Fig. 4 a:
Estimate to obtain group delay according to the cross correlation function time-domain signal or based on the index of the value correspondence of amplitude maximum in the cross correlation function time-domain signal after handling, obtain the phase angle of the cross correlation function correspondence of group delay correspondence, estimate to obtain the faciation position, comprise the steps:
The relation of the index of judging the value correspondence of amplitude maximum in the time-domain signal cross correlation function and the symmetric interval relevant with transform length N, in one embodiment, if the index of the value correspondence of amplitude maximum is smaller or equal to N/2 in the time-domain signal cross correlation function, group delay equals the index of the value correspondence of amplitude maximum in this time-domain signal cross correlation function so, if the index of the value correspondence of amplitude maximum is greater than N/2 in the related function, group delay deducts transform length N for this index so, can be with [0, N/2] and (N/2, N] regard first symmetric interval and second symmetric interval relevant as with stereophonic signal time-frequency conversion length N, in another kind is implemented, the scope of judging can be [0, m] and (N-m, N] first symmetric interval and second symmetric interval, wherein m is less than N/2, the index of the value correspondence of amplitude maximum and the relevant information of m compare in the time-domain signal cross correlation function, the index of the value correspondence of amplitude maximum is positioned at interval [0 in time domain signal cross correlation function, m], then group delay equals the index of the value correspondence of amplitude maximum in this time-domain signal cross correlation function, the index of the value correspondence of amplitude maximum is positioned at interval (N-m in time domain signal cross correlation function, N], then group delay deducts transform length N for this index.But in actual applications, judge can be amplitude maximum in the time-domain signal cross correlation function the value correspondence index close on value, under the condition that does not influence subjective effect or qualification according to demand can suitably select to be slightly smaller than the index of value correspondence of amplitude maximum as Rule of judgment, as the index of second largest value correspondence of amplitude or and the index that differs in the value correspondence of fixing or preset range of amplitude maximal value all be suitable for, index with the value correspondence of amplitude maximum in the time-domain signal cross correlation function is an example, and a kind of concrete form embodies as follows:
Figure GSA00000019669200101
Arg max|C wherein Ravg(n) | be C Ravg(n) index of the value correspondence of amplitude maximum in, present embodiment is protected the various distortion of above-mentioned form equally.
According to the phase angle of the time-domain signal cross correlation function correspondence of group delay correspondence, as group delay d gMore than or equal to zero, by determining d gThe phase angle of corresponding cross correlation value correspondence estimates to obtain the faciation position; Work as d gLess than zero the time, the faciation position is exactly d gThe phase angle of corresponding cross correlation value correspondence on the+N index, specifically can adopt any distortion of following a kind of form or this form to embody:
Figure GSA00000019669200102
∠ C wherein Ravg(d g) be time-domain signal cross-correlation function value C Ravg(d g) phase angle, ∠ C Ravg(d g+ N) be time-domain signal cross-correlation function value C Ravg(d g+ N) phase angle.
Step 2033 embodiment two, shown in Fig. 4 b:
To described cross correlation function, or, extract its phase place based on the described cross correlation function after handling
Figure GSA00000019669200103
Function ∠ C wherein r(k is used to extract plural C r(k) phase angle is asked for the average α of phase differential in frequency of low strap 1Determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information, same, obtain faciation position information according to the phase place of the current frequency of described cross correlation function and the difference of frequency index and phase differential average product, specifically can adopt following mode:
&alpha; 1 = E { &Phi; ^ ( k + 1 ) - &Phi; ^ ( k ) } , k < Max ;
d g = a 1 N 2 * &pi; * Fs ;
&theta; g = E { &Phi; ^ ( k ) - a 1 * k } , k < Max
Wherein
Figure GSA00000019669200111
The average of representing phase differential, the frequency of Fs for adopting, Max prevents the phase place rotation for calculating the upper limit of ending of group delay and faciation position.
Step 204: described group delay of quantization encoding and faciation position form side information and transmit.
Pre-if in the scope group delay is carried out scalar quantization at random, this scope is the positive negative value [Max of symmetry, Max] or the usable levels under the condition at random, adopt long time transmission or employing differential coding to handle to the group delay after the scalar quantization and obtain side information, the span of faciation position is usually [0,2*PI] in the scope, be specifically as follows [0,2*PI), also can be (PI, PI] scope in scalar quantization and coding are carried out in the faciation position, the side information that the group delay behind the quantization encoding and faciation position are formed carries out multiplexing formation encoding code stream, is sent to the stereophonic signal recovery device.
In the method for embodiment of the invention stereo coding, utilize the left and right sound track signals on the frequency domain to estimate that group delay and the faciation position that can embody signal overall situation azimuth information between the stereophonic signal left and right acoustic channels make that the azimuth information of sound field is effectively strengthened, the estimation of stereophonic signal spatial character parameter and group delay and faciation position combined be applied in the little stereo coding of demand bit rate, make spatial information and the overall situation the effective combination of azimuth information, obtain sound field information more accurately, strengthen sound field effect, promoted code efficiency greatly.
Embodiment three
The synoptic diagram that Fig. 5 implements for another stereo encoding method comprises:
On the enforcement basis of embodiment one and embodiment two, stereo coding also comprises respectively:
Step 105/205: estimate to obtain stereo parameter IPD according to described faciation position and group delay information, quantize described IPD parameter and transmission.
When quantizing IPD, estimate with group delay (Group Delay) and faciation position (Group Phase)
Figure GSA00000019669200121
And carry out difference processing with original IPD (k), and the IPD of difference is carried out quantization encoding, can followingly represent:
IPD ( k ) &OverBar; = - 2 &pi; d g * k N + &theta; g , 1 &le; k &le; N / 2 - 1
Figure GSA00000019669200123
And to IPD Diff(k) quantize, the bit after the quantification is delivered to decoding end, in another embodiment, also can directly quantize IPD, and bit stream is high slightly, quantizes more accurate.
In the present embodiment, estimate that stereo parameter IPD and coded quantization can promote code efficiency under the situation that has high code rate to use, strengthen sound field effect.
Embodiment four,
Fig. 6 is that the device 04 of an estimation stereophonic signal is implemented synoptic diagram, comprising:
Weighting cross-correlation unit 41 is used for definite cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain.
Weighting cross-correlation unit 41 receives stereo left and right sound track signals on the frequency domain, the stereo left and right sound track signals of frequency domain is handled the cross correlation function that obtains about the weighting between the stereo left and right sound track signals of frequency domain.
Pretreatment unit 42 is used for the described cross correlation function of weighting is carried out pre-service.
Pretreatment unit 42 receives the described cross correlation function of the weighting that obtains according to weighting cross-correlation unit 41, and the described cross correlation function of weighting is carried out pre-service, obtains the pre-service result, promptly pretreated cross correlation function time-domain signal.
Estimation unit 43 is estimated group delay and faciation position between stereo left and right sound track signals according to the pre-service result.
Estimation unit 43 receives the pre-service result of pretreatment unit 42, obtain pretreated cross correlation function time-domain signal, the information of extracting described cross correlation function time-domain signal in addition judgement or comparison or calculating operation estimates to obtain group delay and faciation position between stereo left and right sound track signals.
Among this another embodiment, the device 04 of estimating stereophonic signal can also comprise frequency-time domain transformation unit 44, be used to receive the output of weighting cross-correlation unit 41, described cross correlation function about the weighting of the stereo left and right sound track signals of frequency domain is carried out inverse time conversion frequently obtain the cross correlation function time-domain signal, and send described cross correlation function time-domain signal to described pretreatment unit 42.
By introducing the embodiment of the invention, group delay and faciation position are estimated and are applied to stereo coding, feasible azimuth information method of estimation by the overall situation can obtain sound field information more accurately under low code check, strengthened sound field effect, promotes code efficiency greatly.
Embodiment five,
Fig. 7 implements synoptic diagram for another device 04 of estimating stereophonic signal, comprising:
Weighting cross-correlation unit 41 receives stereo left and right sound track signals on the frequency domain, the stereo left and right sound track signals of frequency domain is handled the cross correlation function that obtains about the weighting between the stereo left and right sound track signals of frequency domain.The cross correlation function of stereo left and right acoustic channels frequency-region signal can be the cross correlation function of weighting, make coding result more stable, the cross correlation function of weighting is the weighting of product of the conjugation of L channel frequency-region signal and R channel frequency-region signal, and the value of the cross correlation function of described weighting on half frequency of the length N of stereophonic signal time-frequency conversion is 0.The form of the cross correlation function of the weighting of stereo left and right acoustic channels frequency-region signal can followingly be represented:
C r ( k ) = W ( k ) X 1 ( k ) X * 2 ( k ) 0 &le; k &le; N / 2 0 k > N / 2 ,
Wherein w (k) represents weighting function, X * 2(k) expression X 2(k) conjugate function perhaps also can be expressed as: C r(k)=X 1(k) X * 2(k) 0≤k≤N/2+1.In the form of the cross correlation function of another weighting, in conjunction with different weighted type, the cross correlation function of the weighting of stereo left and right acoustic channels frequency-region signal can followingly be represented:
C r ( k ) = X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | k = 0 2 * X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | 1 &le; k &le; N / 2 - 1 X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | k = N / 2 0 k > N / 2 ,
Wherein, N is the length of stereophonic signal time-frequency conversion, | X 1(k) | and | X 2(k) | be X 1(k) and X 2(k) Dui Ying amplitude.The cross correlation function of weighting is at frequency 0, and frequency N/2 goes up and be that the inverse of left and right sound track signals amplitude product on corresponding frequency, the cross correlation function of weighting are left and right sound track signals 2 times of the inverse of amplitude product on other frequencies.
Perhaps also can adopt following form with and the distortion:
C r ( k ) = X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 k = 0 2 * X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 1 &le; k &le; N / 2 - 1 X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 k = N / 2 0 k > N / 2 .
Frequency-time domain transformation unit 44, what receive that weighting cross-correlation unit 41 determines determines cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain, the cross correlation function about the weighting of stereo left and right acoustic channels frequency-region signal is carried out inverse time conversion frequently obtain cross correlation function time-domain signal C r(n), the cross correlation function time-domain signal is the signal of plural number herein.
Pretreatment unit 42 receives frequency-time domain transformation according to the described cross correlation function time-domain signal that described cross correlation function obtains, and described cross correlation function is carried out pre-service, obtains the pre-service result, promptly passes through pretreated cross correlation function time-domain signal.
Pretreatment unit 42 can comprise in the following unit one or more according to different demands: normalization unit 421, pretreatment unit 422 and absolute value element 423.
1) the 421 pairs of cross correlation function time-domain signals in normalization unit carry out normalized or 422 pairs of cross correlation function time-domain signals of pretreatment unit and carry out pre-service and handle.
Wherein the cross correlation function time-domain signal being carried out pre-service handles and can followingly carry out: C Ravg(n)=α * C Ravg(n)+β * C r(n)
Wherein, α and β are the constants of weighting, 0≤α≤1, β=1-α, in present embodiment, before estimating group delay and faciation position, the cross correlation function of the weighting between the left and right acoustic channels that obtains carried out pre-service such as pre-service and make that the group delay that estimates is better stable.
2) the 421 pairs of cross correlation function time-domain signals in normalization unit carry out after the normalized, and pretreatment unit 422 further carries out the pre-service processing to the result of normalization unit 421.
3) absolute value element 423 obtains the absolute value information of cross correlation function time-domain signal, the 421 pairs of described absolute value information in normalization unit carry out normalized or 422 pairs of described absolute value information of pretreatment unit are carried out the pre-service processing, perhaps carry out normalization earlier and carry out the pre-service processing again.
Wherein the absolute value of cross correlation function time-domain signal is carried out pre-service and handles and can followingly carry out,
C ravg_abs(n)=α*C ravg(n)+β*|C r(n)|。
4) the cross correlation function time-domain signal carries out normalized absolute value signal afterwards and further carries out the pre-service processing.
Pretreatment unit 42 is before the group delay and faciation position of estimating stereophonic signal, the processing unit that can also comprise other for the pre-treatment of cross correlation function time-domain signal, auto-correlation unit 424 etc. for example, the pre-service of 42 pairs of cross correlation function time-domain signals of pretreatment unit this moment also comprises auto-correlation or/and pre-service processing etc.
In another embodiment, described estimation stereophonic signal device 04 also can not comprise pretreatment unit, the result of frequency-time domain transformation unit 44 is directly sent in the following estimation unit 43 of described estimation stereophonic signal device 4, estimation unit 43 is used for estimating to obtain group delay according to the cross correlation function time-domain signal of weighting or based on the index of the value correspondence of the cross correlation function time-domain signal amplitude maximum of the weighting after handling, obtain the phase angle of the time-domain signal cross correlation function correspondence of group delay correspondence, estimate to obtain the faciation position.
Estimation unit 43, estimate group delay and faciation position between stereo left and right sound track signals according to the output of the output of pretreatment unit 42 or frequency-time domain transformation unit 44, as shown in Figure 8, estimation unit 43 further comprises: 431 judging units, receive the cross correlation function time frequency signal of pretreatment unit 42 or 44 outputs of frequency-time domain transformation unit, the relation of the index of judging the value correspondence of amplitude maximum in the time-domain signal cross correlation function and the symmetric interval relevant with transform length N, judged result is sent to group delay unit 432, the group delay that excites group delay unit 432 to estimate between the stereophonic signal left and right acoustic channels, in one embodiment, if the result of judging unit 431 is that the index of value correspondence of amplitude maximum in the time-domain signal cross correlation function is smaller or equal to N/2, group delay unit 432 estimates that group delay equals the index of the value correspondence of amplitude maximum in this time-domain signal cross correlation function, if the result of judging unit 431 is that the index of value correspondence of amplitude maximum in the related function is greater than N/2, group delay unit 432 estimates that group delay deducts transform length N for this index, can be with [0, N/2] and (N/2, N] regard first symmetric interval and second symmetric interval relevant as with stereophonic signal time-frequency conversion length N, in another kind is implemented, the scope of judging can be [0, m] and (N-m, N] first symmetric interval and second symmetric interval, wherein m is less than N/2, the index of the value correspondence of amplitude maximum and the relevant information of m compare in the time-domain signal cross correlation function, the index of the value correspondence of amplitude maximum is positioned at interval [0 in time domain signal cross correlation function, m], then group delay equals the index of the value correspondence of amplitude maximum in this time-domain signal cross correlation function, the index of the value correspondence of amplitude maximum is positioned at interval (N-m in time domain signal cross correlation function, N], then group delay deducts transform length N for this index.But in actual applications, judge can be amplitude maximum in the time-domain signal cross correlation function the value correspondence index close on value, under the condition that does not influence subjective effect or qualification according to demand can suitably select to be slightly smaller than the index of value correspondence of amplitude maximum as Rule of judgment, as the index of second largest value correspondence of amplitude or and the index that differs in the value correspondence of fixing or preset range of amplitude maximal value all be suitable for, comprise any distortion of following a kind of form or this form:
Figure GSA00000019669200171
Arg max|C wherein Ravg(n) | be C Ravg(n) index of the value correspondence of amplitude maximum in.Faciation bit location 433 receives group delay unit 432 results, according to the phase angle of the group delay time-domain signal cross correlation function correspondence of estimating to obtain, as group delay d gMore than or equal to zero, by determining d gThe phase angle of corresponding cross correlation value correspondence estimates to obtain the faciation position; Work as d gLess than zero the time, the faciation position is exactly d gThe phase angle of corresponding cross correlation value correspondence on the+N index, specifically can adopt any distortion of following a kind of form or this form to embody:
Figure GSA00000019669200172
∠ C wherein Ravg(d g) be time-domain signal cross-correlation function value C Ravg(d g) phase angle, ∠ C Ravg(d g+ N) be time-domain signal cross-correlation function value C Ravg(d g+ N) phase angle.
Among another embodiment, the device 04 of described estimation stereophonic signal also comprises parameter characteristic unit 45, and as shown in Figure 9, stereo parameter IPD is estimated to obtain according to described faciation position and group delay information in the parameter characteristic unit.
By introducing the embodiment of the invention, group delay and faciation position are estimated and are applied to stereo coding, feasible azimuth information method of estimation by the overall situation can obtain sound field information more accurately under low code check, strengthened sound field effect, promotes code efficiency greatly.
Embodiment six,
Figure 10 implements synoptic diagram for another device 04 ' of estimating stereophonic signal, with embodiment five different being, the cross correlation function of the weighting of the stereo left and right acoustic channels frequency-region signal that the weighting cross-correlation unit is determined in the present embodiment sends pretreatment unit 42 or estimation unit 43 to, estimation unit 43 extracts the phase place of cross correlation function, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information, obtain faciation position information according to the phase place of the current frequency of cross correlation function and the difference of frequency index and phase differential average product.
Estimation unit 43 is estimated group delay and faciation position between stereo left and right sound track signals according to the output of the output of pretreatment unit 42 or weighting cross-correlation unit 41, estimation unit 43 further comprises: the 430 pairs of cross correlation functions in phase extraction unit, or, extract its phase place based on the cross correlation function after handling Function ∠ C wherein r(k is used to extract plural C r(k) phase angle, group delay unit 432 ' are asked for the average α of phase differential in frequency of low strap 1Faciation position unit 433 ' determines group delay according to the product of phase differential and transform length and the ratio relation of frequency information, same, obtain faciation position information according to the phase place of the current frequency of cross correlation function and the difference of frequency index and phase differential average product, specifically can adopt following mode:
&alpha; 1 = E { &Phi; ^ ( k + 1 ) - &Phi; ^ ( k ) } , k < Max
d g = - a 1 N 2 * &pi; * Fs
&theta; g = E { &Phi; ^ ( k ) - a 1 * k } , k < Max
Wherein
Figure GSA00000019669200185
The average of representing phase differential, the frequency of Fs for adopting, Max prevents the phase place rotation for calculating the upper limit of ending of group delay and faciation position.
In the equipment of embodiment of the invention stereo coding, utilize the left and right sound track signals on the frequency domain to estimate that group delay and the faciation position that can embody signal overall situation azimuth information between the stereophonic signal left and right acoustic channels make that the azimuth information of sound field is effectively strengthened, the estimation of stereophonic signal spatial character parameter and group delay and faciation position combined be applied in the little stereo coding of demand bit rate, make spatial information and the overall situation the effective combination of azimuth information, obtain sound field information more accurately, strengthen sound field effect, promoted code efficiency greatly.
Embodiment seven,
Figure 11 is that the equipment 51 of stereophonic signal coding is implemented synoptic diagram, comprising:
Converting means 01 is used for conversion time domain stereo left channel signal and right-channel signals and forms left channel signals and right-channel signals on the frequency domain to frequency domain;
Under load in mixture and put 02, the left channel signals and the right-channel signals that are used on the frequency domain are mixed signal through mixing down to generate under the monophony;
Parameter extraction device 03 is used to extract the spatial parameter of left channel signals and right-channel signals on the frequency domain;
Estimate stereophonic signal device 04, be used to utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels;
Code device 05 is used for described group delay of quantization encoding and faciation position, mixes signal under described spatial parameter and the described monophony.
Estimate that wherein stereo 04 is applicable to the foregoing description four-embodiment six, estimate that stereophonic signal device 04 receives through left channel signals and right-channel signals on the frequency domain that obtains behind the converting means 01, utilize left and right sound track signals on the described frequency domain to take embodiment arbitrary among embodiment four-embodiment six to estimate to obtain group delay and faciation position between stereo left and right acoustic channels, and group delay and the faciation position that obtains be sent to code device 05, equally, code device 05 is also received the spatial parameter of left channel signals and right-channel signals on the frequency domain that extracts to parameter extraction device 03,05 pair of information that receives of code device is carried out quantization encoding and is formed side information, and code device 05 also quantizes the bit after described mixed signal down carries out coded quantization.Described code device 05 can be as a whole, be used to receive different information and carry out quantization encoding, also can be separated into a plurality of code devices and handle the different information that receives, as first code device 501 with under load in mixture and put 02 and be connected, be used for mixed information under the quantization encoding, second code device 502 is connected with the parameter extraction device, be used for the described spatial parameter of quantization encoding, the 3rd code device 503, be used for and estimate that the stereophonic signal device is connected, and is used for described group delay of quantization encoding and faciation position.In another embodiment, comprise parameter characteristic unit 45 if estimate stereophonic signal device 04, described code device can also comprise that the 4th code device is used for quantization encoding IPD.When quantizing IPD, estimate with group delay (Group Delay) and faciation position (Group Phase)
Figure GSA00000019669200201
And carry out difference processing with original IPD (k), and the IPD of difference is carried out quantization encoding, can followingly represent:
IPD ( k ) &OverBar; = - 2 &pi; d g * k N + &theta; g , 1 &le; k &le; N / 2 - 1
Figure GSA00000019669200203
And to IPD Diff(k) bit after quantizing to obtain quantizing in another embodiment, also can directly quantize IPD, and bit stream is high slightly, quantizes more accurate.
The equipment 51 of described stereo coding can carry out the equipment of encoding process for stereophonic encoder or other according to different demands to stereo multi-channel signal.
Embodiment eight
Figure 12 is that the system 666 of stereophonic signal coding implements synoptic diagram, also comprises on the basis of stereophonic signal encoding device 51 as described in embodiment seven:
Receiving equipment 50 receives stereo input signal and is used for stereophonic signal encoding device 51; Transfer equipment 52 is used to transmit the result of described stereo coding equipment 51, and transfer equipment 52 sends to decoding end with the result of stereo coding equipment and is used for decoding generally speaking.
One of ordinary skill in the art will appreciate that all or part of flow process that realizes in the foregoing description method, be to instruct relevant hardware to finish by computer program, described program can be stored in the computer read/write memory medium, this program can comprise the flow process as the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only storage memory body (Read-Only Memory, ROM) or at random store memory body (Random AccessMemory, RAM) etc.
It should be noted that at last: above embodiment only in order to the explanation embodiment of the invention technical scheme but not limit it, although the embodiment of the invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that: it still can make amendment or be equal to replacement the technical scheme of the embodiment of the invention, and these modifications or be equal to replacement and also can not make amended technical scheme break away from the spirit and scope of embodiment of the invention technical scheme.

Claims (27)

1. the method for a stereo coding is characterized in that, described method comprises:
Conversion time domain stereo left channel signal and right-channel signals form left channel signals and right-channel signals on the frequency domain to frequency domain;
Left channel signals on the frequency domain and right-channel signals are mixed signal through mixing down to generate under the monophony, transmit the bit after described mixed signal down carries out coded quantization;
The spatial parameter of left channel signals and right-channel signals on the extraction frequency domain;
Utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels;
Described group delay of quantization encoding and faciation position and described spatial parameter.
2. the method for claim 1, it is characterized in that: describedly utilize left and right sound track signals on the frequency domain to estimate to comprise before group delay between stereo left and right acoustic channels and the faciation position to determine about the cross correlation function between stereo left and right sound track signals on the frequency domain, described cross correlation function comprises the simple crosscorrelation of the weighting of the product of the conjugation of left channel signals and right-channel signals on the frequency domain.
3. method as claimed in claim 2 is characterized in that: the simple crosscorrelation of the weighting between the stereo left and right sound track signals of frequency domain can be expressed as:
C r ( k ) = X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | k = 0 2 * X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | 1 &le; k &le; N / 2 - 1 X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | k = N / 2 0 k > N / 2 , Or
C r ( k ) = X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 k = 0 2 * X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 1 &le; k &le; N / 2 - 1 X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 k = N / 2 0 k > N / 2
Wherein, N is the length of stereophonic signal time-frequency conversion, | X 1(k) | and | X 2(k) | be X 1(k) and X 2(k) Dui Ying amplitude.The described cross correlation function of weighting is at frequency 0, and the value on the frequency N/2 is the inverse of left and right sound track signals amplitude product on corresponding frequency, and the described cross correlation function of weighting is left and right sound track signals on other frequencies 2 times of the inverse of amplitude product.
4. method as claimed in claim 3 is characterized in that: described method comprises that also described cross correlation function is carried out inverse time conversion frequently obtains the cross correlation function time-domain signal,
Or described cross correlation function is carried out inverse time conversion frequently obtain the cross correlation function time-domain signal, described time-domain signal is carried out pre-service.
5. method as claimed in claim 4 is characterized in that: according to the cross correlation function time-domain signal, describedly utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels to comprise:
Estimate to obtain group delay according to the cross correlation function time-domain signal or based on the index of the value correspondence of amplitude maximum in the cross correlation function time-domain signal after handling, obtain the phase angle of the cross correlation function correspondence of group delay correspondence, estimate to obtain the faciation position.
6. method as claimed in claim 3 is characterized in that: according to described cross correlation function, describedly utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels to comprise:
Extract the phase place of described cross correlation function, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information;
Obtain faciation position information according to the phase place of the current frequency of cross correlation function of weighting and the difference of frequency index and phase differential average product.
7. as claim 5 or 6 described methods, it is characterized in that, described method also comprises according to described faciation position and group delay estimates to obtain the stereo branch breath of taking a message, the described branch of the quantization encoding breath of taking a message, the described branch breath of taking a message comprises: the phase differential parameter between left and right acoustic channels, simple crosscorrelation parameter and/or L channel and the phase differential parameter of mixed signal down.
8. a method of estimating stereophonic signal is characterized in that, described method comprises:
Determine cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain;
Described weighting cross correlation function is carried out pre-service;
Estimation obtains group delay and the faciation position between stereo left and right sound track signals according to the pre-service result.
9. method as claimed in claim 8 is characterized in that: the cross correlation function of the weighting of the stereo left and right sound track signals of frequency domain can be expressed as:
C r ( k ) = X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | k = 0 2 * X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | 1 &le; k &le; N / 2 - 1 X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | k = N / 2 0 k > N / 2 , Or
C r ( k ) = X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 k = 0 2 * X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 1 &le; k &le; N / 2 - 1 X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 k = N / 2 0 k > N / 2
Wherein, N is the length of stereophonic signal time-frequency conversion, | X 1(k) | and | X 2(k) | be X 1(k) and X 2(k) Dui Ying amplitude.The cross correlation function of described weighting is at frequency 0, and the value on the frequency N/2 is the inverse of left and right sound track signals amplitude product on corresponding frequency, and the cross correlation function of described weighting is left and right sound track signals on other frequencies 2 times of the inverse of amplitude product.
10. method as claimed in claim 9 is characterized in that, described method also comprises: the cross correlation function about the weighting of the stereo left and right sound track signals of frequency domain is carried out inverse time conversion frequently obtain the cross correlation function time-domain signal.
11. method as claimed in claim 10 is characterized in that, described cross correlation function time-domain signal is carried out pre-service comprise the cross correlation function time-domain signal is carried out normalized and smoothing processing that wherein said smoothing processing comprises:
C ravg(n)=α*C ravg(n)+β*C r(n),
Perhaps the absolute value signal of described cross correlation function time-domain signal is carried out normalized and smoothing processing, wherein said smoothing processing comprises:
C ravg_abs(n)=α*C ravg(n)+β*|C r(n)|。
12. method as claimed in claim 11 is characterized in that, estimates that according to the pre-service result group delay and the faciation position that obtain stereophonic signal comprise:
The relation of the index of judging the value correspondence of amplitude maximum in the time-domain signal cross correlation function and the symmetric interval relevant with stereophonic signal time-frequency conversion length N, if the index of the value correspondence of amplitude maximum is positioned at first symmetric interval in the time-domain signal cross correlation function, group delay equals the index of the value correspondence of amplitude maximum in this time-domain signal cross correlation function so, if the index of the value correspondence of amplitude maximum is positioned at second symmetric interval in the related function, group delay deducts N for this index;
According to the phase angle of the cross correlation function correspondence of group delay correspondence, as group delay d gMore than or equal to zero, by determining d gThe phase angle of corresponding cross correlation value correspondence estimates to obtain the faciation position; Work as d gLess than zero the time, the faciation position is d gThe phase angle of corresponding cross correlation value correspondence on the+N index.
13. method as claimed in claim 12 is characterized in that, estimates that according to the pre-service result group delay and the faciation position that obtain stereophonic signal comprise:
Group delay d g = arg max | C ravg ( n ) | arg max | C ravg ( n ) | &le; N / 2 arg max | C ravg ( n ) | - N arg max | C ravg ( n ) | > N / 2 ,
The faciation position &theta; g = &angle; C ravg ( d g ) d g &GreaterEqual; 0 &angle; C ravg ( d g + N ) d g < 0 ,
Wherein, N is the length of stereophonic signal time-frequency conversion, argmax|C Ravg(n) | be C Ravg(n) index of the value correspondence of amplitude maximum in, ∠ C Ravg(d g) be cross-correlation function value C Ravg(d g) phase angle, ∠ C Ravg(d g+ N) be cross-correlation function value C Ravg(d g+ N) phase angle.
14. method as claimed in claim 8 is characterized in that, estimates that according to the pre-service result group delay and the faciation position that obtain between stereo left and right sound track signals comprise:
To described cross correlation function, or, extract it based on the cross correlation function after handling
Phase place
Figure FSA00000019669100051
Function ∠ C wherein r(k is used to extract plural C r(k) phase angle;
In frequency of low strap, ask for the average α of phase differential 1, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information, obtain the faciation position according to the phase place of the current frequency of described cross correlation function and the difference of frequency index and phase differential average product.
15. method as claimed in claim 14 is characterized in that, estimates that according to the pre-service result group delay and the faciation position that obtain between stereo left and right sound track signals comprise:
&alpha; 1 = E { &Phi; ^ ( k + 1 ) - &Phi; ^ ( k ) } , k < Max ;
d g = - a 1 N 2 * &pi; * Fs ;
&theta; g = E { &Phi; ^ ( k ) - a 1 * k } , k < Max ,
Wherein
Figure FSA00000019669100055
The average of representing phase differential, the frequency of Fs for adopting, Max prevents the phase place rotation for calculating the upper limit of ending of group delay and faciation position.
16. a device of estimating stereophonic signal is characterized in that, described device comprises:
The weighting cross-correlation unit is used for definite cross correlation function about the weighting between the stereo left and right sound track signals of frequency domain;
Pretreatment unit is used for the cross correlation function of described weighting is carried out pre-service;
Estimation unit, estimation obtains group delay and the faciation position between stereo left and right sound track signals according to the pre-service result.
17. device as claimed in claim 16 is characterized in that, described device also comprises:
The frequency-time domain transformation unit carries out inverse time conversion frequently to the cross correlation function about the weighting of the stereo left and right sound track signals of frequency domain and obtains the cross correlation function time-domain signal.
18. device as claimed in claim 17 is characterized in that, describedly estimates to obtain the group delay of stereophonic signal and the estimation unit of faciation position comprises according to the pre-service result:
Judging unit is used for judging the index of value correspondence of time-domain signal cross correlation function amplitude maximum and the relation of the symmetric interval relevant with stereophonic signal time-frequency conversion length N;
The group delay unit, if the index of the value correspondence of amplitude maximum is positioned at first symmetric interval in the time-domain signal cross correlation function, group delay equals the index of the value correspondence of amplitude maximum in this time-domain signal cross correlation function so, if the index of the value correspondence of amplitude maximum is positioned at second symmetric interval in the related function, group delay deducts N for this index;
The faciation bit location is used for the phase angle according to the cross correlation function correspondence of group delay correspondence, as group delay d gMore than or equal to zero, by determining d gThe phase angle of corresponding cross correlation value correspondence estimates to obtain the faciation position; Work as d gLess than zero the time, the faciation position is d gThe phase angle of corresponding cross correlation value correspondence on the+N index.
19. device as claimed in claim 16 is characterized in that, describedly estimates to obtain group delay between stereo left and right sound track signals and the estimation unit of faciation position comprises according to the pre-service result:
The phase extraction unit is used for described cross correlation function, or based on the cross correlation function after handling, extracts its phase place
Figure FSA00000019669100061
Function ∠ C wherein r(k is used to extract plural C r(k) phase angle;
The group delay unit is used for asking for the average α of phase differential in frequency of low strap 1, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information;
Faciation position unit is used for obtaining faciation position information according to the phase place of the current frequency of described cross correlation function and the difference of frequency index and phase differential average product.
20. device as claimed in claim 16 is characterized in that, described device also comprises the parameter characteristic unit, is used for estimating to obtain stereo parameter IPD according to described faciation position and group delay.
21. the equipment of a stereophonic signal coding is characterized in that described equipment comprises:
Converting means is used for conversion time domain stereo left channel signal and right-channel signals and forms left channel signals and right-channel signals on the frequency domain to frequency domain;
Under load in mixture and put, the left channel signals and the right-channel signals that are used on the frequency domain are mixed signal through mixing down to generate under the monophony;
The parameter extraction device is used to extract the spatial parameter of left channel signals and right-channel signals on the frequency domain;
Estimate the stereophonic signal device, be used to utilize left and right sound track signals on the frequency domain to estimate group delay and faciation position between stereo left and right acoustic channels;
Code device is used for described group delay of quantization encoding and faciation position, mixes signal under described spatial parameter and the described monophony.
22. equipment as claimed in claim 21, it is characterized in that: described estimation stereophonic signal device utilizes left and right sound track signals on the frequency domain to estimate also to comprise before group delay between stereo left and right acoustic channels and the faciation position to determine about the cross correlation function between stereo left and right sound track signals on the frequency domain, and described cross correlation function comprises the simple crosscorrelation of the weighting of the product of the conjugation of left channel signals and right-channel signals on the frequency domain.
23. as claim 20 or 22 described equipment, it is characterized in that: the cross correlation function about the weighting between stereo left and right sound track signals on the frequency domain that described estimation stereophonic signal device is determined can be expressed as:
C r ( k ) = X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | k = 0 2 * X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | 1 &le; k &le; N / 2 - 1 X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | k = N / 2 0 k > N / 2 , Or
C r ( k ) = X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 k = 0 2 * X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 1 &le; k &le; N / 2 - 1 X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 k = N / 2 0 k > N / 2
Wherein, N is the length of stereophonic signal time-frequency conversion, | X 1(k) | and | X 2(k) | be X 1(k) and X 2(k) Dui Ying amplitude.The described cross correlation function of weighting is at frequency 0, and the value on the frequency N/2 is the inverse of left and right sound track signals amplitude product on corresponding frequency, and the described cross correlation function of weighting is left and right sound track signals on other frequencies 2 times of the inverse of amplitude product.
24. device as claimed in claim 23 is characterized in that: described estimation stereophonic signal device comprises the frequency-time domain transformation unit, is used for described cross correlation function is carried out the time-domain signal that inverse time conversion frequently obtains cross correlation function.
25. device as claimed in claim 24, it is characterized in that: described estimation stereophonic signal device comprises estimation unit, be used for estimating to obtain group delay according to the cross correlation function time-domain signal or based on the index of the value correspondence of the cross correlation function time-domain signal amplitude maximum after handling, obtain the phase angle of the cross correlation function correspondence of group delay correspondence, estimate to obtain the faciation position.
26. device as claimed in claim 24, it is characterized in that: described estimation stereophonic signal device comprises estimation unit, be used to extract the phase place of described cross correlation function, determine group delay according to the product of phase differential and transform length and the ratio relation of frequency information; Obtain faciation position information according to the phase place of the current frequency of cross correlation function and the difference of frequency index and phase differential average product.
27. the system of a stereo coding is characterized in that, described system comprises that receiving equipment is used to receive stereo input signal and is used for stereo coding equipment as the arbitrary described stereo coding equipment of claim 21-26, receiving equipment and transfer equipment; Transfer equipment 52 is used to transmit the result of described stereo coding equipment 51.
CN201010113805.9A 2010-02-12 2010-02-12 Method for coding stereo and device thereof Active CN102157152B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201010113805.9A CN102157152B (en) 2010-02-12 2010-02-12 Method for coding stereo and device thereof
PCT/CN2010/079410 WO2011097915A1 (en) 2010-02-12 2010-12-03 Method and device for stereo coding
US13/567,982 US9105265B2 (en) 2010-02-12 2012-08-06 Stereo coding method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010113805.9A CN102157152B (en) 2010-02-12 2010-02-12 Method for coding stereo and device thereof

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN2013102709304A Division CN103366748A (en) 2010-02-12 2010-02-12 Stereo coding method and device

Publications (2)

Publication Number Publication Date
CN102157152A true CN102157152A (en) 2011-08-17
CN102157152B CN102157152B (en) 2014-04-30

Family

ID=44367218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010113805.9A Active CN102157152B (en) 2010-02-12 2010-02-12 Method for coding stereo and device thereof

Country Status (3)

Country Link
US (1) US9105265B2 (en)
CN (1) CN102157152B (en)
WO (1) WO2011097915A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102446507A (en) * 2011-09-27 2012-05-09 华为技术有限公司 Down-mixing signal generating and reducing method and device
CN103700372A (en) * 2013-12-30 2014-04-02 北京大学 Orthogonal decoding related technology-based parametric stereo coding and decoding methods
CN103971692A (en) * 2013-01-28 2014-08-06 北京三星通信技术研究有限公司 Audio processing method, device and system
CN104681029A (en) * 2013-11-29 2015-06-03 华为技术有限公司 Coding method and coding device for stereo phase parameters
CN108028988A (en) * 2015-06-17 2018-05-11 三星电子株式会社 Handle the apparatus and method of the inside sound channel of low complexity format conversion
CN109389985A (en) * 2017-08-10 2019-02-26 华为技术有限公司 Time domain stereo decoding method and Related product
WO2020135610A1 (en) * 2018-12-28 2020-07-02 南京中感微电子有限公司 Audio data recovery method and apparatus and bluetooth device
CN111988726A (en) * 2019-05-06 2020-11-24 深圳市三诺数字科技有限公司 Method and system for synthesizing single sound channel by stereo
CN112242150A (en) * 2020-09-30 2021-01-19 上海佰贝科技发展股份有限公司 Method and system for detecting stereo
CN112242150B (en) * 2020-09-30 2024-04-12 上海佰贝科技发展股份有限公司 Method and system for detecting stereo

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102157152B (en) 2010-02-12 2014-04-30 华为技术有限公司 Method for coding stereo and device thereof
WO2012167479A1 (en) * 2011-07-15 2012-12-13 Huawei Technologies Co., Ltd. Method and apparatus for processing a multi-channel audio signal
CN108269577B (en) 2016-12-30 2019-10-22 华为技术有限公司 Stereo encoding method and stereophonic encoder
CN109215667B (en) 2017-06-29 2020-12-22 华为技术有限公司 Time delay estimation method and device
CN117133297A (en) * 2017-08-10 2023-11-28 华为技术有限公司 Coding method of time domain stereo parameter and related product
US10306391B1 (en) 2017-12-18 2019-05-28 Apple Inc. Stereophonic to monophonic down-mixing
EP3985665A1 (en) * 2018-04-05 2022-04-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method or computer program for estimating an inter-channel time difference
CN114205821B (en) * 2021-11-30 2023-08-08 广州万城万充新能源科技有限公司 Wireless radio frequency anomaly detection method based on depth prediction coding neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1860526A (en) * 2003-09-29 2006-11-08 皇家飞利浦电子股份有限公司 Encoding audio signals
CN101149925A (en) * 2007-11-06 2008-03-26 武汉大学 Space parameter selection method for parameter stereo coding
US20080097766A1 (en) * 2006-10-18 2008-04-24 Samsung Electronics Co., Ltd. Method, medium, and apparatus encoding and/or decoding multichannel audio signals
CN101313355A (en) * 2005-09-27 2008-11-26 Lg电子株式会社 Method and apparatus for encoding/decoding multi-channel audio signal
EP2144229A1 (en) * 2008-07-11 2010-01-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Efficient use of phase information in audio encoding and decoding

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ATE426235T1 (en) * 2002-04-22 2009-04-15 Koninkl Philips Electronics Nv DECODING DEVICE WITH DECORORATION UNIT
KR20050021484A (en) 2002-07-16 2005-03-07 코닌클리케 필립스 일렉트로닉스 엔.브이. Audio coding
CN1748247B (en) * 2003-02-11 2011-06-15 皇家飞利浦电子股份有限公司 Audio coding
SE0402650D0 (en) * 2004-11-02 2004-11-02 Coding Tech Ab Improved parametric stereo compatible coding or spatial audio
KR20070090219A (en) 2004-12-28 2007-09-05 마츠시타 덴끼 산교 가부시키가이샤 Audio encoding device and audio encoding method
US9009057B2 (en) * 2006-02-21 2015-04-14 Koninklijke Philips N.V. Audio encoding and decoding to generate binaural virtual spatial signals
GB2453117B (en) * 2007-09-25 2012-05-23 Motorola Mobility Inc Apparatus and method for encoding a multi channel audio signal
CN100571043C (en) * 2007-11-06 2009-12-16 武汉大学 A kind of space parameter stereo coding/decoding method and device thereof
US20100318353A1 (en) * 2009-06-16 2010-12-16 Bizjak Karl M Compressor augmented array processing
CN102157152B (en) 2010-02-12 2014-04-30 华为技术有限公司 Method for coding stereo and device thereof
CN102157150B (en) * 2010-02-12 2012-08-08 华为技术有限公司 Stereo decoding method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1860526A (en) * 2003-09-29 2006-11-08 皇家飞利浦电子股份有限公司 Encoding audio signals
CN101313355A (en) * 2005-09-27 2008-11-26 Lg电子株式会社 Method and apparatus for encoding/decoding multi-channel audio signal
US20080097766A1 (en) * 2006-10-18 2008-04-24 Samsung Electronics Co., Ltd. Method, medium, and apparatus encoding and/or decoding multichannel audio signals
CN101149925A (en) * 2007-11-06 2008-03-26 武汉大学 Space parameter selection method for parameter stereo coding
EP2144229A1 (en) * 2008-07-11 2010-01-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Efficient use of phase information in audio encoding and decoding

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9516447B2 (en) 2011-09-27 2016-12-06 Huawei Technologies Co., Ltd. Method and apparatus for generating and restoring downmixed signal
WO2013044826A1 (en) * 2011-09-27 2013-04-04 华为技术有限公司 Method and device for generating and restoring downmix signal
CN102446507B (en) * 2011-09-27 2013-04-17 华为技术有限公司 Down-mixing signal generating and reducing method and device
CN102446507A (en) * 2011-09-27 2012-05-09 华为技术有限公司 Down-mixing signal generating and reducing method and device
CN103971692A (en) * 2013-01-28 2014-08-06 北京三星通信技术研究有限公司 Audio processing method, device and system
US10008211B2 (en) 2013-11-29 2018-06-26 Huawei Technologies Co., Ltd. Method and apparatus for encoding stereo phase parameter
CN104681029A (en) * 2013-11-29 2015-06-03 华为技术有限公司 Coding method and coding device for stereo phase parameters
WO2015078123A1 (en) * 2013-11-29 2015-06-04 华为技术有限公司 Method and device for encoding stereo phase parameter
CN104681029B (en) * 2013-11-29 2018-06-05 华为技术有限公司 The coding method of stereo phase parameter and device
CN103700372A (en) * 2013-12-30 2014-04-02 北京大学 Orthogonal decoding related technology-based parametric stereo coding and decoding methods
US10607622B2 (en) 2015-06-17 2020-03-31 Samsung Electronics Co., Ltd. Device and method for processing internal channel for low complexity format conversion
CN108028988A (en) * 2015-06-17 2018-05-11 三星电子株式会社 Handle the apparatus and method of the inside sound channel of low complexity format conversion
CN108028988B (en) * 2015-06-17 2020-07-03 三星电子株式会社 Apparatus and method for processing internal channel of low complexity format conversion
CN109389985A (en) * 2017-08-10 2019-02-26 华为技术有限公司 Time domain stereo decoding method and Related product
CN109389985B (en) * 2017-08-10 2021-09-14 华为技术有限公司 Time domain stereo coding and decoding method and related products
US11355131B2 (en) 2017-08-10 2022-06-07 Huawei Technologies Co., Ltd. Time-domain stereo encoding and decoding method and related product
US11900952B2 (en) 2017-08-10 2024-02-13 Huawei Technologies Co., Ltd. Time-domain stereo encoding and decoding method and related product
WO2020135610A1 (en) * 2018-12-28 2020-07-02 南京中感微电子有限公司 Audio data recovery method and apparatus and bluetooth device
CN111988726A (en) * 2019-05-06 2020-11-24 深圳市三诺数字科技有限公司 Method and system for synthesizing single sound channel by stereo
CN112242150A (en) * 2020-09-30 2021-01-19 上海佰贝科技发展股份有限公司 Method and system for detecting stereo
CN112242150B (en) * 2020-09-30 2024-04-12 上海佰贝科技发展股份有限公司 Method and system for detecting stereo

Also Published As

Publication number Publication date
US9105265B2 (en) 2015-08-11
WO2011097915A1 (en) 2011-08-18
US20120300945A1 (en) 2012-11-29
CN102157152B (en) 2014-04-30

Similar Documents

Publication Publication Date Title
CN102157152B (en) Method for coding stereo and device thereof
RU2645271C2 (en) Stereophonic code and decoder of audio signals
CN1748247B (en) Audio coding
CN101681623B (en) Method and apparatus for encoding and decoding high frequency band
CN101002261B (en) Method and apparatus for encoding and decoding multi-channel audio signal using virtual source location information
CN1307612C (en) Parametric representation of spatial audio
CN102157149B (en) Stereo signal down-mixing method and coding-decoding device and system
EP2467850B1 (en) Method and apparatus for decoding multi-channel audio signals
CN102428513B (en) Apparatus and method for encoding/decoding a multichannel signal
EP3518234B1 (en) Audio encoding device and method
CN103262158B (en) The multi-channel audio signal of decoding or stereophonic signal are carried out to the apparatus and method of aftertreatment
CN101675471A (en) The method and apparatus that is used for audio signal
JP2007507726A (en) Audio signal encoding
EP3511934B1 (en) Method, apparatus and system for processing multi-channel audio signal
CN103262160B (en) Method and apparatus for downmixing multi-channel audio signals
CN101568959A (en) Method, medium, and apparatus with bandwidth extension encoding and/or decoding
EP3608910B1 (en) Decoding device and method, and program
CN106033671B (en) Method and apparatus for determining inter-channel time difference parameters
CN102157150B (en) Stereo decoding method and device
US20120070007A1 (en) Apparatus and method for bandwidth extension for multi-channel audio
US8626518B2 (en) Multi-channel signal encoding and decoding method, apparatus, and system
WO2017206794A1 (en) Method and device for extracting inter-channel phase difference parameter
CN103366748A (en) Stereo coding method and device
CN101562015A (en) Audio-frequency processing method and device
CN106033672B (en) Method and apparatus for determining inter-channel time difference parameters

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant