CN101673549B

CN101673549B - Spatial audio parameters prediction coding and decoding methods of movable sound source and system

Info

Publication number: CN101673549B
Application number: CN200910272282XA
Authority: CN
Inventors: 胡瑞敏; 周成; 高丽; 杭波; 王晓晨
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2009-09-28
Filing date: 2009-09-28
Publication date: 2011-12-14
Anticipated expiration: 2029-09-28
Also published as: CN101673549A

Abstract

The invention relates to the technical field of audio frequency, in particular relating to spatial audio parameters prediction coding and decoding methods of a movable sound source and a system. The coding method provided in the invention comprises: inputting multichannel audio frequency signals; extracting spatial audio parameters from the input multichannel audio frequency signals; predicting the spatial audio parameters in the current frame according to the speed of the movable sound source; obtaining the spatial audio parameter prediction coefficient and the spatial audio parameter prediction residual error of the current frame; and coding the spatial audio parameter prediction residual error of the current frame and obtaining a code stream. The decoding method provided in the invention comprises: inputting the code stream; decoding the code stream and obtaining the spatial audio parameter prediction residual error; and rebuilding the spatial audio parameters of the current frame according to the speed of the movable sound source and the spatial audio parameter prediction residual error of the current frame. The invention can predict accurately the spatial parameter variation of the movable sound source according to the Doppler effect, kinematic principle, geometrical acoustics principle and fluctuation acoustics principle, the prediction error is small, and the coding rate is effectively reduced.

Description

A kind of spatial audio parameters prediction coding and decoding methods of movable sound source and system

Technical field

The present invention relates to the Audiotechnica field, relate in particular to a kind of spatial audio parameters prediction coding and decoding methods of movable sound source and system.

Background technology

The theoretical foundation of spatial audio coding is the physiological acoustics and the psychologic acoustics of people's ear spatial hearing.Nineteen eighty-three Blauert etc. have provided mathematics physics model and experimental analysis to the simple sund source and the space orientation of many sound sources of people's ear, have defined spatial cues parameter such as degree of correlation IC between interaural difference ITD, interaural intensity difference ILD and ear.Wherein, ITD and ILD represent that respectively sound that same sound source is sent arrives the mistiming and the intensity difference of left and right ear, by these two parameters can localization of sound source the position, IC is estimating of left and right ear entering tone similarity, can determine the width of acoustic image.Calendar year 2001 Faller and Baumgarte etc. propose binaural cue coding method (BCC, Binaural Cue Coding), stereophonic signal mixed down to monophonic signal transfer to the conventional codec coding, to encoding separately, rebuild stereophonic signal jointly according to monophonic signal and spatial audio parameter during decoding simultaneously by the spatial audio parameter that extracts in the input signal.

Is a continuous process based on audio object the mobile of space, and there is correlativity in the audio space audio frequency parameter that characterizes the audio object dimensional orientation on time domain.Time domain correlation properties at the audio space audio frequency parameter, current main-stream spatial audio coding algorithm EAAC+ has adopted a kind of difference Huffman encoding algorithm, spatial audio parameter difference to this frame and former frame is done Huffman encoding, purpose is in order to dwindle parametric coding of spatial audio numerical value dynamic range, to reduce the spatial audio coding code check.This Differential video coding method reckons without the spatial audio parameter of the former frame predicted value as present frame the pan spatial audio parameter and changes, and still has the space of lifting.

Summary of the invention

The purpose of this invention is to provide a kind of spatial audio parameters prediction coding and decoding methods of movable sound source and system, to eliminate coding redundancy, predict the present frame spatial audio parameter according to Doppler effect, kinematic principle, geometric acoustics principle and wave acoustics principle, then poor (being the spatial audio parameters prediction error) of actual value and predicted value carried out encoding and decoding.

For achieving the above object, the present invention adopts following technical scheme:

The coding method of a kind of pan spatial audio parameters prediction may further comprise the steps:

1. import multi-channel audio signal;

2. the multi-channel audio signal to input extracts spatial audio parameter;

3. according to pan speed the spatial audio parameter of present frame is made prediction, obtain the spatial audio parameters prediction coefficient and the spatial audio parameters prediction residual error of present frame;

4. the spatial audio parameters prediction residual error of present frame is encoded and obtain encoding code stream.

After 2. described step is carried out, obtain interaural difference ITD and interaural intensity difference ILD;

Described step 3. in, deduct described prediction ears mistiming ITD according to described interaural difference ITD and obtain ears mistiming ITD prediction residual, deduct described prediction interaural intensity difference ILD according to described interaural intensity difference ILD and obtain interaural intensity difference ILD prediction residual;

Wherein, described prediction ears mistiming ITD is:

ITD (f) = \frac{a}{u} (\frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}} + \arctan (\frac{r_{0} \sin θ_{0} - v * Δt * f}{r_{0} \cos θ_{0}}))

Described prediction interaural intensity difference ILD is:

ILD (f) = 10 \log_{10} (\frac{\sin (\frac{2 πfc (u * Δt * f - \frac{a}{2} (\frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}} + \arctan (\frac{r_{0} \sin θ_{0} - v * Δt * f}{r_{0} \cos θ_{0}}))}{u - v \frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}}})}{\sin (\frac{2 πfc (u * Δt * f + \frac{a}{2} (\frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}} + \arctan (\frac{r_{0} \sin θ_{0} - v * Δt * f}{r_{0} \cos θ_{0}}))}{u - v \frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}}})})

Wherein, pan velocity of propagation in medium is that u, frequency are fc, and it is from initial position A (r ₀, θ ₀) move to position B (r with constant level speed V elapsed time t=Δ t*f _t, θ _t), r, θ are respectively radius and position angle, and Δ t is the interframe time interval, and f is the present frame frame number, and head is approximated to the spheroid that radius is a, and ears are approximated to 2 relative on sphere points, and θ (t) is the plane sound wave of surface level direction incident.

A kind of pan spatial audio parameters prediction coding/decoding method may further comprise the steps:

1. input coding code stream;

2. decoding obtains the spatial audio parameters prediction residual error from encoding code stream;

3. rebuild the present frame spatial audio parameter according to pan speed and present frame spatial audio parameters prediction residual error.

After 2. described step is carried out, obtain ears mistiming ITD prediction residual and interaural intensity difference ILD prediction residual;

Described step 3. in, add the above prediction ears mistiming ITD according to described ears mistiming ITD prediction residual and obtain ears mistiming ITD, add the above prediction interaural intensity difference ILD according to described interaural intensity difference ILD prediction residual and obtain interaural intensity difference ILD;

Wherein, described prediction ears mistiming ITD is:

ITD (f) = \frac{a}{u} (\frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}} + \arctan (\frac{r_{0} \sin θ_{0} - v * Δt * f}{r_{0} \cos θ_{0}}))

Described prediction interaural intensity difference ILD is:

ILD (f) = 10 \log_{10} (\frac{\sin (\frac{2 πfc (u * Δt * f - \frac{a}{2} (\frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}} + \arctan (\frac{r_{0} \sin θ_{0} - v * Δt * f}{r_{0} \cos θ_{0}}))}{u - v \frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}}})}{\sin (\frac{2 πfc (u * Δt * f + \frac{a}{2} (\frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}} + \arctan (\frac{r_{0} \sin θ_{0} - v * Δt * f}{r_{0} \cos θ_{0}}))}{u - v \frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}}})})

A kind of pan spatial audio parameters prediction coding and decoding system comprises:

Spatial audio parameter extraction module (1) receives the multi-channel audio signal (6) of input, be used for the multi-channel audio signal of input is extracted spatial audio parameter, and the output region audio frequency parameter is given spatial audio parameters prediction module (2);

Spatial audio parameters prediction module (2) receives spatial audio parameter, be used for the spatial audio parameter of present frame being made prediction according to pan speed, obtain the spatial audio parameters prediction coefficient and the spatial audio parameters prediction residual error of present frame, and prediction residual is exported to parametric coding of spatial audio module (3);

Parametric coding of spatial audio module (3) receives prediction residual, is used for the spatial audio parameters prediction residual error of present frame encoded obtaining encoding code stream, and encoding code stream is exported to spatial audio parameter decoder module (4);

Spatial audio parameter decoder module (4) received code code stream is used for obtaining the spatial audio parameters prediction residual error from the encoding code stream decoding, and prediction residual is exported to spatial audio parameter rebuilding module (5);

Spatial audio parameter rebuilding module (5) receives prediction residual, is used for rebuilding the present frame spatial audio parameter according to the spatial audio parameters prediction residual error of pan speed and present frame, and output region audio frequency parameter (7).

The present invention has the following advantages and good effect:

1) by in the prediction of coding side to spatial audio parameter, prediction residual is put into encoding code stream, decoding end is rebuild spatial audio parameter with prediction residual;

2) come the pan spatial parameter changed according to Doppler effect, kinematic principle, geometric acoustics principle and wave acoustics principle and make accurate estimation, predicated error is little, can effectively reduce encoder bit rate.

Description of drawings

Fig. 1 is a pan spatial audio parameters prediction coding method process flow diagram provided by the invention.

Fig. 2 is a pan spatial audio parameters prediction coding/decoding method process flow diagram provided by the invention.

Fig. 3 is a pan spatial audio parameters prediction coding and decoding system construction drawing provided by the invention.

Fig. 4 is that pan provided by the invention moves synoptic diagram.

Fig. 5 is a pan sonic propagation schematic diagram provided by the invention.

Wherein,

S1.1-imports multi-channel audio signal, S1.2-extracts spatial audio parameter, S1.3-is to the spatial audio parameters prediction of present frame, S1.4-encodes to the spatial audio parameters prediction residual error of present frame, S2.1-input coding code stream, the S2.2-decoding obtains the spatial audio parameters prediction residual error, S2.3-rebuilds the present frame spatial audio parameter, 1-spatial audio parameter extraction module, 2-spatial audio parameters prediction module, 3-parametric coding of spatial audio module, 4-spatial audio parameter decoder module, 5-spatial audio parameter rebuilding module, 6-multi-channel audio signal, 7-spatial audio parameter.

Embodiment

The invention will be further described in conjunction with the accompanying drawings with specific embodiment below:

Following technical scheme is specifically adopted in pan spatial audio parameters prediction provided by the invention coding method, referring to Fig. 1, may further comprise the steps:

S1.1: input multi-channel audio signal;

S1.2: the multi-channel audio signal to input extracts spatial audio parameter;

S1.3: according to pan speed the spatial audio parameter of present frame is made prediction, obtain the spatial audio parameters prediction coefficient and the spatial audio parameters prediction residual error of present frame;

S1.4: the spatial audio parameters prediction residual error of present frame encoded obtains encoding code stream.

Further according to embodiment, describe the present invention in detail below:

When step 1.2 is specifically implemented, various spatial audio parameter computing method be can use, interaural difference ITD and interaural intensity difference ILD calculated;

When step 1.3 is specifically implemented, referring to accompanying drawing 4, suppose that pan velocity of propagation in medium is that u, frequency are fc, it is from initial position A (r ₀, θ ₀) move to position B (r with constant level speed V elapsed time t=Δ t*f _t, θ _t), r, θ are respectively radius and position angle, and Δ t is the interframe time interval, and f is the present frame frame number.

(1) referring to accompanying drawing 5, head is approximated to the spheroid that radius is a, ears are approximated to 2 relative on sphere points.For the plane sound wave of surface level θ (t) direction incident, considered sound wave after the transmission on head-bent surface, the prediction ears mistiming ITD that is obtained present frame by geometric acoustics principle is:

ITD (f) = \frac{a}{u} (\frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}} + \arctan (\frac{r_{0} \sin θ_{0} - v * Δt * f}{r_{0} \cos θ_{0}}))

(2) according to the result of calculation of ears mistiming ITD, the prediction interaural intensity difference ILD that obtains present frame based on Doppler effect and wave acoustics principle is:

ILD (f) = 10 \log_{10} (\frac{\sin (\frac{2 πfc (u * Δt * f - \frac{a}{2} (\frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}} + \arctan (\frac{r_{0} \sin θ_{0} - v * Δt * f}{r_{0} \cos θ_{0}}))}{u - v \frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}}})}{\sin (\frac{2 πfc (u * Δt * f + \frac{a}{2} (\frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}} + \arctan (\frac{r_{0} \sin θ_{0} - v * Δt * f}{r_{0} \cos θ_{0}}))}{u - v \frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}}})})

Step 1.2 gained interaural difference ITD is deducted prediction ears mistiming ITD obtain ears mistiming ITD prediction residual.

Step 1.2 gained interaural intensity difference ILD is deducted prediction interaural intensity difference ILD obtain interaural intensity difference ILD prediction residual.

Step 1.4 is concrete when implementing, and can adopt any encryption algorithm that spatial audio parameter (ears mistiming ITD and interaural intensity difference ILD) prediction residual is encoded and sends into encoding code stream.

Above process gained encoding code stream is exactly process object, decode procedure and the cataloged procedure contrary of pan spatial audio parameters prediction coding provided by the present invention.

Pan spatial audio parameters prediction coding/decoding method provided by the invention specifically adopts following technical scheme, referring to accompanying drawing 2, may further comprise the steps:

Step 2.1, the input coding code stream;

Step 2.2, decoding obtains the spatial audio parameters prediction residual error from encoding code stream;

Step 2.3 is rebuild the present frame spatial audio parameter according to pan speed and present frame spatial audio parameters prediction residual error.

With specific embodiment, describe the present invention in detail below:

When step 2.2 is specifically implemented, adopt the decoding algorithm corresponding from encoding code stream, to decode and obtain ears mistiming ITD prediction residual and interaural intensity difference ILD prediction residual with step 1.4;

When step 2.3 is specifically implemented, referring to accompanying drawing 4, suppose that pan velocity of propagation in medium is that u, frequency are fc, it is from initial position A (r ₀, θ ₀) move to position B (r with constant level speed V elapsed time t=Δ t*f _t, θ _t), r, θ are respectively radius and position angle, and Δ t is the interframe time interval, and f is the present frame frame number.

ITD (f) = \frac{a}{u} (\frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}} + \arctan (\frac{r_{0} \sin θ_{0} - v * Δt * f}{r_{0} \cos θ_{0}}))

ILD (f) = 10 \log_{10} (\frac{\sin (\frac{2 πfc (u * Δt * f - \frac{a}{2} (\frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}} + \arctan (\frac{r_{0} \sin θ_{0} - v * Δt * f}{r_{0} \cos θ_{0}}))}{u - v \frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}}})}{\sin (\frac{2 πfc (u * Δt * f + \frac{a}{2} (\frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}} + \arctan (\frac{r_{0} \sin θ_{0} - v * Δt * f}{r_{0} \cos θ_{0}}))}{u - v \frac{r_{0} \sin θ_{0} - v * Δt * f}{\sqrt{{(r_{0} \sin θ_{0} - v * Δt * f)}^{2} + {(r_{0} \cos θ_{0})}^{2}}}})})

Step 2.2 gained ears mistiming ITD prediction residual is added that prediction ears mistiming ITD obtains ears mistiming ITD.

Step 2.2 gained interaural intensity difference ILD prediction residual is added that prediction interaural intensity difference ILD obtains interaural intensity difference ILD.

The present invention provides pan spatial audio parameters prediction coding and decoding system simultaneously, specifically adopts following technical scheme, referring to Fig. 3, comprising:

Spatial audio parameter extraction module 1, spatial audio parameters prediction module 2, parametric coding of spatial audio module 3, spatial audio parameter decoder module 4, spatial audio parameter rebuilding module 5, wherein spatial audio parameter extraction module 1 receives the multi-channel audio signal 6 of input, be used for the multi-channel audio signal of input is extracted spatial audio parameter, and the output region audio frequency parameter is given spatial audio parameters prediction module 2; Spatial audio parameters prediction module 2 receives spatial audio parameter, be used for the spatial audio parameter of present frame being made prediction according to pan speed, obtain the spatial audio parameters prediction coefficient and the spatial audio parameters prediction residual error of present frame, and prediction residual exported to parametric coding of spatial audio module 3, parametric coding of spatial audio module 3 receives prediction residual, being used for the spatial audio parameters prediction residual error of present frame encoded obtains encoding code stream, and encoding code stream is exported to spatial audio parameter decoder module 4; Spatial audio parameter decoder module 4 received code code streams are used for obtaining the spatial audio parameters prediction residual error from the encoding code stream decoding, and prediction residual are exported to spatial audio parameter rebuilding module 5; Spatial audio parameter rebuilding module 5 receives prediction residual, is used for rebuilding the present frame spatial audio parameter according to the spatial audio parameters prediction residual error of pan speed and present frame, and output region audio frequency parameter 7.

During concrete enforcement, can adopt computer software technology to realize automatically performing of above step, also can make up codec, provide with the hardware unit form to practice according to Audiotechnica field custom.All situations that meets technical scheme spirit provided by the present invention comprise being equal to alternative, all should be in the claimed scope of the present invention.

Claims

1. pan spatial audio parameters prediction coding method is characterized in that, may further comprise the steps:

1. import multi-channel audio signal;

2. the multi-channel audio signal to input extracts spatial audio parameter;

4. the spatial audio parameters prediction residual error of present frame is encoded and obtain encoding code stream;

Described step 3. in, deduct prediction interaural difference ITD according to interaural difference ITD and obtain interaural difference ITD prediction residual, deduct prediction interaural intensity difference ILD according to interaural intensity difference ILD and obtain interaural intensity difference ILD prediction residual;

Wherein, for the plane sound wave of surface level θ (t) direction input, described prediction interaural difference ITD is:

Described prediction interaural intensity difference ILD is:

Wherein, pan velocity of propagation in medium is that u, frequency are

, it is from initial position A (r ₀, θ ₀) move to position B (r with constant level speed v elapsed time t=Δ t*f _t, θ _t), r, θ are respectively radius and position angle, and Δ t is the interframe time interval, and f is the present frame frame number, and head is approximated to the spheroid that radius is a, and ears are approximated to 2 relative on sphere points.

2. a pan spatial audio parameters prediction coding/decoding method is characterized in that, may further comprise the steps:

1. input coding code stream;

3. rebuild the present frame spatial audio parameter according to pan speed and present frame spatial audio parameters prediction residual error;

After 2. described step is carried out, obtain interaural difference ITD prediction residual and interaural intensity difference ILD prediction residual;

Described step 3. in, add that according to interaural difference ITD prediction residual prediction interaural difference ITD obtains interaural difference ITD, add that according to interaural intensity difference ILD prediction residual prediction interaural intensity difference ILD obtains interaural intensity difference ILD;

Described prediction interaural intensity difference ILD is:

Wherein, pan velocity of propagation in medium is that u, frequency are

3. a pan spatial audio parameters prediction coding and decoding system is characterized in that, comprising:

Spatial audio parameter extraction module (1) receives input multi-channel audio signal (6), extracts interaural difference ITD and interaural intensity difference ILD and exports to spatial audio parameters prediction module (2) from the input multi-channel audio signal;

Spatial audio parameters prediction module (2) receives interaural difference ITD and interaural intensity difference ILD, interaural difference ITD is deducted prediction interaural difference ITD obtain interaural difference ITD prediction residual, interaural intensity difference ILD is deducted prediction interaural intensity difference ILD obtain interaural intensity difference ILD prediction residual

Wherein, for the plane sound wave of surface level θ (t) direction incident, described prediction interaural difference ITD is:

Described prediction interaural intensity difference ILD is:

Wherein, pan velocity of propagation in medium is that u, frequency are

, it is from initial position A (r ₀, θ ₀) move to position B (r with constant level speed v elapsed time t=Δ t*f _t, θ _t), r, θ are respectively radius and position angle, and Δ t is the interframe time interval, and f is the present frame frame number, and head is approximated to the spheroid that radius is a, and ears are approximated to 2 relative on sphere points;

Interaural difference ITD prediction residual and interaural intensity difference ILD prediction residual are exported to parametric coding of spatial audio module (3);

Parametric coding of spatial audio module (3) receives interaural difference ITD prediction residual and interaural intensity difference ILD prediction residual, interaural difference ITD prediction residual and interaural intensity difference ILD prediction residual encoded obtains encoding code stream, and encoding code stream is exported to spatial audio parameter decoder module (4);

Spatial audio parameter decoder module (4) received code code stream is used for obtaining interaural difference ITD prediction residual and interaural intensity difference ILD prediction residual and exporting to spatial audio parameter rebuilding module (5) from the encoding code stream decoding;

Spatial audio parameter rebuilding module (5) receives interaural difference ITD prediction residual and interaural intensity difference ILD prediction residual, interaural difference ITD prediction residual is added that prediction interaural difference ITD obtains interaural difference ITD, interaural intensity difference ILD prediction residual is added that prediction interaural intensity difference ILD obtains interaural intensity difference ILD;

Described prediction interaural intensity difference ILD is:

Wherein, pan velocity of propagation in medium is that u, frequency are

Output interaural difference ITD and interaural intensity difference ILD, i.e. spatial audio parameter (7).