CN103065634A - Three-dimensional audio space parameter quantification method based on perception characteristic - Google Patents

Three-dimensional audio space parameter quantification method based on perception characteristic Download PDF

Info

Publication number
CN103065634A
CN103065634A CN2012105589545A CN201210558954A CN103065634A CN 103065634 A CN103065634 A CN 103065634A CN 2012105589545 A CN2012105589545 A CN 2012105589545A CN 201210558954 A CN201210558954 A CN 201210558954A CN 103065634 A CN103065634 A CN 103065634A
Authority
CN
China
Prior art keywords
dimensional audio
dimensional
parameter
perception
quantization table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012105589545A
Other languages
Chinese (zh)
Other versions
CN103065634B (en
Inventor
胡瑞敏
王恒
杨姗姗
张茂胜
李登实
涂卫平
王晓晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOOSLINK SUZHOU INFORMATION TECHNOLOGY Co.,Ltd.
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201210558954.5A priority Critical patent/CN103065634B/en
Publication of CN103065634A publication Critical patent/CN103065634A/en
Application granted granted Critical
Publication of CN103065634B publication Critical patent/CN103065634B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

At present, three-dimensional audio signals are mostly stored and played locally, coding compression of the audio signals is unavailable, or only each sound track is coded independently, code rate can be increased in a linear mode along with the number of the sound tracks, and data size of a audio file can be very large. Compression ratio can be greatly improved with a parameterization coding method, however, space information code rate can also be increased in a linear mode along with the number of the sound tracks, and distortion of the space direction sense of a three-dimensional audio and obvious reducing of the rebuilding effect of the three-dimensional audio can be caused by parameter quantification errors under the condition that the code rate is limited by a real-time rebroadcast bandwidth and capacity of a storage medium. According to the three-dimensional audio space parameter quantification method based on the perception characteristic, in order to solve the problems in the prior art, an uneven quantization table is generated by obtaining perception threshold values in different directions, and encoding rate of parameters is effectively reduced compared with a traditional even quantification table.

Description

Quantization method based on the three-dimensional audio spatial parameter of apperceive characteristic
Technical field
The invention belongs to the audio coding field, relate in particular to parameter quantification technology in space in the three-dimensional audio.
Background technology
In the end of the year 2009, three-dimensional movie " A Fanda " is climbed up top box-office value in more than 30 country in the whole world, and at the beginning of 2010 9 months, accumulative total box office, the whole world is above 2,700,000,000 dollars.Why " A Fanda " can obtain the box office achievement of splendidness like this, is that it has adopted brand-new three-dimensional special effect making technology to bring the effect of the shock on people's sense organ.Gorgeous picture and sound effect true to nature that " A Fanda " represents have not only shaken spectators, also so that industry has had asserting of " film entering the three-dimensional epoch ".Moreover, it also will expedite the emergence of technology and the standard of more relevant video display, recording, broadcast aspect.In January, 2010, the TV new product that each colour TV giant reveals one after another brought the new expectations of people in the international consumption electronic product exhibition that Las Vegas, US is held---the three-dimensional new focus that has become each large colour TV manufacturer competition of the whole world.Want to reach better audiovisual experience, the three-dimensional sound field auditory effect synchronous with 3 d video content need to be arranged, could really reach audiovisual impression on the spot in person.Early stage three dimensional audio systems (such as the Ambisonics system) is difficult to promote practical because its complex structure is had relatively high expectations to collection and playback apparatus.Japanese NHK company has released 22.2 sound channel systems in recent years, can pass through 24 three-dimensional sound fields that loudspeaker reproduction is original.MPEG in 2011 set about formulating the international standard of three-dimensional audio, wish to reduce three-dimensional sound field by fewer loudspeaker or earphone when reaching certain code efficiency, so that can be with this Technique Popularizing to the ordinary family user.This shows that three-dimensional audio frequency and video technology has become the study hotspot of multimedia technology field and the important directions that further develops.
In order to obtain better 3D sound effect, need to increase to greatest extent channel number, channel number increases so that the 3D audio frequency faces lot of challenges: the surge of 3D audio track number is so that its data volume is very large, in the situation that be subject to relaying in real time bandwidth and storage medium capacity restriction, the reconstruction effect of three-dimensional audio can significantly descend.At present for the compression coding technology of three-dimensional audio, although the Lossless Compression that each sound channel lossless coding can the capable of meeting requirements on three-dimensional audio, compression efficiency is difficult to the demand of capable of meeting requirements on three-dimensional audio storage and transmission.The parameter coding of three-dimensional audio has considered that spatial information is redundant between signal redundancy, the interior perception redundancy of sound channel and sound channel, can obtain preferably compression efficiency.Current three-dimensional audio parameter coding mainly extracts the dimensional orientation parameter, and parameter is carried out unified quantization, and the quantization error of direction parameter is identical on all directions.In the situation that code check is subject to transmission channel bandwidth and storage medium tolerance limit system, can cause the acoustic image direction feeling distortion of three-dimensional audio, so that existing three dimensional audio systems can't be applied to relay in real time and the home theater environment, seriously limited the application and development of three dimensional audio systems.People's ear shows the apperceive characteristic result of study of space source of sound, there is threshold of perception current on the dimensional orientation of auditory system to sound source, and the perceptual sensitivity difference on the different directions is huge, if error then can not lost three-dimensional perception tonequality under the threshold of perception current of correspondence direction.Therefore utilizing the spatial perception characteristic of people's ear that three-dimensional audio is carried out parameter coding, is the effective way that solves current three-dimensional audio parameter coding technical bottleneck, demands urgently carrying out based on the three-dimensional audio parameter coding technical research of perception.
Summary of the invention
The present invention has proposed a kind of spatial parameter non-uniform quantizing method based on people's ear orientation apperceive characteristic to the deficiency of existing uniform quantization method, guiding space audio quantization coding.
Technical solution provided by the invention is a kind of quantization method of the three-dimensional audio spatial parameter based on apperceive characteristic, comprises following steps:
Step 1, divided band, but obtaining the proper difference in perception of different azimuth under the different frequency bands, described orientation adopts point (α, β) expression on the sphere, horizontal angle and elevation angle during α, β are respectively take sphere centre as three-dimensional system of coordinate;
Step 2, but according to the proper difference in perception of different azimuth under the step 1 gained different frequency bands, set up the quantization table of different azimuth for each frequency band;
Step 3 to the mixed monophonic signal that obtains under the pending three-dimensional audio signal, and is extracted direction parameter;
Step 4, according to step 2 gained quantization table, the direction parameter that step 3 is extracted quantizes;
Step 5 is carried out quantization encoding to step 3 gained monophonic signal, and quantizes the synthetic code stream of back side parameter with step 4 gained.
And in step 2, when setting up the quantization table of different azimuth for each frequency band, but the interval between interior two quantized values of each quantization table is no more than the corresponding just twice of difference in perception.
The present invention provides quantization table heterogeneous by obtaining the threshold of perception current of different azimuth, effectively lowers the encoder bit rate of parameter with respect to traditional uniform quantization table.
Description of drawings
Fig. 1 is the quantification frame diagram based on the three-dimensional audio spatial parameter of apperceive characteristic of the embodiment of the invention;
Fig. 2 is that loudspeaker position and the calculated signals of the embodiment of the invention concerns schematic diagram.
Embodiment
The present invention is mainly based on the space psychologic acoustics, considers the frequencydependence characteristic of auditory perceptual characteristic and spatial parameter, proposition a kind of based on spatial cues apperceive characteristic direction parameter quantization method.Such as Fig. 1, the concrete implementation process of embodiment is described as follows:
Step 1 is obtained the perception JND of different azimuth under the different frequency bands.
Each sound channel of three-dimensional audio signal is carried out sub-band division by existing frequency band division mode, establish B kBe k frequency band of single sound channel, each sound channel adopts same dividing mode.The position of putting on (α, β) expression sphere, α, β are respectively horizontal angle and elevation angle, and the value of establishing β has the p kind, and wherein the value of the horizontal angle α under the j kind elevation angle β has q jKind, concrete value condition is according to circumstances set, and then can have altogether
Figure BDA00002623459800031
Plant different position groupings.
During implementation, can test with reference to existing method, to obtain the orientation apperceive characteristic of diverse location under the different frequency bands.As recording respectively different frequency bands B by the method for approaching gradually kThe JND value of lower each location point (α, β) (but proper difference in perception), the JND of horizontal angle is designated as Δ α, and the JND of elevation angle is designated as Δ β.
Embodiment is divided into 24 critical bands according to the division methods of Bark band with Whole frequency band, can select the centre frequency of Bark frequency band or the narrow band noise under this frequency band, each frequency band selects the location point of some to do test according to the qualitative analysis of orientation apperceive characteristic.For example at the 6th frequency band (B 6) select under 5 elevation angles the varying level angle to test, because the people dead ahead is comparatively responsive to the position angle, the angle intervals of selection is less, and both sides are insensitive, angle intervals can be selected larger, concrete selected angle such as the following horizontal angle selecting test point table of comparisons:
Figure BDA00002623459800032
Step 2 is set up the quantization table of different azimuth.
The horizontal angle that obtains according to step 1 and the JND value of elevation angle are set up different frequency bands B kThe horizontal angle of lower different spatial and elevation angle quantization table advise that the interval between two quantized values will be lower than the people in the twice of this position minimum resolvable angle, namely are no more than the twice of this value JND value.
JND such as certain horizontal angle α is Δ α, and then when setting up quantization table, near the quantized interval that the quantification principle is followed this position can not greater than 2 Δ α, so just can guarantee that quantization error is not perceived by the human ear.Can obtain whole spherical space quantization table under different frequency bands, each frequency band B by this kSet up a quantization table.
For example recording elevation angle is that the 75 horizontal threshold of perception current tables of spending are as follows:
Horizontal angle The JND value The quantized interval value
0 5.3 7
7 5.9 7
15 6.7 8
25 7.4 8
40 8.3 9
60 8.9 10
80 9.4 10
105 10.6 11
130 9.4 10
150 8.3 9
168 6.6 7
180 6.1 7
Last quantization table can be definite according to the value of quantized interval, and the horizontal angle quantized value under this elevation angle is: 0,7,14,22,30,38,47,56,66,76,86,97,108,119,129,139,149,158,167,174,180,186,193,202,211,221,231,241,252,263,274,284,294,304,313,322,330,338,346,353.Realization obtains the direction parameter quantization table based on the direction feeling perception model.
Step 3 is carried out pre-service to the three-dimensional audio signal, its lower mixing is monophonic signal, and therefrom extracts direction parameter.
Embodiment does time-frequency conversion to the three-dimensional audio signal, then the division methods according to the Bark band is divided into 24 frequency bands, in each frequency band, the three-dimensional audio signal is done lower mixed processing, be mixed into and enter step 5 after the monophonic signal and quantize, simultaneously also extract a prescription position parameter, enter step 4 it is quantized.
Dimensional orientation can be take the auditor as reference point, and the auditor is positioned at the three-dimensional coordinate initial point, and each sound channel signal of multi-channel system is decomposed, and obtains each sound channel at the X of three dimensions cartesian coordinate system, the component on Y and the Z axis.The component of each sound channel is the decomposition of former single-tone source on this sound channel.Therefore obtaining the X of each sound channel, behind the component on Y and the Z axis, respectively to each component addition, can obtain former single-tone source to the component of auditor position, each axial component quadratic sum namely can be used as the monophony after lower the mixing, and can adopt the monophony scrambler to realize during implementation.
At first each sound channel signal is carried out time-frequency conversion, concrete conversion method can be utilized the existing methods such as Short Time Fourier Transform and virtual orthographic mirror filter group.
As shown in Figure 2, arbitrary loudspeaker has direction parameter (μ, η), and μ is horizontal angle, and η is elevation angle.Consider that is positioned at a horizontal angle μ i, elevation angle η iLoudspeaker, represent three-dimensional audio signal in the loudspeaker with a vector
Figure BDA00002623459800041
Wherein Be the time-frequency thresholding of i the sound channel of sound source S, k, n represent respectively frequency band and time domain frame index, i are the index values of loudspeaker.This signal Can be decomposed as the formula (1):
P s i ( k , n ) = g s i ( k , n ) · cos μ i · cos η i sin μ i · cos η i sin η i . . . ( 1 )
Wherein,
Figure BDA00002623459800054
Strength information for frequency domain point.Consider that the loudspeaker number is the three-dimensional audio playing environment of N, each component of final signal intensity (lower mixing sound road) be each sound channel intensity after each component of three dimensions cartesian coordinate system, final signal intensity
Figure BDA00002623459800055
As the formula (2):
G s 2 ( k , n ) = [ Σ i = 1 N g s i ( k , n ) · cos μ i · cos η i ] 2 +
[ Σ i = 1 N g s i ( k , n ) · sin μ i · cos η i ] 2 + . . . ( 2 )
[ Σ i = 1 N g s i ( k , n ) · sin η i ] 2
And lower mixing sound road can be regarded a sound source of removing spatial positional information as.Its position angle can be divided into horizontal angle μ and elevation angle η, and through type (3), formula (4) calculate:
tan μ s ( k , n ) = Σ i = 1 N g s i ( k , n ) · cos μ i · cos η i Σ i = 1 N g s i ( k , n ) · sin μ i · cos η i . . . ( 3 )
tan η s ( k , n ) = [ Σ i = 1 N g s i ( k , n ) · cos μ i · cos η i ] 2 + [ Σ i = 1 N g s i ( k , n ) · sin μ i · cos η i ] 2 Σ i = 1 N g s i ( k , n ) · sin η i . . . ( 4 )
μ wherein s(k, n), η s(k, n) is horizontal angle and the elevation angle of sound source S under n frame, a k frequency band.
Embodiment calculates the direction parameter of each frequency band according to step 3 correlation formula, for example the orientation of frequency band 4 calculating is (60 °, 72 °).
Step 4 quantizes the direction parameter (μ, η) that extracts, and carries out non-uniform quantizing according to the quantization table in the step 2.
According to the frequency band B parameter in the step 3 kDetermine the quantization table of use, judge (μ, η) at which quantized interval again, the hypothetical target quantized value is (s, t).Concrete system of selection is: will (μ, η) one by one with quantization table in each quantized interval compare, if satisfy (| the value minimum of μ-s|+| η-t|) then is quantified as (s, t) with location parameter (μ, η).
For example direction parameter (μ, η)=(60 °, 72 °) that extract are quantized, carry out non-uniform quantizing according to the quantization table in the step 2.Calculate different quantized value (s, t) with (60 ° of the direction parameters that is quantized, 72 °) calculate (| 60 °-s|+|72 °-t|) value, can draw quantized value (s, t)=(66 °, 75 °) calculate value minimum, then direction parameter is (60 °, 72 °) be quantified as at last (66 °, 75 °), the direction parameter value of all frequency bands is quantized.
Step 5, lower mixing sound road coding, code stream is synthetic.
After the spatial parameter that extracts quantized, adopts general perceptual audio coder that monophony is quantized, at last the code stream after two quantifications is synthesized, export last encoding code stream.
Specific embodiment described herein only is to the explanation for example of the present invention's spirit.Those skilled in the art can make various modifications or replenish or adopt similar mode to substitute described specific embodiment, but can't depart from spirit of the present invention or surmount the defined scope of appended claims.

Claims (2)

1. the quantization method based on the three-dimensional audio spatial parameter of apperceive characteristic is characterized in that, comprises following steps:
Step 1, divided band, but obtaining the proper difference in perception of different azimuth under the different frequency bands, described orientation adopts point (α, β) expression on the sphere, horizontal angle and elevation angle during α, β are respectively take sphere centre as three-dimensional system of coordinate;
Step 2, but according to the proper difference in perception of different azimuth under the step 1 gained different frequency bands, set up the quantization table of different azimuth for each frequency band;
Step 3 to the mixed monophonic signal that obtains under the pending three-dimensional audio signal, and is extracted direction parameter;
Step 4, according to step 2 gained quantization table, the direction parameter that step 3 is extracted quantizes;
Step 5 is carried out quantization encoding to step 3 gained monophonic signal, and quantizes the synthetic code stream of back side parameter with step 4 gained.
2. according to the quantization method of the described three-dimensional audio spatial parameter based on apperceive characteristic of claims 1, it is characterized in that: in step 2, when setting up the quantization table of different azimuth for each frequency band, but the interval between interior two quantized values of each quantization table is no more than the corresponding just twice of difference in perception.
CN201210558954.5A 2012-12-20 2012-12-20 Three-dimensional audio space parameter quantification method based on perception characteristic Active CN103065634B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210558954.5A CN103065634B (en) 2012-12-20 2012-12-20 Three-dimensional audio space parameter quantification method based on perception characteristic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210558954.5A CN103065634B (en) 2012-12-20 2012-12-20 Three-dimensional audio space parameter quantification method based on perception characteristic

Publications (2)

Publication Number Publication Date
CN103065634A true CN103065634A (en) 2013-04-24
CN103065634B CN103065634B (en) 2014-11-19

Family

ID=48108234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210558954.5A Active CN103065634B (en) 2012-12-20 2012-12-20 Three-dimensional audio space parameter quantification method based on perception characteristic

Country Status (1)

Country Link
CN (1) CN103065634B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607690A (en) * 2013-12-06 2014-02-26 武汉轻工大学 Down conversion method for multichannel signals in 3D (Three Dimensional) voice frequency
CN104064194A (en) * 2014-06-30 2014-09-24 武汉大学 Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency
CN104240712A (en) * 2014-09-30 2014-12-24 武汉大学深圳研究院 Three-dimensional audio multichannel grouping and clustering coding method and three-dimensional audio multichannel grouping and clustering coding system
CN104464742A (en) * 2014-12-31 2015-03-25 武汉大学 System and method for carrying out comprehensive non-uniform quantitative coding on 3D audio space parameters
CN112584297A (en) * 2020-12-01 2021-03-30 中国电影科学技术研究所 Audio data processing method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040181393A1 (en) * 2003-03-14 2004-09-16 Agere Systems, Inc. Tonal analysis for perceptual audio coding using a compressed spectral representation
CN101162904A (en) * 2007-11-06 2008-04-16 武汉大学 Space parameter stereo coding/decoding method and device thereof
CN101408615A (en) * 2008-11-26 2009-04-15 武汉大学 Method and device for measuring binaural sound time difference ILD critical apperceive characteristic
CN101499279A (en) * 2009-03-06 2009-08-05 武汉大学 Bit distribution method and apparatus with progressively fine spacing parameter

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040181393A1 (en) * 2003-03-14 2004-09-16 Agere Systems, Inc. Tonal analysis for perceptual audio coding using a compressed spectral representation
CN101162904A (en) * 2007-11-06 2008-04-16 武汉大学 Space parameter stereo coding/decoding method and device thereof
CN101408615A (en) * 2008-11-26 2009-04-15 武汉大学 Method and device for measuring binaural sound time difference ILD critical apperceive characteristic
CN101499279A (en) * 2009-03-06 2009-08-05 武汉大学 Bit distribution method and apparatus with progressively fine spacing parameter

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汤永清 等: "基于球傅里叶变换的声源三维空间定位", 《信号处理》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607690A (en) * 2013-12-06 2014-02-26 武汉轻工大学 Down conversion method for multichannel signals in 3D (Three Dimensional) voice frequency
CN104064194A (en) * 2014-06-30 2014-09-24 武汉大学 Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency
CN104064194B (en) * 2014-06-30 2017-04-26 武汉大学 Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency
CN104240712A (en) * 2014-09-30 2014-12-24 武汉大学深圳研究院 Three-dimensional audio multichannel grouping and clustering coding method and three-dimensional audio multichannel grouping and clustering coding system
CN104240712B (en) * 2014-09-30 2018-02-02 武汉大学深圳研究院 A kind of three-dimensional audio multichannel grouping and clustering coding method and system
CN104464742A (en) * 2014-12-31 2015-03-25 武汉大学 System and method for carrying out comprehensive non-uniform quantitative coding on 3D audio space parameters
CN104464742B (en) * 2014-12-31 2017-07-11 武汉大学 A kind of comprehensive non-uniform quantizing coded system of 3D audio spaces parameter and method
CN112584297A (en) * 2020-12-01 2021-03-30 中国电影科学技术研究所 Audio data processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN103065634B (en) 2014-11-19

Similar Documents

Publication Publication Date Title
US11962990B2 (en) Reordering of foreground audio objects in the ambisonics domain
CN105325015B (en) The ears of rotated high-order ambiophony
KR101877604B1 (en) Determining renderers for spherical harmonic coefficients
KR101921403B1 (en) Higher order ambisonics signal compression
CN106797527B (en) The display screen correlation of HOA content is adjusted
CN106575506A (en) Intermediate compression for higher order ambisonic audio data
ES2841419T3 (en) Signaling channels for scalable encoding of higher-order ambisonic audio data
CN106796794A (en) The normalization of environment high-order ambiophony voice data
CN103065634B (en) Three-dimensional audio space parameter quantification method based on perception characteristic
CN104064194B (en) Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency
US10075802B1 (en) Bitrate allocation for higher order ambisonic audio data
KR20170067764A (en) Signaling layers for scalable coding of higher order ambisonic audio data
KR20160136361A (en) Inserting audio channels into descriptions of soundfields
US20190392846A1 (en) Demixing data for backward compatible rendering of higher order ambisonic audio
EP3363213B1 (en) Coding higher-order ambisonic coefficients during multiple transitions
CN112313744B (en) Rendering different portions of audio data using different renderers
Wu et al. Distortion reduction via CAE and DenseNet mixture network for low bitrate spatial audio object coding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210714

Address after: 215000 unit 01, 5 / F, building a, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Patentee after: BOOSLINK SUZHOU INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 430072 Hubei Province, Wuhan city Wuchang District of Wuhan University Luojiashan

Patentee before: WUHAN University