CN106023999A - Encoding and decoding method and system for improving three-dimensional audio spatial parameter compression ratio - Google Patents
Encoding and decoding method and system for improving three-dimensional audio spatial parameter compression ratio Download PDFInfo
- Publication number
- CN106023999A CN106023999A CN201610541939.8A CN201610541939A CN106023999A CN 106023999 A CN106023999 A CN 106023999A CN 201610541939 A CN201610541939 A CN 201610541939A CN 106023999 A CN106023999 A CN 106023999A
- Authority
- CN
- China
- Prior art keywords
- spatial parameter
- audio
- decoding
- coding
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000007906 compression Methods 0.000 title claims abstract description 33
- 230000006835 compression Effects 0.000 title claims abstract description 33
- 230000005236 sound signal Effects 0.000 claims abstract description 60
- 238000013139 quantization Methods 0.000 claims abstract description 50
- 230000008447 perception Effects 0.000 claims description 18
- 238000011084 recovery Methods 0.000 claims description 6
- 241000406668 Loxodonta cyclotis Species 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
- 210000000697 sensory organ Anatomy 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention provides an encoding and decoding method and system for improving three-dimensional audio spatial parameter compression ratio. According to the invention, the audio signal of a three-dimensional audio, the spatial side information of the three-dimensional audio and the number of the audio object of spatial parameters are input when encoding is carried out; when encoding is carried out, clustering, quantization, intra-frame encoding and inter-frame differential encoding are successively carried out on the spatial parameters; when decoding is carried out, inter-frame differential decoding, intra-frame decoding, inverse quantization and spatial parameter mapping are successively carried out. According to the invention, based on the characteristic that different sub-band spatial parameters in the same sound source and the same frame have similarities, a spatial parameter clustering method is adopted to improve the compression ratio of the three-dimensional audio spatial parameters, and the compression ratio of the three-dimensional audio spatial parameters is high.
Description
Technical field
The present invention relates to digital audio field, for the demand of raising three-dimensional audio spatial parameter compression ratio, particularly relate to
A kind of decoding method improving three-dimensional audio spatial parameter compression ratio and system.
Background technology
In the end of the year 2009, three-dimensional movie " A Fanda " climbs up top box-office value, in JIUYUE, 2010 in more than 30 country in the whole world
Just, accumulative box office, the whole world is more than 2,700,000,000 dollars.Why " A Fanda " can obtain the most brilliant box office achievement, is that it is adopted
Brand-new three-dimensional special effect making technology bring the shock effect on people's sense organ.
In order to provide a kind of sensation more immersed and one in 3d space more real sound field to auditor, space
Audio object coding (SAOC), direction audio coding (DirAC) and space squeezing audio coding (S3AC) are suggested.Along with 3D
The raising of spatial resolution and increasing sound channel or object, the bit rate of spatial parameter improves the most sharp.Such as,
In space orientation point of quantification (SLQP) method of S3AC coding, the bit rate of spatial parameter is 18kbps/ object, then for
16 sound object, spatial parameter needs the bit rate of 288kbps.Therefore, the ratio of the spatial parameter in minimizing 3D audio coding
Special rate is the most urgent.
Compression method BCC, MPEG Surround and S3AC of spatial parameter considers the characteristic between consecutive frame, then
The bit rate of spatial parameter can be reduced by differential coding.These methods can remove in identical frequency band empty between consecutive frame
Between the inter-frame redundancy of parameter, but in same frame between same sound source different frequency bands in the frame of spatial parameter redundancy yet suffer from.
Remove redundancy in these frames if can try every possible means, then spatial parameter bit rate can be further compressed.
Summary of the invention
Present invention aims to above-mentioned prior art not enough present on compression 3D audio space parameter, it is provided that
A kind of new object-based spatial parameter compression method for 3D audio recording;The method based on same sound source at same frame
Interior different frequency bands has the characteristic of identical spatial parameter, can remove in existing spatial parameter compression method with height ratio
Redundancy in the frame of the spatial parameter not considered, thus compression stroke parameters bit rate further.
Technical scheme provides a kind of decoding method for improving three-dimensional audio spatial parameter compression ratio, bag
Including cataloged procedure and decoding process, described cataloged procedure comprises the following steps:
Step C1, input includes three-dimensional sound signal, three-dimensional audio spatial parameter and the spatial parameter comprising n object
The numbering of affiliated audio object, transforms to frequency domain by three-dimensional audio time-domain signal, specific as follows,
If the time-domain signal of three-dimensional audio is s (t), described s (t) includes s1(t)、s2(t)、sk(t)…、sK(t), three
The spatial parameter of dimension audio frequencyDescribedIncluding The numbered Index of audio object belonging to spatial parameter (n, f);Time-domain signal s (t) of three-dimensional audio is become
Changing to frequency domain, (n, f), (n f) includes S to described S to obtain the frequency-region signal S of three-dimensional audio1(n,f)、S2(n,f)、Sk(n,
f)…、SK(n,f);Wherein, skT () is that the time domain of kth aeoplotropism audio signal is expressed, t express time;Sk(n f) is kth
The frequency domain presentation of individual aeoplotropism audio signal;Represent the spatial parameter that kth aeoplotropism audio signal is corresponding, θ
For horizontal angle,For elevation angle, r is distance side information;The value of k is 1,2 ..., K, K are original aeoplotropism audio signal
Sum;(n, value f) is the numbering of audio object belonging to spatial parameter to Index;N represents frame index, and f represents frequency indices;
Step C2, carries out intraframe coding to the spatial parameter of input, it is achieved as follows, to belonging to same audio frequency pair in same frame
The spatial parameter of the different frequency bands of elephant clusters;To the spatial parameter after clusterQuantify;After quantifying
Spatial parameter carries out intraframe coding;
Step C3, carries out interframe encode to spatial parameter, generates three-dimensional audio encoding code stream, and coded method is that difference is compiled
Code;
Described decoding process comprises the following steps;
Step D1, carries out decoding inter frames to spatial parameter, and coding/decoding method is differential decoding;
Step D2, carries out intraframe decoder to spatial parameter, it is achieved as follows, and spatial parameter is carried out intraframe decoder;To in frame
Decoded spatial parameter carries out inverse quantization;Reduce original spatial parameter
Step D3, by frequency domain presentation S of audio signal ' (n, f) transforms to time domain, and the time domain obtaining audio signal expresses s '
T (), (n is f) that (n, f) signal after encoding and decoding, described s ' (t) is that s (t) is after encoding and decoding to S to the S ' described in contracting
Signal;The time domain of the audio signal comprising n object expresses s ' (t) and step D2 gained spatial parameterAnd it is former
(n f) constitutes the sound of the decoded three-dimensional audio comprising n object to numbering Index of audio object belonging to the spatial parameter begun
Frequently the numbering of audio object belonging to signal, spatial parameter and spatial parameter.
Further, in described step C2, it it is the space to the different frequency bands belonging to same audio object in same frame
Parameter clusters, i.e. identical for n, and (n, value f) is identical but spatial parameter that f is different for IndexGather
Class, generates the spatial parameter after cluster
Further, in described step D2, it is by the difference of the most clustered same audio object belonging to same frame
The spatial parameter of frequency bandMap to they corresponding frequency bands, be reduced into original spatial parameter
Further, in described step C2, to the spatial parameter after clusterQuantify, described amount
Change is that perception quantifies or directly quantifies;To quantify after spatial parameter carry out intraframe coding, described coding be perceptual coding or
Direct coding.
Further, in described step D2, spatial parameter is carried out intraframe decoder, described decoding be perception decoding or
Directly decode;Spatial parameter after intraframe decoder carries out inverse quantization, and described inverse quantization is aimed at the inverse that perception quantifies
Change or be directed to the inverse quantization directly quantified.
A kind of coding/decoding system for improving three-dimensional audio spatial parameter compression ratio, including encoder;
Described encoder includes following module:
Time-frequency conversion module, the three-dimensional sound signal including comprising n object for input, three-dimensional audio spatial parameter with
And the numbering of audio object belonging to spatial parameter, three-dimensional audio time-domain signal is transformed to frequency domain, specifically sets three-dimensional audio
Time-domain signal is s (t), and described s (t) includes s1(t)、s2(t)、sk(t)…、sK(t), the spatial parameter of three-dimensional audioDescribedIncludingEmpty
Between audio object belonging to parameter numbered Index (n, f);Time-domain signal s (t) of three-dimensional audio is transformed to frequency domain, obtains
(n, f), (n f) includes S to described S to the frequency-region signal S of three-dimensional audio1(n,f)、S2(n,f)、Sk(n,f)…、SK(n,f);Its
In, skT () is that the time domain of kth aeoplotropism audio signal is expressed, t express time;Sk(n f) is kth aeoplotropism audio signal
Frequency domain presentation;Representing the spatial parameter that kth aeoplotropism audio signal is corresponding, θ is horizontal angle,For height
Angle, r is distance side information;The value of k is 1,2 ..., K, K are the sum of original aeoplotropism audio signal;Index (n, f)
Value numbering of audio object belonging to spatial parameter;N represents frame index, and f represents frequency indices;
Intraframe coding module, for carrying out intraframe coding, including for belonging in same frame to the spatial parameter of input
The spatial parameter of the different frequency bands of same audio object clusters;To the spatial parameter after clusterQuantify;
Spatial parameter after quantifying is carried out intraframe coding;
Inter-coding module, carries out interframe encode to spatial parameter, generates three-dimensional audio encoding code stream, and coded method is for poor
Coded;
Described decoder includes with lower module:
Decoding inter frames module, for spatial parameter is carried out decoding inter frames, coding/decoding method is differential decoding;
Intraframe decoder module, for spatial parameter is carried out intraframe decoder, solves including in spatial parameter carries out frame
Code;Spatial parameter after intraframe decoder is carried out inverse quantization;Reduce original spatial parameter
Time-frequency inverse transform block, for by frequency domain presentation S of audio signal ' (n f) transforms to time domain, obtains audio signal
Time domain express s ' (t), (n is f) that (n, f) signal after encoding and decoding, described s ' (t) is s (t) warp to S to the S ' described in contracting
Cross the signal after encoding and decoding;The time domain of the audio signal comprising n object expresses s ' (t) and step D2 gained spatial parameterAnd numbering Index of audio object belonging to original spatial parameter (n f) constitutes the decoded n that comprises right
The numbering of audio object belonging to the audio signal of the three-dimensional audio of elephant, spatial parameter and spatial parameter.
Further, described intraframe coding module includes cluster module, and described cluster module is in same frame
The spatial parameter of the different frequency bands belonging to same audio object clusters, i.e. identical for n, Index (n, value f) identical but
The spatial parameter that f is differentCluster, generate the spatial parameter after cluster
Further, described intraframe decoder module includes recovery module, and described recovery module is for by the most clustered
The spatial parameter of different frequency bands of the same audio object belonging to same frameMap to they corresponding frequency bands,
It is reduced into original spatial parameter
Further, described intraframe coding module includes quantization modules, and described quantization modules is after to cluster
Spatial parameterQuantifying, described quantization is that perception quantifies or directly quantifies;Spatial parameter after quantifying is entered
Row intraframe coding, described coding is perceptual coding or direct coding.
Further, described intraframe decoder module includes inverse quantization module, and described inverse quantization module is for space
Parameter carries out intraframe decoder, and described decoding is perception decoding or directly decodes;Spatial parameter after intraframe decoder is carried out instead
Quantifying, described inverse quantization is aimed at the inverse quantization of perception quantization or is directed to the inverse quantization directly quantified.
The invention has the beneficial effects as follows: present invention different frequency bands based on sound source same in same frame has identical space to join
Number, at coding side by spatial parameter cluster, spatial parameter quantization, spatial parameter intraframe coding, then carries out spatial parameter frame
Between differential coding, further compress three-dimensional audio space parameters bit rate, improve spatial parameter compression ratio.Decoding end is to three-dimensional sound
Frequency code stream is decoded, and carries out inter-frame difference decoding including to spatial parameter, and spatial parameter intraframe decoder, after intraframe decoder
Spatial parameter carries out inverse quantization, and is mapped by the spatial parameter of cluster, obtains the audio signal of three-dimensional audio, spatial parameter
And the numbering of audio object belonging to spatial parameter.Therefore, the present invention, by encoding and decoding in increase frame, solves the most existing
Spatial parameter compression method in do not consider the defect of redundancy in spatial parameter frame, can compress three-dimensional audio space ginseng further
Number bit rate, improves spatial parameter compression ratio.
Accompanying drawing explanation
Fig. 1 is the flow chart of the coding side of the embodiment of the present invention;
Fig. 2 is the flow chart of the decoding end of the embodiment of the present invention.
Detailed description of the invention
(wherein step C1 to step C3 is encoded to describe technical solution of the present invention in detail below in conjunction with drawings and Examples
Journey, step D1 to step D3 is decoding process).
See Fig. 1, the coding side execution below scheme of the embodiment of the present invention:
Step C1, transforms to frequency domain by time-domain signal s (t) of three-dimensional audio, obtain three-dimensional audio frequency-region signal S (n,
f)。
The input of coding side is: three-dimensional sound signal, three-dimensional audio spatial parameter and the spatial parameter comprising n object
The numbering of affiliated audio object.The time domain of the audio signal of three-dimensional audio is expressed as s (t), and s (t) is by s1(t)、s2(t)、…、sK
T () is constituted, t express time;The spatial parameter of three-dimensional audio, namely the spatial parameter that each time frequency point is correspondingByConstitute;The numbering of audio object belonging to spatial parameter, uses Index
(n f) expresses.Wherein, skT () is that the time domain of kth aeoplotropism audio signal is expressed,Represent kth aeoplotropism
The spatial parameter that audio signal is corresponding, spatial parameter is by direction parameter (horizontal angle θ, elevation angle) and distance parameter r composition.K's
Value is 1,2 ..., K, K are the sum of original aeoplotropism audio signal.
The time-domain signal of three-dimensional audio is transformed to frequency domain, time-domain signal s (t) of three-dimensional audio can be used Fu in short-term
In leaf transformation (STFT) transform to frequency domain, (n, f), (n, f) by S for S to obtain the frequency-region signal S of three-dimensional audio1(n,f)、S2(n,
f)、…、SK(n,f).Wherein, Sk(n, f) is the frequency domain presentation of kth aeoplotropism audio signal, and n represents frame index, and f represents frequency
Rate indexes.When being embodied as, it is possible to use the additive method such as MDCT or Hilbert Huang to convert.
K=8, f=1,2 in embodiment ..., 40.8 aeoplotropism audio signals s1(t)、s2(t)、…、s8The frequency domain of (t)
Signal is (S1(n,f),S2(n,f),…,S8(n, f)), the spatial parameter of they correspondences isAnd the numbered Index of object belonging to these spatial parameters (n, f).
Step C2, carries out intraframe coding to spatial parameter, when embodiment carries out step C3, specifically performs following steps:
C21: the spatial parameter of the different frequency bands belonging to same audio object in same frame is clustered, i.e. for n phase
With, Index, (n, value f) is identical but spatial parameter that f is differentCarry out
Cluster, generates the spatial parameter after cluster
C22: to the spatial parameter after clusterQuantify, permissible
It is that perception quantifies or directly quantifies;
C23: the spatial parameter after quantifying is carried out intraframe coding, can be perceptual coding or direct coding;
Step C3, carries out interframe encode to spatial parameter, generates three-dimensional audio encoding code stream, and embodiment carries out step C3
Time, coded method is differential coding.
See Fig. 2, the decoding end execution below scheme of the embodiment of the present invention:
Step D1, carries out decoding inter frames to spatial parameter, and when embodiment carries out step D1, coding/decoding method is differential decoding.
Step D2, carries out intraframe decoder to spatial parameter, when embodiment carries out step D2, specifically performs following steps:
D21: spatial parameter is carried out intraframe decoder can be perception decoding or directly decodes;
D22: the spatial parameter after intraframe decoder is carried out inverse quantization, can be aimed at perception quantify inverse quantization or
It is directed to the inverse quantization directly quantified;
D23: by the spatial parameter of the different frequency bands of the most clustered same audio object belonging to same frame Map to they corresponding frequency bands, be reduced into original spatial parameter
Step D3, by frequency domain presentation S of audio signal ' (n, f) transforms to time domain, and the time domain obtaining audio signal expresses s '
T (), (n is f) that (n, f) signal after encoding and decoding, s ' (t) is the s (t) signal after encoding and decoding to S to S ';Comprise n individual right
The time domain of the audio signal of elephant expresses s ' (t) and step D2 gained spatial parameterAnd original spatial parameter institute
(n, f) constitutes the audio signal of the decoded three-dimensional audio comprising n object to numbering Index of genus audio object, and space is joined
The numbering of audio object belonging to number and spatial parameter.Different configuration of speaker or earphone can be used accordingly when being embodied as
Rebuild three-dimensional audio sound field, the most reducible original three-dimensional audio.
Embodiment by after encoding and decoding 8 aeoplotropism audio signals (S '1(n,f),S’2(n,f),…,S’8(n, f)) converts
To time domain, obtain 8 aeoplotropism audio signals s '1(t),s’2(t),…,s’8(t) with decoded spatial parameter And numbering Index of audio object belonging to original spatial parameter
(n, f) constitutes the audio signal of the decoded three-dimensional audio comprising n object, audio frequency belonging to spatial parameter and spatial parameter
The numbering of object.The present embodiment uses earphone to realize the band playback apart from the three-dimensional sound signal of side information, in order to realize ear
The three-dimensional audio of machine is reappeared, and needs with related transfer function (HRTF) storehouse to the end, and PKU&IOA HRTF storehouse is to far field and near field all
Measuring, the resolution that distance r changes to 160cm, horizontal angle and elevation angle from 20cm is 5 respectively0With 100, we select
PKU&IOA HRTF storehouse completes to have carried out the three-dimensional audio of frame data compression and interframe compression and rebuilds.
By Experimental comparison, add the three-dimensional audio compression method of intraframe coding than the three of original only interframe encode
The compression effectiveness of dimension audio compression method is good, and compression ratio is higher and reconstruction audio quality is still kept.Owing to adding in frame
Coding, can eliminate redundancy in frame, and therefore this method improves three dimensions on the basis of ensureing to rebuild three-dimensional audio quality
Compression of parameters rate, reduces spatial parameter bit rate.
Method provided by the present invention can use software engineering to realize automatically and run, it is possible to be embodied as corresponding modularity system
System.A kind of parametric codec system for improving three-dimensional audio spatial impression distance perspective that the present invention provides, including encoder and
Decoder, described encoder includes following module,
Time-frequency conversion module, the three-dimensional sound signal including comprising n object for input, three-dimensional audio spatial parameter with
And the numbering of audio object belonging to spatial parameter, three-dimensional audio time-domain signal is transformed to frequency domain, specifically sets three-dimensional audio
Time-domain signal is s (t), and described s (t) includes s1(t)、s2(t)、sk(t)…、sK(t), the spatial parameter of three-dimensional audioDescribedIncluding
The numbered Index of audio object belonging to spatial parameter (n, f);Time-domain signal s (t) of three-dimensional audio is transformed to frequency domain,
To the frequency-region signal S of three-dimensional audio, (n, f), (n f) includes S to described S1(n,f)、S2(n,f)、Sk(n,f)…、SK(n,f);
Wherein, skT () is that the time domain of kth aeoplotropism audio signal is expressed, t express time;Sk(n f) is kth aeoplotropism audio frequency letter
Number frequency domain presentation;Representing the spatial parameter that kth aeoplotropism audio signal is corresponding, θ is horizontal angle,For height
Degree angle, r is distance side information;The value of k is 1,2 ..., K, K are the sum of original aeoplotropism audio signal;Index(n,f)
Value be the numbering of audio object belonging to spatial parameter;N represents frame index, and f represents frequency indices;
Intraframe coding module, for carrying out intraframe coding, including for belonging in same frame to the spatial parameter of input
The spatial parameter of the different frequency bands of same audio object clusters;To the spatial parameter after clusterQuantify;
Spatial parameter after quantifying is carried out intraframe coding;
Inter-coding module, carries out interframe encode to spatial parameter, generates three-dimensional audio encoding code stream, and coded method is for poor
Coded;
Described decoder includes with lower module:
Decoding inter frames module, for spatial parameter is carried out decoding inter frames, coding/decoding method is differential decoding;
Intraframe decoder module, for spatial parameter is carried out intraframe decoder, solves including in spatial parameter carries out frame
Code;Spatial parameter after intraframe decoder is carried out inverse quantization;Reduce original spatial parameter
Time-frequency inverse transform block, for by frequency domain presentation S of audio signal ' (n f) transforms to time domain, obtains audio signal
Time domain express s ' (t), (n is f) that (n, f) signal after encoding and decoding, described s ' (t) is s (t) warp to S to the S ' described in contracting
Cross the signal after encoding and decoding;The time domain of the audio signal comprising n object expresses s ' (t) and step D2 gained spatial parameterAnd numbering Index of audio object belonging to original spatial parameter (n f) constitutes the decoded n that comprises right
The numbering of audio object belonging to the audio signal of the three-dimensional audio of elephant, spatial parameter and spatial parameter.
Intraframe coding module includes cluster module, and described cluster module is for belonging to same audio object in same frame
The spatial parameter of different frequency bands cluster, i.e. identical for n, (n, value f) is identical but spatial parameter that f is different for IndexCluster, generate the spatial parameter after cluster
Intraframe decoder module includes recovery module, and described recovery module is for belonging to the same of same frame by the most clustered
The spatial parameter of the different frequency bands of one audio objectMap to they corresponding frequency bands, be reduced into original space
Parameter
Intraframe coding module includes quantization modules, and described quantization modules is for the spatial parameter after clusterQuantifying, described quantization is that perception quantifies or directly quantifies;Compile in spatial parameter after quantifying is carried out frame
Code, described coding is perceptual coding or direct coding.
Intraframe decoder module includes inverse quantization module, and described inverse quantization module solves in spatial parameter carries out frame
Code, described decoding is perception decoding or directly decodes;Spatial parameter after intraframe decoder is carried out inverse quantization, described inverse
Change and be aimed at the inverse quantization of perception quantization or be directed to the inverse quantization directly quantified.
Each module implements corresponding to method step, and it will not go into details for the present invention.
Specific embodiment described herein is only explanation for example to present invention.Technology neck belonging to the present invention
Described specific embodiment can be made various amendment or supplements or use similar mode to replace by the technical staff in territory
Generation, but without departing from present disclosure or surmount scope defined in appended claims.
Claims (10)
1. the decoding method being used for improving three-dimensional audio spatial parameter compression ratio, it is characterised in that include cataloged procedure
With decoding process, described cataloged procedure comprises the following steps:
Step C1, inputs belonging to three-dimensional sound signal, three-dimensional audio spatial parameter and the spatial parameter including comprising n object
The numbering of audio object, transforms to frequency domain by three-dimensional audio time-domain signal, specific as follows,
If the time-domain signal of three-dimensional audio is s (t), described s (t) includes s1(t)、s2(t)、sk(t)…、sK(t), three-dimensional audio
Spatial parameterDescribedIncluding The numbered Index of audio object belonging to spatial parameter (n, f);Time-domain signal s (t) of three-dimensional audio is become
Changing to frequency domain, (n, f), (n f) includes S to described S to obtain the frequency-region signal S of three-dimensional audio1(n,f)、S2(n,f)、Sk(n,
f)…、SK(n,f);Wherein, skT () is that the time domain of kth aeoplotropism audio signal is expressed, t express time;Sk(n f) is kth
The frequency domain presentation of individual aeoplotropism audio signal;Represent the spatial parameter that kth aeoplotropism audio signal is corresponding, θ
For horizontal angle,For elevation angle, r is distance side information;The value of k is 1,2 ..., K, K are original aeoplotropism audio signal
Sum;(n, value f) is the numbering of audio object belonging to spatial parameter to Index;N represents frame index, and f represents frequency indices;
Step C2, carries out intraframe coding to the spatial parameter of input, it is achieved as follows, to belonging to same audio object in same frame
The spatial parameter of different frequency bands clusters;To the spatial parameter after clusterQuantify;To the space after quantifying
Parameter carries out intraframe coding;
Step C3, carries out interframe encode to spatial parameter, generates three-dimensional audio encoding code stream, and coded method is differential coding;
Described decoding process comprises the following steps;
Step D1, carries out decoding inter frames to spatial parameter, and coding/decoding method is differential decoding;
Step D2, carries out intraframe decoder to spatial parameter, it is achieved as follows, and spatial parameter is carried out intraframe decoder;To intraframe decoder
After spatial parameter carry out inverse quantization;Reduce original spatial parameter
Step D3, by frequency domain presentation S of audio signal ' (n, f) transforms to time domain, and the time domain obtaining audio signal expresses s ' (t),
(n is f) that (n, f) signal after encoding and decoding, described s ' (t) is the s (t) letter after encoding and decoding to S to S ' described in contracting
Number;The time domain of the audio signal comprising n object expresses s ' (t) and step D2 gained spatial parameterAnd it is original
Spatial parameter belonging to numbering Index of audio object (n f) constitutes the audio frequency of the decoded three-dimensional audio comprising n object
The numbering of audio object belonging to signal, spatial parameter and spatial parameter.
The most according to claim 1 for improving the decoding method of three-dimensional audio compression of parameters rate, it is characterised in that:
In described step C2, it is that the spatial parameter to the different frequency bands belonging to same audio object in same frame clusters,
I.e. identical for n, (n, value f) is identical but spatial parameter that f is different for IndexCluster, after generating cluster
Spatial parameter
The most according to claim 1 for improving the decoding method of three-dimensional audio compression of parameters rate, it is characterised in that:
In described step D2, it it is the spatial parameter of different frequency bands by the most clustered same audio object belonging to same frameMap to they corresponding frequency bands, be reduced into original spatial parameter
The most according to claim 1 for improving the decoding method of three-dimensional audio compression of parameters rate, it is characterised in that:
In described step C2, to the spatial parameter after clusterQuantify, described quantization be perception quantify or
Directly quantify;Spatial parameter after quantifying is carried out intraframe coding, and described coding is perceptual coding or direct coding.
The most according to claim 1 for improving the decoding method of three-dimensional audio compression of parameters rate, it is characterised in that:
In described step D2, spatial parameter carrying out intraframe decoder, described decoding is perception decoding or directly decodes;To frame
Interior decoded spatial parameter carries out inverse quantization, and described inverse quantization is aimed at the inverse quantization of perception quantization or is directed to straight
Connect the inverse quantization of quantization.
6. one kind for improving the coding/decoding system of three-dimensional audio spatial parameter compression ratio, it is characterised in that: include encoder and
Decoder, described encoder includes following module,
Time-frequency conversion module, includes three-dimensional sound signal, three-dimensional audio spatial parameter and the sky comprising n object for input
Between the numbering of audio object belonging to parameter, three-dimensional audio time-domain signal is transformed to frequency domain, specifically sets the time domain of three-dimensional audio
Signal is s (t), and described s (t) includes s1(t)、s2(t)、sk(t)…、sK(t), the spatial parameter of three-dimensional audio
DescribedIncludingSpatial parameter
Belonging to audio object numbered Index (n, f);Time-domain signal s (t) of three-dimensional audio is transformed to frequency domain, obtains three-dimensional sound
(n, f), (n f) includes S to described S to the frequency-region signal S of frequency1(n,f)、S2(n,f)、Sk(n,f)…、SK(n,f);Wherein, sk(t)
Time domain for kth aeoplotropism audio signal is expressed, t express time;Sk(n f) is the frequency domain of kth aeoplotropism audio signal
Express;Representing the spatial parameter that kth aeoplotropism audio signal is corresponding, θ is horizontal angle,For elevation angle, r is
Distance side information;The value of k is 1,2 ..., K, K are the sum of original aeoplotropism audio signal;(n, value f) is empty to Index
Between the numbering of audio object belonging to parameter;N represents frame index, and f represents frequency indices;
Intraframe coding module, for carrying out intraframe coding, including for belonging to same in same frame to the spatial parameter of input
The spatial parameter of the different frequency bands of audio object clusters;To the spatial parameter after clusterQuantify;To amount
Spatial parameter after change carries out intraframe coding;
Inter-coding module, carries out interframe encode to spatial parameter, generates three-dimensional audio encoding code stream, and coded method is that difference is compiled
Code;
Described decoder includes with lower module:
Decoding inter frames module, for spatial parameter is carried out decoding inter frames, coding/decoding method is differential decoding;
Intraframe decoder module, for carrying out intraframe decoder to spatial parameter, including for spatial parameter is carried out intraframe decoder;Right
Spatial parameter after intraframe decoder carries out inverse quantization;Reduce original spatial parameter
Time-frequency inverse transform block, for by frequency domain presentation S of audio signal ' (n, f) transforms to time domain, obtain audio signal time
S ' (t) is expressed in territory, and (n is f) that (n, f) signal after encoding and decoding, described s ' (t) is that s (t) is through compiling to S to the S ' described in contracting
Decoded signal;The time domain of the audio signal comprising n object expresses s ' (t) and intraframe decoder module gained spatial parameterAnd numbering Index of audio object belonging to original spatial parameter (n f) constitutes the decoded n that comprises right
The numbering of audio object belonging to the audio signal of the three-dimensional audio of elephant, spatial parameter and spatial parameter.
The most according to claim 6 for improving the coding/decoding system of three-dimensional audio compression of parameters rate, it is characterised in that: described
Intraframe coding module include cluster module, described cluster module is for the difference belonging to same audio object in same frame
The spatial parameter of frequency band clusters, i.e. identical for n, and (n, value f) is identical but spatial parameter that f is different for IndexCluster, generate the spatial parameter after cluster
The most according to claim 6 for improving the coding/decoding system of three-dimensional audio compression of parameters rate, it is characterised in that: described
Intraframe decoder module include recovery module, described recovery module is for by the most clustered same audio frequency belonging to same frame
The spatial parameter of the different frequency bands of objectMap to they corresponding frequency bands, be reduced into original spatial parameter
The most according to claim 6 for improving the coding/decoding system of three-dimensional audio compression of parameters rate, it is characterised in that: described
Intraframe coding module include quantization modules, described quantization modules for cluster after spatial parameterCarry out
Quantifying, described quantization is that perception quantifies or directly quantifies;Spatial parameter after quantifying is carried out intraframe coding, described coding
It is perceptual coding or direct coding.
The most according to claim 6 for improving the coding/decoding system of three-dimensional audio compression of parameters rate, it is characterised in that: institute
The intraframe decoder module stated includes inverse quantization module, and described inverse quantization module for carrying out intraframe decoder, institute to spatial parameter
The decoding stated is perception decoding or directly decodes;Spatial parameter after intraframe decoder carries out inverse quantization, and described inverse quantization is
It is directed to the inverse quantization of perception quantization or is directed to the inverse quantization directly quantified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610541939.8A CN106023999B (en) | 2016-07-11 | 2016-07-11 | For improving the decoding method and system of three-dimensional audio spatial parameter compression ratio |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610541939.8A CN106023999B (en) | 2016-07-11 | 2016-07-11 | For improving the decoding method and system of three-dimensional audio spatial parameter compression ratio |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106023999A true CN106023999A (en) | 2016-10-12 |
CN106023999B CN106023999B (en) | 2019-06-11 |
Family
ID=57108555
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610541939.8A Active CN106023999B (en) | 2016-07-11 | 2016-07-11 | For improving the decoding method and system of three-dimensional audio spatial parameter compression ratio |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106023999B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020043935A1 (en) | 2018-08-31 | 2020-03-05 | Nokia Technologies Oy | Spatial parameter signalling |
WO2020089523A1 (en) * | 2018-11-01 | 2020-05-07 | Nokia Technologies Oy | Apparatus, methods and computer programs for encoding spatial metadata |
CN108206022B (en) * | 2016-12-16 | 2020-12-18 | 南京青衿信息科技有限公司 | Codec for transmitting three-dimensional acoustic signals by using AES/EBU channel and coding and decoding method thereof |
WO2021032909A1 (en) * | 2019-08-16 | 2021-02-25 | Nokia Technologies Oy | Quantization of spatial audio direction parameters |
RU2763155C2 (en) * | 2017-11-17 | 2021-12-27 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Apparatus and method for encoding or decoding the directional audio encoding parameters using quantisation and entropy encoding |
WO2022129672A1 (en) * | 2020-12-15 | 2022-06-23 | Nokia Technologies Oy | Quantizing spatial audio parameters |
CN115662448A (en) * | 2022-10-17 | 2023-01-31 | 深圳市超时代软件有限公司 | Method and device for converting audio data coding format |
US12020713B2 (en) | 2019-08-16 | 2024-06-25 | Nokia Technologies Oy | Quantization of spatial audio direction parameters |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20070025907A (en) * | 2005-08-30 | 2007-03-08 | 엘지전자 주식회사 | Method of effective bitstream composition for the parameter band number of channel conversion module in multi-channel audio coding |
CN101521013A (en) * | 2009-04-08 | 2009-09-02 | 武汉大学 | Spatial audio parameter bidirectional interframe predictive coding and decoding devices |
CN101609674A (en) * | 2008-06-20 | 2009-12-23 | 华为技术有限公司 | Decoding method, device and system |
US7974287B2 (en) * | 2006-02-23 | 2011-07-05 | Lg Electronics Inc. | Method and apparatus for processing an audio signal |
CN102177542A (en) * | 2008-10-10 | 2011-09-07 | 艾利森电话股份有限公司 | Energy conservative multi-channel audio coding |
CN103165134A (en) * | 2013-04-02 | 2013-06-19 | 武汉大学 | Coding and decoding device of audio signal high frequency parameter |
CN103400582A (en) * | 2013-08-13 | 2013-11-20 | 武汉大学 | Encoding and decoding method and system for multi-channel three-dimensional voice frequency |
CN103928030A (en) * | 2014-04-30 | 2014-07-16 | 武汉大学 | Gradable audio coding system and method based on sub-band space attention measure |
CN104064194A (en) * | 2014-06-30 | 2014-09-24 | 武汉大学 | Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency |
-
2016
- 2016-07-11 CN CN201610541939.8A patent/CN106023999B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20070025907A (en) * | 2005-08-30 | 2007-03-08 | 엘지전자 주식회사 | Method of effective bitstream composition for the parameter band number of channel conversion module in multi-channel audio coding |
US7974287B2 (en) * | 2006-02-23 | 2011-07-05 | Lg Electronics Inc. | Method and apparatus for processing an audio signal |
CN101609674A (en) * | 2008-06-20 | 2009-12-23 | 华为技术有限公司 | Decoding method, device and system |
CN102177542A (en) * | 2008-10-10 | 2011-09-07 | 艾利森电话股份有限公司 | Energy conservative multi-channel audio coding |
CN101521013A (en) * | 2009-04-08 | 2009-09-02 | 武汉大学 | Spatial audio parameter bidirectional interframe predictive coding and decoding devices |
CN103165134A (en) * | 2013-04-02 | 2013-06-19 | 武汉大学 | Coding and decoding device of audio signal high frequency parameter |
CN103400582A (en) * | 2013-08-13 | 2013-11-20 | 武汉大学 | Encoding and decoding method and system for multi-channel three-dimensional voice frequency |
CN103928030A (en) * | 2014-04-30 | 2014-07-16 | 武汉大学 | Gradable audio coding system and method based on sub-band space attention measure |
CN104064194A (en) * | 2014-06-30 | 2014-09-24 | 武汉大学 | Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency |
Non-Patent Citations (1)
Title |
---|
胡瑞敏等: "AVS-P10移动音频编解码标准与关键技术", 《电视技术》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108206022B (en) * | 2016-12-16 | 2020-12-18 | 南京青衿信息科技有限公司 | Codec for transmitting three-dimensional acoustic signals by using AES/EBU channel and coding and decoding method thereof |
RU2763155C2 (en) * | 2017-11-17 | 2021-12-27 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Apparatus and method for encoding or decoding the directional audio encoding parameters using quantisation and entropy encoding |
RU2763313C2 (en) * | 2017-11-17 | 2021-12-28 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Apparatus and method for encoding or decoding the directional audio encoding parameters using various time and frequency resolutions |
US11367454B2 (en) | 2017-11-17 | 2022-06-21 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for encoding or decoding directional audio coding parameters using quantization and entropy coding |
US11783843B2 (en) | 2017-11-17 | 2023-10-10 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for encoding or decoding directional audio coding parameters using different time/frequency resolutions |
WO2020043935A1 (en) | 2018-08-31 | 2020-03-05 | Nokia Technologies Oy | Spatial parameter signalling |
CN112970062A (en) * | 2018-08-31 | 2021-06-15 | 诺基亚技术有限公司 | Spatial parameter signaling |
JP7208385B2 (en) | 2018-11-01 | 2023-01-18 | ノキア テクノロジーズ オーユー | Apparatus, method and computer program for encoding spatial metadata |
WO2020089523A1 (en) * | 2018-11-01 | 2020-05-07 | Nokia Technologies Oy | Apparatus, methods and computer programs for encoding spatial metadata |
JP2022506581A (en) * | 2018-11-01 | 2022-01-17 | ノキア テクノロジーズ オーユー | Devices, methods and computer programs for encoding spatial metadata |
WO2021032909A1 (en) * | 2019-08-16 | 2021-02-25 | Nokia Technologies Oy | Quantization of spatial audio direction parameters |
US12020713B2 (en) | 2019-08-16 | 2024-06-25 | Nokia Technologies Oy | Quantization of spatial audio direction parameters |
WO2022129672A1 (en) * | 2020-12-15 | 2022-06-23 | Nokia Technologies Oy | Quantizing spatial audio parameters |
CN115662448A (en) * | 2022-10-17 | 2023-01-31 | 深圳市超时代软件有限公司 | Method and device for converting audio data coding format |
CN115662448B (en) * | 2022-10-17 | 2023-10-20 | 深圳市超时代软件有限公司 | Method and device for converting audio data coding format |
Also Published As
Publication number | Publication date |
---|---|
CN106023999B (en) | 2019-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106023999B (en) | For improving the decoding method and system of three-dimensional audio spatial parameter compression ratio | |
ES2899286T3 (en) | Temporal Envelope Configuration for Audio Spatial Encoding Using Frequency Domain Wiener Filtering | |
CN111226442B (en) | Method of configuring transforms for video compression and computer-readable storage medium | |
CN101120615B (en) | Multi-channel encoder/decoder and related encoding and decoding method | |
CN106415714A (en) | Coding independent frames of ambient higher-order ambisonic coefficients | |
CN106463121A (en) | Higher order ambisonics signal compression | |
HRP20140400T1 (en) | Decoding of multichannel aufio encoded bit streams using adaptive hybrid transformation | |
CN103108187B (en) | The coded method of a kind of 3 D video, coding/decoding method, encoder | |
CN104064194A (en) | Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency | |
JP2013513330A5 (en) | ||
TW200935403A (en) | Technique for encoding/decoding of codebook indices for quantized MDCT spectrum in scalable speech and audio codecs | |
CN101371447A (en) | Complex-transform channel coding with extended-band frequency coding | |
US11776552B2 (en) | Methods and apparatus for decoding encoded audio signal(s) | |
TWI702594B (en) | Backward-compatible integration of high frequency reconstruction techniques for audio signals | |
CN109448741A (en) | A kind of 3D audio coding, coding/decoding method and device | |
CN109887517A (en) | Method, decoder and the computer-readable medium that audio scene is decoded | |
TW201503113A (en) | Encoding device and method, decoding device and method, and program | |
JP2020074052A (en) | Backward compatible integration of harmonic converter for high frequency reconstruction of audio signal | |
WO2015096789A1 (en) | Method and device for use in vector quantization encoding/decoding of audio signal | |
CN103065634A (en) | Three-dimensional audio space parameter quantification method based on perception characteristic | |
JP6094322B2 (en) | Orthogonal transformation device, orthogonal transformation method, computer program for orthogonal transformation, and audio decoding device | |
CN104347077B (en) | A kind of stereo coding/decoding method | |
CN112365896A (en) | Object-oriented encoding method based on stack type sparse self-encoder | |
KR102546098B1 (en) | Apparatus and method for encoding / decoding audio based on block | |
CN105336334B (en) | Multi-channel sound signal coding method, decoding method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |