CN105895108B - Panoramic sound processing method - Google Patents

Panoramic sound processing method Download PDF

Info

Publication number
CN105895108B
CN105895108B CN201610157032.1A CN201610157032A CN105895108B CN 105895108 B CN105895108 B CN 105895108B CN 201610157032 A CN201610157032 A CN 201610157032A CN 105895108 B CN105895108 B CN 105895108B
Authority
CN
China
Prior art keywords
sound
sound object
dimensional coordinate
bits
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610157032.1A
Other languages
Chinese (zh)
Other versions
CN105895108A (en
Inventor
潘兴德
吴超刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panorama Sound (Beijing) Intelligent Technology Co.,Ltd.
Original Assignee
NANJING QINGJIN INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANJING QINGJIN INFORMATION TECHNOLOGY Co Ltd filed Critical NANJING QINGJIN INFORMATION TECHNOLOGY Co Ltd
Priority to CN201610157032.1A priority Critical patent/CN105895108B/en
Publication of CN105895108A publication Critical patent/CN105895108A/en
Application granted granted Critical
Publication of CN105895108B publication Critical patent/CN105895108B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

The invention discloses a panoramic sound processing method, which comprises the following steps: acquiring a sound object of a sound field space; establishing a three-dimensional coordinate system by taking the monitoring point as an origin, determining the origin of the three-dimensional coordinate value of the sound object, establishing the three-dimensional coordinate system, and determining the three-dimensional coordinate value of the sound object; dividing three-dimensional coordinate values of a sound object into a reference block and a prediction block in time order; directly coding the three-dimensional coordinate value of the reference block, and differentially coding the three-dimensional coordinate value of the prediction block; and determining the effective action area of the sound object according to the three-dimensional coordinate value of the sound object before encoding or after decoding. The invention provides a coordinate definition, motion trail and action region representation method of a sound object of a three-dimensional sound field during recording production, coding, decoding and rendering playback, and the method has the advantages of high coding efficiency, good sound expression and convenient sound production.

Description

Panoramic sound processing method
Technical Field
The invention relates to the technical field of sound coding, in particular to a panoramic sound processing method.
Background
With the rapid development of computing power and networks, in the application fields of movies, televisions, music, games, virtual reality, network videos and the like, the audio recording, down-mixing editing, encoding, decoding, rendering and playback technologies capable of representing a real three-dimensional sound field have important application values. "panoramic sound" is a visual description of a three-dimensional sound field.
At present, MPEG introduced the three-dimensional acoustic coding technology of MPEG H, Dolby introduced the Atmos panoramic acoustic coding technology, and both proposed the concept of sound object coding on the basis of the traditional multi-channel signal coding. The Dolby Atmos encodes three-dimensional coordinates (x, y, z) of a sound object by directly recording a three-dimensional motion trajectory of the sound object, and divides rendering and playback modes of the sound object into 9 rectangular regions. MPEG H does not encode the sound objects directly but adopts parametric stereo encoding technique to mix a plurality of sounds into one mono signal and encode the spatial perceptual information (phase, intensity and correlation) of each sound object; during decoding, the monaural sum signal is decoded, and then each sound object is restored by using the spatial perception information of the sound object.
In high quality applications, such as movies, Dolby Atmos can achieve higher sound quality than MPEG H. However, the spatial coordinate system, the coordinate representation method, the audio object coordinate encoding method, and the audio object partition representation method of Dolby Atmos have limitations such as low encoding efficiency, poor audio expression, and inconvenience in audio production.
In the process of describing a sound field, DolbyAtmos determines the origin of coordinates at the height position of a front left screen loudspeaker, wherein the X axis is from the origin to a right wall, the Y axis is from the origin to a rear wall, and the Z axis is from the origin to a roof; meanwhile, the room is divided into nine areas, i.e., a left-screen speaker area, a middle-screen speaker area, a right-screen speaker area, a left-wall speaker area, a right-wall speaker area, a rear-wall left-side speaker area, a rear-wall right-side speaker area, a left-roof speaker area, and a right-roof speaker area. The sound object is encoded with the position coordinates and the area division as above.
The origin of coordinates definition and the region of dolby atmos are separated, and the expression efficiency of sound objects such as point sound sources, plane sound sources and diffuse sound sources is not high. In addition, the loudspeaker area of Dolby Atmos and the effective active area of the actual sound object are not equivalent relationships, the latter being a more accurate description of the actual physical sound field.
From the perspective of sound coding efficiency, generally speaking, fewer code streams are contended on the premise of expressing complete information, so as to achieve higher coding efficiency. The existing coordinate definition method is to encode the coordinates with a fixed number of bits, for example, dolby atmos maps the position coordinates into a unit cube to obtain a decimal number in the range of [0,1], and then store the unsigned decimal number with 12 bits. The result of such encoding is that 12 bits are used for storage regardless of whether the position coordinates are changed, thereby generating a large amount of waste of code stream. In practice, the position of the sound object changes slowly in many cases, and there is a large redundancy between position coordinate data of adjacent frames or between adjacent blocks.
From the perspective of sound expression, the existing space area division is a fixed division mode, for example, dolby atmos divides the space into nine areas, i.e., a left screen speaker area, a middle screen speaker area, a right screen speaker area, a left wall speaker area, a right wall speaker area, a back wall left side speaker area, a back wall right side speaker area, a left roof speaker area, and a right roof speaker area. This lacks flexibility in positioning the sound object and leaves less room for selection, thereby making the sound appear less flexible.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects of the prior art, the invention provides a panoramic sound processing method which is high in coding efficiency and good in sound expression.
The technical scheme is as follows: the panoramic sound processing method comprises the following steps:
acquiring a sound object of a sound field space;
establishing a three-dimensional coordinate system by taking the monitoring point as an origin, and determining a three-dimensional coordinate value of the sound object;
dividing three-dimensional coordinate values of a sound object into a reference block and a prediction block in time order;
directly coding the three-dimensional coordinate value of the reference block, and differentially coding the three-dimensional coordinate value of the prediction block;
and determining the effective action area of the sound object according to the three-dimensional coordinate value of the sound object before encoding or after decoding.
Further perfecting the technical scheme, the origin is defined as the position with the same height as the center of the horizontal tangent plane of the sound field space and the center of the connecting line between the two ears of the sound recorder.
Further, the position track of the sound object is in units of frames, each frame comprises a plurality of blocks, the first block of each frame is the reference block, and the subsequent blocks are the prediction blocks.
Further, the three-dimensional coordinate value of each block of the sound object is (x)i,yi,zi),(xi,yi,zi) Is mapped as (pID)i,Axi,Ayi,Azi),pIDiIs a quadrant identifier, Axi、Ayi、AziIs the absolute value of the position coordinates.
Further, the reference block pair (pID)i,Axi,Ayi,Azi) Directly coded into (pID)j,Dxj,Dyj,Dzi),pIDjUsing 3 bits, Axi、Ayi、AziIn the range of 0,1]unsigned number Dx with internal coding of 4-16 bitsj、Dyj、Dzi(ii) a The prediction block has a difference value (Δ x) between the coordinate values of the current block and the previous blockk,Δyk,Δzk) Coding is carried out, wherein Δ xkIs the difference value, Δ y, between the x-axis coordinates of the current block and the previous blockkIs the difference value, Δ z, of the y-axis coordinates of the current block and the previous blockkIs the difference value of the z-axis coordinates of the current block and the previous block, and the difference value (Δ x)k,Δyk,Δzk) Is mapped to (pID)k,|Δxk|,|Δyk|,|Δzk|) wherein pIDkIs Δ xk,Δyk,ΔzkIs given by the quadrant identifier, | Δ xk|、|Δyk|、|Δzk| corresponds to Δ x, respectivelyk、Δyk、ΔzkAbsolute value of, | Δ xk|、|Δyk|、|ΔzkL is in [0, 2]]Unsigned number Dx with inner code of 4-17 bitsk、Dyk、Dzk
Further, the unsigned number Dxk、Dyk、DzkAdopting a DIF (n) coding method: taking unsigned position coordinates Dxk、Dyk、DzkAny one of DIFFata is compared with the size of (2^ n-1), and if the size is smaller than (2^ n-1), the DIFFata is stored by n bits; otherwise, setting all n bits to 1, and then following 2n bits; by analogy, until (2^ (kn) -1)>DIFFata (k is a positive integer).
Further, the unsigned position coordinate diffata is stored using 4 bits or 8 bits or 12 bits.
Further, the effective active area of the loudspeaker is conical
Figure GDA0002128724590000033
Wherein
Figure GDA0002128724590000034
The included angle and the range [0, 2 pi ] between the projection of a connecting line of the sound object and the original point on the xoy plane and the x axis, theta is the included angle between the connecting line of the sound object and the original point and the z axis, and gamma is used for describing the large opening degree of the conical surfaceSmall, defined as the angle between the generatrix of the cone and the central axis, in the range [0, π/2]。
Further, according to the coordinates (x) of the sound objecti,yi,zi) To obtain
Figure GDA0002128724590000031
Figure GDA0002128724590000032
Gamma is encoded as a 4-bit unsigned number B,
the mapping relation is as follows: γ ═ π/2 × B/(2^4-1), 0 ═ B ^ 2^ 4-1.
Has the advantages that: compared with the prior art, the invention has the advantages that: the invention introduces a three-dimensional sound technology of sound objects on the basis of the traditional multi-channel stereo sound field, provides a coordinate definition, a motion track and an action region representation method of the sound objects of the three-dimensional sound field during recording making, coding, decoding and rendering playback, introduces an effective action region of the sound objects, and represents the coordinates (x, y, z) and the effective action region of the sound objects by a cone
Figure GDA0002128724590000036
The point source can be represented by only three-dimensional coordinate values, the area source not only needs the three-dimensional coordinate values, but also needs area information, a point source sound object and an area source sound object are more effectively represented, higher-efficiency space representation and better sound field effect are realized, and a three-dimensional sound field is more perfect; the coding efficiency is high, the sound expression is good and the sound production is convenient.
The invention adopts a differential coding method, and the coding mode ensures that most sound objects can be coded by using less bits, for example, low-speed objects with the moving speed per hour not higher than 53km/h can be coded by using only 4 bits, thereby greatly saving the code stream space. For a few high-speed objects, the coding can be completed by expanding the high-speed objects in a DIF (n) mode. For low-speed objects, the code stream space is greatly saved, and for high-speed objects, although more bit numbers are used, the coding efficiency is improved on the whole considering that most objects are low-speed objects.
The invention provides a new dividing mode, a cone is obtained by taking a connecting line of an object and an original point as a central axis, the opening angle of the cone is adjustable, and the area covered by the cone is an effective action area of the object. The invention divides the effective action area of the object from the angle of the object, which is beneficial for the sound engineer to define the ideal effective action area, and can flexibly decide the selection of the loudspeaker according to the loudspeaker arrangement of the actual sound field and the adopted presentation algorithm when presenting the object, thus the formed area division can lead the reconstruction of the sound object to have expressive force.
From the perspective of sound production, the position of a sound object and the area division of a sound field space are flexibly defined, the sound object can be conveniently and randomly added on the basis of the traditional 3D stereo in the link of sound production, and the link of sound production or recording is full of flexibility.
Drawings
Fig. 1 is a schematic diagram of the area division of the loudspeaker of the present invention.
Detailed Description
The technical scheme of the invention is explained in detail in the following with the accompanying drawings.
Example 1: taking a cube as an example to describe the sound field space, a typical application is that the speakers are arranged at the boundary surfaces of the cube. Spatial coordinates of the sound object define: the origin of coordinates is defined as the center of the horizontal section, the position where the height is flush with the ears when the sound engineer is listening, and has the x-axis pointing to the right (wall), the y-axis pointing in front (typically screen), and the z-axis pointing vertically up (roof).
The sound field space is expressed by normalized coordinates, the maximum absolute coordinate values of an x axis, a y axis and a z axis are 1, the shorter side of the z axis is the ground, the normalized absolute coordinate value is a (a <1), and then the 8 coordinates of the sound field space are as follows:
(1, 1, 1) -represent the upper right hand corner in front of the region;
-representing the upper left corner in front of the region (-1, 1, 1);
(1, 1, -a) -represents the lower right corner in front of the region;
(-1, 1, -a) — representing the lower left corner in front of the region;
(1, -1, 1) -represent the upper right corner behind the region;
-representing the upper left corner behind the region (-1, -1, 1);
(1, -1, -a) -represents the lower right corner behind the region;
(-1, -1, -a) — representing the lower left corner behind the region.
The position track coding of the sound object is divided in units of frames, and each frame is further divided into a plurality of blocks. For compatibility with compression coding, 1024 samples are taken as a frame: at the sampling frequency of 48kHz, each block has 256 samples, and the time interval is 5.3 ms; at a sampling frequency of 96kHz, each block is 512 samples with a time interval of 5.3 ms. The position coordinates of a certain audio object in the i-th block are represented by (x (i), y (i), and z (i)), where i is 1, 2, 3, or 4. The position coordinates (x, y, z) of the sound object can be mapped to be described in four quantities (pID, Ax, Ay, Az), i.e. the quadrant identifier pID and the absolute values of the position coordinates Ax, Ay, Az (range of values 0, 1).
The quadrant identifier pID of the sound object is description of quadrant position of coordinates (x, y, z) corresponding to sign bit information (sign b (x), sign b (y), sign b (z)) of (x, y, z), wherein sign b (x) is sign bit operation
signb (x) ═ 0 when x > ═ 0;
signb (x) 1 when x < 0;
the quadrant identifier may take the following values:
TABLE 1 quadrant identifier pID Table
pID index Sign bit
0 (0,0,0)
1 (0,0,1)
2 (0,1,0)
3 (0,1,1)
4 (1,0,0)
5 (1,0,1)
6 (1,1,0)
7 (1,1,1)
The first block of each frame is a reference block, and the spatial position information of the sound object of the block is directly coded; the subsequent block is a prediction block, and the sound object spatial position information of the block is differentially encoded.
The first block encodes (pID, Ax, Ay, Az) directly, pID being in three bits, as shown in table 1; ax, Ay, Az are coded as 10-bit unsigned numbers Dx, Dy, Dz in the range [0,1], which satisfy the mapping relationship:
Figure GDA0002128724590000061
the subsequent blocks are differentially encoded, namely, the difference value (delta x, delta y, delta z) of the coordinate values of the current block and the previous block is encoded, wherein the delta x is the difference value of the x-axis coordinates of the current block and the previous block, the delta y is the difference value of the y-axis coordinates of the current block and the previous block, and the delta z is the difference value of the z-axis coordinates of the current block and the previous block; the following relationship is satisfied:
x(k)=x(k-1)+Δx,-2≤Δx≤2;
y(k)=y(k-1)+Δy,-2≤Δy≤2;
z(k)=z(k-1)+Δz,-2≤Δz≤2;
similar to the foregoing process, the difference values (Δ x, Δ y, Δ z) are also mapped to be described by four quantities (pID, | Δ x |, | Δ y |, | Δ z |). The pID is a quadrant identifier of (Δ x, Δ y, Δ z), | Δ x |, | Δ y | and | Δ z | correspond to absolute values of Δ x, Δ y, Δ z, respectively, and have a value range of [0, 2 ]. The pID uses three bits, as shown in table 1, | Δ x |, | Δ y | and | Δ z | can be mapped to 11-bit unsigned numbers Dx, Dy and Dz, which satisfy the mapping relationship:
Figure GDA0002128724590000063
Figure GDA0002128724590000064
and adopting a DIF (n) coding method for the unsigned numbers Dx, Dy and Dz, wherein the DIF (n) coding process comprises the following steps: firstly, comparing the size of an unsigned position coordinate DIFFata (DIFFata is any value in Dx, Dy and Dz) to be coded with the size of (2^ n-1), and if the unsigned position coordinate DIFFata is smaller than (2^ n-1), storing the unsigned position coordinate DIFFata by using n bits; otherwise, setting all n bits to 1, and then following 2n bits; and so on until (2^ (kn) -1) > DIDATA (k is a positive integer). Taking DIF (4) coding as an example, when DIF (4) coding is adopted for unsigned numbers Dx, Dy and Dz, k values which may appear are 1, 2 and 3, and the specific code stream structure is as follows:
in the differential encoding of the sound object, sufficient space is left for the difference of the coordinate values so that its storage accuracy is sufficient to coincide with that of the position coordinates in the first block. Then, the following formula is given:
Figure GDA0002128724590000071
where R is the half-length of the room, L is the displacement of the object in two adjacent blocks, and n is the number of bits used to store the difference value.
For a 10m square room, 4 bits are chosen to store this difference value first, and then it can store at most the following values:
then, L <0.0781 is solved, then the maximum speed of the sound object at this time is:
in practical recording, for most sound objects, the speed is mostly lower than 53km/h, and 4-bit storage is enough, which is very efficient. For sound objects moving at high speed, i.e. speeds greater than 53km/h, it is possible to extend to 8 bit storage. Even if it is fast as an airplane (assuming 100m/s), there are: l is 100 × 0.0053 is 0.53 (m); l is the distance between two adjacent blocks, and at this time, it can be seen that 8 bits can be fully accommodated due to L/2^8<5/2^ 10.
When the room is enlarged to 100 meters and stored by 10 bits, the precision is 50/2^10, and the precision of storing the residual is more sufficient. The following table defines the maximum sound image speed that can be stored at different bit and room sizes:
TABLE 2 object speeds that can be stored in different cases
10m 100m
4 bits 53km/h 530km/h
8 bits 848km/h 8480km/h
12 bits 13568km/h 135680km/h
Within a three-dimensional region, for the reconstruction of sound objects, there are some regions where sound objects are significant, while others may have no effect. From this point of view, for a certain sound object, the action region is divided, and only a part of the sound objects in the region are used, so that the calculation model and the mixing operation can be simpler. Typical sound objects are, besides point sources, also surface sources (which can be understood as point sources at a great distance) and diffuse sources (which can be diffuse sources, such as explosions, etc.), and the effective active area of the sound object is used to describe the surface source. The effective action area is actually provided for a sound recorder when the sound recorder monitors the sound, the sound recorder provides the ideal effective action area for the coder in a metadata mode, and the coder writes the ideal effective action area into the code stream according to the mode. Since the decoded three-dimensional coordinate values are only available at the decoding side, the effective operation region can be specified by the decoded three-dimensional coordinate values at the time of encoding so that the effective operation region before encoding and the operation region after decoding are made to coincide with each other. In fact, within a certain accuracy, the three-dimensional coordinate values before encoding and the three-dimensional coordinate values after decoding are very close to each other, and the difference is a quantization error of the three-dimensional coordinate values.
The division method is shown in fig. 1, when the position of the sound object is determined, a cone is unfolded by taking the connecting line of the origin and the sound object as an axis, and the origin is the vertex of the cone. The speaker enclosed by the cone is now an effective speaker.
For this division, for convenience of expression, in polar form, this division is represented by three parameters,
Figure GDA0002128724590000082
wherein
Figure GDA0002128724590000083
The azimuth angle of the sound object is composed,
Figure GDA0002128724590000084
the included angle between the projection of the connecting line of the object and the origin on the xoy plane and the x axis is shown as the range [0, 2 pi ], and theta is the included angle between the connecting line of the object and the origin and the z axis. And the third parameter gamma is used for describing the opening size of the conical surface and is defined as the included angle between the generatrix of the conical surface and the central axis in the range of [0, pi/2 ]]. Thus, the entire cone is determined, followed by threeThe region division of the dimensional space is completed.
For the
Figure GDA0002128724590000085
The position of the object has been defined previously and the position coordinates of the acoustic object are expressed as (x, y, z) and are thus easily found.
Pseudo code for the above sound object coding:
Figure GDA0002128724590000081
the method provides the representation methods of coordinate definition, motion trail, action region and the like of the sound object of the three-dimensional sound field during sound recording making, encoding, decoding and rendering playback. In the three-dimensional acoustic coding, it is necessary to encode the waveform of an acoustic object in addition to information such as the track and the action region of the acoustic object.
In view of the independence of sound objects from each other, high-quality sound object waveforms may be encoded independently, including various known lossless encoding and lossy audio encoding techniques, such as APE, FLAC, MP3, AAC, AVS, and so on. On the occasion of low code rate with high requirement on bandwidth, a parameter coding mode can also be adopted to mix a plurality of sound objects into a sum channel, and a parameter coding method is adopted to effectively represent a plurality of sound objects. Such parametric coding methods include sac (spatial audio coding), bbc (binary cup coding), MPEG Surround, and the like.
Since the method of encoding the sound waveform is mature, it will not be described herein.
As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A method of panoramic acoustic processing, comprising:
acquiring a sound object of a sound field space;
establishing a three-dimensional coordinate system by taking the monitoring point as an origin, and determining a three-dimensional coordinate value of the sound object;
dividing three-dimensional coordinate values of a sound object into a reference block and a prediction block in time order;
directly coding the three-dimensional coordinate value of the reference block, and differentially coding the three-dimensional coordinate value of the prediction block;
and determining the effective action area of the sound object according to the three-dimensional coordinate value of the sound object before encoding or after decoding.
2. The panoramic acoustic processing method of claim 1, wherein: the origin is defined as the position where the center of the horizontal tangent plane of the sound field space is equal to the center of the connecting line of the two ears of the sound recorder.
3. The panoramic acoustic processing method of claim 1, wherein: the position track of the sound object is in units of frames, each frame comprises a plurality of blocks, the first block of each frame is the reference block, and the subsequent blocks are the prediction blocks.
4. The panoramic acoustic processing method of claim 3, wherein: the three-dimensional coordinate value of each block of the sound object is (x)i,yi,zi),(xi,yi,zi) Is mapped as (pID)i,Axi,Ayi,Azi),pIDiIs a quadrant identifier, Axi、Ayi、AziIs the absolute value of the position coordinates.
5. The panoramic acoustic processing method of claim 4, wherein: the reference block pair (pID)i,Axi,Ayi,Azi) Directly coded into (pID)j,Dxi,Dyi,Dzi),pIDjWith 3 bits, Axi, Ayi, Azi are in the range [0,1]]Unsigned number Dx with internal coding of 4-16 bitsi、Dyi、Dzi(ii) a The prediction block has a difference value (Δ x) between the coordinate values of the current block and the previous blockk,Δyk,Δzk) Coding is carried out, wherein Δ xkIs the difference value, Δ y, between the x-axis coordinates of the current block and the previous blockkIs the difference value, Δ z, of the y-axis coordinates of the current block and the previous blockkIs the difference value of the z-axis coordinates of the current block and the previous block, and the difference value (Δ x)k,Δyk,Δzk) Is mapped to (pID)k,|Δxk|,|Δyk|,|Δzk|) wherein pIDkIs Δ xk,Δyk,ΔzkIs given by the quadrant identifier, | Δ xk|、|Δyk|、|Δzk| corresponds to Δ x, respectivelyk、Δyk、ΔzkAbsolute value of, | Δ xk|、|Δyk|、|ΔzkL is in [0, 2]]Unsigned number Dx with inner code of 4-17 bitsk、Dyk、Dzk
6. The panoramic acoustic processing method of claim 5, wherein: the unsigned number Dxi、Dyi、DziAnd Dxk、Dyk、DzkAdopting a DIF (n) coding method: taking Dxi、Dyi、DziOr Dxk、Dyk、DzkThe value of any one of them is compared with the size of (2^ n-1) in the unsigned position coordinate DIDATA, and if it is smaller than (2^ n-1), it is stored with n bits; otherwise, setting all n bits to 1; then follows the 2n bits and so on until (2^ (kn) -1)>DIFFata, k is a positive integer.
7. The panoramic acoustic processing method of claim 6, wherein: and storing the unsigned position coordinate DIDATA by adopting any unit of 4 bits, 8 bits, 10 bits and 12 bits.
8. The panoramic acoustic processing method of claim 6, wherein: the effective action area of the sound object is conical
Figure FDA0002161186130000021
Wherein
Figure FDA0002161186130000022
The included angle between the projection of the connecting line of the sound object and the original point on the xoy plane and the x axis is within a range of [0, 2 pi ], theta is the included angle between the connecting line of the sound object and the original point and the z axis, gamma is the included angle which describes the opening size of the conical surface and is defined as the generatrix of the conical surface and the central axis, and the range of [0, pi/2 ]]。
9. The panoramic acoustic processing method of claim 8, wherein: according to the coordinates (x) of the sound objecti,yi,zi) To obtain
Figure FDA0002161186130000023
Gamma is encoded as a 4-bit unsigned number B,
the mapping relation is as follows: γ ═ π/2 × B/(2^4-1), 0 ═ B ^ 2^ 4-1.
CN201610157032.1A 2016-03-18 2016-03-18 Panoramic sound processing method Active CN105895108B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610157032.1A CN105895108B (en) 2016-03-18 2016-03-18 Panoramic sound processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610157032.1A CN105895108B (en) 2016-03-18 2016-03-18 Panoramic sound processing method

Publications (2)

Publication Number Publication Date
CN105895108A CN105895108A (en) 2016-08-24
CN105895108B true CN105895108B (en) 2020-01-24

Family

ID=57014328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610157032.1A Active CN105895108B (en) 2016-03-18 2016-03-18 Panoramic sound processing method

Country Status (1)

Country Link
CN (1) CN105895108B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109801639B (en) * 2017-11-16 2020-12-18 全景声科技南京有限公司 Coding and decoding method of panoramic sound signal conforming to AC-3 format
CN112073748B (en) * 2019-06-10 2022-03-18 北京字节跳动网络技术有限公司 Panoramic video processing method and device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101051844A (en) * 2006-04-03 2007-10-10 扬智科技股份有限公司 Differential pulse code regulation coding method and its device of no built-in decoder
EP1936813A1 (en) * 2006-12-19 2008-06-25 Deutsche Thomson OHG Method and apparatus for reducing miscorrection in an extended Chase decoder
CN102656628A (en) * 2009-10-15 2012-09-05 法国电信公司 Optimized low-throughput parametric coding/decoding
CN104205790A (en) * 2012-03-23 2014-12-10 杜比实验室特许公司 Placement of talkers in 2d or 3d conference scene
CN104363555A (en) * 2014-09-30 2015-02-18 武汉大学深圳研究院 Method and device for reconstructing directions of 5.1 multi-channel sound sources

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101051844A (en) * 2006-04-03 2007-10-10 扬智科技股份有限公司 Differential pulse code regulation coding method and its device of no built-in decoder
EP1936813A1 (en) * 2006-12-19 2008-06-25 Deutsche Thomson OHG Method and apparatus for reducing miscorrection in an extended Chase decoder
CN102656628A (en) * 2009-10-15 2012-09-05 法国电信公司 Optimized low-throughput parametric coding/decoding
CN104205790A (en) * 2012-03-23 2014-12-10 杜比实验室特许公司 Placement of talkers in 2d or 3d conference scene
CN104363555A (en) * 2014-09-30 2015-02-18 武汉大学深圳研究院 Method and device for reconstructing directions of 5.1 multi-channel sound sources

Also Published As

Publication number Publication date
CN105895108A (en) 2016-08-24

Similar Documents

Publication Publication Date Title
EP3646619B1 (en) Mixed-order ambisonics (moa) audio data for computer-mediated reality systems
CN106463128B (en) Apparatus and method for screen-dependent audio object remapping
CN104969577B (en) Mapping virtual speakers to physical speakers
CN105898669B (en) A kind of coding method of target voice
KR20140000240A (en) Data structure for higher order ambisonics audio data
JP2002505058A (en) Playing spatially shaped audio
CN109410912B (en) Audio processing method and device, electronic equipment and computer readable storage medium
CN106796795A (en) The layer of the scalable decoding for high-order ambiophony voice data is represented with signal
CN111183658B (en) Rendering for computer-mediated reality systems
CN105392102A (en) Three-dimensional audio signal generation method and system for non-spherical speaker array
Rana et al. Towards generating ambisonics using audio-visual cue for virtual reality
CN104363555A (en) Method and device for reconstructing directions of 5.1 multi-channel sound sources
CN105895108B (en) Panoramic sound processing method
TW202105164A (en) Audio rendering for low frequency effects
EP4091344A1 (en) Apparatus and method for reproducing a spatially extended sound source or apparatus and method for generating a description for a spatially extended sound source using anchoring information
CN113542907B (en) Multimedia data transceiving method, system, processor and player
CN105895106B (en) Panoramic sound coding method
CN109801639B (en) Coding and decoding method of panoramic sound signal conforming to AC-3 format
CN108206022A (en) Utilize the codec and its decoding method of AES/EBU transmission three-dimensional acoustical signals
WO2022262576A1 (en) Three-dimensional audio signal encoding method and apparatus, encoder, and system
CN105898668A (en) Coordinate definition method of sound field space
US11184731B2 (en) Rendering metadata to control user movement based audio rendering
CN114630145A (en) Multimedia data synthesis method, equipment and storage medium
WO2021091769A1 (en) Signalling of audio effect metadata in a bitstream
TW201937944A (en) Apparatuses for converting an object position of an audio object, audio stream provider, audio content production system, audio playback apparatus, methods and computer programs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 210000 stone city, Gulou District, Nanjing, Jiangsu

Patentee after: WAVARTS TECHNOLOGIES CO.,LTD.

Address before: 210000 stone city, Gulou District, Nanjing, Jiangsu

Patentee before: NANJING QINGJIN INFORMATION TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220415

Address after: 101399 room 1001, building 1, No. 8, jinmayuan Third Street, Gaoliying Town, Shunyi District, Beijing

Patentee after: Beijing panoramic sound information technology Co.,Ltd.

Address before: 210000 stone city, Gulou District, Nanjing, Jiangsu

Patentee before: WAVARTS TECHNOLOGIES CO.,LTD.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221206

Address after: 100041 8th Floor, Zhongguancun Science Fiction Industry Entrepreneurship Center, Building 2, Shougang Park, No. 68, Jinanqiao, Shijingshan District, Beijing

Patentee after: Panorama Sound (Beijing) Intelligent Technology Co.,Ltd.

Address before: 101399 room 1001, building 1, No. 8, jinmayuan Third Street, Gaoliying Town, Shunyi District, Beijing

Patentee before: Beijing panoramic sound information technology Co.,Ltd.