CN108206983B - Encoder and method for three-dimensional sound signal compatible with existing audio and video system - Google Patents

Encoder and method for three-dimensional sound signal compatible with existing audio and video system Download PDF

Info

Publication number
CN108206983B
CN108206983B CN201611171106.3A CN201611171106A CN108206983B CN 108206983 B CN108206983 B CN 108206983B CN 201611171106 A CN201611171106 A CN 201611171106A CN 108206983 B CN108206983 B CN 108206983B
Authority
CN
China
Prior art keywords
channel
downmix
compatible
decoded
basic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611171106.3A
Other languages
Chinese (zh)
Other versions
CN108206983A (en
Inventor
潘兴德
陈笑天
吴超刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing panoramic sound information technology Co.,Ltd.
Original Assignee
NANJING QINGJIN INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANJING QINGJIN INFORMATION TECHNOLOGY Co Ltd filed Critical NANJING QINGJIN INFORMATION TECHNOLOGY Co Ltd
Priority to CN201611171106.3A priority Critical patent/CN108206983B/en
Publication of CN108206983A publication Critical patent/CN108206983A/en
Application granted granted Critical
Publication of CN108206983B publication Critical patent/CN108206983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

The invention discloses an encoder and a method of a three-dimensional sound signal compatible with the existing audio and video system, comprising a down-mixing and basic sound channel dividing module, a down-mixing and basic sound channel dividing module and a down-mixing and basic sound channel dividing module, wherein the down-mixing and basic sound channel dividing module is used for receiving a basic sound channel and/or a sound object, performing down-mixing and basic sound channel dividing operation according to a down-mixing scheme, and outputting down-; a compatible coding module for receiving the downmix compatible base channel and outputting downmix compatible base channel coded data; the extension coding module is used for receiving the sound object, the down mixing scheme, the extension basic sound channel and the basic sound channel dividing side information and outputting extension coding data; and the packaging module is used for receiving the down-mixing compatible basic sound channel coding data and the expansion coding data, and respectively packaging and outputting or mixing and packaging to output the three-dimensional sound data code stream. The invention can be better compatible with the coding and decoding methods of the existing audio and video system, and has the capability of three-dimensional sound coding and decoding.

Description

Encoder and method for three-dimensional sound signal compatible with existing audio and video system
Technical Field
The invention relates to the technical field of three-dimensional sound coding and decoding processing, in particular to a coder compatible with a three-dimensional sound signal of an existing audio and video system and a method thereof.
Background
Over the years, stereo, 5.1, 7.1 surround sound, etc. systems have been widely used, which can only present two-dimensional sound at most due to lack of sound height information. In the real world, three-dimensional sound is the most realistic presentation and expression mode of sound, and is a future development trend regardless of the nature, the art field or the audiovisual entertainment field.
In the existing system, the three-dimensional sound may be a multi-channel signal (e.g. 9.1, 11.1, 13.1, 22.2, etc.), may be a plurality of sound objects, or may be a combination of both. In a three-dimensional sound system, the multi-channel signal may be a surround sound signal, such as 5.1, 7.1, etc., or may be a multi-layer multi-channel signal (i.e., the multi-layer channel signal is distributed in different height planes). For example, some three-dimensional acoustic systems employ two planes of a middle layer and a top layer, some three-dimensional acoustic systems employ a three-layer system, and so on. Some three-dimensional sound systems have only multi-layer multi-channel signals, but no sound objects, such as the 22.2 three-dimensional sound system of SMPTE and the AURO 9.1 system. Some three-dimensional sound systems have both multi-layered multi-channel signals and sound objects, such as MPEG-H, Dolby Atmos and DTS: X systems. Of course, as an extreme example, the three-dimensional sound may be all the sound object signals.
Three-dimensional sound has not yet gained wide application as a new technology and system which just appears, and the popularization of the three-dimensional sound needs a long development stage. Due to the ubiquitous presence of stereo and surround sound systems, three-dimensional sound systems have only to the maximum extent compatible with the already ubiquitous stereo or surround sound systems, and have gained market acceptance and become mainstream.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a coder compatible with three-dimensional sound signals of the existing audio and video system and a method thereof aiming at the application of the three-dimensional sound systems of networks, televisions and the like.
The technical scheme is as follows: the three-dimensional sound encoder of the present invention includes: a down-mixing and basic sound channel dividing module for receiving the basic sound channel and/or the sound object, performing down-mixing basic sound channel dividing operation according to the down-mixing scheme, and outputting the down-mixing compatible basic sound channel, the expanded basic sound channel and the basic sound channel dividing side information; a compatible coding module for receiving the downmix compatible base channel and outputting downmix compatible base channel coded data; the extension coding module is used for receiving the sound object, the down mixing scheme, the extension basic sound channel and the basic sound channel dividing side information and outputting extension coding data; and the packaging module is used for receiving the down-mixing compatible basic sound channel coding data and the extension coding data, respectively packaging the down-mixing compatible basic sound channel coding data and the extension coding data to output a down-mixing compatible basic sound channel data code stream and an extension coding data code stream or respectively mixing the down-mixing compatible basic sound channel coding data and the extension coding data to package and output a three-dimensional sound data code stream.
Further perfecting the above technical solution, when a system adaptive selection downmix scheme is adopted, the downmix and basic channel dividing module includes a downmix module and a basic channel dividing module, the downmix module is configured to receive a basic channel and a sound object and output a downmix compatible basic channel and a downmix scheme, and the basic channel dividing module is configured to receive a downmix scheme generated by the basic channel and the downmix module and output extended basic channel and basic channel dividing side information.
Further, when a downmix scheme determined by external input is adopted, the downmix and base channel dividing module includes a downmix module and a base channel dividing module, and the downmix module is configured to receive a downmix scheme of a base channel, a sound object, and the external input, and output a downmix compatible base channel; the basic sound channel dividing module is used for receiving a basic sound channel and an externally input down mixing scheme and outputting expanded basic sound channel and basic sound channel dividing side information.
The extension coding module adopts lossy coding or lossless coding for coding, and when a downmix scheme determined by the lossy coding and external input is adopted, the downmix and basic channel dividing module comprises a downmix module and a basic channel dividing module, wherein the basic channel dividing module is used for receiving the downmix scheme of the basic channel and the external input and outputting the extension basic channel and basic channel dividing side information; the extension coding data output by the extension coding module is decoded by an extension decoding module, the decoded downmix scheme, the decoded extension base channel, the decoded sound object, the decoded base channel partition side information and the base channel are input to the downmix module, and the downmix module outputs the downmix compatible base channel.
Further, the downmix module downmixes the base channel and the sound object into a downmix compatible base channel according to a downmix scheme, the downmix compatible base channel signal is divided into a base channel downmix component and a sound object downmix component, and the base channel downmix component is divided into an extended base channel downmix component and a compatible base channel downmix component. The down-mixing module adopts a PAN system or a WFS system or an Ambisonic system or a down-mixing system with similar functions to execute down-mixing operation.
Further, the basic channel dividing module divides the basic channel into a compatible basic channel and an extended basic channel; the basic channel division scheme adopted by the basic channel division module is determined according to the channel configuration of the basic channel, a multi-channel system to be compatible and a down-mixing mapping function, such as the division method according to the corresponding down-mixing channel or the division method based on QR decomposition.
The base channel division scheme determined according to the division method of the corresponding downmix channel includes:
s11: let the set Sbedt be Sbed,ssrt ═ Ssr, set of base channel signals
Figure BDA0001183485150000031
Downmix compatible base channel signal set
Figure BDA0001183485150000032
fb (k, i) is the downmix mapping function;
s12: traversing the set Ssrt, and finding out a downmix compatible base channel k satisfying the following relation:
Figure BDA0001183485150000033
for all channels n belonging to Sc, fb (k, n) ═ 0;
if not, go to step S15;
s13: for the downmix compatible base channel k in step S12, go through the set Sbedt, find a base channel m where fb (k, m) is not 0 and fb (k, m) is reversible, if not, perform step S15;
s14: adding the base channel m found in the step S13 to the set Sc to obtain new Sc, removing the downmix compatible base channel k from Ssrt to obtain new Ssrt, removing all the base channels i satisfying fb (k, i) not being 0 from Sbedt to obtain new Sbedt, if neither the new Ssrt nor the new Sbedt is empty, jumping to the step S12, if the new Ssrt and the new Sbedt are empty, executing the step S15;
s15: sc or a subset of Sc as a compatible base channel set of the base channel set Sbed.
The basic channel division scheme determined according to the QR decomposition-based division method includes:
s21: let Sbbedc be Sbed, Sbed is the basic sound channel signal set;
s22: the downmix of Sbedc is expressed in the form of matrix operations: hs _ bedcop ═ HAc × Hbc, Hs _ bedcop is a matrix composed of downmix components formed by Sbedc downmix, Hbc is a matrix composed of base channel signals in Sbedc, and HAc is a matrix composed of Sbedc downmix coefficients;
s23: carrying out QR decomposition on the HAc to obtain the HAc Q HR, wherein Q is a unitary matrix of Ns multiplied by Ns, and HR is an upper triangular matrix of Ns multiplied by Nbc;
s24: assuming that M is min (Ns, Nbc), Ns is the channel number of the base channel downmix and Nbc is the channel number of Sbedc, if r (n, n) >0 is satisfied for each n 1 … M in HR, performing step S25; otherwise, for each n-1 … M, if r (n, n) in HR is 0, the nth channel in Sbedc is removed from Sbedc to form a new set Sbedc ', so that Sbedc is Sbedc' and step S22 is executed;
s25: a set of channels n 1 … M in Sbedc is reserved, which set or a subset of the set serves as a compatible base channel set for the base channel set Sbed.
Further, the compatible coding module and the extension coding module adopt the same coding format or different coding formats for coding.
Further, the compatible encoding module is an audio encoding module, and is configured to receive a downmix compatible base channel and output audio encoded data; and the packaging module is a TS/PS packaging module and is used for respectively packaging the audio coding data and the extended coding data and outputting an audio stream and a private stream which accord with TS/PS standards.
The method for carrying out three-dimensional sound coding by adopting the three-dimensional encoder comprises the following steps: downmixing the basic sound channel and/or the sound object into a compatible downmixed compatible basic sound channel according to a downmixing scheme, dividing the basic sound channel into an expanded basic sound channel and a compatible basic sound channel, and determining basic sound channel division side information; coding the sound object, the down mixing scheme, the expanded basic sound channel and the basic sound channel division side information to obtain expanded coded data; and coding the downmix compatible basic channel to generate downmix compatible basic channel coding data, and separately packaging or mixedly packaging the downmix compatible basic channel coding data and the extension coding data and then outputting the downmix compatible basic channel coding data and the extension coding data.
The three-dimensional sound decoder for decoding the code stream packet generated by the three-dimensional sound encoder is compatible with an audio and video system and comprises: the code stream separation module is used for receiving the down-mixing compatible basic sound channel data code stream, the expanded coding data code stream and/or the three-dimensional sound data code stream which are respectively input in a packaging mode and are input in a mixing mode, and separating and outputting the down-mixing compatible basic sound channel data and the expanded coding data; a compatible decoding module for receiving the downmix compatible base channel data and outputting a decoded downmix compatible base channel; an extension decoding module for receiving extension decoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, and a decoded sound object; a downmix module for receiving the decoded downmix compatible base channel, the decoded downmix scheme, the decoded extension base channel, the decoded base channel partition side information, the decoded sound object, and outputting a compatible base channel; a basic channel combination module for receiving the compatible basic channel, the decoded extended basic channel and the decoded basic channel dividing side information and outputting the basic channel; and the rendering module is used for receiving the basic sound channel and the decoded sound object and outputting the three-dimensional sound multichannel PCM code stream.
In order to better support audio and video code streams, the invention provides a three-dimensional sound decoder for outputting code stream decoding in accordance with TS/PS standards, which comprises a TS/PS stream unpacking module, a private stream decoding module and a sound source module, wherein the TS/PS stream unpacking module is used for receiving TS/PS streams, resolving audio streams and private streams from the TS/PS streams, outputting audio coded data by the audio streams, and outputting expanded coded data by the private streams; an audio decoding module for receiving audio decoding data and outputting a decoded downmix compatible base channel; an extension decoding module for receiving extension decoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, and a decoded sound object; a downmix module for receiving the decoded downmix compatible base channel, the decoded downmix scheme, the decoded extension base channel, the decoded base channel partition side information, the decoded sound object, and outputting a compatible base channel; a basic channel combination module for receiving the compatible basic channel, the decoded extended basic channel and the decoded basic channel dividing side information and outputting the basic channel; and the rendering module is used for receiving the basic sound channel and the decoded sound object and outputting the three-dimensional sound multichannel PCM code stream.
Further, the de-downmix module comprises: the decoded extension base channel signal and/or the down-mix component of the decoded sound object are removed from the decoded down-mix compatible base channel signal according to the decoded down-mix scheme, resulting in a decoded compatible base channel signal.
Further, the de-downmix module comprises the steps of:
1) calculating compatible base channel downmix components: removing the decoded extension base channel signal and/or the decoded downmix component of the sound object from the decoded downmix compatible base channel signal according to the decoded downmix scheme, resulting in a decoded compatible base channel downmix component;
2) inverse mapping: and inverse mapping the decoded compatible base channel downmix component to obtain a decoded compatible base channel signal.
Further, the downmix module comprises the following steps for performing a downmix base channel division operation with respect to a base channel division scheme determined by a division method of a corresponding downmix channel:
1) calculating compatible base channel downmix components: for each compatible base channel n-1 … Nbc, k-component (n) for its corresponding downmix compatible base channel, a decoded downmix compatible base channel signal is derived from
Figure BDA0001183485150000051
Removing the downmix components of the decoded extension base channel signal and/or the decoded sound object to obtain compatible base channel downmix components
Figure BDA0001183485150000052
2) Inverse mapping: for each compatible base channel n-1 … Nbc, k-compot (n) as its corresponding downmix compatible base channel, following the decoded downmix schemeInverse function fb of the middle mapping function fb (k, bctob (n)))-1(k, bctob (n)) downmix components on the decoded compatible base channel
Figure BDA0001183485150000053
Inverse mapping is performed to obtain a decoded compatible base channel
Figure BDA0001183485150000054
Figure BDA0001183485150000055
Further, for a downmix base channel dividing operation according to a base channel dividing scheme determined by a QR decomposition based dividing method, the downmix module comprises the steps of:
1) calculating compatible base channel downmix components: for each downmix compatible base channel k, deriving a downmix compatible base channel signal from the decoded downmix compatible base channel signal
Figure BDA0001183485150000056
Removing the downmix components of the decoded extension base channel signal and/or the decoded sound object to obtain compatible base channel downmix components
Figure BDA0001183485150000057
All of
Figure BDA0001183485150000061
The composition matrix Hs _ bedcop
Figure BDA0001183485150000062
2) Inverse mapping: according to the decoded inverse matrix invHRQ of the mapping coefficient of the compatible basic sound channel, Hs _ bedcop is inversely mapped to obtain Hbc
Hbc=invHRQ*Hs_bedcop
Line n of Hbc is the decoded compatible base channel signal
Figure BDA0001183485150000063
The method for decoding the code stream output by the three-dimensional sound encoder by the three-dimensional sound decoder comprises the following steps: acquiring a down-mixing compatible basic sound channel data code stream, an expanded coding data code stream and/or a three-dimensional sound data code stream which are respectively input in a packaging mode, and separating and outputting the down-mixing compatible basic sound channel and the expanded coding data; decoding the downmix compatible base channel data to obtain a decoded downmix compatible base channel; decoding the extension encoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, a decoded sound object, and a downmix compatible base channel to perform a downmix operation and output a compatible base channel; dividing side information according to the decoded basic sound channel, and combining the compatible basic sound channel and the decoded expanded basic sound channel to generate a basic sound channel; and performing three-dimensional sound rendering on the basic sound channel and the decoded sound object to generate a three-dimensional sound multi-channel PCM code stream.
The three-dimensional sound decoding method for decoding the output code stream conforming to the TS/PS specification comprises the following steps: acquiring an audio stream and a private stream which are respectively packaged and input, and outputting audio coded data and extension coded data; decoding the audio coded data to obtain a decoded downmix compatible base channel; decoding the extension encoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, a decoded sound object, and a downmix compatible base channel to perform a downmix operation and output a compatible base channel; dividing side information according to the decoded basic sound channel, and combining the compatible basic sound channel and the decoded expanded basic sound channel to generate a basic sound channel; and performing three-dimensional sound rendering on the basic sound channel and the decoded sound object to generate a three-dimensional sound multi-channel PCM code stream.
Has the advantages that: the invention provides a coder compatible with three-dimensional sound signals of the existing audio and video system and a method thereof aiming at the application of three-dimensional sound systems such as networks, televisions and the like. When the audio processor (hardware/software) only supports the processor of surround sound format or stereo format, only the downmix compatible base channel data needs to be sent to the surround sound or stereo processor (hardware/software), so that the same hearing experience as existing stereo, 5.1 or 7.1 surround sound can be obtained without loss of basic two-dimensional sound information; in the case of a three-dimensional acoustic processor (hardware/software), all data is transmitted, i.e. three-dimensional acoustic decoding and playback can be achieved. In the application occasion focusing on improving the quality of the three-dimensional sound, the invention also provides an improved three-dimensional sound coding method, which improves the coding quality of the three-dimensional sound and obtains better playing effect.
Drawings
FIG. 1 is a block diagram of a three-dimensional vocoding method according to embodiment 1;
FIG. 2 is a block diagram of a three-dimensional vocoding method according to embodiment 2;
FIG. 3 is a diagram illustrating a first downmix operation;
FIG. 4 is a diagram illustrating a second downmix operation;
FIG. 5 is a block diagram of a three-dimensional vocoding method according to embodiment 3;
FIG. 6 is a block diagram of a three-dimensional vocoding method according to embodiment 4;
FIG. 7 is a schematic diagram of a playing process of a digital movie produced by a three-dimensional sound encoding method (separately outputting encoded code streams);
fig. 8 is a schematic diagram of a playing process of a digital movie mixed by a three-dimensional sound coding method (mixed output coded stream);
FIG. 9 is a block diagram of a three-dimensional acoustic decoding method;
FIG. 10 is a diagram of a method of operation of a de-downmix module;
FIG. 11 is a block diagram of the method of example 6.
Detailed Description
The technical solution of the present invention is described in detail below with reference to the accompanying drawings, but the scope of the present invention is not limited to the embodiments.
Example 1:
the three-dimensional acoustic signal is composed of a multi-channel signal (i.e., a base channel) and/or a sound object signal (including object rendering description information and object audio data). In order to meet the requirement of backward compatible multi-channel (stereo, surround sound) programs or systems, the three-dimensional sound coding method provided by the invention downmixes a three-dimensional sound signal into compatible downmixed compatible basic channel data according to a downmixing scheme, and codes the downmixing scheme, the extended basic channel, the basic channel division side information and the sound object to obtain extended coded data.
The basic channel can be a stereo, 5.1, 7.1 and other multi-channel signals, and can also be a multi-layer multi-channel three-dimensional sound signal such as 9.1, 11.1, 13.1, 22.2 and other multi-layer multi-channel signals, and the data of each channel in the basic channel is represented as
Figure BDA0001183485150000071
Nb is the number of channels of the basic channel, when Nb is equal to 0, the three-dimensional sound signal does not contain multi-channel signals but only sound object signals, and all basic channel signals form a setSound object signal obj _ signal [ j]Including object rendering description information obj _ info [ j]And object signal
Figure BDA0001183485150000081
As the number of sound objects, a sound object is not included in the three-dimensional sound signal when M is equal to 0. The sound object signal may be a mono, stereo or multi-channel signal. The compatible downmix compatible base channel signal is represented as
Figure BDA0001183485150000082
Ns is the number of channels of the compatible stereo, surround sound system, and all the downmix compatible basic channel signals form a setData per channel or sound objectIs a time-series signal, i.e., PCM (pulse code modulation sampling) data, which when frame-divided, can represent a time-series signal within one frame.
When downmixing the three-dimensional sound signal into downmix compatible base channel signals according to a certain downmix scheme, each downmix compatible base channel signal comprises a base channel downmix component and a sound object downmix component:
the downmix scheme may be expressed as a set of mapping functions fo (k, j), fb (k, i). Set of downmix components of a base channel of an all downmix compatible base channel signal
Figure BDA0001183485150000086
Set of downmix components of a sound object for all downmix compatible base channel signals
Figure BDA0001183485150000087
1 … Ns }. The base channel downmix component is generated from the base channel signal and may be represented as:
Figure BDA0001183485150000088
fb (k, i) is a downmix mapping function when the ith base channel signal is downmixed to the kth channel of the downmix compatible base channel signal. The sound object downmix component is generated by downmix rendering the signal of each sound object according to a downmix compatible base channel system to be compatible, and can be expressed as:
Figure BDA0001183485150000089
fo (k, j) is a downmix mapping function when a kth channel of a jth object downmix compatible base channel is downmixed, and is related to information such as position coordinates of an object. The functions fo (k, j), fb (k, i) may be expressed as operations of gain, delay, etc., e.g., fb (k, i) (x (t)) ═ a (k, i) × (t- Δ (k, i)); more complex mapping functions such as WFS and HOA driver functions are also possible. Wfs (wave field synthesis) is a sound rendering method, which uses Kirchhoff-Helmholtz integration to recover the original sound field by using the set of echo speakers when solving the wave equation. Hoa (high order ambisonic) is also a sound rendering method, which uses a spherical wave stage number to realize the original sound field restoration by loudspeaker sets when solving the wave equation. The WFS and HOA methods may be referred to in particular in the literature "Coorison of high order organic and Wave Field Synthesis With resource to Spatial characterization objects in Time Domain" (Sascha spheres and Jens Ahrens,19th International Congress on optics major, 2-7 Sept.2007).
Can integrate the basic channel signals
Figure BDA0001183485150000091
Carry out the sound channel and divide into two sets Sbede and Sbedc, satisfy:
Figure BDA0001183485150000092
Figure BDA0001183485150000093
Nbe+Nbc=Nb
Sbede∪Sbedc=Sbed
Figure BDA0001183485150000094
accordingly, the base channel downmix component
Figure BDA0001183485150000095
Can also be divided into
Figure BDA0001183485150000096
And
Figure BDA0001183485150000097
the sum, which can be expressed as:
Figure BDA0001183485150000098
Figure BDA0001183485150000099
Figure BDA00011834851500000910
wherein, beta (m) (m 1 … Nbe) is the sequence number of the mth channel in Sbede in the base channel,
Figure BDA00011834851500000911
bctob (n) (n 1 … Nbc) is the sequence number of the nth channel in the base channel in Sbedc,
Figure BDA00011834851500000913
generated from the base channel downmix in the Sbed set, all
Figure BDA00011834851500000914
Composition set
Figure BDA00011834851500000915
If can Ssrbed according to the setcopAnd calculating a downmix mapping function fb (k, i), basic channel dividing side information (beta (m), bctob (n)), and the like to obtain all signals in Sbodc, and then, referring Sbodc as a compatible basic channel set, wherein the channel signals are channel signals
Figure BDA0001183485150000101
Referred to as a compatibility baseA base channel; sbede is called an extended base channel set corresponding to Sbedc,
Figure BDA0001183485150000102
referred to as the extension base channel.
Figure BDA0001183485150000103
Referred to as extended base channel downmix component; ssrbedcopReferred to as a set of compatible base channel downmix components,
Figure BDA0001183485150000104
referred to as compatible base channel downmix components.
For the base channel set Sbed, there may be various dividing manners and dividing criteria for the compatible base channel set Sbedc and the extended base channel set Sbede, and obviously, Sbede is Sbed,
Figure BDA0001183485150000105
the division of (a) conforms to the above definition; if Sbedc1 is a compatible base channel set for Sbed, then any subset Sbedct of Sbedc1 is also a compatible base channel set for Sbed.
If the three-dimensional sound signal is composed of a 5.1.4 two-layer multi-channel system (i.e. 5 channels in the middle layer, 1 subwoofer channel and 4 top channels) and 20 sound objects, it is a compatible 5.1 surround sound system, where 5.1 is processed independently as compatible base channel data, transmitted by three surround sound channels, and 4 top channels are processed together with 20 sound objects as extension base channels and packaged by other kinds of channels.
As shown in fig. 1, the three-dimensional audio encoder provided by the present invention includes a downmix and base channel dividing module, a compatible base channel, an extension encoding module, and a packing module.
Step 1.1) Down-mix and basic channel division Module
Down-mixing a three-dimensional sound program into a compatible stereo/multi-channel signal according to an external input or a system-adaptively selected down-mixing scheme (which may be expressed, for example, as a set of mapping functions fb (k, i), fo (k, j)), resulting in a stereo/multi-channel signalDownmix compatible base channel signalDividing the basic channel into two parts of the downmix compatible basic channel data and the extended basic channel, determining the side information (such as beta (m), bctob (n)) of the basic channel, and outputting the used downmix scheme.
"external input" generally refers to a downmix scheme manually selected by a sound engineer during down-mixing of a three-dimensional sound program, which allows the sound engineer to compare and select the downmix scheme by repeated monitoring; "system adaptive" refers to a scheme for intelligently selecting a downmix by a coding system, such as for a downmix of a base channel, downmixing a signal of a channel layer expressing height information in a multi-layer multi-channel system to surround sound channels of an intermediate layer in a position relationship (such as directly mixing a top front left channel into a left channel and directly mixing a top front right channel into a right channel) adaptively according to a relation of speaker layouts between different base channel systems and a downmix compatible multi-channel system, so as to form a downmix scheme (which can be expressed by a set of mapping functions fb (k, i)); for example, for the downmix of sound objects, a downmix scheme (which may be expressed by a set of mapping functions fo (k, j)) may be formed adaptively according to the object rendering description information (position coordinates of objects, etc.) and the rendering method WFS/HOA/PAN, etc.
This step 1.1 may not be required if the three-dimensional sound program does not include the base channel, or if there is a simple and clear one-to-one correspondence between the compatible base channel, the extended base channel and the base channel.
Step 1.2) extension coding Module
And carrying out extension coding on the extension basic channel, the basic channel division side information, the downmix scheme and the sound object to obtain extension coded data. If the three-dimensional program has no basic sound channel, the three-dimensional program may not include a part for encoding the extension basic sound channel and the basic sound channel division side information; if the three-dimensional program has no sound object, a portion encoding the sound object may not be included. If the same determinable base channel partition is followed in both encoding and decoding, the base channel partition side information may not be encoded. If the same determinable downmix scheme is followed in both encoding and decoding, the downmix scheme may not be encoded.
The encoding of the extension base channel and the encoding of the audio object may be performed by non-compression encoding or compression encoding, or may be vector encoding or scalar encoding, such as dolby AC3, MPEG-1layer3, MPEG-2/4AAC, MPEG H, dolby Atmos, AVS three-dimensional audio encoding, or the like. The encoding of the sound object comprises encoding of object rendering description information obj _ info [ j ] and object signals obj _ data [ j ]. When encoding the downmix scheme, the parameters a (k, i), Δ (k, i), etc. of the downmix mapping functions fb (k, i), fo (k, j) may be encoded lossy or lossless; the a (k, i) and Δ (k, i) may be vector-coded. When encoding the base channel division side information (for example, beta (m) (m-1 … Nbe), bctob (n) (n-1 … Nbc)), lossy or lossless encoding may be performed; vector coding may also be used.
Step 1.3) compatible coding module
And coding the downmix compatible base channel to obtain downmix compatible base channel coding data.
The coding of the downmix compatible base channel can adopt non-compressed coding or compressed coding method, and can be vector coding or scalar coding, such as coding formats of dolby AC3, MPEG-1layer3, MPEG-2/4AAC, AVS, etc., and in order to meet the compatibility requirement, the coding method used should meet the requirement of the compatible multi-channel system.
It should be noted that, the specific coding method of the compatible coding module and the specific coding method of the extended coding module may adopt the same coding format or different coding formats, as long as a corresponding decoder is selected during decoding.
Step 1.4) packing module
The packaging module has two packaging modes, wherein the first mode adopts a down-mixing compatible basic sound channel packaging module and an extended coding data packaging module to respectively package down-mixing compatible basic sound channel data and extended coding data, and the packaged data can be transmitted through channels such as an IP network, a broadcast television network, a mobile network and a USB; the packaging operation may be any suitable packaging protocol for broadcast network, IP network, mobile network, etc., such as MPEG TS, MPEG DASH, hls (http live streaming), etc.
And the second mode adopts a mixed packing module, all data are printed in the same code stream, and the output code stream is called a three-dimensional acoustic data code stream. The three-dimensional sound data can be transmitted through channels such as an IP network, a broadcast television network, a mobile network, a USB and the like; the packaging operation may adopt any packaging protocol suitable for broadcast network, IP network, mobile network, etc., such as MPEG TS, MPEG dash, HLS (HTTP Live Streaming), etc.
Example 2:
as shown in fig. 2, in the down-mixing scheme adopting the system adaptive selection, the three-dimensional vocoder provided by the present invention includes a down-mixing module, a basic channel dividing module, a compatible encoding module, an extension encoding module, and a packing module. The difference from embodiment 1 is that the downmix and base channel dividing module is divided into a downmix module and a base channel dividing module.
Step 2.1) Down-mixing Module
Down-mixing the three-dimensional sound program into a compatible stereo/multi-channel signal according to a down-mixing scheme selected by the system self-adaptation to obtain a down-mixing compatible basic channel signalAnd outputting the used downmix scheme.
As described previously, the downmix compatible base channel signal comprises the base channel downmix component and the sound object downmix component:
Figure BDA0001183485150000122
the base channel downmix component is generated from the signal of each base channel, and the base channel downmix component of each downmix compatible base channel Ki can be calculated as follows:
Figure BDA0001183485150000123
fb (k, i) is a downmix mapping function when the ith base channel signal is downmixed to the kth channel of the downmix multi-channel signal.
The sound object downmix component is generated by downmix rendering the signal of each sound object according to the multi-channel system to be compatible, and the sound object downmix component of each downmix compatible base channel k can be calculated according to the following formula:
Figure BDA0001183485150000124
fo (k, j) is a downmix mapping function when a kth channel of a jth object downmix multi-channel signal is downmixed. A schematic diagram of the downmix operation in this case is shown in fig. 3.
Base channel downmix component
Figure BDA0001183485150000131
And can be divided into extension base channel downmix components
Figure BDA0001183485150000132
And compatible base channel downmix component
Figure BDA0001183485150000133
Figure BDA0001183485150000134
The extended base channel downmix component may be calculated as follows, respectively
Figure BDA0001183485150000135
And compatible base channel downmix component
Figure BDA0001183485150000137
A schematic diagram of the downmix operation in this case is shown in fig. 4.
Step 2.2) basic sound channel dividing module
The base channel is divided into two parts, a compatible base channel and an extension base channel, according to the downmix scheme used by the downmix module (which may be expressed, for example, by a set of mapping functions fb (k, i), fo (k, j)), and base channel division side information (e.g., beta (m), bctob (n)) is determined.
This step 2.2 may not be required if the three-dimensional sound program does not include the base channel, or if there is a simple and clear one-to-one correspondence between the compatible base channel, the extended base channel and the base channel.
The basic channel division method will be described in detail below by taking two cases as examples.
1) The corresponding method for dividing the down-mixing sound channel comprises the following steps:
when the channel configuration of the base channel of the three-dimensional sound program, the multi-channel system to be compatible and the downmix mapping function fb (k, i) are all determined, the base channel may be divided according to the following division rule
Figure BDA00011834851500001313
Figure BDA0001183485150000139
Is divided into
Figure BDA00011834851500001310
Figure BDA00011834851500001311
Two parts are as follows:
for each n 1 … Nbc, there is k _ composition (n) and n _ invcomposition (k), where composition (n) is the sequence number of the downmix compatible base channel corresponding to the nth compatible base channel, having
And an inverse function exists for fb (k, bctob (n)).
In case that the above rule is satisfied, the decoded compatible base channel signal can be calculated as follows
Figure BDA0001183485150000142
For each n, there is k ═ compat (n)
Figure BDA0001183485150000144
I.e. collections
Figure BDA0001183485150000145
Can be according to a set
Figure BDA0001183485150000147
And the downmix mapping function fb (k, i), the base channel partition side information (beta (m), bctob (n)), and the like, and satisfies the definition of the compatible base channel set, so Sbedc is a compatible base channel set of Sbed. The characteristics in this case are: the compatible base channel n may be obtained by downmixing the compatible base channel downmix components of the corresponding downmix compatible base channel k
Figure BDA0001183485150000148
And performing inverse mapping, wherein the inverse mapping function is an inverse function of the downmix mapping function.
In this case, the base channel division side information includes beta (m) (m-1 … Nbe), bctob (n) (n-1 …)Nbc), compat (n ═ 1 … Nbc). Where (c) is the number of the downmix compatible base channel corresponding to the nth compatible base channel, fb (k, bctob (n)) has an inverse function fb-1(k, bctob (n)), e.g.
fb(k,i)(x(t))=a(k,i)*x(t-Δ(k,i))
fb-1(k,i)(x(t))=x(t+Δ(k,i))/a(k,i)
In the special case that the temperature of the liquid is lower than the set temperature,
fb(k,i)(x(t))=x(t)
fb-1(k,i)(x(t))=x(t)
for example, a base channel of 5.1.4(5.1+4H, 5.1 surround plus 4 top speakers), a compatible multichannel system of 5.1, suppose that the channel sequence of 5.1.4 is left channel (1), right channel (2), central channel (3), subwoofer channel (4), left surround channel (5), right surround channel (6), top front left channel (7), top front right channel (8), top back left channel (9), top back right channel (10), the channel sequence of 5.1 is left channel (1), right channel (2), central channel (3), subwoofer channel (4), left surround channel (5), right surround channel (6), the downmix algorithm proceeds according to the way that top front left channel (7), top back left channel (9) mix into direct left channel, top front right channel (8), top back right channel (10) mix into direct right channel, the channel mapping function simplifies to:
betob(m)=m+6
bctob(n)=n
the correspondence of the compatible base channel and the downmix compatible base channel, compat (n), is simplified as:
compat(n)=n
the downmix mapping function of the base channel is simplified to
fb(k,i)(x(t))=x(t)
At this time, the basic channel can be divided into two parts, namely an extended basic channel { top front left channel (7), top front right channel (8), top rear left channel (9), top rear right channel (10) } and a compatible basic channel { left channel (1), right channel (2), central channel (3), subwoofer channel (4), left surround channel (5) and right surround channel (6) }.
For example, the basic channel division may be performed as follows:
step 2.2 a.1): let the set Sbedt be Sbed,
Figure BDA0001183485150000151
Ssrt=Ssr。
step 2.2 a.2): and traversing the set Ssrt until finding a downmix compatible base channel k satisfying the following relation or the traversal is finished:
Figure BDA0001183485150000152
for all channels n belonging to Sc, fb (k, n) ═ 0.
If no such downmix compatible base channel k can be found, a jump is made to step 2.2 a.5.
Step 2.2 a.3): for the downmix compatible base channel k found in step 2.2a.2, one base channel m is selected from Sbedt such that fb (k, m) is not 0, and fb (k, m) is invertible, and base channel m is not in Se. If no such base channel m can be found, a jump is made to step 2.2 a.5.
Step 2.2 a.4): removing all base channels i which satisfy the condition that fb (k, i) is not 0 from the Sbedt to obtain new Sbedt; adding the basic channel m found in the step 2.2a.3) into Sc to obtain new Sc; the downmix compatible base channel k is removed from the Ssrt to obtain a new Ssrt. And if neither Ssrt nor Sbodt is empty, jumping to step 2.2 a.2.
Step 2.2 a.5): sc is the compatible base channel set.
2) QR decomposition-based partitioning method
If the downmix function can be expressed as fb (k, i) (x (t)) ═ a (k, i) × (x (t)) (a (k, i) is a real number), the downmix of the base channels can be expressed in the form of a matrix operation:
Figure BDA0001183485150000161
forming a downmix multi-channel signal matrix Hs _ bed
Figure BDA0001183485150000162
Figure BDA0001183485150000163
Form a base channel signal matrix Hb
Figure BDA0001183485150000164
a (k, i) forms a basic channel down-mixing coefficient matrix HA, and the HA is a Ns x Nb matrix
Figure BDA0001183485150000165
Hs_bed=HA*Hb
At this time, the division of the base channel may be performed as follows:
step 2.2 b.1): let Sbedc be Sbed,
step 2.2 b.2): by all channel signals in SbedcComposition matrix
Figure BDA0001183485150000167
Their corresponding a (k, i) form a compatible basic channel down-mixing coefficient matrix HAc, which is Ns-Nbc matrix
Figure BDA0001183485150000168
Their downmix forming a set of compatible base channel downmix components SsrbedcopAll of
Figure BDA0001183485150000169
Composition matrix
Figure BDA0001183485150000171
Satisfies the following conditions: hs _ bedcop ═ HAc × Hbc
Step 2.2 b.3): QR decomposition of HAc yields HAc Q HR, where Q is a unitary matrix of Ns × Ns and HR is an upper triangular matrix of Ns × Nbc.
Figure BDA0001183485150000172
Step 2.2 b.4): assuming that M is the minimum of Ns and Nbc, M ═ min (Ns, Nbc), if for each n ═ 1 … M satisfies r (n, n)>0 then execute 2.2 b.5; otherwise, the following operations are carried out: for each n-1 … M, if r (n, n) is 0, the nth channel in Sbedc is added
Figure BDA0001183485150000173
Removed from Sbedc, thus forming a new set Sbedc ', let Sbedc be Sbedc' and jump to step 2.2 b.2.
Step 2.2 b.5): only keeping n-1 … M sound channels in Sbodc to form new Sbodc, wherein Nbc-M is less than or equal to Ns; any subset of Sbedc may also be used as the new Sbedc.
Obtaining Sbodc through the operation of the above steps, and obtaining all the channel signals in Sbodc
Figure BDA0001183485150000174
Figure BDA0001183485150000175
Composition matrix
Figure BDA0001183485150000176
Their corresponding a (k, i) form a down-mixing coefficient matrix HAc, which is Ns Nbc matrix
Figure BDA0001183485150000177
Their downmix forming a set of compatible base channel downmix components SsrbedcopOf
Figure BDA0001183485150000178
Composition matrix
Figure BDA0001183485150000179
Satisfies the following conditions: hs _ bedcop ═ HAc × Hbc. The result of QR decomposition of HAc is HAc Q HR, where Q is a unitary matrix of Ns × Ns and HR is an upper triangular matrix of Ns × Nbc:
Figure BDA0001183485150000181
m Nbc ≦ Ns, r (n, n) >0 being satisfied for each n 1 … M.
It can be shown that Sbedc obtained by the above step operation is a compatible base channel set of Sbed:
Hs_bedcop=HAc*Hbc=Q*HR*Hbc
since Q is a unitary matrix of Ns × Ns, there exists an inverse matrix Q-1Is obtained by
HR*Hbc=Q-1*Hs_bedcop=QHs_bedcop
Since M min (Ns, Nbc) ═ Nbc and r (n, n) >0 for each n 1 … M, the first Nbc rows of HR can be used to construct a square matrix HRm
Figure BDA0001183485150000182
The square HRm is a triangular matrix with r (n, n) for each n-1 … M>0, so HRm there is an inverse matrix HRm-1. Get Q-1The first Nbc rows of the matrix invQm are constructed, and the requirement of the matrix invQm is met
HRm*Hbc=invQm*Hs_bedcop
Therefore, the method comprises the following steps:
Hbc=HRm-1*HRm*Hbc=HRm-1*invQm*Hs_bedcop
let invHRQ be HRm-1invQm of
Hbc=invHRQ*Hs_bedcop
I.e. all channel signals in the SbedcMay be based on the set SsrbedcopAnd fb (k, i), beta (m), bctob (n), which satisfy the definition of the compatible base channel set, so Sbedc is a compatible base channel set of Sbed. The characteristics in this case are: the matrix composed of the compatible base channels can be obtained by inverse mapping the matrix composed of all the compatible base channel downmix components, and the inverse mapping process can be expressed by the matrix invHRQ. In this case, the base channel division side information includes beta (m) (m ═ 1 … Nbe), bctob (n) (n ═ 1 … Nbc), invHRQ, and invHRQ, which is called a compatible base channel mapping coefficient inverse matrix and can be calculated from information such as fb (k, i), beta (m), bctob (n).
In a broader range, if fb (k, i) (x (t)) ═ a (k, i) × fb (k,1) (x (t)) and fb (k,1) there is an inverse function fb-1(k,1), the basic channel division may be performed according to the above-described QR decomposition-based division method, and in this case, after Hbc is obtained by matrix operation, fb is required for each decoded channel signal-1(k,1) to obtain a final decoded compatible base channel signal.
Example 3
As shown in fig. 5, the three-dimensional sound encoder provided by the present invention includes a downmix module, a basic channel dividing module, a compatible encoding module, an extension encoding module, and a packing module. The difference from example 2 is that the downmix scheme employs an external input.
Step 3.1) basic sound channel dividing module
The base channel is divided into two parts, a compatible base channel and an extension base channel, according to an externally input downmix scheme (which may be expressed, for example, by a set of mapping functions fb (k, i), fo (k, j)), and base channel dividing side information (e.g., beta (m), bctob (n)) is determined.
This step 3.1) may not be required if the three-dimensional sound program does not include the base channel, or if there is a simple and clear one-to-one correspondence between the compatible base channel, the extension base channel and the base channel.
Step 3.2) Down-mixing Module
Downmixing the three-dimensional sound program into a compatible stereo/multi-channel signal according to an externally input downmix scheme, which may for example be expressed as a set of mapping functions fb (k, i), fo (k, j), resulting in a downmix compatible base channel signal
As described previously, the downmix compatible base channel signal comprises the base channel downmix component and the sound object downmix component:
Figure BDA0001183485150000192
the base channel downmix component is generated from the signal of each base channel, and the base channel downmix component of each downmix compatible base channel k may be calculated as follows:
Figure BDA0001183485150000193
fb (k, i) is a downmix mapping function when the ith base channel signal is downmixed to the kth channel of the downmix multi-channel signal.
The sound object downmix component is generated by rendering the signal of each sound object according to the multi-channel system to be compatible, and the sound object downmix component of each downmix compatible base channel k can be calculated according to the following formula:
Figure BDA0001183485150000194
fo (k, j) is a downmix mapping function when a kth channel of a jth object downmix multi-channel signal is downmixed.
Base channel downmix component
Figure BDA0001183485150000195
And can be divided into extension base channel downmix components
Figure BDA0001183485150000201
And compatible base channel downmix component
Figure BDA0001183485150000202
Figure BDA0001183485150000203
The extended base channel downmix component may be calculated as follows, respectively
Figure BDA0001183485150000204
And compatible base channel downmix component
Figure BDA0001183485150000205
Figure BDA0001183485150000206
Figure BDA0001183485150000207
Example 4
When the extended encoding module employs lossy encoding, the three-dimensional vocoder may be further optimized to include an extended decoding module. As shown in fig. 6, the improved three-dimensional sound encoder includes a downmix module, a base channel dividing module, an extension encoding module, an extension decoding module, a compatible encoding module and a packing module; the extension decoding module decodes extension encoded data output by the lossy extension encoding module, and outputs the decoded downmix scheme, the decoded extension base channel, the decoded sound object, and the decoded base channel partition side information to the downmix module.
Since the decoded data of the extended base channel and the sound object are used in the downmix module, the improved three-dimensional sound coding method has the following features:
1. when the existing stereo or surround sound system is adopted to play the sound data obtained by the improved three-dimensional sound coding method, the quality of the down-mixing compatible basic sound channel is reduced to a certain extent. This is because the extension base channel downmixed to the "downmix compatible base channel" and the sound object data are encoded twice, resulting in degradation of the quality of these component sounds.
2. When the three-dimensional sound system is adopted to play sound data obtained by the improved three-dimensional sound coding method, the improved three-dimensional sound coding method can improve the coding quality of three-dimensional sound under the condition that the coding distortion of the compatible coding module is small. This is because with the improved three-dimensional acoustic coding method, the three-dimensional acoustic decoder introduces less new errors when de-downmixing, thus improving the quality of the compatible base channels in the three-dimensional acoustic signal.
Therefore, the improved three-dimensional sound coding method is suitable for the application occasions that the coding distortion of the compatible coding module is small enough and the emphasis is placed on improving the three-dimensional sound quality.
Example 5
The code stream manufactured by the three-dimensional sound coding method can be compatible with the existing stereo and surround sound player systems. When the audio processor (hardware/software) only supports processors in surround or stereo format, only the downmix compatible base channel data needs to be sent to the surround or stereo processor (hardware/software), the same listening experience as existing stereo, 5.1 or 7.1 surround sound can be obtained without loss of substantial two-dimensional sound information. In the case of a three-dimensional acoustic processor (hardware/software), all data is transmitted, i.e. three-dimensional acoustic decoding and playback can be achieved. The playing and sound processing processes of the code stream manufactured by the three-dimensional sound coding method provided by the invention are shown in fig. 7 and 8.
The server or the upper computer possesses and processes the code stream, typical servers such as a broadcast audio and video server, a streaming media server, a background server for requesting channels and the like, and typical upper computers such as a mobile phone, a computer, a PAD and the like connected with a digital earphone amplifier. In general, a server or a host computer provides three-dimensional sound audio data or transmits only downmix compatible base channel data according to a network condition and/or a support condition of an audio player for three-dimensional sound. The input source can adopt one or both of two packing modes according to convention, and the code stream separation module separates the downmix compatible basic channel data to the surround sound or stereo sound processor, or sends the complete data to the three-dimensional sound processor.
The three-dimensional sound decoder shown in fig. 9 includes a code stream separating module, a compatible decoding module, an extended decoding module, a de-downmix module, a basic channel combining module, and a three-dimensional sound rendering module;
step 5.1) code stream separation module
Reading parallel down-mixing compatible basic audio channel data code stream, extended coding code stream and/or three-dimensional audio data code stream from input sources such as a server and an upper computer, and separating the down-mixing compatible basic audio channel data and the extended coding data.
Step 5.2) compatible decoding module
Decoding the downmix compatible base channel data to obtain a decoded downmix compatible base channel signal
Step 5.3) extended decoding module
The extension encoded data is decoded to obtain a decoded sound object, a decoded extension base channel signal, a decoded downmix scheme and a decoded base channel partition side information. If the same determined downmix scheme, the base channel partition side information, is followed in encoding, decoding, the base channel partition side information, the downmix scheme may not be decoded, but generated according to a determined rule.
The process of extension decoding is the inverse of the aforementioned extension encoding.
Step 5.4) Down-mix removal Module
And inputting the expanded coded data to a de-downmix module through an expanded decoding module to obtain a decoded downmix scheme, decoded expanded basic channel data, decoded basic channel partition side information, and a decoded sound object and a downmix compatible basic channel. The de-downmix module performs an inverse process of the downmix module to obtain a de-extended base channel and a compatible base channel of the sound object information.
According to the decoded downmix scheme, the decoded extension base channel signal and the downmix component of the decoded sound object are removed from the decoded downmix compatible base channel signal, and inverse mapping is performed to obtain the decoded compatible base channel signal. The method comprises the following two steps:
step 5.4.1): computing compatible base channel downmix components
Down-mixing the decoded extension base channel signal and the decoded sound object according to the decoded down-mixing scheme, and removing down-mixing components of the decoded extension base channel signal and the decoded sound object from the decoded down-mixing compatible base channel signal to obtain a decoded compatible base channel down-mixing component;
step 5.4.2): inverse mapping
And inverse mapping the decoded compatible base channel downmix component to obtain a decoded compatible base channel signal.
If the process of inverse mapping is a simple pass-through relationship, i.e. the decoded compatible base channel signal is identical to the decoded compatible base channel signal, then step 5.4.2) is not required: and (5) inverse mapping.
The downmix process will be described in detail below by taking two cases corresponding to the encoding end as examples.
1) If the division method of the corresponding downmix channel is used in the encoding process:
at this time, the decoded base channel division side information includes beta (m) (m-1 … Nbe), bctob (n) (n-1 … Nbc), and compot (n) (n-1 … Nbc), and the decoded downmix scheme includes downmix mapping functions fb (k, i), fo (k, j).
Step 5.4 a.1): computing compatible base channel downmix components
As shown in fig. 11, for each compatible base channel n-1 … Nbc, k-compot (n) for its corresponding downmix compatible base channel, the decoded downmix compatible base channel signal is derived from
Figure BDA0001183485150000221
Removing the decoded extension base channel signal, the downmix component of the decoded sound object to obtain a compatible base channel downmix component
Figure BDA0001183485150000222
Figure BDA0001183485150000223
Figure BDA0001183485150000224
Wherein the base channel downmix component is extended
Figure BDA0001183485150000225
Down-mixing the decoded extended base channel signal according to the decoded down-mixing scheme to obtain:
Figure BDA0001183485150000226
sound object downmix component
Figure BDA0001183485150000227
Rendering the decoded sound object downmix in accordance with the decoded downmix scheme to:
Figure BDA0001183485150000231
step 5.4 a.2): inverse mapping
According to the inverse function fb of the mapping function fb (k, bctob (n)) in the decoded downmix scheme-1(k, bctob (n)) pair of compatible base channel downmix components
Figure BDA0001183485150000232
Inverse mapping is performed to obtain a decoded compatible base channel
Figure BDA0001183485150000233
Figure BDA0001183485150000234
Inverse function fb-1The operation of (k, bctob (n)) is the inverseAnd (5) mapping.
2) If the dividing method based on QR decomposition is used in the encoding process:
at this time, the decoded base channel division side information includes beta (m) (m-1 … Nbe), bctob (n) (n-1 … Nbc), and a compatible base channel mapping coefficient inverse matrix invHRQ, and the decoded downmix scheme includes downmix mapping functions fb (k, i), fo (k, j).
Step 5.4 b.1): computing compatible base channel downmix components
For each downmix compatible base channel k, deriving a downmix compatible base channel signal from the decoded downmix compatible base channel signal
Figure BDA0001183485150000235
Removing the decoded extension base channel signal, the downmix component of the decoded sound object to obtain a compatible base channel downmix component
Figure BDA0001183485150000236
Figure BDA0001183485150000237
Wherein the base channel downmix component is extended
Figure BDA0001183485150000239
Down-mixing the decoded extended base channel signal according to the decoded down-mixing scheme to obtain:
sound object downmix component
Figure BDA00011834851500002311
Rendering the decoded sound object downmix in accordance with the decoded downmix scheme to:
Figure BDA0001183485150000241
all of
Figure BDA0001183485150000242
Composition matrix
Figure BDA0001183485150000243
Step 5.4 b.2): inverse mapping
As described above, the downmix mapping function fb (k, i) at this time satisfies:
fb(k,i)(x(t))=a(k,i)
based on the decoded inverse matrix invHRQ of the compatible base channel mapping coefficients, Hbc is obtained by inverse mapping as follows
Hbc=invHRQ*Hs_bedcop
Line n of Hbc is the decoded compatible base channel signal
Figure BDA0001183485150000244
If the encoding end does not encode the inverse matrix invHRQ of the mapping coefficients of the compatible basic channel, the decoding end calculates by the same method as that in the basic channel division module of step 2.2) to obtain invHRQ.
It should be noted that the above expression is an expression of a mathematical relationship of inverse mapping, and there are various equivalent implementations and processes, for example, Hbc can also be obtained by the following process:
and constructing a compatible base channel downmix coefficient matrix HAc by the downmix coefficients a (k, i) of the compatible base channel, wherein HAc is a Ns Nbc matrix:
Figure BDA0001183485150000245
the result of QR decomposition of HAc is HAc Q HR, where Q is a unitary matrix of Ns × Ns and HR is an upper triangular matrix of Ns × Nbc:
Figure BDA0001183485150000246
m Nbc ≦ Ns, r (n, n) >0 being satisfied for each n 1 … M.
Since Q is a unitary matrix of Ns × Ns, there exists an inverse matrix Q-1To construct a matrix
QHs_bedcop=Q-1*Hs_bedcop
Since M min (Ns, Nbc) ═ Nbc and r (n, n) >0 for each n 1 … M, the first Nbc rows of HR can be used to construct a square matrix
The square HRm is a triangular matrix with r (n, n) for each n-1 … M>0, so HRm there is an inverse matrix HRm-1. The first Nbc lines of QHs _ bedcop can be taken to construct a square matrix QHs _ bedcop, and Hbc can be obtained by calculating as follows:
Hbc=HRm-1*QHs_bedcopm
line n of Hbc is the decoded compatible base channel signal
In a broader range, if fb (k, i) (x (t)) ═ a (k, i) × fb (k,1) (x (t)) and fb (k,1) there is an inverse function fb-1(k,1), after Hbc is obtained according to the above calculation, fb is required to be performed for each decoded channel signal-1(k,1) to obtain a final decoded compatible base channel signal.
Step 5.5: basic sound channel combined module
And combining the decoded compatible basic channel signal and the decoded extension basic channel signal according to the decoded basic channel division side information to obtain a decoded basic channel signal.
This operation of step 5.5) may not be performed if the three-dimensional program has no base channel signal, or if there is a simple one-to-one correspondence between the compatible base channel signal, the extension base channel signal and the base channel, or if the extension base channel already contains all base channels.
Step 5.6): three-dimensional sound rendering module
And receiving the compatible basic sound channel code stream, the decoded extended basic sound channel and the sound object data to perform three-dimensional sound rendering, and generating three-dimensional sound PCM data.
There is no precedence constraint between step 5.2) and step 5.3).
In a compatible stereo, surround sound system, only step 5.1) is performed resulting in a decoded downmix compatible base channel signal.
Example 6
In various application scenes such as a digital television system, live network broadcasting, on-demand network broadcasting, local broadcasting and the like, the support for audio and video code streams is the most core requirement. Currently, the digital television system employs an MPEG TS (Transport Stream) protocol; in the existing various Streaming media communication protocols (such as apple HTTP Live Streaming protocol, MPEG-DASH protocol, microsoft Smooth Streaming protocol, etc.), an audio/video protocol similar to an MPEG-TS mechanism is also adopted; and subsequently, the audio and video protocols for transmission are collectively called TS (transport stream) protocols, and the audio and video code streams are encapsulated in the TS code streams. Similarly, in the media played locally, MPEG PS (program Stream) or similar audio/video packing protocol is also commonly used, and at this time, the audio/video code Stream is encapsulated in the PS code Stream.
The TS stream is used for network transmission, and the PS stream is used for local playing, which are the same in nature, in both stream organizations, there is a specific identifier (such as stream-type) to identify whether the segment stream is audio or video, and the specific identifier also reserves some value for user extension for storing customized data stream (called private stream in the present invention) in the segment.
The invention packs the expanded coded data in the private stream, for the ordinary stereo/surround processor, it can't identify this segment of private stream, therefore it can't interpret the private stream, give up directly; for three-dimensional acoustic terminals or applications, it can recognize this segment of the private stream and then perform the subsequent decoding process.
Reference specification: a general coding first part system of GB/T17975.1-2010 information technology moving pictures and their accompanying sound;
GB/T17975.3-2000 moving picture and its accompanying sound general coding second part video;
GB/T17975.3-2002 moving picture and its accompanying sound.
Step 6.1) encoding
The uncompressed video generates video coding data through a video coding module (H.264, JPEG and the like);
generating audio coding data by a down-mixing compatible basic channel after down-mixing of three-dimensional sound signals (basic channels and/or sound objects) through an audio coding module (such as AAC, AC3, MP3, AVS and other standards); the sound objects and the downmix scheme, the extended base channel, and the base channel division side information are subjected to an extension encoding module (such as standards of Dolby Atmos, AVS2-P3, and MPEG H) to generate extension encoded data.
The audio coding module and the extension coding module may adopt common coding standards such as AAC, AC3, MP3, AVS, Dolby Atmos, AVS2-P3, MPEG H, and the like, or may adopt any proprietary format, and the audio coding module and the extension coding module may adopt the same format, or may adopt different coding formats. Typically, the audio encoding module generally adopts a common encoding standard to achieve compatibility with an existing audio/video system.
Step 6.1) packing
In the TS/PS packing module, video coding data are packed into a video stream conforming to TS/PS specifications; the audio encoding data is packetized into an audio stream compliant with the TS/PS specification, and the extension encoding data is packetized into a private stream compliant with the TS/PS specification.
For terminals or applications that support surround sound, stereo only:
step 6.2.1):
the TS/PS stream unpacking module takes out the audio and video coding data from the code stream, and the expansion coding data is directly ignored because the private stream containing the expansion coding data can not be identified.
Step 6.2.2):
the video coded data is decoded by a video decoding module and then output to display equipment;
and the audio coding data is decoded by an audio decoding module to obtain a down-mixing compatible basic sound channel and then is played.
For a terminal or application that supports three-dimensional sound:
step 6.3.1):
the TS/PS stream unpacking module takes the audio and video coding data out of the code stream, and the expansion coding data is also unpacked at the same time as the TS/PS stream unpacking module can identify the private stream containing the expansion coding data;
step 6.3.2):
the video coded data is decoded by a video decoding module and then output to display equipment;
the audio coding data is decoded by an audio decoding module to output a decoded downmix compatible base channel;
the extension coding data is decoded into a decoded down-mixing scheme, a decoded extension basic channel, decoded basic channel division side information and a decoded sound object by an extension coding module;
step 6.3.3):
the decoded downmix scheme, the decoded downmix compatible base channel, the decoded base channel partition side information, the decoded extension base channel and the decoded sound object are passed through a downmix module to obtain a compatible base channel;
step 6.3.4):
according to the decoded basic sound channel division side information, a basic sound channel combination module is carried out on the compatible basic sound channel, the decoded expanded basic sound channel and the decoded sound object to recover a basic sound channel;
step 6.3.5):
and the basic sound channel and the decoded sound object are processed by a rendering module to generate a three-dimensional sound multichannel PCM code stream for playing.
As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (19)

1. A three-dimensional acoustic encoder, comprising: the three-dimensional sound comprises a base channel and/or a sound object; characterized in that said encoder comprises: the down-mixing and basic sound channel dividing module is used for receiving the three-dimensional sound, performing down-mixing basic sound channel dividing operation according to a down-mixing scheme and outputting down-mixing compatible basic sound channels, expanded basic sound channels and basic sound channel dividing side information; a compatible coding module for receiving the downmix compatible base channel and outputting downmix compatible base channel coded data; the extension coding module is used for receiving the sound object, the down mixing scheme, the extension basic sound channel and the basic sound channel dividing side information and outputting extension coding data; the packaging module is used for receiving the down-mixing compatible basic sound channel coding data and the expansion coding data, respectively packaging the down-mixing compatible basic sound channel coding data and the expansion coding data to output a down-mixing compatible basic sound channel data code stream and an expansion coding data code stream or respectively mixing the down-mixing compatible basic sound channel coding data and the expansion coding data to package to output a three-dimensional sound data code stream; when the three-dimensional sound does not comprise a basic sound channel, the down-mixing and basic sound channel dividing module does not carry out basic sound channel dividing operation, the compatible coding module does not carry out compatible coding operation, and the extension coding module does not carry out extension coding operation of extending the basic sound channel and the basic sound channel dividing side information; when the three-dimensional sound does not include a sound object, the extension encoding module does not perform an extension encoding operation of the sound object.
2. The three-dimensional acoustic encoder according to claim 1, wherein: the down-mixing and basic sound channel dividing module comprises a down-mixing module and a basic sound channel dividing module, wherein the down-mixing module is used for receiving basic sound channels and sound objects and outputting a down-mixing compatible basic sound channel and a down-mixing scheme, and the basic sound channel dividing module is used for receiving the basic sound channels and the down-mixing scheme generated by the down-mixing module and outputting expanded basic sound channels and basic sound channel dividing side information.
3. The three-dimensional acoustic encoder according to claim 1, wherein: the down-mixing and basic sound channel dividing module comprises a down-mixing module and a basic sound channel dividing module, wherein the down-mixing module is used for receiving a basic sound channel, a sound object and an externally input down-mixing scheme and outputting a down-mixing compatible basic sound channel; the basic sound channel dividing module is used for receiving a basic sound channel and an externally input down mixing scheme and outputting expanded basic sound channel and basic sound channel dividing side information.
4. The three-dimensional acoustic encoder according to claim 1, wherein: the down-mixing and basic sound channel dividing module comprises a down-mixing module and a basic sound channel dividing module, wherein the basic sound channel dividing module is used for receiving a basic sound channel and an externally input down-mixing scheme and outputting the expanded basic sound channel and basic sound channel dividing side information; the extension coding data output by the extension coding module is decoded by an extension decoding module, the decoded downmix scheme, the decoded extension base channel, the decoded sound object, the decoded base channel partition side information and the base channel are input to the downmix module, and the downmix module outputs the downmix compatible base channel.
5. The three-dimensional acoustic encoder according to any one of claims 2 to 4, wherein: the downmix module downmixes the basic channel and the sound object into a downmix compatible basic channel according to a downmix scheme, the downmix compatible basic channel signal is divided into a basic channel downmix component and a sound object downmix component, and the basic channel downmix component is divided into an extended basic channel downmix component and a compatible basic channel downmix component.
6. The three-dimensional acoustic encoder according to any one of claims 2 to 4, wherein: the basic sound channel dividing module divides the basic sound channel into a compatible basic sound channel and an expanded basic sound channel; the basic sound channel dividing scheme adopted by the basic sound channel dividing module is determined according to the sound channel configuration of the basic sound channel, a multi-channel system to be compatible and a down-mixing mapping function.
7. The three-dimensional acoustic encoder according to claim 6, wherein: the base channel division scheme determined according to the division method of the corresponding downmix channel includes:
s11: let the set Sbedt be Sbed,
Figure FDA0002273290050000021
ssrt ═ Ssr, set of base channel signals
Figure FDA0002273290050000023
Downmix compatible base channel signal set
Figure FDA0002273290050000024
Figure FDA0002273290050000025
fb (k, i) is the downmix mapping function;
s12: traversing the set Ssrt, and finding out a downmix compatible base channel k satisfying the following relation:
Figure FDA0002273290050000026
for all channels n belonging to Sc, fb (k, n) ═ 0;
if not, go to step S15;
s13: for the downmix compatible base channel k in step S12, go through the set Sbedt, find a base channel m where fb (k, m) is not 0 and fb (k, m) is reversible, if not, perform step S15;
s14: adding the base channel m found in the step S13 to the set Sc to obtain new Sc, removing the downmix compatible base channel k from Ssrt to obtain new Ssrt, removing all the base channels i satisfying fb (k, i) not being 0 from Sbedt to obtain new Sbedt, if neither the new Ssrt nor the new Sbedt is empty, jumping to the step S12, if the new Ssrt and the new Sbedt are empty, executing the step S15;
s15: sc or a subset of Sc as a compatible base channel set of the base channel set Sbed.
8. The three-dimensional acoustic encoder according to claim 6, wherein: the basic channel division scheme determined according to the QR decomposition-based division method includes:
s21: let Sbbedc be Sbed, Sbed is the basic sound channel signal set;
s22: the downmix of Sbedc is expressed in the form of matrix operations: hs _ bedcop ═ HAc × Hbc, Hs _ bedcop is a matrix composed of downmix components formed by Sbedc downmix, Hbc is a matrix composed of base channel signals in Sbedc, and HAc is a matrix composed of Sbedc downmix coefficients;
s23: carrying out QR decomposition on the HAc to obtain the HAc Q HR, wherein Q is a unitary matrix of Ns multiplied by Ns, and HR is an upper triangular matrix of Ns multiplied by Nbc;
s24: assuming that M is min (Ns, Nbc), Ns is the channel number of the base channel downmix and Nbc is the channel number of Sbedc, if r (n, n) >0 is satisfied for each n 1 … M in HR, performing step S25; otherwise, for each n-1 … M, if r (n, n) in HR is 0, the nth channel in Sbedc is removed from Sbedc to form a new set Sbedc ', so that Sbedc is Sbedc' and step S22 is executed;
s25: a set of channels n 1 … M in Sbedc is reserved, which set or a subset of the set serves as a compatible base channel set for the base channel set Sbed.
9. The three-dimensional acoustic encoder according to claim 1, wherein: the compatible coding module and the expansion coding module adopt the same coding format or different coding formats for coding.
10. The three-dimensional acoustic encoder according to claim 1, wherein: the compatible coding module is an audio coding module and is used for receiving the down-mixing compatible basic sound channel and outputting audio coding data; and the packaging module is a TS/PS packaging module and is used for respectively packaging the audio coding data and the extension coding data or performing mixed packaging on the audio coding data and the extension coding data to output an audio stream and a private stream which accord with TS/PS standards.
11. A three-dimensional vocoding method, comprising the steps of: downmixing the basic sound channel and/or the sound object into a compatible downmixed compatible basic sound channel according to a downmixing scheme, dividing the basic sound channel into an expanded basic sound channel and a compatible basic sound channel, and determining basic sound channel division side information; when the three-dimensional sound does not comprise a basic sound channel, the down-mixing and basic sound channel dividing module does not carry out basic sound channel dividing operation, the compatible coding module does not carry out compatible coding operation, and the extension coding module does not carry out extension coding operation of extending the basic sound channel and the basic sound channel dividing side information; when the three-dimensional sound does not comprise a sound object, the extension coding module does not perform extension coding operation of the sound object; coding the sound object, the down mixing scheme, the expanded basic sound channel and the basic sound channel division side information to obtain expanded coded data; and coding the downmix compatible basic channel to generate downmix compatible basic channel coding data, and separately packaging or mixedly packaging the downmix compatible basic channel coding data and the extension coding data and then outputting the downmix compatible basic channel coding data and the extension coding data.
12. A three-dimensional acoustic decoder, characterized by: the code stream separation module is used for receiving the down-mixing compatible basic sound channel data code stream, the expanded coding data code stream and/or the three-dimensional sound data code stream which are respectively input in a packaging mode and are input in a mixing mode, and separating and outputting the down-mixing compatible basic sound channel data and the expanded coding data; a compatible decoding module for receiving the downmix compatible base channel data and outputting a decoded downmix compatible base channel; an extension decoding module for receiving extension decoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, and a decoded sound object; a downmix module for receiving the decoded downmix compatible base channel, the decoded downmix scheme, the decoded extension base channel, the decoded base channel partition side information, the decoded sound object, and outputting a compatible base channel; a basic channel combination module for receiving the compatible basic channel, the decoded extended basic channel and the decoded basic channel dividing side information and outputting the basic channel; and the rendering module is used for receiving the basic sound channel and the decoded sound object and outputting the three-dimensional sound multichannel PCM code stream.
13. A three-dimensional sound decoder for decoding output code streams conforming to TS/PS specifications is characterized in that: the system comprises a TS/PS stream unpacking module, a private stream decoding module and a voice stream decoding module, wherein the TS/PS stream unpacking module is used for receiving a TS/PS stream, analyzing an audio stream and the private stream from the TS/PS stream, outputting audio coding data from the audio stream, and outputting extended coding data from the private stream; an audio decoding module for receiving audio decoding data and outputting a decoded downmix compatible base channel; an extension decoding module for receiving extension decoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, and a decoded sound object; a downmix module for receiving the decoded downmix compatible base channel, the decoded downmix scheme, the decoded extension base channel, the decoded base channel partition side information, the decoded sound object, and outputting a compatible base channel; a basic channel combination module for receiving the compatible basic channel, the decoded extended basic channel and the decoded basic channel dividing side information and outputting the basic channel; and the rendering module is used for receiving the basic sound channel and the decoded sound object and outputting the three-dimensional sound multichannel PCM code stream.
14. The three-dimensional acoustic decoder according to claim 12 or 13, wherein: the de-downmix module comprises: the decoded extension base channel signal and/or the down-mix component of the decoded sound object are removed from the decoded down-mix compatible base channel signal according to the decoded down-mix scheme, resulting in a decoded compatible base channel signal.
15. The three-dimensional acoustic decoder according to claim 14, wherein: the de-downmix module comprises the following steps:
1) calculating compatible base channel downmix components: removing the decoded extension base channel signal and/or the decoded downmix component of the sound object from the decoded downmix compatible base channel signal according to the decoded downmix scheme, resulting in a decoded compatible base channel downmix component;
2) inverse mapping: and inverse mapping the decoded compatible base channel downmix component to obtain a decoded compatible base channel signal.
16. The three-dimensional acoustic decoder according to claim 15, wherein: for a downmix base channel division operation performed according to a base channel division scheme determined by a division method of a corresponding downmix channel, the downmix module comprises the steps of:
1) calculating compatible base channel downmix components: for each compatible base channel n-1 … Nbc, k-component (n) for its corresponding downmix compatible base channel, a decoded downmix compatible base channel signal is derived from
Figure FDA0002273290050000041
Removing the downmix components of the decoded extension base channel signal and/or the decoded sound object to obtain compatible base channel downmix components
Figure FDA0002273290050000042
2) Inverse mapping: for each compatible base channel n-1 … Nbc, k-compot (n) for its corresponding downmix compatible base channel, the inverse function fb of the mapping function fb (k, bctob (n)) in the decoded downmix scheme is followed-1(k, bctob (n)) downmix components on the decoded compatible base channel
Figure FDA0002273290050000051
Inverse mapping is performed to obtain a decoded compatible base channel
Figure FDA0002273290050000052
Figure FDA0002273290050000053
17. The three-dimensional acoustic decoder according to claim 15, wherein the downmix module comprises, for a downmix and base channel division operation according to a base channel division scheme determined by a QR decomposition based division method:
1) calculating compatible base channel downmix components: for each downmix compatible base channel k, deriving a downmix compatible base channel signal from the decoded downmix compatible base channel signal
Figure FDA0002273290050000054
Removing the downmix components of the decoded extension base channel signal and/or the decoded sound object to obtain compatible base channel downmix components
Figure FDA0002273290050000055
All of
Figure FDA0002273290050000056
The composition matrix Hs _ bedcop
Figure FDA0002273290050000057
2) Inverse mapping: according to the decoded inverse matrix invHRQ of the mapping coefficient of the compatible basic sound channel, Hs _ bedcop is inversely mapped to obtain Hbc
Hbc=invHRQ*Hs_bedcop
Line n of Hbc is the decoded compatible base channel signal
Figure FDA0002273290050000058
18. A three-dimensional acoustic decoding method, comprising the steps of: acquiring a down-mixing compatible basic sound channel data code stream, an expanded coding data code stream and/or a three-dimensional sound data code stream which are respectively input in a packaging mode, and separating and outputting the down-mixing compatible basic sound channel and the expanded coding data; decoding the downmix compatible base channel data to obtain a decoded downmix compatible base channel; decoding the extension encoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, a decoded sound object, and a downmix compatible base channel to perform a downmix operation and output a compatible base channel; dividing side information according to the decoded basic sound channel, and combining the compatible basic sound channel and the decoded expanded basic sound channel to generate a basic sound channel; and performing three-dimensional sound rendering on the basic sound channel and the decoded sound object to generate a three-dimensional sound multi-channel PCM code stream.
19. A three-dimensional sound decoding method for decoding output code streams conforming to TS/PS specifications is characterized in that: the method comprises the following steps: acquiring an audio stream and a private stream which are respectively packaged and input, and outputting audio coded data and extension coded data; decoding the audio coded data to obtain a decoded downmix compatible base channel; decoding the extension encoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, a decoded sound object, and a downmix compatible base channel to perform a downmix operation and output a compatible base channel; dividing side information according to the decoded basic sound channel, and combining the compatible basic sound channel and the decoded expanded basic sound channel to generate a basic sound channel; and performing three-dimensional sound rendering on the basic sound channel and the decoded sound object to generate a three-dimensional sound multi-channel PCM code stream.
CN201611171106.3A 2016-12-16 2016-12-16 Encoder and method for three-dimensional sound signal compatible with existing audio and video system Active CN108206983B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611171106.3A CN108206983B (en) 2016-12-16 2016-12-16 Encoder and method for three-dimensional sound signal compatible with existing audio and video system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611171106.3A CN108206983B (en) 2016-12-16 2016-12-16 Encoder and method for three-dimensional sound signal compatible with existing audio and video system

Publications (2)

Publication Number Publication Date
CN108206983A CN108206983A (en) 2018-06-26
CN108206983B true CN108206983B (en) 2020-02-14

Family

ID=62601691

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611171106.3A Active CN108206983B (en) 2016-12-16 2016-12-16 Encoder and method for three-dimensional sound signal compatible with existing audio and video system

Country Status (1)

Country Link
CN (1) CN108206983B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110225445A (en) * 2019-05-22 2019-09-10 上海德衡数据科技有限公司 A kind of processing voice signal realizes the method and device of three-dimensional sound field auditory effect

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1984913A1 (en) * 2006-02-07 2008-10-29 LG Electronics Inc. Apparatus and method for encoding/decoding signal
CN104064194A (en) * 2014-06-30 2014-09-24 武汉大学 Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency
CN105405445A (en) * 2015-12-10 2016-03-16 北京大学 Parameter stereo coding, decoding method based on inter-channel transfer function

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1984913A1 (en) * 2006-02-07 2008-10-29 LG Electronics Inc. Apparatus and method for encoding/decoding signal
CN104064194A (en) * 2014-06-30 2014-09-24 武汉大学 Parameter coding/decoding method and parameter coding/decoding system used for improving sense of space and sense of distance of three-dimensional audio frequency
CN105405445A (en) * 2015-12-10 2016-03-16 北京大学 Parameter stereo coding, decoding method based on inter-channel transfer function

Also Published As

Publication number Publication date
CN108206983A (en) 2018-06-26

Similar Documents

Publication Publication Date Title
US9473870B2 (en) Loudspeaker position compensation with 3D-audio hierarchical coding
US9788136B2 (en) Apparatus and method for low delay object metadata coding
JP6045696B2 (en) Audio signal processing method and apparatus
KR102374897B1 (en) Encoding and reproduction of three dimensional audio soundtracks
JP6346278B2 (en) Audio encoder, audio decoder, method, and computer program using joint encoded residual signal
WO2019126745A1 (en) Priority information for higher order ambisonic audio data
US20200013426A1 (en) Synchronizing enhanced audio transports with backward compatible audio transports
US20140310010A1 (en) Apparatus for encoding and apparatus for decoding supporting scalable multichannel audio signal, and method for apparatuses performing same
CN111034225B (en) Audio signal processing method and apparatus using ambisonic signal
CN110603585A (en) Hierarchical intermediate compression of audio data for higher order stereo surround
US20190110148A1 (en) Spatial relation coding of higher order ambisonic coefficients
US11081116B2 (en) Embedding enhanced audio transports in backward compatible audio bitstreams
CN108206022B (en) Codec for transmitting three-dimensional acoustic signals by using AES/EBU channel and coding and decoding method thereof
CN108206984B (en) Codec for transmitting three-dimensional acoustic signals using multiple channels and method for encoding and decoding the same
CN108206983B (en) Encoder and method for three-dimensional sound signal compatible with existing audio and video system
KR101949756B1 (en) Apparatus and method for audio signal processing
US11062713B2 (en) Spatially formatted enhanced audio data for backward compatible audio bitstreams
CN108206021B (en) Backward compatible three-dimensional sound encoder, decoder and encoding and decoding methods thereof
KR20140017344A (en) Apparatus and method for audio signal processing
KR20090033720A (en) Method of managing a memory and method and apparatus of decoding multi channel data
EP3874494A1 (en) Apparatus, methods and computer programs for encoding spatial metadata
WO2022262758A1 (en) Audio rendering system and method and electronic device
WO2020201619A1 (en) Spatial audio representation and associated rendering
KR101950455B1 (en) Apparatus and method for audio signal processing
KR101949755B1 (en) Apparatus and method for audio signal processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 210000 stone city, Gulou District, Nanjing, Jiangsu

Patentee after: WAVARTS TECHNOLOGIES CO.,LTD.

Address before: 210000 Room 302, No. 69, shitoucheng, Nanjing, Jiangsu

Patentee before: NANJING QINGJIN INFORMATION TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220408

Address after: 101399 room 1001, building 1, No. 8, jinmayuan Third Street, Gaoliying Town, Shunyi District, Beijing

Patentee after: Beijing panoramic sound information technology Co.,Ltd.

Address before: 210000 stone city, Gulou District, Nanjing, Jiangsu

Patentee before: WAVARTS TECHNOLOGIES CO.,LTD.