CN108206983B

CN108206983B - Encoder and method for three-dimensional sound signal compatible with existing audio and video system

Info

Publication number: CN108206983B
Application number: CN201611171106.3A
Authority: CN
Inventors: 潘兴德; 陈笑天; 吴超刚
Original assignee: NANJING QINGJIN INFORMATION TECHNOLOGY Co Ltd
Current assignee: Beijing panoramic sound information technology Co.,Ltd.
Priority date: 2016-12-16
Filing date: 2016-12-16
Publication date: 2020-02-14
Anticipated expiration: 2036-12-16
Also published as: CN108206983A

Abstract

The invention discloses an encoder and a method of a three-dimensional sound signal compatible with the existing audio and video system, comprising a down-mixing and basic sound channel dividing module, a down-mixing and basic sound channel dividing module and a down-mixing and basic sound channel dividing module, wherein the down-mixing and basic sound channel dividing module is used for receiving a basic sound channel and/or a sound object, performing down-mixing and basic sound channel dividing operation according to a down-mixing scheme, and outputting down-; a compatible coding module for receiving the downmix compatible base channel and outputting downmix compatible base channel coded data; the extension coding module is used for receiving the sound object, the down mixing scheme, the extension basic sound channel and the basic sound channel dividing side information and outputting extension coding data; and the packaging module is used for receiving the down-mixing compatible basic sound channel coding data and the expansion coding data, and respectively packaging and outputting or mixing and packaging to output the three-dimensional sound data code stream. The invention can be better compatible with the coding and decoding methods of the existing audio and video system, and has the capability of three-dimensional sound coding and decoding.

Description

Encoder and method for three-dimensional sound signal compatible with existing audio and video system

Technical Field

The invention relates to the technical field of three-dimensional sound coding and decoding processing, in particular to a coder compatible with a three-dimensional sound signal of an existing audio and video system and a method thereof.

Background

Over the years, stereo, 5.1, 7.1 surround sound, etc. systems have been widely used, which can only present two-dimensional sound at most due to lack of sound height information. In the real world, three-dimensional sound is the most realistic presentation and expression mode of sound, and is a future development trend regardless of the nature, the art field or the audiovisual entertainment field.

In the existing system, the three-dimensional sound may be a multi-channel signal (e.g. 9.1, 11.1, 13.1, 22.2, etc.), may be a plurality of sound objects, or may be a combination of both. In a three-dimensional sound system, the multi-channel signal may be a surround sound signal, such as 5.1, 7.1, etc., or may be a multi-layer multi-channel signal (i.e., the multi-layer channel signal is distributed in different height planes). For example, some three-dimensional acoustic systems employ two planes of a middle layer and a top layer, some three-dimensional acoustic systems employ a three-layer system, and so on. Some three-dimensional sound systems have only multi-layer multi-channel signals, but no sound objects, such as the 22.2 three-dimensional sound system of SMPTE and the AURO 9.1 system. Some three-dimensional sound systems have both multi-layered multi-channel signals and sound objects, such as MPEG-H, Dolby Atmos and DTS: X systems. Of course, as an extreme example, the three-dimensional sound may be all the sound object signals.

Three-dimensional sound has not yet gained wide application as a new technology and system which just appears, and the popularization of the three-dimensional sound needs a long development stage. Due to the ubiquitous presence of stereo and surround sound systems, three-dimensional sound systems have only to the maximum extent compatible with the already ubiquitous stereo or surround sound systems, and have gained market acceptance and become mainstream.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to provide a coder compatible with three-dimensional sound signals of the existing audio and video system and a method thereof aiming at the application of the three-dimensional sound systems of networks, televisions and the like.

The technical scheme is as follows: the three-dimensional sound encoder of the present invention includes: a down-mixing and basic sound channel dividing module for receiving the basic sound channel and/or the sound object, performing down-mixing basic sound channel dividing operation according to the down-mixing scheme, and outputting the down-mixing compatible basic sound channel, the expanded basic sound channel and the basic sound channel dividing side information; a compatible coding module for receiving the downmix compatible base channel and outputting downmix compatible base channel coded data; the extension coding module is used for receiving the sound object, the down mixing scheme, the extension basic sound channel and the basic sound channel dividing side information and outputting extension coding data; and the packaging module is used for receiving the down-mixing compatible basic sound channel coding data and the extension coding data, respectively packaging the down-mixing compatible basic sound channel coding data and the extension coding data to output a down-mixing compatible basic sound channel data code stream and an extension coding data code stream or respectively mixing the down-mixing compatible basic sound channel coding data and the extension coding data to package and output a three-dimensional sound data code stream.

Further perfecting the above technical solution, when a system adaptive selection downmix scheme is adopted, the downmix and basic channel dividing module includes a downmix module and a basic channel dividing module, the downmix module is configured to receive a basic channel and a sound object and output a downmix compatible basic channel and a downmix scheme, and the basic channel dividing module is configured to receive a downmix scheme generated by the basic channel and the downmix module and output extended basic channel and basic channel dividing side information.

Further, when a downmix scheme determined by external input is adopted, the downmix and base channel dividing module includes a downmix module and a base channel dividing module, and the downmix module is configured to receive a downmix scheme of a base channel, a sound object, and the external input, and output a downmix compatible base channel; the basic sound channel dividing module is used for receiving a basic sound channel and an externally input down mixing scheme and outputting expanded basic sound channel and basic sound channel dividing side information.

The extension coding module adopts lossy coding or lossless coding for coding, and when a downmix scheme determined by the lossy coding and external input is adopted, the downmix and basic channel dividing module comprises a downmix module and a basic channel dividing module, wherein the basic channel dividing module is used for receiving the downmix scheme of the basic channel and the external input and outputting the extension basic channel and basic channel dividing side information; the extension coding data output by the extension coding module is decoded by an extension decoding module, the decoded downmix scheme, the decoded extension base channel, the decoded sound object, the decoded base channel partition side information and the base channel are input to the downmix module, and the downmix module outputs the downmix compatible base channel.

Further, the downmix module downmixes the base channel and the sound object into a downmix compatible base channel according to a downmix scheme, the downmix compatible base channel signal is divided into a base channel downmix component and a sound object downmix component, and the base channel downmix component is divided into an extended base channel downmix component and a compatible base channel downmix component. The down-mixing module adopts a PAN system or a WFS system or an Ambisonic system or a down-mixing system with similar functions to execute down-mixing operation.

Further, the basic channel dividing module divides the basic channel into a compatible basic channel and an extended basic channel; the basic channel division scheme adopted by the basic channel division module is determined according to the channel configuration of the basic channel, a multi-channel system to be compatible and a down-mixing mapping function, such as the division method according to the corresponding down-mixing channel or the division method based on QR decomposition.

The base channel division scheme determined according to the division method of the corresponding downmix channel includes:

s11: let the set Sbedt be Sbed,ssrt ═ Ssr, set of base channel signals

Downmix compatible base channel signal set

fb (k, i) is the downmix mapping function;

s12: traversing the set Ssrt, and finding out a downmix compatible base channel k satisfying the following relation:

for all channels n belonging to Sc, fb (k, n) ═ 0;

if not, go to step S15;

s13: for the downmix compatible base channel k in step S12, go through the set Sbedt, find a base channel m where fb (k, m) is not 0 and fb (k, m) is reversible, if not, perform step S15;

s14: adding the base channel m found in the step S13 to the set Sc to obtain new Sc, removing the downmix compatible base channel k from Ssrt to obtain new Ssrt, removing all the base channels i satisfying fb (k, i) not being 0 from Sbedt to obtain new Sbedt, if neither the new Ssrt nor the new Sbedt is empty, jumping to the step S12, if the new Ssrt and the new Sbedt are empty, executing the step S15;

s15: sc or a subset of Sc as a compatible base channel set of the base channel set Sbed.

The basic channel division scheme determined according to the QR decomposition-based division method includes:

s21: let Sbbedc be Sbed, Sbed is the basic sound channel signal set;

s22: the downmix of Sbedc is expressed in the form of matrix operations: hs _ bedcop ═ HAc × Hbc, Hs _ bedcop is a matrix composed of downmix components formed by Sbedc downmix, Hbc is a matrix composed of base channel signals in Sbedc, and HAc is a matrix composed of Sbedc downmix coefficients;

s23: carrying out QR decomposition on the HAc to obtain the HAc Q HR, wherein Q is a unitary matrix of Ns multiplied by Ns, and HR is an upper triangular matrix of Ns multiplied by Nbc;

s24: assuming that M is min (Ns, Nbc), Ns is the channel number of the base channel downmix and Nbc is the channel number of Sbedc, if r (n, n) >0 is satisfied for each n 1 … M in HR, performing step S25; otherwise, for each n-1 … M, if r (n, n) in HR is 0, the nth channel in Sbedc is removed from Sbedc to form a new set Sbedc ', so that Sbedc is Sbedc' and step S22 is executed;

s25: a set of channels n 1 … M in Sbedc is reserved, which set or a subset of the set serves as a compatible base channel set for the base channel set Sbed.

Further, the compatible coding module and the extension coding module adopt the same coding format or different coding formats for coding.

Further, the compatible encoding module is an audio encoding module, and is configured to receive a downmix compatible base channel and output audio encoded data; and the packaging module is a TS/PS packaging module and is used for respectively packaging the audio coding data and the extended coding data and outputting an audio stream and a private stream which accord with TS/PS standards.

The method for carrying out three-dimensional sound coding by adopting the three-dimensional encoder comprises the following steps: downmixing the basic sound channel and/or the sound object into a compatible downmixed compatible basic sound channel according to a downmixing scheme, dividing the basic sound channel into an expanded basic sound channel and a compatible basic sound channel, and determining basic sound channel division side information; coding the sound object, the down mixing scheme, the expanded basic sound channel and the basic sound channel division side information to obtain expanded coded data; and coding the downmix compatible basic channel to generate downmix compatible basic channel coding data, and separately packaging or mixedly packaging the downmix compatible basic channel coding data and the extension coding data and then outputting the downmix compatible basic channel coding data and the extension coding data.

The three-dimensional sound decoder for decoding the code stream packet generated by the three-dimensional sound encoder is compatible with an audio and video system and comprises: the code stream separation module is used for receiving the down-mixing compatible basic sound channel data code stream, the expanded coding data code stream and/or the three-dimensional sound data code stream which are respectively input in a packaging mode and are input in a mixing mode, and separating and outputting the down-mixing compatible basic sound channel data and the expanded coding data; a compatible decoding module for receiving the downmix compatible base channel data and outputting a decoded downmix compatible base channel; an extension decoding module for receiving extension decoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, and a decoded sound object; a downmix module for receiving the decoded downmix compatible base channel, the decoded downmix scheme, the decoded extension base channel, the decoded base channel partition side information, the decoded sound object, and outputting a compatible base channel; a basic channel combination module for receiving the compatible basic channel, the decoded extended basic channel and the decoded basic channel dividing side information and outputting the basic channel; and the rendering module is used for receiving the basic sound channel and the decoded sound object and outputting the three-dimensional sound multichannel PCM code stream.

In order to better support audio and video code streams, the invention provides a three-dimensional sound decoder for outputting code stream decoding in accordance with TS/PS standards, which comprises a TS/PS stream unpacking module, a private stream decoding module and a sound source module, wherein the TS/PS stream unpacking module is used for receiving TS/PS streams, resolving audio streams and private streams from the TS/PS streams, outputting audio coded data by the audio streams, and outputting expanded coded data by the private streams; an audio decoding module for receiving audio decoding data and outputting a decoded downmix compatible base channel; an extension decoding module for receiving extension decoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, and a decoded sound object; a downmix module for receiving the decoded downmix compatible base channel, the decoded downmix scheme, the decoded extension base channel, the decoded base channel partition side information, the decoded sound object, and outputting a compatible base channel; a basic channel combination module for receiving the compatible basic channel, the decoded extended basic channel and the decoded basic channel dividing side information and outputting the basic channel; and the rendering module is used for receiving the basic sound channel and the decoded sound object and outputting the three-dimensional sound multichannel PCM code stream.

Further, the de-downmix module comprises: the decoded extension base channel signal and/or the down-mix component of the decoded sound object are removed from the decoded down-mix compatible base channel signal according to the decoded down-mix scheme, resulting in a decoded compatible base channel signal.

Further, the de-downmix module comprises the steps of:

1) calculating compatible base channel downmix components: removing the decoded extension base channel signal and/or the decoded downmix component of the sound object from the decoded downmix compatible base channel signal according to the decoded downmix scheme, resulting in a decoded compatible base channel downmix component;

2) inverse mapping: and inverse mapping the decoded compatible base channel downmix component to obtain a decoded compatible base channel signal.

Further, the downmix module comprises the following steps for performing a downmix base channel division operation with respect to a base channel division scheme determined by a division method of a corresponding downmix channel:

1) calculating compatible base channel downmix components: for each compatible base channel n-1 … Nbc, k-component (n) for its corresponding downmix compatible base channel, a decoded downmix compatible base channel signal is derived from

Removing the downmix components of the decoded extension base channel signal and/or the decoded sound object to obtain compatible base channel downmix components

2) Inverse mapping: for each compatible base channel n-1 … Nbc, k-compot (n) as its corresponding downmix compatible base channel, following the decoded downmix schemeInverse function fb of the middle mapping function fb (k, bctob (n)))^-1(k, bctob (n)) downmix components on the decoded compatible base channel

Inverse mapping is performed to obtain a decoded compatible base channel

Further, for a downmix base channel dividing operation according to a base channel dividing scheme determined by a QR decomposition based dividing method, the downmix module comprises the steps of:

1) calculating compatible base channel downmix components: for each downmix compatible base channel k, deriving a downmix compatible base channel signal from the decoded downmix compatible base channel signal

All of

The composition matrix Hs _ bedcop

2) Inverse mapping: according to the decoded inverse matrix invHRQ of the mapping coefficient of the compatible basic sound channel, Hs _ bedcop is inversely mapped to obtain Hbc

Hbc＝invHRQ*Hs_bedcop

Line n of Hbc is the decoded compatible base channel signal

The method for decoding the code stream output by the three-dimensional sound encoder by the three-dimensional sound decoder comprises the following steps: acquiring a down-mixing compatible basic sound channel data code stream, an expanded coding data code stream and/or a three-dimensional sound data code stream which are respectively input in a packaging mode, and separating and outputting the down-mixing compatible basic sound channel and the expanded coding data; decoding the downmix compatible base channel data to obtain a decoded downmix compatible base channel; decoding the extension encoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, a decoded sound object, and a downmix compatible base channel to perform a downmix operation and output a compatible base channel; dividing side information according to the decoded basic sound channel, and combining the compatible basic sound channel and the decoded expanded basic sound channel to generate a basic sound channel; and performing three-dimensional sound rendering on the basic sound channel and the decoded sound object to generate a three-dimensional sound multi-channel PCM code stream.

The three-dimensional sound decoding method for decoding the output code stream conforming to the TS/PS specification comprises the following steps: acquiring an audio stream and a private stream which are respectively packaged and input, and outputting audio coded data and extension coded data; decoding the audio coded data to obtain a decoded downmix compatible base channel; decoding the extension encoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, a decoded sound object, and a downmix compatible base channel to perform a downmix operation and output a compatible base channel; dividing side information according to the decoded basic sound channel, and combining the compatible basic sound channel and the decoded expanded basic sound channel to generate a basic sound channel; and performing three-dimensional sound rendering on the basic sound channel and the decoded sound object to generate a three-dimensional sound multi-channel PCM code stream.

Has the advantages that: the invention provides a coder compatible with three-dimensional sound signals of the existing audio and video system and a method thereof aiming at the application of three-dimensional sound systems such as networks, televisions and the like. When the audio processor (hardware/software) only supports the processor of surround sound format or stereo format, only the downmix compatible base channel data needs to be sent to the surround sound or stereo processor (hardware/software), so that the same hearing experience as existing stereo, 5.1 or 7.1 surround sound can be obtained without loss of basic two-dimensional sound information; in the case of a three-dimensional acoustic processor (hardware/software), all data is transmitted, i.e. three-dimensional acoustic decoding and playback can be achieved. In the application occasion focusing on improving the quality of the three-dimensional sound, the invention also provides an improved three-dimensional sound coding method, which improves the coding quality of the three-dimensional sound and obtains better playing effect.

Drawings

FIG. 1 is a block diagram of a three-dimensional vocoding method according to embodiment 1;

FIG. 2 is a block diagram of a three-dimensional vocoding method according to embodiment 2;

FIG. 3 is a diagram illustrating a first downmix operation;

FIG. 4 is a diagram illustrating a second downmix operation;

FIG. 5 is a block diagram of a three-dimensional vocoding method according to embodiment 3;

FIG. 6 is a block diagram of a three-dimensional vocoding method according to embodiment 4;

FIG. 7 is a schematic diagram of a playing process of a digital movie produced by a three-dimensional sound encoding method (separately outputting encoded code streams);

fig. 8 is a schematic diagram of a playing process of a digital movie mixed by a three-dimensional sound coding method (mixed output coded stream);

FIG. 9 is a block diagram of a three-dimensional acoustic decoding method;

FIG. 10 is a diagram of a method of operation of a de-downmix module;

FIG. 11 is a block diagram of the method of example 6.

Detailed Description

The technical solution of the present invention is described in detail below with reference to the accompanying drawings, but the scope of the present invention is not limited to the embodiments.

Example 1:

the three-dimensional acoustic signal is composed of a multi-channel signal (i.e., a base channel) and/or a sound object signal (including object rendering description information and object audio data). In order to meet the requirement of backward compatible multi-channel (stereo, surround sound) programs or systems, the three-dimensional sound coding method provided by the invention downmixes a three-dimensional sound signal into compatible downmixed compatible basic channel data according to a downmixing scheme, and codes the downmixing scheme, the extended basic channel, the basic channel division side information and the sound object to obtain extended coded data.

The basic channel can be a stereo, 5.1, 7.1 and other multi-channel signals, and can also be a multi-layer multi-channel three-dimensional sound signal such as 9.1, 11.1, 13.1, 22.2 and other multi-layer multi-channel signals, and the data of each channel in the basic channel is represented as

Nb is the number of channels of the basic channel, when Nb is equal to 0, the three-dimensional sound signal does not contain multi-channel signals but only sound object signals, and all basic channel signals form a setSound object signal obj _ signal [ j]Including object rendering description information obj _ info [ j]And object signal

As the number of sound objects, a sound object is not included in the three-dimensional sound signal when M is equal to 0. The sound object signal may be a mono, stereo or multi-channel signal. The compatible downmix compatible base channel signal is represented as

Ns is the number of channels of the compatible stereo, surround sound system, and all the downmix compatible basic channel signals form a setData per channel or sound objectIs a time-series signal, i.e., PCM (pulse code modulation sampling) data, which when frame-divided, can represent a time-series signal within one frame.

When downmixing the three-dimensional sound signal into downmix compatible base channel signals according to a certain downmix scheme, each downmix compatible base channel signal comprises a base channel downmix component and a sound object downmix component:

the downmix scheme may be expressed as a set of mapping functions fo (k, j), fb (k, i). Set of downmix components of a base channel of an all downmix compatible base channel signal

Set of downmix components of a sound object for all downmix compatible base channel signals

1 … Ns }. The base channel downmix component is generated from the base channel signal and may be represented as:

fb (k, i) is a downmix mapping function when the ith base channel signal is downmixed to the kth channel of the downmix compatible base channel signal. The sound object downmix component is generated by downmix rendering the signal of each sound object according to a downmix compatible base channel system to be compatible, and can be expressed as:

fo (k, j) is a downmix mapping function when a kth channel of a jth object downmix compatible base channel is downmixed, and is related to information such as position coordinates of an object. The functions fo (k, j), fb (k, i) may be expressed as operations of gain, delay, etc., e.g., fb (k, i) (x (t)) ═ a (k, i) × (t- Δ (k, i)); more complex mapping functions such as WFS and HOA driver functions are also possible. Wfs (wave field synthesis) is a sound rendering method, which uses Kirchhoff-Helmholtz integration to recover the original sound field by using the set of echo speakers when solving the wave equation. Hoa (high order ambisonic) is also a sound rendering method, which uses a spherical wave stage number to realize the original sound field restoration by loudspeaker sets when solving the wave equation. The WFS and HOA methods may be referred to in particular in the literature "Coorison of high order organic and Wave Field Synthesis With resource to Spatial characterization objects in Time Domain" (Sascha spheres and Jens Ahrens,19th International Congress on optics major, 2-7 Sept.2007).

Can integrate the basic channel signals

Carry out the sound channel and divide into two sets Sbede and Sbedc, satisfy:

Nbe+Nbc＝Nb

Sbede∪Sbedc＝Sbed

accordingly, the base channel downmix component

Can also be divided into

And

the sum, which can be expressed as:

wherein, beta (m) (m 1 … Nbe) is the sequence number of the mth channel in Sbede in the base channel,

bctob (n) (n 1 … Nbc) is the sequence number of the nth channel in the base channel in Sbedc,

generated from the base channel downmix in the Sbed set, all

Composition set

If can Ssrbed according to the set_copAnd calculating a downmix mapping function fb (k, i), basic channel dividing side information (beta (m), bctob (n)), and the like to obtain all signals in Sbodc, and then, referring Sbodc as a compatible basic channel set, wherein the channel signals are channel signals

Referred to as a compatibility baseA base channel; sbede is called an extended base channel set corresponding to Sbedc,

referred to as the extension base channel.

Referred to as extended base channel downmix component; ssrbed_copReferred to as a set of compatible base channel downmix components,

referred to as compatible base channel downmix components.

For the base channel set Sbed, there may be various dividing manners and dividing criteria for the compatible base channel set Sbedc and the extended base channel set Sbede, and obviously, Sbede is Sbed,

the division of (a) conforms to the above definition; if Sbedc1 is a compatible base channel set for Sbed, then any subset Sbedct of Sbedc1 is also a compatible base channel set for Sbed.

If the three-dimensional sound signal is composed of a 5.1.4 two-layer multi-channel system (i.e. 5 channels in the middle layer, 1 subwoofer channel and 4 top channels) and 20 sound objects, it is a compatible 5.1 surround sound system, where 5.1 is processed independently as compatible base channel data, transmitted by three surround sound channels, and 4 top channels are processed together with 20 sound objects as extension base channels and packaged by other kinds of channels.

As shown in fig. 1, the three-dimensional audio encoder provided by the present invention includes a downmix and base channel dividing module, a compatible base channel, an extension encoding module, and a packing module.

Step 1.1) Down-mix and basic channel division Module

Down-mixing a three-dimensional sound program into a compatible stereo/multi-channel signal according to an external input or a system-adaptively selected down-mixing scheme (which may be expressed, for example, as a set of mapping functions fb (k, i), fo (k, j)), resulting in a stereo/multi-channel signalDownmix compatible base channel signalDividing the basic channel into two parts of the downmix compatible basic channel data and the extended basic channel, determining the side information (such as beta (m), bctob (n)) of the basic channel, and outputting the used downmix scheme.

"external input" generally refers to a downmix scheme manually selected by a sound engineer during down-mixing of a three-dimensional sound program, which allows the sound engineer to compare and select the downmix scheme by repeated monitoring; "system adaptive" refers to a scheme for intelligently selecting a downmix by a coding system, such as for a downmix of a base channel, downmixing a signal of a channel layer expressing height information in a multi-layer multi-channel system to surround sound channels of an intermediate layer in a position relationship (such as directly mixing a top front left channel into a left channel and directly mixing a top front right channel into a right channel) adaptively according to a relation of speaker layouts between different base channel systems and a downmix compatible multi-channel system, so as to form a downmix scheme (which can be expressed by a set of mapping functions fb (k, i)); for example, for the downmix of sound objects, a downmix scheme (which may be expressed by a set of mapping functions fo (k, j)) may be formed adaptively according to the object rendering description information (position coordinates of objects, etc.) and the rendering method WFS/HOA/PAN, etc.

This step 1.1 may not be required if the three-dimensional sound program does not include the base channel, or if there is a simple and clear one-to-one correspondence between the compatible base channel, the extended base channel and the base channel.

Step 1.2) extension coding Module

And carrying out extension coding on the extension basic channel, the basic channel division side information, the downmix scheme and the sound object to obtain extension coded data. If the three-dimensional program has no basic sound channel, the three-dimensional program may not include a part for encoding the extension basic sound channel and the basic sound channel division side information; if the three-dimensional program has no sound object, a portion encoding the sound object may not be included. If the same determinable base channel partition is followed in both encoding and decoding, the base channel partition side information may not be encoded. If the same determinable downmix scheme is followed in both encoding and decoding, the downmix scheme may not be encoded.

The encoding of the extension base channel and the encoding of the audio object may be performed by non-compression encoding or compression encoding, or may be vector encoding or scalar encoding, such as dolby AC3, MPEG-1layer3, MPEG-2/4AAC, MPEG H, dolby Atmos, AVS three-dimensional audio encoding, or the like. The encoding of the sound object comprises encoding of object rendering description information obj _ info [ j ] and object signals obj _ data [ j ]. When encoding the downmix scheme, the parameters a (k, i), Δ (k, i), etc. of the downmix mapping functions fb (k, i), fo (k, j) may be encoded lossy or lossless; the a (k, i) and Δ (k, i) may be vector-coded. When encoding the base channel division side information (for example, beta (m) (m-1 … Nbe), bctob (n) (n-1 … Nbc)), lossy or lossless encoding may be performed; vector coding may also be used.

Step 1.3) compatible coding module

And coding the downmix compatible base channel to obtain downmix compatible base channel coding data.

The coding of the downmix compatible base channel can adopt non-compressed coding or compressed coding method, and can be vector coding or scalar coding, such as coding formats of dolby AC3, MPEG-1layer3, MPEG-2/4AAC, AVS, etc., and in order to meet the compatibility requirement, the coding method used should meet the requirement of the compatible multi-channel system.

It should be noted that, the specific coding method of the compatible coding module and the specific coding method of the extended coding module may adopt the same coding format or different coding formats, as long as a corresponding decoder is selected during decoding.

Step 1.4) packing module

The packaging module has two packaging modes, wherein the first mode adopts a down-mixing compatible basic sound channel packaging module and an extended coding data packaging module to respectively package down-mixing compatible basic sound channel data and extended coding data, and the packaged data can be transmitted through channels such as an IP network, a broadcast television network, a mobile network and a USB; the packaging operation may be any suitable packaging protocol for broadcast network, IP network, mobile network, etc., such as MPEG TS, MPEG DASH, hls (http live streaming), etc.

And the second mode adopts a mixed packing module, all data are printed in the same code stream, and the output code stream is called a three-dimensional acoustic data code stream. The three-dimensional sound data can be transmitted through channels such as an IP network, a broadcast television network, a mobile network, a USB and the like; the packaging operation may adopt any packaging protocol suitable for broadcast network, IP network, mobile network, etc., such as MPEG TS, MPEG dash, HLS (HTTP Live Streaming), etc.

Example 2:

as shown in fig. 2, in the down-mixing scheme adopting the system adaptive selection, the three-dimensional vocoder provided by the present invention includes a down-mixing module, a basic channel dividing module, a compatible encoding module, an extension encoding module, and a packing module. The difference from embodiment 1 is that the downmix and base channel dividing module is divided into a downmix module and a base channel dividing module.

Step 2.1) Down-mixing Module

Down-mixing the three-dimensional sound program into a compatible stereo/multi-channel signal according to a down-mixing scheme selected by the system self-adaptation to obtain a down-mixing compatible basic channel signalAnd outputting the used downmix scheme.

As described previously, the downmix compatible base channel signal comprises the base channel downmix component and the sound object downmix component:

the base channel downmix component is generated from the signal of each base channel, and the base channel downmix component of each downmix compatible base channel Ki can be calculated as follows:

fb (k, i) is a downmix mapping function when the ith base channel signal is downmixed to the kth channel of the downmix multi-channel signal.

The sound object downmix component is generated by downmix rendering the signal of each sound object according to the multi-channel system to be compatible, and the sound object downmix component of each downmix compatible base channel k can be calculated according to the following formula:

fo (k, j) is a downmix mapping function when a kth channel of a jth object downmix multi-channel signal is downmixed. A schematic diagram of the downmix operation in this case is shown in fig. 3.

Base channel downmix component

And can be divided into extension base channel downmix components

And compatible base channel downmix component

The extended base channel downmix component may be calculated as follows, respectively

And compatible base channel downmix component

A schematic diagram of the downmix operation in this case is shown in fig. 4.

Step 2.2) basic sound channel dividing module

The base channel is divided into two parts, a compatible base channel and an extension base channel, according to the downmix scheme used by the downmix module (which may be expressed, for example, by a set of mapping functions fb (k, i), fo (k, j)), and base channel division side information (e.g., beta (m), bctob (n)) is determined.

This step 2.2 may not be required if the three-dimensional sound program does not include the base channel, or if there is a simple and clear one-to-one correspondence between the compatible base channel, the extended base channel and the base channel.

The basic channel division method will be described in detail below by taking two cases as examples.

1) The corresponding method for dividing the down-mixing sound channel comprises the following steps:

when the channel configuration of the base channel of the three-dimensional sound program, the multi-channel system to be compatible and the downmix mapping function fb (k, i) are all determined, the base channel may be divided according to the following division rule

Is divided into

Two parts are as follows:

for each n 1 … Nbc, there is k _ composition (n) and n _ invcomposition (k), where composition (n) is the sequence number of the downmix compatible base channel corresponding to the nth compatible base channel, having

And an inverse function exists for fb (k, bctob (n)).

In case that the above rule is satisfied, the decoded compatible base channel signal can be calculated as follows

For each n, there is k ═ compat (n)

I.e. collections

Can be according to a set

And the downmix mapping function fb (k, i), the base channel partition side information (beta (m), bctob (n)), and the like, and satisfies the definition of the compatible base channel set, so Sbedc is a compatible base channel set of Sbed. The characteristics in this case are: the compatible base channel n may be obtained by downmixing the compatible base channel downmix components of the corresponding downmix compatible base channel k

And performing inverse mapping, wherein the inverse mapping function is an inverse function of the downmix mapping function.

In this case, the base channel division side information includes beta (m) (m-1 … Nbe), bctob (n) (n-1 …)Nbc), compat (n ═ 1 … Nbc). Where (c) is the number of the downmix compatible base channel corresponding to the nth compatible base channel, fb (k, bctob (n)) has an inverse function fb^-1(k, bctob (n)), e.g.

fb(k,i)(x(t))＝a(k,i)*x(t-Δ(k,i))

fb^-1(k,i)(x(t))＝x(t+Δ(k,i))/a(k,i)

In the special case that the temperature of the liquid is lower than the set temperature,

fb(k,i)(x(t))＝x(t)

fb^-1(k,i)(x(t))＝x(t)

for example, a base channel of 5.1.4(5.1+4H, 5.1 surround plus 4 top speakers), a compatible multichannel system of 5.1, suppose that the channel sequence of 5.1.4 is left channel (1), right channel (2), central channel (3), subwoofer channel (4), left surround channel (5), right surround channel (6), top front left channel (7), top front right channel (8), top back left channel (9), top back right channel (10), the channel sequence of 5.1 is left channel (1), right channel (2), central channel (3), subwoofer channel (4), left surround channel (5), right surround channel (6), the downmix algorithm proceeds according to the way that top front left channel (7), top back left channel (9) mix into direct left channel, top front right channel (8), top back right channel (10) mix into direct right channel, the channel mapping function simplifies to:

betob(m)＝m+6

bctob(n)＝n

the correspondence of the compatible base channel and the downmix compatible base channel, compat (n), is simplified as:

compat(n)＝n

the downmix mapping function of the base channel is simplified to

fb(k,i)(x(t))＝x(t)

At this time, the basic channel can be divided into two parts, namely an extended basic channel { top front left channel (7), top front right channel (8), top rear left channel (9), top rear right channel (10) } and a compatible basic channel { left channel (1), right channel (2), central channel (3), subwoofer channel (4), left surround channel (5) and right surround channel (6) }.

For example, the basic channel division may be performed as follows:

step 2.2 a.1): let the set Sbedt be Sbed,

Ssrt＝Ssr。

step 2.2 a.2): and traversing the set Ssrt until finding a downmix compatible base channel k satisfying the following relation or the traversal is finished:

for all channels n belonging to Sc, fb (k, n) ═ 0.

If no such downmix compatible base channel k can be found, a jump is made to step 2.2 a.5.

Step 2.2 a.3): for the downmix compatible base channel k found in step 2.2a.2, one base channel m is selected from Sbedt such that fb (k, m) is not 0, and fb (k, m) is invertible, and base channel m is not in Se. If no such base channel m can be found, a jump is made to step 2.2 a.5.

Step 2.2 a.4): removing all base channels i which satisfy the condition that fb (k, i) is not 0 from the Sbedt to obtain new Sbedt; adding the basic channel m found in the step 2.2a.3) into Sc to obtain new Sc; the downmix compatible base channel k is removed from the Ssrt to obtain a new Ssrt. And if neither Ssrt nor Sbodt is empty, jumping to step 2.2 a.2.

Step 2.2 a.5): sc is the compatible base channel set.

2) QR decomposition-based partitioning method

If the downmix function can be expressed as fb (k, i) (x (t)) ═ a (k, i) × (x (t)) (a (k, i) is a real number), the downmix of the base channels can be expressed in the form of a matrix operation:

forming a downmix multi-channel signal matrix Hs _ bed

Form a base channel signal matrix Hb

a (k, i) forms a basic channel down-mixing coefficient matrix HA, and the HA is a Ns x Nb matrix

Hs_bed＝HA*Hb

At this time, the division of the base channel may be performed as follows:

step 2.2 b.1): let Sbedc be Sbed,

step 2.2 b.2): by all channel signals in SbedcComposition matrix

Their corresponding a (k, i) form a compatible basic channel down-mixing coefficient matrix HAc, which is Ns-Nbc matrix

Their downmix forming a set of compatible base channel downmix components Ssrbed_copAll of

Composition matrix

Satisfies the following conditions: hs _ bedcop ═ HAc × Hbc

Step 2.2 b.3): QR decomposition of HAc yields HAc Q HR, where Q is a unitary matrix of Ns × Ns and HR is an upper triangular matrix of Ns × Nbc.

Step 2.2 b.4): assuming that M is the minimum of Ns and Nbc, M ═ min (Ns, Nbc), if for each n ═ 1 … M satisfies r (n, n)>0 then execute 2.2 b.5; otherwise, the following operations are carried out: for each n-1 … M, if r (n, n) is 0, the nth channel in Sbedc is added

Removed from Sbedc, thus forming a new set Sbedc ', let Sbedc be Sbedc' and jump to step 2.2 b.2.

Step 2.2 b.5): only keeping n-1 … M sound channels in Sbodc to form new Sbodc, wherein Nbc-M is less than or equal to Ns; any subset of Sbedc may also be used as the new Sbedc.

Obtaining Sbodc through the operation of the above steps, and obtaining all the channel signals in Sbodc

Composition matrix

Their corresponding a (k, i) form a down-mixing coefficient matrix HAc, which is Ns Nbc matrix

Their downmix forming a set of compatible base channel downmix components Ssrbed_copOf

Composition matrix

Satisfies the following conditions: hs _ bedcop ═ HAc × Hbc. The result of QR decomposition of HAc is HAc Q HR, where Q is a unitary matrix of Ns × Ns and HR is an upper triangular matrix of Ns × Nbc:

m Nbc ≦ Ns, r (n, n) >0 being satisfied for each n 1 … M.

It can be shown that Sbedc obtained by the above step operation is a compatible base channel set of Sbed:

Hs_bedcop＝HAc*Hbc＝Q*HR*Hbc

since Q is a unitary matrix of Ns × Ns, there exists an inverse matrix Q^-1Is obtained by

HR*Hbc＝Q^-1*Hs_bedcop＝QHs_bedcop

Since M min (Ns, Nbc) ═ Nbc and r (n, n) >0 for each n 1 … M, the first Nbc rows of HR can be used to construct a square matrix HRm

The square HRm is a triangular matrix with r (n, n) for each n-1 … M>0, so HRm there is an inverse matrix HRm^-1. Get Q^-1The first Nbc rows of the matrix invQm are constructed, and the requirement of the matrix invQm is met

HRm*Hbc＝invQm*Hs_bedcop

Therefore, the method comprises the following steps:

Hbc＝HRm^-1*HRm*Hbc＝HRm^-1*invQm*Hs_bedcop

let invHRQ be HRm^-1invQm of

Hbc＝invHRQ*Hs_bedcop

I.e. all channel signals in the SbedcMay be based on the set Ssrbed_copAnd fb (k, i), beta (m), bctob (n), which satisfy the definition of the compatible base channel set, so Sbedc is a compatible base channel set of Sbed. The characteristics in this case are: the matrix composed of the compatible base channels can be obtained by inverse mapping the matrix composed of all the compatible base channel downmix components, and the inverse mapping process can be expressed by the matrix invHRQ. In this case, the base channel division side information includes beta (m) (m ═ 1 … Nbe), bctob (n) (n ═ 1 … Nbc), invHRQ, and invHRQ, which is called a compatible base channel mapping coefficient inverse matrix and can be calculated from information such as fb (k, i), beta (m), bctob (n).

In a broader range, if fb (k, i) (x (t)) ═ a (k, i) × fb (k,1) (x (t)) and fb (k,1) there is an inverse function fb^-1(k,1), the basic channel division may be performed according to the above-described QR decomposition-based division method, and in this case, after Hbc is obtained by matrix operation, fb is required for each decoded channel signal^-1(k,1) to obtain a final decoded compatible base channel signal.

Example 3

As shown in fig. 5, the three-dimensional sound encoder provided by the present invention includes a downmix module, a basic channel dividing module, a compatible encoding module, an extension encoding module, and a packing module. The difference from example 2 is that the downmix scheme employs an external input.

Step 3.1) basic sound channel dividing module

The base channel is divided into two parts, a compatible base channel and an extension base channel, according to an externally input downmix scheme (which may be expressed, for example, by a set of mapping functions fb (k, i), fo (k, j)), and base channel dividing side information (e.g., beta (m), bctob (n)) is determined.

This step 3.1) may not be required if the three-dimensional sound program does not include the base channel, or if there is a simple and clear one-to-one correspondence between the compatible base channel, the extension base channel and the base channel.

Step 3.2) Down-mixing Module

Downmixing the three-dimensional sound program into a compatible stereo/multi-channel signal according to an externally input downmix scheme, which may for example be expressed as a set of mapping functions fb (k, i), fo (k, j), resulting in a downmix compatible base channel signal

the base channel downmix component is generated from the signal of each base channel, and the base channel downmix component of each downmix compatible base channel k may be calculated as follows:

The sound object downmix component is generated by rendering the signal of each sound object according to the multi-channel system to be compatible, and the sound object downmix component of each downmix compatible base channel k can be calculated according to the following formula:

fo (k, j) is a downmix mapping function when a kth channel of a jth object downmix multi-channel signal is downmixed.

Base channel downmix component

And can be divided into extension base channel downmix components

And compatible base channel downmix component

And compatible base channel downmix component

Example 4

When the extended encoding module employs lossy encoding, the three-dimensional vocoder may be further optimized to include an extended decoding module. As shown in fig. 6, the improved three-dimensional sound encoder includes a downmix module, a base channel dividing module, an extension encoding module, an extension decoding module, a compatible encoding module and a packing module; the extension decoding module decodes extension encoded data output by the lossy extension encoding module, and outputs the decoded downmix scheme, the decoded extension base channel, the decoded sound object, and the decoded base channel partition side information to the downmix module.

Since the decoded data of the extended base channel and the sound object are used in the downmix module, the improved three-dimensional sound coding method has the following features:

1. when the existing stereo or surround sound system is adopted to play the sound data obtained by the improved three-dimensional sound coding method, the quality of the down-mixing compatible basic sound channel is reduced to a certain extent. This is because the extension base channel downmixed to the "downmix compatible base channel" and the sound object data are encoded twice, resulting in degradation of the quality of these component sounds.

2. When the three-dimensional sound system is adopted to play sound data obtained by the improved three-dimensional sound coding method, the improved three-dimensional sound coding method can improve the coding quality of three-dimensional sound under the condition that the coding distortion of the compatible coding module is small. This is because with the improved three-dimensional acoustic coding method, the three-dimensional acoustic decoder introduces less new errors when de-downmixing, thus improving the quality of the compatible base channels in the three-dimensional acoustic signal.

Therefore, the improved three-dimensional sound coding method is suitable for the application occasions that the coding distortion of the compatible coding module is small enough and the emphasis is placed on improving the three-dimensional sound quality.

Example 5

The code stream manufactured by the three-dimensional sound coding method can be compatible with the existing stereo and surround sound player systems. When the audio processor (hardware/software) only supports processors in surround or stereo format, only the downmix compatible base channel data needs to be sent to the surround or stereo processor (hardware/software), the same listening experience as existing stereo, 5.1 or 7.1 surround sound can be obtained without loss of substantial two-dimensional sound information. In the case of a three-dimensional acoustic processor (hardware/software), all data is transmitted, i.e. three-dimensional acoustic decoding and playback can be achieved. The playing and sound processing processes of the code stream manufactured by the three-dimensional sound coding method provided by the invention are shown in fig. 7 and 8.

The server or the upper computer possesses and processes the code stream, typical servers such as a broadcast audio and video server, a streaming media server, a background server for requesting channels and the like, and typical upper computers such as a mobile phone, a computer, a PAD and the like connected with a digital earphone amplifier. In general, a server or a host computer provides three-dimensional sound audio data or transmits only downmix compatible base channel data according to a network condition and/or a support condition of an audio player for three-dimensional sound. The input source can adopt one or both of two packing modes according to convention, and the code stream separation module separates the downmix compatible basic channel data to the surround sound or stereo sound processor, or sends the complete data to the three-dimensional sound processor.

The three-dimensional sound decoder shown in fig. 9 includes a code stream separating module, a compatible decoding module, an extended decoding module, a de-downmix module, a basic channel combining module, and a three-dimensional sound rendering module;

step 5.1) code stream separation module

Reading parallel down-mixing compatible basic audio channel data code stream, extended coding code stream and/or three-dimensional audio data code stream from input sources such as a server and an upper computer, and separating the down-mixing compatible basic audio channel data and the extended coding data.

Step 5.2) compatible decoding module

Decoding the downmix compatible base channel data to obtain a decoded downmix compatible base channel signal

Step 5.3) extended decoding module

The extension encoded data is decoded to obtain a decoded sound object, a decoded extension base channel signal, a decoded downmix scheme and a decoded base channel partition side information. If the same determined downmix scheme, the base channel partition side information, is followed in encoding, decoding, the base channel partition side information, the downmix scheme may not be decoded, but generated according to a determined rule.

The process of extension decoding is the inverse of the aforementioned extension encoding.

Step 5.4) Down-mix removal Module

And inputting the expanded coded data to a de-downmix module through an expanded decoding module to obtain a decoded downmix scheme, decoded expanded basic channel data, decoded basic channel partition side information, and a decoded sound object and a downmix compatible basic channel. The de-downmix module performs an inverse process of the downmix module to obtain a de-extended base channel and a compatible base channel of the sound object information.

According to the decoded downmix scheme, the decoded extension base channel signal and the downmix component of the decoded sound object are removed from the decoded downmix compatible base channel signal, and inverse mapping is performed to obtain the decoded compatible base channel signal. The method comprises the following two steps:

step 5.4.1): computing compatible base channel downmix components

Down-mixing the decoded extension base channel signal and the decoded sound object according to the decoded down-mixing scheme, and removing down-mixing components of the decoded extension base channel signal and the decoded sound object from the decoded down-mixing compatible base channel signal to obtain a decoded compatible base channel down-mixing component;

step 5.4.2): inverse mapping

And inverse mapping the decoded compatible base channel downmix component to obtain a decoded compatible base channel signal.

If the process of inverse mapping is a simple pass-through relationship, i.e. the decoded compatible base channel signal is identical to the decoded compatible base channel signal, then step 5.4.2) is not required: and (5) inverse mapping.

The downmix process will be described in detail below by taking two cases corresponding to the encoding end as examples.

1) If the division method of the corresponding downmix channel is used in the encoding process:

at this time, the decoded base channel division side information includes beta (m) (m-1 … Nbe), bctob (n) (n-1 … Nbc), and compot (n) (n-1 … Nbc), and the decoded downmix scheme includes downmix mapping functions fb (k, i), fo (k, j).

Step 5.4 a.1): computing compatible base channel downmix components

As shown in fig. 11, for each compatible base channel n-1 … Nbc, k-compot (n) for its corresponding downmix compatible base channel, the decoded downmix compatible base channel signal is derived from

Removing the decoded extension base channel signal, the downmix component of the decoded sound object to obtain a compatible base channel downmix component

Wherein the base channel downmix component is extended

Down-mixing the decoded extended base channel signal according to the decoded down-mixing scheme to obtain:

sound object downmix component

Rendering the decoded sound object downmix in accordance with the decoded downmix scheme to:

step 5.4 a.2): inverse mapping

According to the inverse function fb of the mapping function fb (k, bctob (n)) in the decoded downmix scheme^-1(k, bctob (n)) pair of compatible base channel downmix components

Inverse mapping is performed to obtain a decoded compatible base channel

Inverse function fb^-1The operation of (k, bctob (n)) is the inverseAnd (5) mapping.

2) If the dividing method based on QR decomposition is used in the encoding process:

at this time, the decoded base channel division side information includes beta (m) (m-1 … Nbe), bctob (n) (n-1 … Nbc), and a compatible base channel mapping coefficient inverse matrix invHRQ, and the decoded downmix scheme includes downmix mapping functions fb (k, i), fo (k, j).

Step 5.4 b.1): computing compatible base channel downmix components

For each downmix compatible base channel k, deriving a downmix compatible base channel signal from the decoded downmix compatible base channel signal

Wherein the base channel downmix component is extended

sound object downmix component

all of

Composition matrix

Step 5.4 b.2): inverse mapping

As described above, the downmix mapping function fb (k, i) at this time satisfies:

fb(k,i)(x(t))＝a(k,i)

based on the decoded inverse matrix invHRQ of the compatible base channel mapping coefficients, Hbc is obtained by inverse mapping as follows

Hbc＝invHRQ*Hs_bedcop

Line n of Hbc is the decoded compatible base channel signal

If the encoding end does not encode the inverse matrix invHRQ of the mapping coefficients of the compatible basic channel, the decoding end calculates by the same method as that in the basic channel division module of step 2.2) to obtain invHRQ.

It should be noted that the above expression is an expression of a mathematical relationship of inverse mapping, and there are various equivalent implementations and processes, for example, Hbc can also be obtained by the following process:

and constructing a compatible base channel downmix coefficient matrix HAc by the downmix coefficients a (k, i) of the compatible base channel, wherein HAc is a Ns Nbc matrix:

the result of QR decomposition of HAc is HAc Q HR, where Q is a unitary matrix of Ns × Ns and HR is an upper triangular matrix of Ns × Nbc:

m Nbc ≦ Ns, r (n, n) >0 being satisfied for each n 1 … M.

Since Q is a unitary matrix of Ns × Ns, there exists an inverse matrix Q^-1To construct a matrix

QHs_bedcop＝Q^-1*Hs_bedcop

Since M min (Ns, Nbc) ═ Nbc and r (n, n) >0 for each n 1 … M, the first Nbc rows of HR can be used to construct a square matrix

The square HRm is a triangular matrix with r (n, n) for each n-1 … M>0, so HRm there is an inverse matrix HRm^-1. The first Nbc lines of QHs _ bedcop can be taken to construct a square matrix QHs _ bedcop, and Hbc can be obtained by calculating as follows:

Hbc＝HRm^-1*QHs_bedcopm

line n of Hbc is the decoded compatible base channel signal

In a broader range, if fb (k, i) (x (t)) ═ a (k, i) × fb (k,1) (x (t)) and fb (k,1) there is an inverse function fb^-1(k,1), after Hbc is obtained according to the above calculation, fb is required to be performed for each decoded channel signal^-1(k,1) to obtain a final decoded compatible base channel signal.

Step 5.5: basic sound channel combined module

And combining the decoded compatible basic channel signal and the decoded extension basic channel signal according to the decoded basic channel division side information to obtain a decoded basic channel signal.

This operation of step 5.5) may not be performed if the three-dimensional program has no base channel signal, or if there is a simple one-to-one correspondence between the compatible base channel signal, the extension base channel signal and the base channel, or if the extension base channel already contains all base channels.

Step 5.6): three-dimensional sound rendering module

And receiving the compatible basic sound channel code stream, the decoded extended basic sound channel and the sound object data to perform three-dimensional sound rendering, and generating three-dimensional sound PCM data.

There is no precedence constraint between step 5.2) and step 5.3).

In a compatible stereo, surround sound system, only step 5.1) is performed resulting in a decoded downmix compatible base channel signal.

Example 6

In various application scenes such as a digital television system, live network broadcasting, on-demand network broadcasting, local broadcasting and the like, the support for audio and video code streams is the most core requirement. Currently, the digital television system employs an MPEG TS (Transport Stream) protocol; in the existing various Streaming media communication protocols (such as apple HTTP Live Streaming protocol, MPEG-DASH protocol, microsoft Smooth Streaming protocol, etc.), an audio/video protocol similar to an MPEG-TS mechanism is also adopted; and subsequently, the audio and video protocols for transmission are collectively called TS (transport stream) protocols, and the audio and video code streams are encapsulated in the TS code streams. Similarly, in the media played locally, MPEG PS (program Stream) or similar audio/video packing protocol is also commonly used, and at this time, the audio/video code Stream is encapsulated in the PS code Stream.

The TS stream is used for network transmission, and the PS stream is used for local playing, which are the same in nature, in both stream organizations, there is a specific identifier (such as stream-type) to identify whether the segment stream is audio or video, and the specific identifier also reserves some value for user extension for storing customized data stream (called private stream in the present invention) in the segment.

The invention packs the expanded coded data in the private stream, for the ordinary stereo/surround processor, it can't identify this segment of private stream, therefore it can't interpret the private stream, give up directly; for three-dimensional acoustic terminals or applications, it can recognize this segment of the private stream and then perform the subsequent decoding process.

Reference specification: a general coding first part system of GB/T17975.1-2010 information technology moving pictures and their accompanying sound;

GB/T17975.3-2000 moving picture and its accompanying sound general coding second part video;

GB/T17975.3-2002 moving picture and its accompanying sound.

Step 6.1) encoding

The uncompressed video generates video coding data through a video coding module (H.264, JPEG and the like);

generating audio coding data by a down-mixing compatible basic channel after down-mixing of three-dimensional sound signals (basic channels and/or sound objects) through an audio coding module (such as AAC, AC3, MP3, AVS and other standards); the sound objects and the downmix scheme, the extended base channel, and the base channel division side information are subjected to an extension encoding module (such as standards of Dolby Atmos, AVS2-P3, and MPEG H) to generate extension encoded data.

The audio coding module and the extension coding module may adopt common coding standards such as AAC, AC3, MP3, AVS, Dolby Atmos, AVS2-P3, MPEG H, and the like, or may adopt any proprietary format, and the audio coding module and the extension coding module may adopt the same format, or may adopt different coding formats. Typically, the audio encoding module generally adopts a common encoding standard to achieve compatibility with an existing audio/video system.

Step 6.1) packing

In the TS/PS packing module, video coding data are packed into a video stream conforming to TS/PS specifications; the audio encoding data is packetized into an audio stream compliant with the TS/PS specification, and the extension encoding data is packetized into a private stream compliant with the TS/PS specification.

For terminals or applications that support surround sound, stereo only:

step 6.2.1):

the TS/PS stream unpacking module takes out the audio and video coding data from the code stream, and the expansion coding data is directly ignored because the private stream containing the expansion coding data can not be identified.

Step 6.2.2):

the video coded data is decoded by a video decoding module and then output to display equipment;

and the audio coding data is decoded by an audio decoding module to obtain a down-mixing compatible basic sound channel and then is played.

For a terminal or application that supports three-dimensional sound:

step 6.3.1):

the TS/PS stream unpacking module takes the audio and video coding data out of the code stream, and the expansion coding data is also unpacked at the same time as the TS/PS stream unpacking module can identify the private stream containing the expansion coding data;

step 6.3.2):

the audio coding data is decoded by an audio decoding module to output a decoded downmix compatible base channel;

the extension coding data is decoded into a decoded down-mixing scheme, a decoded extension basic channel, decoded basic channel division side information and a decoded sound object by an extension coding module;

step 6.3.3):

the decoded downmix scheme, the decoded downmix compatible base channel, the decoded base channel partition side information, the decoded extension base channel and the decoded sound object are passed through a downmix module to obtain a compatible base channel;

step 6.3.4):

according to the decoded basic sound channel division side information, a basic sound channel combination module is carried out on the compatible basic sound channel, the decoded expanded basic sound channel and the decoded sound object to recover a basic sound channel;

step 6.3.5):

and the basic sound channel and the decoded sound object are processed by a rendering module to generate a three-dimensional sound multichannel PCM code stream for playing.

As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A three-dimensional acoustic encoder, comprising: the three-dimensional sound comprises a base channel and/or a sound object; characterized in that said encoder comprises: the down-mixing and basic sound channel dividing module is used for receiving the three-dimensional sound, performing down-mixing basic sound channel dividing operation according to a down-mixing scheme and outputting down-mixing compatible basic sound channels, expanded basic sound channels and basic sound channel dividing side information; a compatible coding module for receiving the downmix compatible base channel and outputting downmix compatible base channel coded data; the extension coding module is used for receiving the sound object, the down mixing scheme, the extension basic sound channel and the basic sound channel dividing side information and outputting extension coding data; the packaging module is used for receiving the down-mixing compatible basic sound channel coding data and the expansion coding data, respectively packaging the down-mixing compatible basic sound channel coding data and the expansion coding data to output a down-mixing compatible basic sound channel data code stream and an expansion coding data code stream or respectively mixing the down-mixing compatible basic sound channel coding data and the expansion coding data to package to output a three-dimensional sound data code stream; when the three-dimensional sound does not comprise a basic sound channel, the down-mixing and basic sound channel dividing module does not carry out basic sound channel dividing operation, the compatible coding module does not carry out compatible coding operation, and the extension coding module does not carry out extension coding operation of extending the basic sound channel and the basic sound channel dividing side information; when the three-dimensional sound does not include a sound object, the extension encoding module does not perform an extension encoding operation of the sound object.

2. The three-dimensional acoustic encoder according to claim 1, wherein: the down-mixing and basic sound channel dividing module comprises a down-mixing module and a basic sound channel dividing module, wherein the down-mixing module is used for receiving basic sound channels and sound objects and outputting a down-mixing compatible basic sound channel and a down-mixing scheme, and the basic sound channel dividing module is used for receiving the basic sound channels and the down-mixing scheme generated by the down-mixing module and outputting expanded basic sound channels and basic sound channel dividing side information.

3. The three-dimensional acoustic encoder according to claim 1, wherein: the down-mixing and basic sound channel dividing module comprises a down-mixing module and a basic sound channel dividing module, wherein the down-mixing module is used for receiving a basic sound channel, a sound object and an externally input down-mixing scheme and outputting a down-mixing compatible basic sound channel; the basic sound channel dividing module is used for receiving a basic sound channel and an externally input down mixing scheme and outputting expanded basic sound channel and basic sound channel dividing side information.

4. The three-dimensional acoustic encoder according to claim 1, wherein: the down-mixing and basic sound channel dividing module comprises a down-mixing module and a basic sound channel dividing module, wherein the basic sound channel dividing module is used for receiving a basic sound channel and an externally input down-mixing scheme and outputting the expanded basic sound channel and basic sound channel dividing side information; the extension coding data output by the extension coding module is decoded by an extension decoding module, the decoded downmix scheme, the decoded extension base channel, the decoded sound object, the decoded base channel partition side information and the base channel are input to the downmix module, and the downmix module outputs the downmix compatible base channel.

5. The three-dimensional acoustic encoder according to any one of claims 2 to 4, wherein: the downmix module downmixes the basic channel and the sound object into a downmix compatible basic channel according to a downmix scheme, the downmix compatible basic channel signal is divided into a basic channel downmix component and a sound object downmix component, and the basic channel downmix component is divided into an extended basic channel downmix component and a compatible basic channel downmix component.

6. The three-dimensional acoustic encoder according to any one of claims 2 to 4, wherein: the basic sound channel dividing module divides the basic sound channel into a compatible basic sound channel and an expanded basic sound channel; the basic sound channel dividing scheme adopted by the basic sound channel dividing module is determined according to the sound channel configuration of the basic sound channel, a multi-channel system to be compatible and a down-mixing mapping function.

7. The three-dimensional acoustic encoder according to claim 6, wherein: the base channel division scheme determined according to the division method of the corresponding downmix channel includes:

s11: let the set Sbedt be Sbed,

ssrt ═ Ssr, set of base channel signals

Downmix compatible base channel signal set

fb (k, i) is the downmix mapping function;

for all channels n belonging to Sc, fb (k, n) ═ 0;

if not, go to step S15;

8. The three-dimensional acoustic encoder according to claim 6, wherein: the basic channel division scheme determined according to the QR decomposition-based division method includes:

s21: let Sbbedc be Sbed, Sbed is the basic sound channel signal set;

9. The three-dimensional acoustic encoder according to claim 1, wherein: the compatible coding module and the expansion coding module adopt the same coding format or different coding formats for coding.

10. The three-dimensional acoustic encoder according to claim 1, wherein: the compatible coding module is an audio coding module and is used for receiving the down-mixing compatible basic sound channel and outputting audio coding data; and the packaging module is a TS/PS packaging module and is used for respectively packaging the audio coding data and the extension coding data or performing mixed packaging on the audio coding data and the extension coding data to output an audio stream and a private stream which accord with TS/PS standards.

11. A three-dimensional vocoding method, comprising the steps of: downmixing the basic sound channel and/or the sound object into a compatible downmixed compatible basic sound channel according to a downmixing scheme, dividing the basic sound channel into an expanded basic sound channel and a compatible basic sound channel, and determining basic sound channel division side information; when the three-dimensional sound does not comprise a basic sound channel, the down-mixing and basic sound channel dividing module does not carry out basic sound channel dividing operation, the compatible coding module does not carry out compatible coding operation, and the extension coding module does not carry out extension coding operation of extending the basic sound channel and the basic sound channel dividing side information; when the three-dimensional sound does not comprise a sound object, the extension coding module does not perform extension coding operation of the sound object; coding the sound object, the down mixing scheme, the expanded basic sound channel and the basic sound channel division side information to obtain expanded coded data; and coding the downmix compatible basic channel to generate downmix compatible basic channel coding data, and separately packaging or mixedly packaging the downmix compatible basic channel coding data and the extension coding data and then outputting the downmix compatible basic channel coding data and the extension coding data.

12. A three-dimensional acoustic decoder, characterized by: the code stream separation module is used for receiving the down-mixing compatible basic sound channel data code stream, the expanded coding data code stream and/or the three-dimensional sound data code stream which are respectively input in a packaging mode and are input in a mixing mode, and separating and outputting the down-mixing compatible basic sound channel data and the expanded coding data; a compatible decoding module for receiving the downmix compatible base channel data and outputting a decoded downmix compatible base channel; an extension decoding module for receiving extension decoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, and a decoded sound object; a downmix module for receiving the decoded downmix compatible base channel, the decoded downmix scheme, the decoded extension base channel, the decoded base channel partition side information, the decoded sound object, and outputting a compatible base channel; a basic channel combination module for receiving the compatible basic channel, the decoded extended basic channel and the decoded basic channel dividing side information and outputting the basic channel; and the rendering module is used for receiving the basic sound channel and the decoded sound object and outputting the three-dimensional sound multichannel PCM code stream.

13. A three-dimensional sound decoder for decoding output code streams conforming to TS/PS specifications is characterized in that: the system comprises a TS/PS stream unpacking module, a private stream decoding module and a voice stream decoding module, wherein the TS/PS stream unpacking module is used for receiving a TS/PS stream, analyzing an audio stream and the private stream from the TS/PS stream, outputting audio coding data from the audio stream, and outputting extended coding data from the private stream; an audio decoding module for receiving audio decoding data and outputting a decoded downmix compatible base channel; an extension decoding module for receiving extension decoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, and a decoded sound object; a downmix module for receiving the decoded downmix compatible base channel, the decoded downmix scheme, the decoded extension base channel, the decoded base channel partition side information, the decoded sound object, and outputting a compatible base channel; a basic channel combination module for receiving the compatible basic channel, the decoded extended basic channel and the decoded basic channel dividing side information and outputting the basic channel; and the rendering module is used for receiving the basic sound channel and the decoded sound object and outputting the three-dimensional sound multichannel PCM code stream.

14. The three-dimensional acoustic decoder according to claim 12 or 13, wherein: the de-downmix module comprises: the decoded extension base channel signal and/or the down-mix component of the decoded sound object are removed from the decoded down-mix compatible base channel signal according to the decoded down-mix scheme, resulting in a decoded compatible base channel signal.

15. The three-dimensional acoustic decoder according to claim 14, wherein: the de-downmix module comprises the following steps:

16. The three-dimensional acoustic decoder according to claim 15, wherein: for a downmix base channel division operation performed according to a base channel division scheme determined by a division method of a corresponding downmix channel, the downmix module comprises the steps of:

2) Inverse mapping: for each compatible base channel n-1 … Nbc, k-compot (n) for its corresponding downmix compatible base channel, the inverse function fb of the mapping function fb (k, bctob (n)) in the decoded downmix scheme is followed^-1(k, bctob (n)) downmix components on the decoded compatible base channel

Inverse mapping is performed to obtain a decoded compatible base channel

17. The three-dimensional acoustic decoder according to claim 15, wherein the downmix module comprises, for a downmix and base channel division operation according to a base channel division scheme determined by a QR decomposition based division method:

All of

The composition matrix Hs _ bedcop

Hbc＝invHRQ*Hs_bedcop

Line n of Hbc is the decoded compatible base channel signal

18. A three-dimensional acoustic decoding method, comprising the steps of: acquiring a down-mixing compatible basic sound channel data code stream, an expanded coding data code stream and/or a three-dimensional sound data code stream which are respectively input in a packaging mode, and separating and outputting the down-mixing compatible basic sound channel and the expanded coding data; decoding the downmix compatible base channel data to obtain a decoded downmix compatible base channel; decoding the extension encoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, a decoded sound object, and a downmix compatible base channel to perform a downmix operation and output a compatible base channel; dividing side information according to the decoded basic sound channel, and combining the compatible basic sound channel and the decoded expanded basic sound channel to generate a basic sound channel; and performing three-dimensional sound rendering on the basic sound channel and the decoded sound object to generate a three-dimensional sound multi-channel PCM code stream.

19. A three-dimensional sound decoding method for decoding output code streams conforming to TS/PS specifications is characterized in that: the method comprises the following steps: acquiring an audio stream and a private stream which are respectively packaged and input, and outputting audio coded data and extension coded data; decoding the audio coded data to obtain a decoded downmix compatible base channel; decoding the extension encoded data, outputting a decoded downmix scheme, a decoded extension base channel, decoded base channel partition side information, a decoded sound object, and a downmix compatible base channel to perform a downmix operation and output a compatible base channel; dividing side information according to the decoded basic sound channel, and combining the compatible basic sound channel and the decoded expanded basic sound channel to generate a basic sound channel; and performing three-dimensional sound rendering on the basic sound channel and the decoded sound object to generate a three-dimensional sound multi-channel PCM code stream.