WO2014005327A1

WO2014005327A1 - Method for encoding multichannel digital audio

Info

Publication number: WO2014005327A1
Application number: PCT/CN2012/078306
Authority: WO
Inventors: 闫建新; 王磊
Original assignee: 深圳广晟信源技术有限公司
Priority date: 2012-07-06
Filing date: 2012-07-06
Publication date: 2014-01-09
Also published as: CN103650036B; CN103650036A

Abstract

The present invention provides a method for encoding multichannel digital audio, comprising: dividing multichannel audio into a base layer and at least one enhancement layer; configuring the number of bytes for the base layer and the at least one enhancement layer respectively; and encoding the base layer and the at least one enhancement layer respectively. According to the present invention, decrease of the encoding efficiency that is caused by fine layering is avoided to some extent, and at the same time, applications such as digital audio broadcast in some fields are satisfied. The implementation of the present invention is easy, optimal comprehensive sound quality is obtained by flexibly controlling the quality of a sound channel at each layer, channel encoding requirements are easy to be satisfied, and various limiting conditions during fine layering are not required, thereby ensuring compression at higher efficiency.

Description

Method for encoding multi-channel digital audio

Technical field

The present invention relates to the field of audio coding processing, and more particularly to a method of encoding multi-channel digital audio.

Background technique

In the field of multi-channel digital audio layered audio coding, there are already lossy digital audio coding methods and lossless audio coding techniques through fine layering, such as ISO/IEC 14496-3. MPEG-4 BSAC (Bit sliced arithmetic coding) bit slice arithmetic coding, in AVS (Audio Video coding) Standard Workgroup of China) similar to MPEG-4 BSAC encoding method and MPEG-4 SLS (Scalable Lossless Coding's lossless enhancement layer approach enables fine layering of audio and encoding each layer separately. However, the fine layering method has the disadvantages that the layering is too thin, requires a lot of auxiliary information, low coding efficiency, complicated structure, and high processing logic complexity.

There is also a non-fine layered coding scheme in the prior art: a scalable sample rate coding algorithm is provided in both MPEG-4 Part 3 and MPEG-2 Part 7. AAC-SSR (Advanced Audio Coding-Scalable Sampling Rate), first proposed by Sony, the encoding architecture is similar to its unique ARTAC (Adaptive Transform Acoustic) Coding) coding. The encoding scheme first passes the input digital audio signal through a 4-band polyphase quadrature filter bank (PQF, Polyphase Quadrature). Filter) is divided into four frequency bands, and then the four frequency bands respectively perform one 256-point MDCT (512 sample window length) or eight 32-point (64 sample window length) MDCT. The coding scheme can also reduce the data rate by removing the high PQF band, and achieve bit stream layering by reducing the frequency band, thereby obtaining different bit rates and sampling rates. The advantage of this coding scheme is that the long block or short block MDCT can be independently selected in each frequency band, so that the high frequency can use short block coding to enhance the time resolution; and the low frequency use the long block coding to obtain high frequency resolution. However, due to the overlap between the four PQF bands, the coding efficiency of the transform domain coefficients of adjacent parts will decrease.

Summary of the invention

To solve the above technical problem, the present invention provides a method for encoding a multi-channel digital audio, comprising: dividing a multi-channel audio into a basic layer and at least one enhancement layer; respectively configuring a basic layer and at least one enhancement layer The number of bytes; respectively encoded for a base layer and at least one enhancement layer.

Preferably, the multi-channel audio signal is divided into a base layer and an enhancement layer; wherein the base layer includes at least one full-band channel, the enhancement layer includes at least one full-band channel; and the base layer includes a full-band channel that is not greater than The number of full-band channels included in the enhancement layer.

Preferably, the full-band channel included in the base layer is smaller than the number of full-band channels included in the enhancement layer The situation also includes: the number of bytes configured for the base layer is the total number of bytes of the data frame/2, the number of bytes per channel of the base layer is the total number of bytes of the data frame/2* the number of full-band channels included in the base layer The number of bytes configured for the enhancement layer is the total number of bytes of the data frame/2; the number of bytes per channel of the enhancement layer is the total number of bytes of the data frame/2* the number of full-band channels included in the enhancement layer.

Preferably, for the case where the baseband included in the base layer is equal to the number of full-band channels included in the enhancement layer, the method further includes: configuring the number of bytes for the base layer to be greater than the total number of bytes of the data frame/2; configuring the enhancement layer The number of bytes is less than the total number of bytes in the data frame/2.

Preferably, the method further comprises: configuring the same number of bytes for each full-band channel, which is the total number of bytes of the data frame / (the number of full-band channels included in the base layer + the number of full-band channels included in the enhancement layer) ).

Preferably, the method further comprises: configuring, for each full-band channel in the base layer, the number of bytes as the total number of bytes of the data frame/the number of full-band channels included in the basic layer, and (the total number of bytes of the data frame/2) > (total number of bytes of data frame / number of full-band channels included in the base layer) > (total number of bytes of data frame / (number of full-band channels included in the base layer + number of full-band channels included in the enhancement layer)) The number of bytes for a channel configuration of the enhancement layer is greater than the total number of bytes of the data frame * (1-1 / the number of full-band channels included in the base layer) / the number of full-band channels included in the enhancement layer, and the rest The number of bytes of at least one channel configuration is less than the total number of bytes of the data frame* (1-1/the number of full-band channels included in the base layer)/the number of full-band channels included in the enhancement layer.

Preferably, the number of bytes is separately configured for the base layer and the enhancement layer according to a block size of LDPC coding, a channel coding condition, a characteristic of the base layer, and/or a characteristic of the enhancement layer in each transmission frame.

Preferably, the multi-channel audio signal is divided into a base layer and a plurality of enhancement layers; wherein the base layer comprises at least one full-band channel, the plurality of enhancement layers respectively comprise at least one full-band channel; the base layer comprises a full-band The channel is less than the sum of the full-band channels included in all enhancement layers.

Preferably, the number of bytes configured for the base layer is the total number of bytes of the data frame/2, and the number of bytes per channel of the base layer is the total number of bytes of the data frame/2* the number of full-band channels included in the base layer; The sum of the number of bytes of the at least one enhancement layer configuration is the total number of bytes of the data frame/2, wherein the number of bytes of each full-band channel of the first enhancement layer is greater than the total number of bytes of the data frame/2 (enhancement layer includes The number of full-band channels + the number of full-band channels included in the base layer), The number of bytes of each full-band channel of the remaining at least one enhancement layer is less than the total number of bytes of the data frame/2 (the number of full-band channels included in the enhancement layer + the number of full-band channels included in the base layer).

Preferably, the same number of bytes are configured for each full-band channel, which is the total number of bytes of the data frame / (the number of full-band channels included in the base layer + the sum of the number of full-band channels included in all enhancement layers) ).

Preferably, the number of bytes configured for each full-band channel in the base layer is the total number of bytes of the data frame/the number of full-band channels included in the base layer, and (the total number of bytes of the data frame/2)> (data frame) Total number of bytes / number of full-band channels included in the base layer)> (total number of bytes of data frame / (the number of full-band channels included in the base layer + the sum of the number of channels of the full-band included in all enhancement layers)); The number of bytes of a channel configuration for the first enhancement layer is greater than the total number of bytes of the data frame* (1-1/the number of full-band channels included in the base layer) / the sum of the total number of channels included in all enhancement layers And the number of bytes configured for the remaining at least one channel is less than the total number of bytes of the data frame * (1-1 / the number of full-band channels included in the base layer) / the sum of the number of full-band channels included in all enhancement layers.

Preferably, the number of bytes is separately configured for the base layer and the at least one enhancement layer according to the block size of the LDPC encoding, the channel coding condition, the characteristics of the base layer, and/or the characteristics of the enhancement layer in each transmission frame.

Preferably, the method further includes: encoding, by using a DRA coding algorithm, a basic layer and at least one enhancement layer, respectively.

Preferably, the method further comprises: separately performing bandwidth expansion on the base layer and/or the at least one enhancement layer.

The invention also provides a method for encoding multi-channel digital audio, comprising: dividing a multi-channel audio signal into a base layer and an enhancement layer, wherein the base layer comprises at least one full-band channel, and the enhancement layer comprises at least one Full-band channel; the number of full-band channels included in the base layer is not greater than the number of full-band channels included in the enhancement layer; the number of bytes is configured for the base layer and the enhancement layer respectively; wherein, for each full-band channel in the base layer The configured number of bytes is the total number of bytes of the data frame / the number of full-band channels included in the base layer, and (the total number of bytes of the data frame / 2) > (the total number of bytes of the data frame / the full-band sound contained in the base layer) Number of tracks)>(the total number of bytes of the data frame / (the number of full-band channels included in the base layer + the number of full-band channels included in the enhancement layer)); the number of bytes configured for one channel of the enhancement layer is larger than the data frame The total number of bytes * (1-1 / the number of full-band channels included in the base layer) / the number of full-band channels included in the enhancement layer, and the number of bytes configured for the remaining at least one channel is less than the total number of bytes of the data frame * (1-1/the number of full-band channels included in the base layer) / the number of full-band channels included in the enhancement layer; DRA and enhancement layer coding algorithm respectively.

The invention also provides a method for encoding multi-channel digital audio, comprising: dividing a multi-channel audio signal into a basic layer and a plurality of enhancement layers; wherein the base layer comprises at least one full-band channel, and multiple enhancement layers Each of the at least one full-band channel is included; the number of full-band channels included in the base layer is not greater than the sum of the number of full-band channels included in all enhancement layers; and the number of bytes is configured for a basic layer and at least one enhancement layer; The number of bytes of each full-band channel configuration in the base layer is the total number of bytes of the data frame / the number of full-band channels included in the base layer, and (the total number of bytes of the data frame / 2) > (the total number of bytes of the data frame) / The number of full-band channels included in the base layer) > (the total number of bytes of the data frame / (the number of full-band channels included in the base layer + the sum of the number of channels of the full-band included in all enhancement layers)); for the first enhancement The number of bytes in a channel configuration of the layer is greater than the total number of bytes of the data frame* (1-1/the number of full-band channels included in the base layer) / the sum of the total number of channels included in all enhancement layers, and the rest The number of bytes in at least one channel configuration is less than the total number of bytes in the data frame* (1-1/the full band included in the base layer) Channel number) / reinforcing layer comprises all of the number of full-range channels and; on a base layer and at least one enhancement layer respectively DRA coding algorithm.

The invention not only avoids the degradation of coding efficiency caused by fine layering, but also satisfies some fields of application, such as digital audio broadcasting. The invention is simple to implement, and has the advantages of flexible control of the quality of each channel, obtaining the best integrated sound quality, easily satisfying the channel coding requirements, and eliminating various constraints in fine layering, and ensuring more efficient compression.

DRAWINGS

1 is a schematic flow chart of an embodiment of the present invention;

2 is a schematic diagram of a multi-layer digital audio layer structure according to an embodiment of the present invention;

3 is a schematic diagram of a multi-channel digital audio multi-layer structure according to an embodiment of the present invention;

4 is a schematic diagram of a two-layer structure of stereo left and right channels according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a two-layer structure of stereo and differential channels according to an embodiment of the present invention; FIG.

6 is a schematic diagram of a surround sound two-layer structure according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a two-layer structure of surround sound according to an embodiment of the present invention; FIG.

FIG. 8 is a schematic diagram of a three-layer structure of surround sound according to an embodiment of the present invention; FIG.

FIG. 9 is a schematic diagram of a three-layer structure of surround sound according to an embodiment of the present invention; FIG.

FIG. 10 is a schematic diagram of a hierarchical structure of a DRA & DRA+ surround sound according to an embodiment of the present invention.

Detailed ways

The details of the technical contents, structural features, objects and effects of the present invention will be described in detail below with reference to the embodiments and the accompanying drawings.

Referring to the flow diagram shown in FIG. 1, the multi-channel digital audio encoding method of the first embodiment of the present invention includes:

Step S1, dividing the multi-channel audio into a basic layer and at least one enhancement layer;

Step S2: configuring a number of bytes for a basic layer and at least one enhancement layer respectively;

Step S3, encoding a basic layer and at least one enhancement layer separately.

2, the second embodiment of the present invention provides a two-layer structure for dividing a multi-channel audio signal into a basic layer and a enhancement layer, wherein the basic layer includes at least one. The full-band channel, the enhancement layer includes at least one full-band channel; the base layer includes a full-band channel that is no larger than the full-band channel number included in the enhancement layer.

Let the base layer contain k full-band channels, and the enhancement layer is set to contain m full-band channels. The configuration base layer includes a full-band channel that is not larger than the full-band channel number included in the enhancement layer, that is, k<=m, and configures the base layer to encode a relatively small channel, thereby ensuring higher quality.

Regarding the arrangement scheme of the inter-layer payload, that is, the number of bytes, the present invention proposes various embodiments under the premise that the total load (i.e., the number of bytes) is constant.

The third embodiment emphasizes the configuration scheme for the base layer. Since the base layer is more important, and the enhancement layer contributes less to the overall sound quality, it is necessary to divide the payload into roughly equal parts. In particular, application scenarios in which the enhancement layer is not properly obtained due to channel or the like and the quality of the base layer is emphasized.

For the case where the baseband included in the base layer is smaller than the number of full-band channels included in the enhancement layer, the number of bytes configured in the base layer in this embodiment is the total number of bytes of the data frame/2, and the byte of each channel of the base layer. The number is the total number of bytes of the data frame/2* the number of full-band channels included in the base layer; the number of bytes configured for the enhancement layer is the total number of bytes of the data frame/2; the number of bytes per channel of the enhancement layer is data Total number of frames in the frame/2* The number of full-band channels included in the enhancement layer.

Let the total number of bytes of a data frame be D. When k<m, the number of bytes allocated to the base layer and the enhancement layer is D/2, and the effective number of bytes per channel of the base layer is D/2. *k, the number of bytes per channel of the enhancement layer is D/2*m.

For the case where the baseband included in the base layer is equal to the number of full-band channels included in the enhancement layer, the number of bytes configured in the base layer in this embodiment is greater than the total number of bytes in the data frame/2; bytes configured on the enhancement layer The number is less than the total number of bytes of the data frame/2.

That is, when k=m, it can be configured to the number of bytes whose base layer is larger than D/2, such as 3*D/5, and the enhancement layer is configured with 2*D/5, and other ratios can also be used.

Each channel of such a base layer can be represented by more bytes per channel of the enhancement layer, thereby obtaining better sound quality for each channel of the base layer.

The fourth embodiment is a k:m configuration scheme, or a uniform configuration scheme. The above first embodiment highlights the importance of the base layer; and from the overall perspective of multi-channel, it is more reasonable to give equal attention to each full-band channel, so that it can only be correctly decoded due to certain channels and the like. At the time of the base layer, the obtained sound quality is slightly inferior to that of the first embodiment, but when both the base layer and the enhancement layer can be decoded, the overall multi-channel quality is superior to that of the first embodiment.

In this embodiment, the same number of bytes are configured for each full-band channel, and the total number of bytes of the data frame/(the number of full-band channels included in the basic layer + the number of full-band channels included in the enhancement layer) is set to one. The total number of bytes of the audio frame is D, and the number of bytes of each full-band channel is D/(k+m), which is encoded by the same number of bytes for each full-band channel, so that each full-band The channels have the same sound quality.

The fifth embodiment is a near k:m configuration scheme, and the non-base layer emphasizes the configuration and is not uniformly configured. The first embodiment described above highlights the importance of the base layer, but when k < m in general, the base layer emphasis configuration may over-emphasize the quality of the base layer. The second embodiment treats the base layer full-band channel as a normal full-band channel; therefore, it should be given the most reasonable configuration, ie, close to the k:m configuration, depending on the multi-channel specific situation. This embodiment considers that each full-band channel in the base layer is more important than the enhancement-band full-band channel, and should be given more than a uniform configuration and less than the number of bytes of the first configuration; and for m in the enhancement layer The full-band channel also needs to be considered separately. Especially for the typical multi-channel surround sound 5.1 case, the center channel in the movie audio system is generally set to dubbing, which should be given more attention than the two surround channels. This configuration provides better multi-channel quality than the first two configurations.

The number of bytes configured for each full-band channel in the base layer is the total number of bytes of the data frame/the number of full-band channels included in the base layer, and (the total number of bytes of the data frame/2)> (data frame) Total number of bytes / number of full-band channels included in the base layer) > (total number of bytes of data frame / (number of full-band channels included in the base layer + number of full-band channels included in the enhancement layer)); for the enhancement layer The number of bytes of a channel configuration is greater than the total number of bytes of the data frame * (1-1 / the number of full-band channels included in the base layer) / the number of full-band channels included in the enhancement layer, and the remaining at least one channel The configured number of bytes is less than the total number of bytes of the data frame * (1-1 / the number of full-band channels included in the base layer) / the number of full-band channels included in the enhancement layer.

That is, the total number of bytes of one audio frame is D, and the number of bytes configured for each full-band channel in the base layer is D/k, and D/2>D/k>D/(k+m); The enhancement layer is also given an appropriate configuration according to the characteristics of each full-band channel in the enhancement layer. For example, in 5.1 surround sound, the center channel should be configured more than D (1-1/k)/m bytes, and each channel in the left and right surround pairs is configured less than D (1-1/k)/m. byte.

The sixth embodiment is a limited configuration scheme that relies on the requirements of channel coding conditions. Due to channel coding such as LDPC (Low Density) Parity Check) coding is block coding, and the two layers need to adopt different protection levels, so each layer of layered coding needs to consider the multi-channel base layer and enhancement layer according to the block size of LDPC coding in each transmission frame. The characteristics are given the most reasonable arrangement and configuration. For the limited configuration case, the general base layer and enhancement layer byte number allocation is similar to that of the third embodiment, but considers the total capacity of each layer of LDPC coded blocks in the transmission frame.

The scheme configures the number of bytes for the base layer and the enhancement layer respectively according to the block size of the LDPC code, the channel coding condition, the characteristics of the base layer, and/or the characteristics of the enhancement layer in each transmission frame.

In conjunction with the multi-channel digital audio multi-layer structure diagram shown in FIG. 3, the present invention also proposes a multi-layering scheme. The multi-channel audio signal is divided into a base layer and a plurality of enhancement layers; wherein the base layer includes at least one full-band channel, the plurality of enhancement layers respectively comprise at least one full-band channel; and the base layer includes a full-band channel The sum of the number of full-band channels included in all enhancement layers.

The present invention is based on a multi-layered scheme, which is a basic layer-emphasized configuration scheme in which the base layer occupies half or more of the payload. The reason and characteristics of the scheme are similar to those of the third embodiment, and therefore will not be described again. The number of bytes configured in the basic layer of the scheme is the total number of bytes of the data frame/2, and the number of bytes per channel of the base layer is the total number of bytes of the data frame/2* the number of full-band channels included in the base layer; The sum of the number of bytes of an enhancement layer configuration is the total number of bytes of the data frame/2, wherein the number of bytes of each full-band channel of the first enhancement layer is greater than the total number of bytes of the data frame/2 (all enhancement layers include The sum of the full-band channels + the number of full-band channels included in the base layer), The number of bytes of each full-band channel of the remaining at least one enhancement layer is less than the total number of bytes of the data frame/2 (the sum of the number of full-band channels included in all enhancement layers + the number of full-band channels included in the base layer).

Taking a three-layer structure of a basic layer and two enhancement layers as an example, the total number of bytes of an audio frame is D, the base layer includes k full-band channels, and the first enhancement layer is set to include m full-band channels, The second enhancement layer is arranged to contain n full-band channels. The number of bytes allocated to the base layer is D/2, and the number of valid bytes per channel of the base layer is D/2k. The sum of the number of bytes of the two enhancement layers is also D/2, but the number of bytes of each full-band channel of the first enhancement layer is greater than D/2 (m+n), and each full-band of the second enhancement layer The number of bytes of the channel is less than D/2 (m+n), so that each base layer channel can be represented by more bytes than the two enhancement layer channels, thereby obtaining the sound quality of each channel of the base layer. Better; at the same time the first enhancement layer will also get a higher quality coding than the second enhancement layer. If the enhancement layer is three or more, the number of bytes of each full-band channel of the first enhancement layer is greater than D/2 (m+n), The sum of the number of bytes of each full-band channel of the second enhancement layer, the third enhancement layer to the N-th enhancement layer is less than D/2 (m+n).

The eighth embodiment is a k:m:n configuration, or a uniform configuration scheme. The reason and features of the configuration are similar to those of the fourth embodiment, and thus are not described herein.

This scheme configures the same number of bytes for each full-band channel, which is the total number of bytes of the data frame / (the number of full-band channels included in the base layer + the sum of the number of full-band channels included in all enhancement layers) . Let the total number of bytes of one audio frame be D, the base layer contains k full-band channels, the first enhancement layer is set to include m full-band channels, and the second enhancement layer is set to include n full-band channels. At this time, the number of bytes of each full-band channel is D/(k+m+n), and each full-band channel is represented (encoded) by the same number of bytes, so each full-band channel has the same The quality of the sound.

The ninth embodiment is a near-k:m:n configuration, which is intermediate to the two configuration schemes provided by the seventh embodiment and the eighth embodiment. The reason and features of the configuration are similar to those of the fifth embodiment, and thus are not described herein.

The number of bytes configured for each full-band channel in the base layer is the total number of bytes of the data frame/the number of full-band channels included in the base layer, and (the total number of bytes of the data frame/2)> (the total number of data frames) Number of bytes/the number of full-band channels included in the base layer)> (the total number of bytes of the data frame / (the number of full-band channels included in the base layer + the sum of the number of channels of the full-band included in all enhancement layers)); The number of bytes of a channel configuration of the first enhancement layer is greater than the total number of bytes of the data frame* (1-1/the number of full-band channels included in the base layer) / the sum of the number of channels of the full-band included in all enhancement layers, The number of bytes configured for the remaining at least one channel is less than the total number of bytes of the data frame * (1-1 / the number of full-band channels included in the base layer) / the sum of the number of full-band channels included in all enhancement layers.

Taking a three-layer structure of a basic layer and two enhancement layers as an example, the total number of bytes of an audio frame is D, and the number of bytes of each full-band channel in the base layer is D/k, and D/2. >D/k>D/(k+m+n); for the full band channel in the first enhancement layer to give more allocation than the full band in the second enhancement layer, such as 5.1 surround sound, the first enhancement The layer transmits the center channel and the subwoofer channel, and the second enhancement layer transmits the left surround and right surround channels, where m=1, n=2. More than D(1-1/k)/3 bytes should be configured for the center channel of the full band, and less than D(1-1/k)/3 bytes for each channel of the left and right surround pairs And the left and right surround channels of the second enhancement layer give the same assignment (or the same code as one channel pair).

The tenth embodiment is a limited configuration, and is dependent on the channel coding conditions. The reason and characteristics of the configuration are similar to those of the sixth embodiment, and therefore will not be described again. The scheme configures the number of bytes for the base layer and the at least one enhancement layer respectively according to the block size of the LDPC code, the channel coding condition, the characteristics of the base layer, and/or the characteristics of the enhancement layer in each transmission frame.

The present invention proposes that each of the foregoing embodiments encodes a base layer and at least one enhancement layer by using a DRA coding algorithm. Bandwidth extension enhancement coding tools may also be used to perform bandwidth expansion on the base layer and/or at least one enhancement layer, respectively.

The following are examples of application of the layering and coding scheme proposed by the present invention in combination with different types of audio signals.

Reference map 4 shows a schematic diagram of the stereo left and right channel two-layer structure. The stereo audio signal has only two independent full-band channels, so the base layer transmits the left channel and the enhancement layer transmits the right channel. In this case, the configuration of the two layers should be uniformly configured, that is, the left and right channels are configured with the same sound quality, that is, the same number of bytes are configured. The base layer and the enhancement layer can respectively use the bandwidth extension enhancement coding tool for bandwidth expansion, and the figure is illustrated by a dashed box.

Referring to the schematic diagram of the stereo and differential channel two hierarchical structure shown in FIG. 5, there are only two full-band channels in this example, so there is only a two-layer scheme. For stereo signals, it is usually necessary to perform the difference encoding in order to improve the coding efficiency at the time of encoding. Since there is a certain correlation between the two channels of the stereo signal, the difference signal has a smaller dynamic range than the right channel in probability, so encoding requires less data representation. In addition, for some applications, such as karaoke stereo signals, one channel is the accompanying sound, one channel lyrics (speech), and the channel can represent the two channels by mixing the two channels together. According to the above two analysis, the sum channel (and possible bandwidth extension) should be used as the base layer, the difference channel (and possible bandwidth extension) as the enhancement layer, and the base layer emphasis configuration mode should be adopted. This application example will layer better than the left and right channels when it can only correctly decode the base layer.

An example of focusing on the 5.1 surround sound situation is given below. Referring to the schematic diagram of the surround sound two-layer structure shown in FIG. 6, this embodiment is 5.1 surround sound, wherein 5 full-band channels and 1 sub-subwoofer channel. The stereo left channel (shown as L) and the right channel (shown as R) are transmitted in the base layer; the other channels are transmitted in the enhancement layer, and the order of the channels in the enhanced channel is the center channel (illustration It is C), subwoofer channel (shown as LFE), left surround and right surround channels (illustrated LS and RS respectively). Of course, each full-band channel can be selected with a bandwidth extension enhancement tool (shown by dashed lines in the figure) to improve coding efficiency; in addition, for each channel pair (L&R and LS&RS respectively), a parametric stereo coding tool can be further selected. Reduce the information redundancy. At this time, the corresponding channels are down-mixed to mono (the pictures are M0 and M1 respectively) for basic encoding. Two layers of basic layer emphasis configuration and near k:m configuration can be used.

Referring to the schematic diagram of the surround sound two-layer structure shown in FIG. 7, the audio layering structure is similar to that of the previous embodiment, except that the enhancement layer can adjust the channel arrangement order to the preferred encoding left surround and right surround channels, and then the central sound. Road and subwoofer channels.

Referring to the schematic diagram of the surround sound three-layer structure shown in FIG. 8, in this example, the 5.1 channel is divided into three layers for encoding, wherein the base layer encodes the left and right channels (L and R), and the bandwidth expansion enhancement tool and the optional bandwidth expansion can be selected. Parametric stereo coding tool for improved coding efficiency; first enhancement layer encoding center channel (C), optional bandwidth extension enhancement tool followed by subwoofer channel (LFE) encoding; second enhancement layer for transmission left surround and right Surround channels (LS and RS), optional bandwidth extension and parametric stereo enhancement tools. If the parametric stereo enhancement tool is selected, the basic encoding of the stereo pair should be modified to the mono encoding of the stereo pair, such as L&R downmixing to M0 and LS&RS downmixing to M1. This application example should adopt a data structure of near k:m:n configuration.

Referring to the schematic diagram of the surround sound three-layer structure shown in FIG. 9, the audio layered structure is similar to the previous embodiment, but the first enhancement layer and the second enhancement layer are interchanged.

Referring to Figure 10, DRA & DRA+ The schematic diagram of the surround sound layering structure adopts a structure of surround sound two layers to form a base layer and an enhancement layer. Adopt DRA (Digital Rise) in the base layer Audio) stereo encoding of stereo pairs consisting of left and right channels, and optional bandwidth extension SBR (Spectral Band Replication) Technology and Parametric Stereo Coded PS (Parametric Stereo) technology. Of course, if the parametric stereo coding technique is selected, the DRA coding portion will be modified to encode only the downmixed mono, and if the SBR technique is selected, the DRA coding portion is further modified to be only for the downmixed mono. Low-band partial encoding; in the enhancement layer, the first channel C is DRA-encoded, optionally using SBR bandwidth extension technology, then DRA encoding for the subwoofer channel LFE, and finally the left and right surround channels (LS and RS) Stereo-to-DRA encoding, optional bandwidth extension SBR and parametric stereo encoding PS, improve coding efficiency for surround pairs. The data structure that should be used in this example is set to near k:m:n, or a limited setting is applied when applied to digital audio broadcasting.

The present invention proposes that each of the foregoing embodiments encodes a base layer and at least one enhancement layer by using a DRA coding algorithm.

The present invention can perform four or more layers on an audio signal, but generally adopts a two- to three-layer hierarchical structure, which is easy to implement. Layering based on the channel, the best overall sound quality is achieved by flexibly controlling the quality of each channel. It is easy to meet channel coding requirements: Since LDPC channel coding requires a fixed size for each coding block, channel-based coarse layering can be reasonably arranged to meet channel requirements. Various restrictions when fine layering is not required, such as MPEG In AAC-BSAC audio coding, MDCT coefficients are required to perform arithmetic coding and related auxiliary data every 32 groups, which affects the overall coding efficiency, so coarse layering can ensure more efficient compression.

The method for multi-channel digital audio encoding of the present invention achieves the above objects and effects by the method disclosed above, but the above disclosure is only a preferred embodiment of the present invention, and the scope of the present invention is not limited thereto. Other equivalent modifications or variations of the invention are intended to be included within the scope of the appended claims.

Claims

A method for encoding multi-channel digital audio, comprising:

Multi-channel audio is divided into a basic layer and at least one enhancement layer;

Configuring a number of bytes for each of the basic layer and the at least one enhancement layer;

The base layer and the at least one enhancement layer are separately encoded.

2. A method of encoding a multi-channel digital audio according to claim 1 wherein:

Multi-channel audio signal is divided into a basic layer and an enhancement layer;

Wherein the base layer comprises at least one full-band channel, and the enhancement layer comprises at least one full-band channel;

The base layer includes a full band channel that is no greater than the full band channel number included in the enhancement layer.

The method for encoding a multi-channel digital audio according to claim 2, wherein, for the case where the baseband channel included in the base layer is smaller than the number of full-band channels included in the enhancement layer, the method further includes:

The number of bytes configured for the base layer is the total number of bytes of the data frame/2, and the number of bytes per channel of the base layer is the total number of bytes of the data frame/2* the number of full-band channels included in the base layer;

The number of bytes configured for the enhancement layer is the total number of bytes of the data frame/2; the number of bytes per channel of the enhancement layer is the total number of bytes of the data frame/2* the number of full-band channels included in the enhancement layer.

The method for encoding a multi-channel digital audio according to claim 2, wherein, for the case where the baseband included in the base layer is equal to the number of full-band channels included in the enhancement layer, the method further includes:

The number of bytes configured for the base layer is greater than the total number of bytes of the data frame/2;

The number of bytes configured for the enhancement layer is less than the total number of bytes of the data frame/2.

5. The method of encoding multi-channel digital audio according to claim 2, further comprising:

The same number of bytes are configured for each full-band channel, which is the total number of bytes of the data frame / (the number of full-band channels included in the base layer + the number of full-band channels included in the enhancement layer).

The method for encoding a multi-channel digital audio according to claim 2, further comprising:

The number of bytes configured for each full-band channel in the base layer is the total number of bytes of the data frame/the number of full-band channels included in the base layer, and (the total number of bytes of the data frame/2)> (data frame) Total number of bytes / number of full-band channels included in the base layer)> (total number of bytes of data frame / (number of full-band channels included in the base layer + number of full-band channels included in the enhancement layer));

The number of bytes for a channel configuration of the enhancement layer is greater than the total number of bytes of the data frame* (1-1/the number of full-band channels included in the base layer)/the number of full-band channels included in the enhancement layer, and at least the rest The number of bytes in the one-channel configuration is less than the total number of bytes in the data frame* (1-1/the number of full-band channels included in the base layer)/the number of full-band channels included in the enhancement layer.

7. A method of encoding multi-channel digital audio according to claim 2, wherein:

The number of bytes is separately configured for the base layer and the enhancement layer according to the block size of the LDPC encoding in each transmission frame, the channel coding condition, the characteristics of the base layer, and/or the characteristics of the enhancement layer.

8. The method of encoding multi-channel digital audio according to claim 1, wherein the multi-channel audio signal is divided into a base layer and a plurality of enhancement layers;

Wherein the base layer comprises at least one full-band channel, and the plurality of enhancement layers respectively comprise at least one full-band channel;

The base layer contains less than the sum of the full band channels of all enhancement layers.

9. A method of encoding multi-channel digital audio according to claim 8 wherein:

The sum of the number of bytes of the at least one enhancement layer configuration is the total number of bytes of the data frame/2, wherein the number of bytes of each full-band channel of the first enhancement layer is greater than the total number of bytes of the data frame/2 (enhanced The number of full-band channels included in the layer + the number of full-band channels included in the base layer), The number of bytes of each full-band channel of the remaining at least one enhancement layer is less than the total number of bytes of the data frame/2 (the number of full-band channels included in the enhancement layer + the number of full-band channels included in the base layer).

10. A method of encoding multi-channel digital audio according to claim 8 wherein:

The same number of bytes are configured for each full-band channel, which is the total number of bytes of the data frame / (the number of full-band channels included in the base layer + the sum of the number of full-band channels included in all enhancement layers).

11. A method of encoding multi-channel digital audio according to claim 8 wherein:

The number of bytes configured for each full-band channel in the base layer is the total number of bytes of the data frame/the number of full-band channels included in the base layer, and (the total number of bytes of the data frame/2)> (the total number of data frames) The number of bytes/the number of full-band channels included in the base layer)> (the total number of bytes of the data frame / (the number of full-band channels included in the base layer + the sum of the number of full-band channels included in all enhancement layers));

The number of bytes of a channel configuration for the first enhancement layer is greater than the total number of bytes of the data frame* (1-1/the number of full-band channels included in the base layer) / the sum of the total number of channels included in all enhancement layers And the number of bytes configured for the remaining at least one channel is less than the total number of bytes of the data frame * (1-1 / the number of full-band channels included in the base layer) / the sum of the number of full-band channels included in all enhancement layers.

12. A method of encoding multi-channel digital audio according to claim 8 wherein: LDPC encoded block size, channel coding conditions, characteristics of said base layer and/or said The characteristics of the enhancement layer are respectively configured with the number of bytes for the base layer and the at least one enhancement layer.

The method for encoding a multi-channel digital audio according to any one of claims 1 to 12, further comprising: encoding the base layer and the at least one enhancement layer by using a DRA coding algorithm, respectively.

14. A method of encoding multi-channel digital audio, comprising:

The multi-channel audio signal is divided into a base layer and an enhancement layer; wherein the base layer includes at least one full-band channel, and the enhancement layer includes at least one full-band channel; the base layer includes a full-band channel number not greater than The number of full-band channels included in the enhancement layer;

Configuring a number of bytes for the base layer and the enhancement layer respectively; wherein the number of bytes configured for each full-band channel in the base layer is the total number of bytes of the data frame/the number of full-band channels included in the base layer, And (the total number of bytes of the data frame/2)> (the total number of bytes of the data frame / the number of full-band channels included in the base layer) > (the total number of bytes of the data frame / (the number of channels of the full-band included in the base layer + The number of full-band channels included in the enhancement layer)); the number of bytes configured for one channel of the enhancement layer is greater than the total number of bytes of the data frame* (1-1/the number of full-band channels included in the base layer)/enhancement layer The number of full-band channels included, and the number of bytes configured for the remaining at least one channel is less than the total number of bytes of the data frame* (1-1/the number of full-band channels included in the base layer) / the full-band included in the enhancement layer Number of channels

The base layer and the enhancement layer are respectively encoded by a DRA coding algorithm.

15. A method of encoding multi-channel digital audio, comprising:

The multi-channel audio signal is divided into a base layer and a plurality of enhancement layers; wherein the base layer includes at least one full-band channel, and the plurality of enhancement layers respectively comprise at least one full-band channel; the base layer includes a full-band sound The number of tracks is not greater than the sum of the number of full-band channels included in all enhancement layers;

Configuring a number of bytes for each of the base layer and the at least one enhancement layer; wherein the number of bytes configured for each full-band channel in the base layer is a total number of bytes of the data frame / a full-band channel included in the base layer Number, and (the total number of bytes of data frame/2)> (total number of bytes of data frame / number of full-band channels included in the base layer)> (total number of bytes of data frame / (full-band channel included in the base layer) Number + the sum of the number of full-band channels included in all enhancement layers)); the number of bytes configured for a channel of the first enhancement layer is greater than the total number of bytes of the data frame* (1-1/full band included in the base layer) Number of channels) / the sum of the number of full-band channels included in all enhancement layers, and the number of bytes configured for the remaining at least one channel is less than the total number of bytes of the data frame * (1-1 / the full-band sound contained in the base layer) Number of channels) / the sum of the number of full-band channels included in all enhancement layers;

The base layer and the at least one enhancement layer are respectively coded by using a DRA coding algorithm.