TWI458365B

TWI458365B - Apparatus and method for generating a level parameter, apparatus and method for generating a multi-channel representation and a storage media stored parameter representation

Info

Publication number: TWI458365B
Application number: TW099130574A
Authority: TW
Inventors: Heiko Purnhagen; Lars Villemoes; Jonas Engdegard; Jonas Roeden; Kristofer Kjoerling
Original assignee: Dolby Int Ab
Priority date: 2005-04-12
Filing date: 2005-08-09
Publication date: 2014-10-21
Also published as: TW201116078A

Description

Apparatus and method for generating level parameters, apparatus and method for generating multi-channel representation, and storage medium for storing parameter representation

本發明係有關於使用空間參數對音頻信號的多聲道表示的編碼。本發明教示用以估計及界定可用以從一些聲道(少於輸出聲道的數目)重建一多聲道信號之適當參數的新方法。特別地，著重在最小化該多聲道表示之位元率及提供該多聲道信號之一編碼表示，其能夠容易地針對所有可能聲道配置編碼及解碼該資料。The present invention relates to encoding of a multi-channel representation of an audio signal using spatial parameters. The present invention teaches a new method for estimating and defining appropriate parameters that can be used to reconstruct a multi-channel signal from some channels (less than the number of output channels). In particular, emphasis is placed on minimizing the bit rate of the multi-channel representation and providing one of the multi-channel signals encoded representations that can easily encode and decode the material for all possible channel configurations.

發明名稱為「用於低位元率音頻編碼應用之有效及可調式參數立體聲編碼」之PCT/SE02/01372已顯示可從一單聲道信號重建一非常相似於原始立體聲圖像之立體聲圖像(假設具有該立體聲圖像之非常緊密表示)。基本原理係將輸入信號分割成頻帶及時間區段，以及針對這些頻帶及時間區段估計聲道間強度差(IID)及聲道間同調性(ICC)。第一參數係在特定頻帶中之兩個聲道間的功率分佈之測量，以及第二參數係該特定頻帶之兩個聲道間的相關性之估計。在解碼器側上，藉由依據該IID資料在兩個輸出聲道間分配該單聲道信號及藉由加入一解相關信號以便保持原始立體聲道之聲道相關性，以從該單聲道信號重建該立體聲圖像。PCT/SE02/01372, entitled "Efficient and Adjustable Parametric Stereo Coding for Low Bit Rate Audio Coding Applications", has been shown to reconstruct a stereo image very similar to the original stereo image from a mono signal ( Assume that there is a very tight representation of the stereo image). The basic principle is to divide the input signal into frequency bands and time segments, and to estimate inter-channel intensity difference (IID) and inter-channel coherence (ICC) for these bands and time segments. The first parameter is a measure of the power distribution between two channels in a particular frequency band, and the second parameter is an estimate of the correlation between the two channels of that particular frequency band. On the decoder side, by assigning the mono signal between the two output channels in accordance with the IID data and by adding a decorrelated signal to maintain the channel correlation of the original stereo channel, from the mono channel The signal reconstructs the stereo image.

對於一多聲道情況(在上下文中之多聲道表示兩個輸出聲道以上)而言，必須說明幾個額外問題。現在有幾個多聲道配置。最通常所知道的是5.1配置(中間聲道、前左/右聲道、環繞左/右聲道及LEE聲道)。然而，亦現在有許多其它配置。從完整的編碼器/解碼器系統觀點來說，期望具有一可針對所有聲道配置使用相同參數組(例如：IID及ICC)或其子組。ITU-R BS.775界定幾個下行混音架構(down-mix schemes)，以便能從一特定聲道配置獲得一包括較少聲道之聲道配置。取代經常必須解碼所有聲道及依據一下行混音，期望具有一多聲道表示，其能使一接收器在解碼該等聲道前擷取有關於手上聲道配置之參數。再者，從一可調式或內嵌式編碼觀點而言期望有一固有可調之參數組，在該觀點中例如可將對應於該等環繞聲道之資料儲存在位元流之一加強層中。For a multi-channel case (multiple channels in the context represent more than two output channels), several additional issues must be accounted for. There are now several multi-channel configurations. The most commonly known is the 5.1 configuration (middle channel, front left/right channel, surround left/right channel, and LEE channel). However, there are many other configurations now. From a complete encoder/decoder system perspective, it is desirable to have one that can use the same set of parameters (eg, IID and ICC) or a subset thereof for all channel configurations. ITU-R BS.775 defines several down-mix schemes to enable a channel configuration that includes fewer channels from a particular channel configuration. Instead of having to decode all of the channels and relying on the next line of mixing, it is desirable to have a multi-channel representation that enables a receiver to retrieve parameters relating to the configuration of the hand channel before decoding the channels. Furthermore, it is desirable from an adjustable or inline coding point of view to have an inherently adjustable set of parameters, in which, for example, data corresponding to the surround channels can be stored in one of the bitstream enhancement layers. .

相反於以上所述，亦可期望能依據所處理之信號的特性來使用不同參數界定，以便在會對所處理之目前信號區段導致最低位元率負擔的參數化間做切換。Contrary to the above, it may also be desirable to use different parameter definitions depending on the characteristics of the signal being processed in order to switch between parameterizations that would result in the lowest bit rate burden for the current signal segment being processed.

使用一加總信號或下行混音信號及額外參數附加資訊之多聲道信號的另一表示係為本技藝中所知之雙聲道信號編碼(Binaural Cue Coding,BCC)。此技術被描述於2003年11月第6期第11卷IEEE語音處理會刊之作者為F. Baumgarte及C. Faller的「雙聲道信號編碼-第一篇：聽覺心理學基礎及設計原理」及2003年11月第6期第11卷IEEE語音處理會刊之作者為C. Faller及F. Baumgarte的「雙聲道信號編碼-第二篇：架構及應用」中。Another representation of a multi-channel signal using a summed or downmix signal and additional parameter additional information is known as Binaural Cue Coding (BCC). This technique is described in the November 2011 issue of the 6th issue of the IEEE Speech Processing Journal by the authors F. Baumgarte and C. Faller, "Two-Channel Signal Coding - Part I: The Foundation and Design Principles of Auditory Psychology" And the author of the IEEE Speech Processing Journal, Vol. 11, No. 6, November 2003, is the "Two-Channel Signal Coding - Part Two: Architecture and Applications" by C. Faller and F. Baumgarte.

通常，雙聲道信號編碼係一種依據一下行混音聲道及附加資訊的多聲道空間表示之方法。針對音頻重建或音頻提供以一BCC編碼器所計算及以一BCC解碼器所使用之幾個參數包括聲道間電平差、聲道間時間差及聲道間同調參數。這些聲道間信號係一空間圖像之感知的決定因數。這些參數係提供給該原始多聲道信號之時間樣本的區塊及亦提供有頻率選擇性，以便多聲道信號樣本之每一區塊對於數個頻帶而言具有數個信號。在C播放聲道之一般情況中，在複數對聲道間之每一子頻帶中(亦即，針對相對於一參考聲道之每一聲道)考量該等聲道電平差及該等聲道間時間差。將一聲道界定成對每一聲道間電平差之參考聲道。由於該等聲道間電平差及該等聲道間時間差，因而可提供一音源至一所使用之播放裝設的複數對揚聲器中之一對揚聲器間的任何方向。為了決定一已提供音源之擴散的寬度，考量所有音頻聲道之每一子頻帶的一參數係足夠的。此參數係該聲道間同調參數。該已提供音源之寬度係藉由修改該等子頻帶信號來控制，以便所有可能聲道對具有相同聲道間同調參數。In general, two-channel signal coding is a method of multi-channel spatial representation based on the next line of mixing channels and additional information. Several parameters calculated by a BCC encoder for audio reconstruction or audio and used by a BCC decoder include inter-channel level differences, inter-channel time differences, and inter-channel coherence parameters. These inter-channel signals are the determining factor for the perception of a spatial image. These parameters are the blocks provided to the time samples of the original multi-channel signal and are also provided with frequency selectivity such that each block of the multi-channel signal samples has several signals for several frequency bands. In the general case of C-playing channels, each of the sub-bands between the complex pairs of channels (i.e., for each channel relative to a reference channel) takes into account the level differences of the channels and such The time difference between channels. One channel is defined as a reference channel for the level difference between each channel. Due to the level difference between the channels and the time difference between the channels, it is possible to provide any direction from one source to one of the plurality of pairs of speakers of a used playback device. In order to determine the width of a given source of diffusion, it is sufficient to consider a parameter for each subband of all audio channels. This parameter is the coherence parameter between the channels. The width of the supplied sound source is controlled by modifying the sub-band signals such that all possible channel pairs have the same inter-channel co-modulation parameters.

在BCC編碼中，決定在該參考聲道1與任何其它聲道間之所有聲道間電平差。當例如決定該中央聲道為該參考聲道時，計算在該左聲道與該中央聲道間之第一聲道間電平差、在該右左聲道與該中央聲道間之第二聲道間電平差、在該左環繞聲道與該中央聲道間之第三聲道間電平差及在該右環繞聲道與該中央聲道間之第四聲道間電平差。此情節描述一5-聲道架構。當該5-聲道架構額外地包括一低頻增強型聲道(亦為所知之「超低音喇叭(sub-woofer)」聲道)時，計算在該低頻增強型聲道與該中央聲道(該單一參考聲道)間之第五聲道間電平差。In BCC encoding, the level difference between all channels between the reference channel 1 and any other channel is determined. When, for example, determining that the center channel is the reference channel, calculating a level difference between the first channel between the left channel and the center channel, and a second between the right channel and the center channel Level difference between channels, level difference between the left channel between the left surround channel and the center channel, and level difference between the fourth channel between the right surround channel and the center channel . This episode describes a 5-channel architecture. When the 5-channel architecture additionally includes a low frequency enhanced channel (also known as a "sub-woofer" channel), the low frequency enhanced channel and the center channel are calculated The fifth channel level difference between (the single reference channel).

當使用該單一下行混音聲道(亦稱為「單」聲道)及傳輸信號，例如：ICLD(聲道間電位差、ICTD(聲道間時間差)及ICC(聲道間同調))來重建該原始多聲道時，使用這些信號來修改該單信號之頻譜係數。使用一用以決定每一頻譜係數之電平修改的正實數以實施該電平修改。使用一複數之大小來決定每一頻譜係數的相位修改以產生該聲道間時間差。另一功能決定該同調影響。藉由先計算該參考聲道之因數以計算每一聲道之電平修改的因數。計算該參考聲道之因數，以便對於每一頻率部分而言所有聲道之功率的總和相同於該合量信號之功率。然後，依據該參考聲道之電平修改因數，使用個別ICLD參數來計算其它聲道之電平修改因數。When using this single downmix channel (also known as "single" channel) and transmitting signals such as ICLD (inter-channel potential difference, ICTD (inter-channel time difference) and ICC (channel-to-channel homology)) These signals are used to modify the spectral coefficients of the single signal during the original multi-channel. A positive real number is used to determine the level modification of each spectral coefficient to implement the level modification. A complex number is used to determine the phase modification of each spectral coefficient to produce the inter-channel time difference. Another feature determines the coherence effect. The factor of the level modification of each channel is calculated by first calculating the factor of the reference channel. The factor of the reference channel is calculated such that the sum of the powers of all channels is the same as the power of the combined signal for each frequency portion. Then, based on the level modification factor of the reference channel, the individual ICLD parameters are used to calculate the level modification factors of the other channels.

因此，為了實施BCC合成，計算該參考聲道之電平修改因數。為了此計算，需要一頻帶之所有ICLD參數。然後，依據該單聲道之電平修改，可計算其它聲道(亦即，非該參考聲道之聲道)之電平修改因數。Therefore, in order to perform BCC synthesis, the level modification factor of the reference channel is calculated. For this calculation, all ICLD parameters for a band are required. Then, based on the level modification of the mono, the level modification factor of the other channels (i.e., the channels other than the reference channel) can be calculated.

此方法之缺點在於：對於一完整重建而言，需要每一聲道間電平差。當出現一易出錯傳輸聲道時，此需求會造成更大問題。因為需要每一聲道間電平差以計算每一多聲道輸出信號，所以在一傳送聲道間電平差內之每一錯誤將導致在該重建多聲道信號中之錯誤。在另一情況中，雖然一聲道間電平差僅是例如該左環繞聲道或右環繞聲道所需，但是當在傳輸期間遺失此聲道間電平差時，則無法實施重建，其中因為重要資訊係包含在該前左聲道(下面稱為左聲道)、該前石聲道(下面稱為右聲道)及該中央聲道中，所以該左環繞聲道及右環繞聲道對於多聲道重建並非是重要的。當在輸輸期間遺失該低頻增強型聲道之聲道間電平差時，此情況變得更糟。在此情況中，雖然該低頻增強型聲道對收聽者之收聽舒適並非是決定性的，但是可能不會有多聲道重建或僅有一錯誤多聲道重建。因為，將在單一聲道間電平差中之錯誤傳播至每一重建輸出聲道內之錯誤。The disadvantage of this method is that for a complete reconstruction, the level difference between each channel is required. This requirement creates a greater problem when an error-prone transmission channel occurs. Since each level difference between channels is required to calculate each multi-channel output signal, each error within the inter-channel level difference will result in an error in the reconstructed multi-channel signal. In another case, although the level difference between the one channels is only required for, for example, the left surround channel or the right surround channel, the reconstruction cannot be performed when the inter-channel level difference is lost during transmission. The left surround channel and the right surround are included because the important information is included in the front left channel (hereinafter referred to as the left channel), the front stone channel (hereinafter referred to as the right channel), and the center channel. The channel is not important for multi-channel reconstruction. This situation gets worse when the inter-channel level difference of the low frequency enhanced channel is lost during the transmission. In this case, although the low frequency enhanced channel is not decisive for the listener's listening comfort, there may be no multi-channel reconstruction or only one erroneous multi-channel reconstruction. Because the error in the level difference between the single channels is propagated to the error in each reconstructed output channel.

參數多聲道表示之問題在於：通常提供聲道間電平差(例如：在BCC編碼中之ICLD或在其它參數多聲道表示中之平衡值)做為相對值而非絕對值。在BCC中，一ICLD參數描述在一聲道與一參考聲道間之電平差。亦可提供平衡值以做為在一聲道對中之兩個聲道間的比率。當重建該多聲道信號時，將此等電平差或平衡參數應用至一基本聲道，該基本聲道係可以是一單聲基本聲道或一具有兩個基本聲道之立體聲基本聲道信號。因此，在至少一基本聲道中所包含之能量係沿著5個或6個重建輸出聲道來分配。因此，在一重建輸出聲道中之絕對能量係由該聲道間電平差或該平衡參數及在該接收器輸入上之下行混音信號的能量來決定。The problem with parametric multi-channel representation is that inter-channel level differences (eg, ICLD in BCC encoding or balanced values in other parametric multi-channel representations) are typically provided as relative values rather than absolute values. In BCC, an ICLD parameter describes the level difference between a channel and a reference channel. A balance value can also be provided as a ratio between the two channels in a pair of channels. When the multi-channel signal is reconstructed, the level difference or balance parameter is applied to a basic channel, which may be a mono basic channel or a stereo basic sound with two basic channels. Signal. Thus, the energy contained in at least one of the basic channels is distributed along five or six reconstructed output channels. Thus, the absolute energy in a reconstructed output channel is determined by the inter-channel level difference or the balance parameter and the energy of the down-mix signal at the receiver input.

當在該接收器輸入上之下行混音信號的能量相對於一編碼器所輸出之一下行混音信號變化的情況出現時，將發生電平變化。在此上下文中，強調當該等參數具有頻率選擇性時，依據所使用之參數化架構，此電平變化不僅導致所建立信號之一般音量變化，而且亦會導致大量的人工因素。當例如：在頻率範圍中相較於在另外位置之一頻帶而言較常操控該下行混音信號之某一頻帶時，因為在該某一頻帶之輸出聲道中的頻率成分具有一太低或太高之電平，所以此操控在該重建輸出信號中係明顯易見的。A level change occurs when the energy of the down-mix signal at the receiver input changes with respect to one of the down-mix signals output by an encoder. In this context, it is emphasized that when the parameters are frequency selective, depending on the parametric architecture used, this level change not only results in a general volume change of the established signal, but also a large number of artifacts. When, for example, a certain frequency band of the downlink mix signal is more frequently manipulated in the frequency range than in one of the other positions, since the frequency component in the output channel of the certain frequency band has a too low Or too high level, so this control is clearly visible in the reconstructed output signal.

此外，適時改變電平操作亦將導致該重建輸出信號之總電平隨著時間變化及因此被認為是一惱人的人工因素。In addition, changing the level operation in a timely manner will also cause the total level of the reconstructed output signal to change over time and is therefore considered an annoying artifact.

雖然上述情況係集中在藉由編碼、傳輸及解碼一下行混音信號所造成之電平操作，但是亦會發生其它電平偏移。由於在要被下行混音成一個或兩個聲道之不同聲道間的相位相依性，因而會發生下列情況：該單聲信號具有一不等於在該原始信號中之能量的總和。因為通常例如藉由加入時間波形以取樣式(sample-wise)來實施該下行混音，所以雖然左信號與右信號當然具有某一信號能量，但是在該兩個信號間之例如180度的相位差將導致在該下行混音信號中之兩個聲道的完全抵消，進而導致零的能量。雖然在正常情況中，將非常不可能發生此種情形，但是因為所有信號當然不是完全不相關的，所以仍然會發生能量變化。因為該重建輸出信號之能量將不同於該原始多聲道信號之能量，所以此等變化亦會導致在該重建輸出信號中之音量變動及亦將導致人工因素。Although the above situation focuses on the level operation caused by encoding, transmitting and decoding the next line of mixed signals, other level shifts may occur. Due to the phase dependence between the different channels to be downmixed into one or two channels, the following occurs: the mono signal has a sum that is not equal to the energy in the original signal. Since the downmix is usually implemented, for example, by adding a time waveform to sample-wise, although the left and right signals of course have a certain signal energy, for example, a phase of 180 degrees between the two signals The difference will result in complete cancellation of the two channels in the downstream mix signal, which in turn results in zero energy. Although this situation will very unlikely to occur under normal conditions, energy changes will still occur because all signals are of course not completely uncorrelated. Since the energy of the reconstructed output signal will be different from the energy of the original multi-channel signal, such variations will also result in volume fluctuations in the reconstructed output signal and will also result in artifacts.

本發明之一目的在於提供一種可造成一具有改善輸出品質之多聲道重建的參數化觀念。It is an object of the present invention to provide a parametric concept that can result in a multi-channel reconstruction with improved output quality.

此目的係藉由依據申請專利範圍第1項之一種用以產生電平參數之裝置、依據申請專利範圍第7項之一種用以產生一重建多聲道表示之裝置、依據申請專利範圍第9項之一種用以產生電平參數之方法、依據申請專利範圍第10項之一種用以產生一重建多聲道表示之方法、依據申請專利範圍第11項之一種電腦程式或依據申請專利範圍第12項之一種參數表示來達成。The object is to generate a device for reconstructing a multi-channel representation according to a device for generating a level parameter according to item 1 of the patent application scope, according to claim 7 of the patent application scope, according to the scope of claim 9 A method for generating a level parameter, a method for generating a reconstructed multi-channel representation according to claim 10 of the patent application, a computer program according to claim 11 or a patent application scope A parameter of 12 items is expressed.

本發明係依據下面之硏究結果：為了高品質重建及有鑑於彈性編碼/傳輸及解碼架構，將一額外電平參數與一多聲道信號之下行混音信號或參數表示一起傳送，以致於一多聲道重建器可一起使用此電平參數與該等電平差參數及該下行混音信號，以便再生一多聲道輸出信號，而不會遭遇電平變化或頻率選擇性電平所引起之人工因素。The present invention is based on the following findings: for high quality reconstruction and in view of the elastic coding/transmission and decoding architecture, an additional level parameter is transmitted along with a multi-channel signal down-mix signal or parameter representation such that A multi-channel reconstructor can use the level parameter together with the level difference parameter and the downmix signal to reproduce a multi-channel output signal without encountering a level change or a frequency selective level. Artificial factors caused.

依據本發明，計算該電平參數，以便以該電平參數加權(例如：乘或除)之至少一下行混音聲道的能量等於該等原始聲道之能量的加總。In accordance with the present invention, the level parameter is calculated such that at least the energy of the next mixing channel is weighted (e.g., multiplied or divided) by the level parameter equal to the sum of the energy of the original channels.

在一實施例中，該電平參數係由在該(等)下行混音聲道之能量與該等原始聲道之能量的加總間之比率所獲得。在此實施例中，在該編碼器側上計算該(等)下行混音聲道與該原始多聲道信號間之任何電平差及將其輸入至該資料流以做為一電平校正因數，該電平校正因數被視為一額外參數，該額外參數亦可被提供給該(等)下行混音聲道之樣本的一區塊及被提供給某一頻帶。因此，針對每一區域及頻帶(存在有聲道間電平差或平衡參數)，加入一新電平參數。In one embodiment, the level parameter is obtained from the ratio of the energy of the (equal) downstream mixing channel to the sum of the energy of the original channels. In this embodiment, any level difference between the (equal) downstream mixing channel and the original multi-channel signal is calculated on the encoder side and input to the data stream as a level correction. The factor, the level correction factor is considered to be an additional parameter that can also be provided to a block of the sample of the (equal) downmix channel and provided to a certain frequency band. Therefore, a new level parameter is added for each region and frequency band (there is an inter-channel level difference or balance parameter).

因為本發明允許傳送一多聲道信號之一不同於該等參數所根據之下行混音的下行混音，所以本發明亦提供彈性。當例如：一廣播站不希望播放一多聲道解碼器所產生之一下行混音信號，然而希望播放在一播音室由一音效工程師所產生之一下行混音信號(係一依據人類之主觀及創造印象之下行混音)時，會出現此等情況。不過，播放者可能亦希望傳送有關於此「主下行混音」之多聲道參數。依據本發明，藉由該電平參數提供該參數組與該主下行混音間之適應，在此情況中，該電平參數係在該主下行混音與該參數下行混音間之電平差，其中該參數組係根據該參數下行混音。The present invention also provides flexibility because the present invention allows for the transmission of one of the multi-channel signals to be different from the downstream mix of the line mix under which the parameters are based. For example, a broadcast station does not wish to play a downlink mix signal generated by a multi-channel decoder, but it is desirable to play a downmix signal generated by a sound engineer in a studio (based on the subjective view of human beings) This happens when you create a mix of impressions. However, the player may also wish to transmit multi-channel parameters related to this "main downmix". According to the present invention, the level parameter is used to provide an adaptation between the parameter group and the main downmix, in which case the level parameter is the level between the main downmix and the parameter downmix. Poor, where the parameter group is downmixed according to the parameter.

本發明之優點在於：因為亦可使有關於一下行混音信號之參數組適應於另一下行混音，其中該另一下行混音並非在參數計算期間所產生，所以該額外電平參數提供改善之輸出品質及改善之彈性。An advantage of the present invention is that the additional level parameter is provided because the parameter set for the next line of mixing signal can also be adapted to another downstream mix, wherein the other downstream mix is not generated during parameter calculation. Improved output quality and improved flexibility.

為了位元率之減少，最好應用該新電平參數之Δ-編碼以及量化及熵編碼。特別地，因為頻帶間或時間區塊間之變化將不會那麼高，以致於可獲得相對小的差值，此連合隨後熵編碼(例如：霍夫曼編碼器)之使用以允許良好編碼增益之可能性，所以Δ-編碼將導致高編碼增益。For the reduction of the bit rate, it is preferable to apply the delta-encoding of the new level parameter as well as quantization and entropy coding. In particular, since the variation between inter-band or time blocks will not be so high that a relatively small difference can be obtained, the use of this concatenation followed by entropy coding (eg Huffman encoder) to allow good coding gain The possibility, so delta-encoding will result in a high coding gain.

在本發明之一較佳實施例中，使用一包括至少兩個不同平衡參數之多聲道信號參數表示，該至少兩個不同平衡參數表示兩個不同聲道對間之平衡。特別地，彈性、可調能力、抗錯誤及甚至位元率率效率係下面事實之結果：第一聲道對(第一平衡參數之根據)係不同於第二聲道對(第二平衡參數之根據)，其中形成這些聲道對之四個聲道皆彼此不同。In a preferred embodiment of the invention, a multi-channel signal parameter representation comprising at least two different balance parameters is used, the at least two different balance parameters representing a balance between two different channel pairs. In particular, the flexibility, adjustability, error resistance, and even bit rate rate efficiency are the result of the fact that the first channel pair (the basis of the first balance parameter) is different from the second channel pair (the second balance parameter) According to the basis, the four channels in which these pairs of channels are formed are different from each other.

因此，本發明觀念不同於該單一參考聲道觀念及使用一多平衡或超平衡觀念，該多平衡或超平衡觀念對人類之聲音印象更直學及更自然。特別地，構成該第一及第二平衡參數之聲道對可包括原始聲道、下行混音聲道或最好是輸入聲道間之某些組合。Thus, the inventive concept differs from the single reference channel concept and the use of a multi-balance or over-balance concept that is more straightforward and more natural to the human voice. In particular, the pairs of channels that make up the first and second balance parameters may include some combination of the original channel, the downmix channel, or preferably the input channel.

已發現到一從該中央聲道(做為該第一聲道)所獲得之平衡參數以及該左原始聲道與該右原始聲道(做為該聲道對之第二聲道)之加總對於在該中央聲道與該左及右聲道間提供一精確能量分佈是特別有用的。注意到在此上下文中這三個聲道通常包括聲場之大部分資訊，其中特別地該左-右立體聲局部化不僅受左與右間之平衡的影響，而且亦受中央與左右之加總間的平衡之影響。依據本發明之一較佳實施例藉由使用此平衡參數來反映此觀察。A balance parameter obtained from the center channel (as the first channel) and the addition of the left original channel and the right original channel (as the second channel of the channel pair) have been found It is always useful to provide a precise energy distribution between the center channel and the left and right channels. It is noted that in this context these three channels usually comprise most of the information of the sound field, wherein in particular the left-right stereo localization is not only affected by the balance between the left and the right, but also by the central and left and right. The effect of the balance between the two. This observation is reflected by the use of this balancing parameter in accordance with a preferred embodiment of the present invention.

最好，當傳送一單一單下行混音信號時，已發現到除該中央/左+右平衡參數之外，還有一左/右平衡參數、一後-左/後-右平衡參數及一前/後平衡參數係一位元率-有效參數表示之最佳解答，其係彈性、抗錯誤及可免於大程度人工因素。Preferably, when transmitting a single single downmix signal, it has been found that in addition to the center/left + right balance parameter, there is a left/right balance parameter, a back-left/back-right balance parameter, and a front The /after balance parameter is the best solution for one-bit rate-effective parameter representation, which is elastic, error-resistant and immune to large artificial factors.

在接收器側上，相較於單獨藉由該已傳輸資訊來計算每一聲道之BCC合成，本發明之多平衡表示額外地使用在用以產生該下行混音聲道之下行混音架構上的資訊。因此，在該下行混音架構(未使用於習知技藝系統中)上之資料亦使用於除該平衡參數之外還有上行混音。因此，實施該上行混音操作，以便藉由該平衡參數來決定在一重建多聲道信號(針對一平衡參數形成一聲道對)內之聲道間的平衡。On the receiver side, the multi-balance representation of the present invention is additionally used in the line mixing architecture to generate the downstream mixing channel, as compared to the BCC synthesis of each channel calculated by the transmitted information alone. Information on. Therefore, the data on the downstream mixing architecture (not used in the prior art system) is also used in addition to the balancing parameters. Therefore, the upstream mixing operation is implemented to determine the balance between the channels in a reconstructed multi-channel signal (forming a pair of channels for a balanced parameter) by the balance parameter.

此觀念(亦即，不同平衡參數具有不同聲道對)可產生一些聲道而不需知道每一傳輸平衡參數。特別地，可重建該左、右及中央聲道而不需知道任何後-左/後-右平衡或不需知道前/後平衡。因為從一位元流擷取一額外參數或傳送一額外平衡參數至一接收器因而允許一個或多個額外聲道之重建，所以此結果允許非常微調之可調能力。此與該習知技藝單一參考系統成對比，在該習知技藝單一參考系統中需要每一聲道間電平差以重建所有已重建輸出聲道之所有子群或只有一子群。This concept (i.e., different balance parameters have different pairs of channels) can produce some channels without knowing each transmission balance parameter. In particular, the left, right and center channels can be reconstructed without knowing any post-left/back-right balance or without knowing the front/back balance. This result allows very fine-tuning of the ability to adjust, since an additional parameter is extracted from a single stream or an additional balanced parameter is transmitted to a receiver, thus allowing reconstruction of one or more additional channels. This is in contrast to the prior art single reference system in which a level difference between channels is required to reconstruct all subgroups or only a subgroup of all reconstructed output channels.

因為可使該等平衡參數之選擇適應於某一重建環境，所以本發明觀念亦是有彈性的。當例如：一5-聲道裝設形成該原始多聲道信號裝設時及當一4-聲道裝設形成一重建多聲道裝設時，一前-後平衡參數允許計算該組合環繞聲道而不需要對該左環繞聲道及該左環繞聲道有任何了解，其中該重建多聲道裝設只具有一單一環繞揚聲器，而該單一環繞揚聲器例如是設置在收聽者之後面。此與一單一參考聲道系統成對比，在該單一參考聲道系統中必須從該資料流擷取該左環繞聲道之聲道間電平差及該右環繞聲道之聲道間電平差。然後，必須計算該左環繞聲道及該右環繞聲道。最後，必須加入兩個聲道以針對一4-聲道重建裝設獲得該單一環繞揚聲器聲道。因為由於該更直覺及更使用者導向之平衡參數表示並非受限於一單一參考聲道而亦可允許使用原始聲道之組合以做為一平衡參數聲道對之一聲道因而可自動地發送該組合環繞聲道，所以不必在該平衡參數表示中實施所有這些步驟。The inventive concept is also flexible because the choice of the balance parameters can be adapted to a certain reconstruction environment. A front-back balance parameter allows calculation of the combined surround when, for example, a 5-channel setup forms the original multi-channel signal setup and when a 4-channel setup forms a reconstructed multi-channel setup The channel does not require any knowledge of the left surround channel and the left surround channel, wherein the reconstructed multi-channel device has only a single surround speaker, and the single surround speaker is, for example, disposed behind the listener. This is in contrast to a single reference channel system in which the inter-channel level difference of the left surround channel and the inter-channel level of the right surround channel must be retrieved from the stream. difference. Then, the left surround channel and the right surround channel must be calculated. Finally, two channels must be added to obtain the single surround speaker channel for a 4-channel reconstruction setup. Because the more intuitive and user-oriented balanced parameter representation is not limited to a single reference channel, the combination of the original channels can be allowed to be used as a balanced channel pair of channels and thus automatically The combined surround channel is sent, so it is not necessary to implement all of these steps in the balanced parameter representation.

本發明係有關於音頻信號之參數化多聲道表示的間題。本發明提供一有效方式以界定該多聲道表示之適當參數及亦提供可擷取用以表示所期望聲道組態之參數的能力而不需解碼所有聲道。本發明進一步解決針對一特定信號區段以選擇最佳參數組態之問題，以便最小化要針對該特定信號區段編碼該空間參數所需之位元率。本發明亦概述如何在一般多聲道環境中應用在先前只可應用於兩個聲道情況之解相關方法。The present invention is related to the parametric multi-channel representation of audio signals. The present invention provides an efficient way to define appropriate parameters for the multi-channel representation and also provides the ability to retrieve parameters for representing the desired channel configuration without decoding all channels. The present invention further addresses the problem of selecting an optimal parameter configuration for a particular signal segment in order to minimize the bit rate required to encode the spatial parameter for that particular signal segment. The present invention also outlines how to apply a decorrelation method that was previously only applicable to two channel situations in a general multi-channel environment.

在較佳實施例中，本發明包括下面特徵：In a preferred embodiment, the invention includes the following features:

-　在該等編碼器側上將該多聲道信號下行混音成為一個或兩個聲道表示；- downmixing the multichannel signal to one or two channel representations on the encoder side;

-　已知有該多聲道信號，界定用以表示該等多聲道信號之參數，以便在一彈性每幀基礎中最小化位元率或使該解碼器能擷取在一位元流位準上之聲道組態；- the multi-channel signal is known to define parameters for representing the multi-channel signals in order to minimize the bit rate in an elastic per-frame basis or to enable the decoder to capture the bit stream On-channel configuration;

-　假設該聲道組態目前係由該解碼器來支援，在該解碼器側上擷取該相關參數組；- assuming that the channel configuration is currently supported by the decoder, the relevant parameter set is retrieved on the decoder side;

-　假設有該目前聲道組態，產生所需數目之相互解相關信號；- assuming the current channel configuration, producing the required number of mutual decorrelation signals;

-　假設該參數組係由該位元流資料及該等解相關信號所解碼，重建該等輸出信號；- assuming that the parameter set is decoded by the bit stream data and the decorrelated signals, reconstructing the output signals;

-　界定該多聲道音頻信號之參數化，以便可使用相同參數或該等參數之一子組，而無關於該聲道組態；Defining the parameterization of the multi-channel audio signal so that the same parameter or a subset of the parameters can be used without regard to the channel configuration;

-　界定該多聲道音頻信號之參數化，以便可在一可調式編碼架構中使用該等參數，在該架構處將該參數組之子組傳送於該可調式流之不同層中；Defining the parameterization of the multi-channel audio signal so that the parameters can be used in an adjustable coding architecture at which a subset of the parameter set is transmitted in different layers of the adjustable stream;

-　界定該多聲道音頻信號之參數化，以便來自該解碼器之輸出信號的能量重建不會受下面音頻編解碼器所損害，該音頻編解碼器係用以編碼該下行混音信號；Defining the parameterization of the multi-channel audio signal such that energy reconstruction from the output signal of the decoder is not compromised by an audio codec for encoding the downstream mix signal;

-　該多聲道音頻信號之不同參數化間做切換，以便最小化用以編號該參數化之位元率負擔；- switching between different parameterizations of the multi-channel audio signal in order to minimize the bit rate burden used to number the parameterization;

-　界定該多聲道音頻信號之參數化，其中包括一用以表示該下行混音信號之能量校正因數的參數；Defining a parameterization of the multi-channel audio signal, comprising a parameter indicating an energy correction factor of the downstream mix signal;

-　使用數個相互解相關之解相關器，以重建該多聲道信號；以及- using several de-correlated decorrelators to reconstruct the multi-channel signal;

-　從一依據該傳送參數組所計算之上行混音矩陣重建該多聲道信號。- reconstructing the multi-channel signal from an upstream mixing matrix calculated from the set of transmission parameters.

現將藉由有關於所附圖式之說明範例來描述本發明，其中該等說明範例並非用以限定本發明之範圍或精神。The invention is now described by way of example with reference to the accompanying drawings, which are not intended to limit the scope or spirit of the invention.

下面描述之實施例僅用以說明本發明在音頻信號之多聲道表示的原理。可了解到在此所述之配置及細節的修改及變化對熟習該項技藝者而言係顯而易知的。因此，意思僅由即將描述之申請專利範例所限定，而非由在此之實施例的描述及說明所呈現之特定細節來限定。The embodiments described below are merely illustrative of the principles of multi-channel representation of the present invention in audio signals. It will be appreciated that modifications and variations of the configurations and details described herein will be apparent to those skilled in the art. Therefore, the meaning of the invention is to be limited only by the specific details of the description and description of the embodiments herein.

在概述如何參數化IID及ICC參數及如何應用這些參數以便重建音頻信號之多聲道表示的本發明之下面描述中，假設所有提及之信號係在一濾波器阻中之子頻帶信號或對應聲道之整個頻率範圍的一部分之一些其它頻率選擇性表示。因此，了解到本發明並非局限在一特定濾波器組，以及下面針對該信號之子頻帶表示的一頻帶來概述本發明，以及相同操作應用至所有子頻帶信號。In the following description of the present invention, which outlines how to parameterize IID and ICC parameters and how to apply these parameters in order to reconstruct a multi-channel representation of an audio signal, assume that all mentioned signals are sub-band signals or corresponding sounds in a filter stop. Some other frequency selective representation of a portion of the entire frequency range of the track. Thus, it is understood that the present invention is not limited to a particular filter bank, and that a frequency band is shown below for a sub-band representation of the signal, and that the same operation is applied to all sub-band signals.

雖然一平衡參數亦稱為一「聲道間強度差(IID)」參數，但是強調在一聲道對間之一平衡參數沒有必要是在該聲道對之第一聲道中的能量或強度及在該聲道對中之第二聲道的能量或強度。通常，該平衡參數表示在該聲道對之兩個聲道間的一聲音源的局部化。雖然此局部化通常係由能量/電平/強度差所提供，但是可使用信號之其它特性(例如：兩個聲道之功率測量或該等聲道之時間或頻率包封等)。Although a balance parameter is also referred to as an Inter-Channel Intensity Difference (IID) parameter, it is not necessary to emphasize the energy or intensity in the first channel of the channel pair. And the energy or intensity of the second channel in the pair of channels. Typically, the balance parameter represents the localization of a sound source between the two channels of the pair of channels. While this localization is typically provided by energy/level/intensity differences, other characteristics of the signal (eg, power measurements of two channels or time or frequency encapsulation of the channels, etc.) may be used.

在第1圖中，顯現一5.1聲道組態之不同聲道，其中a(t) 101表示該左環繞聲道，b(t) 102表示該左前聲道，c(t) 103表示該中央聲道，d(t) 104表示該右前聲道，e(t) 105表示該右環繞聲道，以及f(t) 106表示該LEF(低頻音效(low frequency effect))聲道。In Fig. 1, a different channel of a 5.1 channel configuration is presented, where a(t) 101 represents the left surround channel, b(t) 102 represents the left front channel, and c(t) 103 represents the center. The channel, d(t) 104 represents the right front channel, e(t) 105 represents the right surround channel, and f(t) 106 represents the LEF (low frequency effect) channel.

假設我們界定期望運算子為：Suppose we define the expectation operator as:

以及因此上面所概述之聲道的能量可依據下面來界定(在此以左環繞聲道做為範例)：And therefore the energy of the channel outlined above can be defined as follows (here the left surround channel is used as an example):

A = A =

在該編碼器側上將5-聲道下行混音成為一2-聲道表示或一1-聲道表示。此能夠以幾個方式來完成，以及一通常所使用之方式為由下面所界定之ITU下行混音：5.1至2-聲道下行混音：The 5-channel downmix is made on the encoder side as a 2-channel representation or a 1-channel representation. This can be done in several ways, and one commonly used way is the ITU Downmix as defined below: 5.1 to 2-channel downmix:

l_d =αb(t)+βa(t)+γc(t)+δf(t)l _d =αb(t)+βa(t)+γc(t)+δf(t)

r_d (t)=αd(t)+βe(t)+γc(t)+δf(t)r _d (t)=αd(t)+βe(t)+γc(t)+δf(t)

以及5.1至1-聲道下行混音：And 5.1 to 1-channel downstream mix:

常數α、β、γ及δ之通常所使用之值為：α=1，β=γ=，以及δ=0。The values commonly used for the constants α, β, γ, and δ are: α = 1, β = γ = , and δ=0.

將該IID參數界定成為兩個任意選擇聲道或加權群之聲道的能量比。假如有上述針對該5.1聲道組態之所概述的聲道之能量，則可界定幾組之IID參數。The IID parameter is defined as the energy ratio of the channels of two randomly selected channels or weighted groups. If there is the energy of the channel outlined above for the 5.1 channel configuration, then several sets of IID parameters can be defined.

第7圖表示一普通下行混音器700，其使用上述方程式，以便計算一單聲道m或兩個最佳立體聲道l_d 及r_d 。通常，該下行混音器使用某些下行混音資訊。在一線性下行混音之較佳實施例中，此下行混音資訊包括加權因數α、β、γ及δ。在本技藝中已知可使用更多或更少常數或非常數加權因數。7 shows a general view of a downlink mixer 700, using the above equation in order to calculate a best two mono or stereo channels m l _d and r _d. Typically, this downstream mixer uses some downstream mix information. In a preferred embodiment of a linear downmix, the downmix information includes weighting factors α, β, γ, and δ. It is known in the art that more or less constant or non-constant weighting factors can be used.

在一ITU建議下行混音中，α係設定為1，β及γ係設定為等於0.5之平方根，以及δ係設定為0。通常，因素α可在1.5與0.5之間變化。此外，因素β與γ係彼此不同的及在0與1之間變化。該低頻增強型聲道f(t)具有相同之事實。此聲道之因數δ可在0與1之間變化。此外，該左-下行混音及該右-下行混音之因數不必彼此相等。當考量一例如藉由一音效工程師所實施之非自動下行混音時，此變得更清楚。進一步指導該音效工程實施一創造性下行混音而非一由任何數學定律所支配之下行混音。取而代之，該音效工程師係由他本身自己的創造感覺來支配。當某一參數組記錄此「創造性的」下行混音時，將依據本發明由第8圖所示之一發明上行混音器來使用該「創造性的」下行混音，此不僅由該等參數來支配，而且亦由該下行混音架構之額外資訊來支配。In an ITU recommended downmix, the alpha system is set to 1, the beta and gamma systems are set equal to the square root of 0.5, and the delta system is set to zero. Generally, the factor a can vary between 1.5 and 0.5. Further, the factors β and γ are different from each other and vary between 0 and 1. This low frequency enhanced channel f(t) has the same fact. The factor δ of this channel can vary between 0 and 1. Furthermore, the factors of the left-downmix and the right-downmix do not have to be equal to each other. This becomes clearer when considering a non-automatic downmix performed by a sound engineer, for example. Further directing the sound engineering to implement a creative downmix rather than a mix of sounds under the control of any mathematical law. Instead, the sound engineer is dominated by his own creative feelings. When a certain parameter group records this "creative" downstream mix, the "creative" downstream mix will be used in accordance with the invention by the invention of the upstream mixer shown in Fig. 8, which is not only by the parameters It is dominated and is also dominated by additional information about the downstream mix architecture.

當如同在第7圖中已實施一線性下行混音時，該等加權參數係在該下行混音架構上要由該上行混音器所使用之最佳資訊。然而，當呈現在該下行混音架構中所使用之其它資訊時，一上行混音器亦可使用此其它資訊以做為該下行混音架構之資訊。此其它資訊例如亦可以是在一上行混音-矩陣之矩陣元素內之某些矩陣元素或某些因素或函數(例如：如第11圖所示)。When a linear downmix is implemented as in Figure 7, the weighting parameters are the best information to be used by the upstream mixer on the downstream mix architecture. However, when presenting other information used in the downlink mixing architecture, an upstream mixer can also use this other information as information for the downstream mixing architecture. Such other information may also be, for example, certain matrix elements or certain factors or functions within a matrix element of an upstream mix-matrix (eg, as shown in FIG. 11).

假設有第1圖所概述之5.1聲道組態及觀察其它聲道組態如何相關於該5.1聲道組態：對於不可獲得環繞聲道之3-聲道情況而言，亦即可依據上述記號獲得B、C及D。對於一4-聲道組態而言，可獲得B、C及D，然而亦可獲得用以表示該單環繞聲道或在此上下文中所一般表示之後聲道的A與E之組合。Suppose there is a 5.1-channel configuration as outlined in Figure 1 and how the other channel configurations are related to the 5.1-channel configuration: for 3-channel situations where surround channels are not available, Marks get B, C and D. For a 4-channel configuration, B, C, and D are available, however a combination of A and E to represent the single surround channel or the subsequent channels in this context is also available.

本發明使用應用至所有這些聲道之IID參數，亦即，該5.1聲道組態之4-聲道子組在描述該5.1聲道之IID參數組內具有一對應子組。下面IID參數組解決此問題：The present invention uses IID parameters applied to all of these channels, i.e., the 4-channel sub-group of the 5.1 channel configuration has a corresponding sub-group within the IID parameter set describing the 5.1 channel. The following IID parameter group resolves this issue:

明顯可知r₁ 參數對應於該左下行混音聲道與該右下行混音聲道間之能量比。r₂ 參數對應於中央聲道與該左及右前聲道間之能量比。r₃ 參數對應於該三個前聲道與該兩個環繞聲道間之能量比。r₄ 參數對應於該兩個環繞聲道間之能量比。r₅ 參數對應於該LFE聲道與所有其它聲道間之能量比。It is apparent that the r ₁ parameter corresponds to the energy ratio between the left downmix channel and the right down mix channel. The r ₂ parameter corresponds to the energy ratio between the center channel and the left and right front channels. The r ₃ parameter corresponds to the energy ratio between the three front channels and the two surround channels. The r ₄ parameter corresponds to the energy ratio between the two surround channels. The r ₅ parameter corresponds to the energy ratio between the LFE channel and all other channels.

在第4圖中，描述上面所述之能量比。不同輸出聲道係由101至105所表示及相同於第1圖所示以及因而在此不做詳細陳述。將該揚聲器裝設分割成左及右半部，其中該中央聲道103係兩個半部之部分。該左半部平面與該右半部平面間之能量比正好是r₁ 參數。此係藉由第4圖中之r₁ 下方的實體線來表示。再者，在該中央聲道103與該左前102及右前103聲道間之能量分佈係由r₂ 所表示。最後，在該整個前聲道裝設(102、103及104)與該後聲道(101及105)間之能量分佈係以r₃ 參數由第4圖中之箭頭來描述。In Figure 4, the energy ratios described above are described. The different output channels are denoted by 101 to 105 and are identical to those shown in Fig. 1 and thus will not be described in detail herein. The speaker assembly is divided into left and right halves, wherein the center channel 103 is part of two halves. The energy ratio between the left half plane and the right half plane is exactly the r ₁ parameter. This is indicated by the solid line below r _{1 in} Figure 4. Furthermore, the energy distribution between the center channel 103 and the left front 102 and right front 103 channels is represented by r ₂ . Finally, installation (102, 103, and 104) in the entire front channel and the rear channel energy (101 and 105) between the distribution system to be described by the parameters r ₃ in FIG. 4 of the arrows.

假設有上述參數化及該傳輸單下行混音聲道之能量：Suppose there is the above parameterization and the energy of the single downlink mixing channel of the transmission:

該等重建聲道之能量可表示成為：The energy of the reconstructed channels can be expressed as:

因此，可將M信號之能量分配至該等重建聲道，導致重建聲道具有相同於該等原始聲道之能量。Thus, the energy of the M signal can be distributed to the reconstructed channels, resulting in the reconstructed channels having the same energy as the original channels.

上述較佳上行混音架構係描述於第8圖中。從F、A、E、C、B及D之方程式可清楚知道該下行混音架構之由該上行混音器所使用的資訊係該等加權因數α、β、γ及δ，在使此加權或未加權聲道一起加入或彼此扣減以便獲得某一數目之下行混音聲道前，使用該等加權因數以加權該等原始聲道，其中下行混音聲道之數目小於原始聲道之數目。因此，從第8圖可清楚知道依據本發明該等重建聲道之能量不僅由從一編碼側傳送至一解碼側之平衡參數所決定，而且可由該下行混音因數α、β、γ及δ。The preferred upstream mixing architecture described above is depicted in FIG. From the equations of F, A, E, C, B, and D, it is clear that the information used by the upstream mixer for the downlink mixing architecture is the weighting factors α, β, γ, and δ, which are weighted. Or the unweighted channels are added together or deducted from each other to obtain a certain number of lower line mixing channels, the weighting factors are used to weight the original channels, wherein the number of downstream mixing channels is less than the original channel number. Therefore, it is clear from Fig. 8 that the energy of the reconstructed channels according to the present invention is determined not only by the balance parameters transmitted from an encoding side to a decoding side, but also by the downstream mixing factors α, β, γ, and δ. .

當考量第8圖時，變成可清楚知道為了計算該左及右能量B及D，可在該方程式中使用已計算之聲道能量F、A、E及C。然而，此沒有必要包含一連續上行混音架構。取而代之，為了獲得一例如使用某一上行混音矩陣(具有某些上行混音矩陣元素)來實施之完全平行上行混音架構，將A、C、E及F之方程式插入B及D之方程式。因此，變得可清楚知道重建聲道能量僅由平衡參數、下行混音聲道及該下行混音架構之資訊(例如：該等下行混音因數)來決定。When considering Fig. 8, it becomes clear that in order to calculate the left and right energies B and D, the calculated vocal energy F, A, E, and C can be used in the equation. However, it is not necessary to include a continuous upstream mixing architecture. Instead, in order to obtain a fully parallel upstream mixing architecture implemented using, for example, an upstream mixing matrix (with certain upstream mixing matrix elements), the equations of A, C, E, and F are inserted into the equations of B and D. Therefore, it becomes clear that the reconstructed channel energy is determined only by the balance parameters, the downmix channel, and the information of the downmix architecture (eg, the downmix factors).

如從下面將可明顯知道，假設有上述IID參數，則明顯易知已解決用以界定一參數組之IID參數(可用於數個聲道組態)的問題。觀察該三個聲道組態(亦即，從一可獲得聲道重建三個前聲道)來做為一個範例，可明顯易知因為A、E及F聲道不存在，所以r₃ 、r₄ 及r₅ 參數係不顯著的。亦可明顯易知因為參數r₁ 描述該左與右前聲道間之能量比及參數r₂ 描述該中央聲道與該左及右前聲道間之能量比，所以參數r₁ 及r₂ 係足以從一下行混音單聲道重建該三個聲道。As will be apparent from the following, assuming the above IID parameters, it is apparent that the problem of defining an IID parameter of a parameter set (available for several channel configurations) has been solved. Observing the three-channel configuration (that is, reconstructing three front channels from an available channel) as an example, it is obvious that since the A, E, and F channels do not exist, r ₃ , The r ₄ and r ₅ parameters are not significant. It can also be clearly seen that since the parameter r ₁ describes the energy ratio between the left and right front channels and the parameter r ₂ describes the energy ratio between the center channel and the left and right front channels, the parameters r ₁ and r ₂ are sufficient. The three channels are reconstructed from the next line of mixing.

在更一般情況中，可容易地看到上述之IID參數(r₁ ...r₅ )係應用至用以從m個聲道重建n個聲道之所有子組，其中m<n6。觀察第4圖，可以說：In a more general case, it can be easily seen that the above IID parameters (r ₁ ... r ₅ ) are applied to all subgroups for reconstructing n channels from m channels, where m < n 6. Looking at Figure 4, you can say:

-　對於一從1聲道重建2聲道之系統而言，從r₁ 參數獲得用以保持該等聲道間之正確能量比的充分資訊；- for a system that reconstructs 2 channels from 1 channel, obtaining sufficient information from the r ₁ parameter to maintain the correct energy ratio between the channels;

-　對於一從1聲道重建3聲道之系統而言，從r₁ 及r₂ 參數獲得用以保持該等聲道間之正確能量比的充分資訊；- for a system that reconstructs 3 channels from 1 channel, obtaining sufficient information from the r ₁ and r ₂ parameters to maintain the correct energy ratio between the channels;

-　對於一從1聲道重建4聲道之系統而言，從r₁ 、r₂ 及r₃ 參數獲得用以保持該等聲道間之正確能量比的充分資訊；- for a system that reconstructs 4 channels from 1 channel, obtaining sufficient information from the r ₁ , r ₂ and r ₃ parameters to maintain the correct energy ratio between the channels;

-　對於一從1聲道重建5聲道之系統而言，從r₁ 、r₂ 、r₃ 及r₄ 參數獲得用以保持該等聲道間之正確能量比的充分資訊；- For a system that reconstructs 5 channels from 1 channel, sufficient information is obtained from the r ₁ , r ₂ , r ₃ and r ₄ parameters to maintain the correct energy ratio between the channels;

-　對於一從1聲道重建5.1聲道之系統而言，從r₁ 、r₂ 、r₃ 、r₄ 及r₅ 參數獲得用以保持該等聲道間之正確能量比的充分資訊；- For a system that reconstructs 5.1 channels from 1 channel, sufficient information is obtained from the r ₁ , r ₂ , r ₃ , r ₄ and r ₅ parameters to maintain the correct energy ratio between the channels;

-　對於一從2聲道重建5.1聲道之系統而言，從r₂ 、r₃ 、r₄ 及r₅ 參數獲得用以保持該等聲道間之正確能量比的充分資訊。- For a system that reconstructs 5.1 channels from 2 channels, sufficient information is obtained from the r ₂ , r ₃ , r ₄ and r ₅ parameters to maintain the correct energy ratio between the channels.

上述可調能力特徵可藉由第10b圖中之列表來描述。第10a圖中所述且在稍後所說明之可調位元流亦可適用於第10b圖中之列表，以便獲得比第10a圖所述者更細之可調能力。The above adjustable capability features can be described by the list in Figure 10b. The tunable bit stream described in Figure 10a and described later may also be applied to the list in Figure 10b in order to obtain a finer tunable capability than those described in Figure 10a.

本發明之優點特別在於：可容易地從一單平衡參數r₁ 重建該左及右聲道，而無需知道或擷取任何其它平衡參數。為此目的，在第8圖中之B、D的方程式中，將聲道A、C、F及E簡單地設定成為零。The advantage of the invention is in particular that the left and right channels can be easily reconstructed from a single balancing parameter r ₁ without knowing or taking any other balancing parameters. For this purpose, in the equations of B and D in Fig. 8, the channels A, C, F, and E are simply set to zero.

在另一情況中，當只考量該平衡參數r₂ 時，該等重建聲道係該中央聲道與該低頻聲道(此聲道未被設定成零)間之加總及該左與右聲道間之加總。因此，可只使用一單一參數來重建該中央聲道及該單音信號。此特徵對於一簡單3-聲道表示是有用的，其中例如藉由對分以從左及右之加總獲得該左及右信號，以及其中藉由該平衡參數r₂ 正確地決定該中央與該左右之加總間的能量。In another case, when only the balance parameter r ₂ is considered, the reconstructed channels are the sum of the center channel and the low frequency channel (the channel is not set to zero) and the left and right The sum of the channels. Therefore, the center channel and the tone signal can be reconstructed using only a single parameter. This feature is useful for a simple 3-channel representation in which the left and right signals are obtained, for example, by summing from left and right, and wherein the central and correct decisions are made by the balancing parameter r ₂ The energy between the left and right.

在此上下文中，該等平衡參數r₁ 或r₂ 係位於一較低調整層中。In this context, the equalization parameters r ₁ or r ₂ are located in a lower adjustment layer.

至於第10b圖之列表中的第二項，表示如何只使用兩個平衡參數取代所有5個平衡參數來產生三個聲道B、D及C與F間之加總，相較位於該較低調整層中之參數r₁ 或r₂ ，這些參數r₁ 及r₂ 中之一可以已經在一較高調整層中。As for the second item in the list of Figure 10b, it shows how to replace all five balance parameters with only two balance parameters to produce the sum of the three channels B, D and C and F, which is lower than the lower one. The parameter r ₁ or r ₂ in the layer is adjusted, one of these parameters r ₁ and r ₂ may already be in a higher adjustment layer.

當考量第8圖中之方程式時，變得清楚知道：為了計算C，將未擷取參數r₅ 及另一非擷取參數r₃ 設定成零。在另一情況中，亦將該等未使用聲道A、E及F亦設定成零，以便可計算該三個聲道B、D及該中央聲道與該低頻增強型聲道F之組合。When considering the equation in Fig. 8, it becomes clear that in order to calculate C, the untaken parameter r ₅ and the other non-taken parameter r _{3 are} set to zero. In another case, the unused channels A, E, and F are also set to zero so that the three channels B, D and the combination of the center channel and the low frequency enhanced channel F can be calculated. .

當使一4-聲道表示上行混音時，只從該參數資料流擷取r₁ 、r₂ 及r₃ 係足夠的。在此上下文中，相較於參數r₁ 或r₂ ，r₃ 可以在一下一較高調整層中。因為如同稍後有關於第6圖所述，已從該等前聲道與該等後聲道之組合獲得第三平衡參數r₃ ，所以該4-聲道組態特別適合相關於本發明之超平衡參數表示。此乃是基於下面事實：該參數r₃ 係一從該聲道對所獲得之前-後平衡參數，該聲道對具有該等後聲道A與E之組合(做為第一聲道)及具有左聲道B、右聲道E及中央聲道C之組合(做為該等前聲道)。When a 4-channel is indicated for the upstream mix, it is sufficient to only extract r ₁ , r ₂ and r ₃ from the parameter data stream. In this context, r ₃ can be in a next higher adjustment layer than the parameter r ₁ or r ₂ . Since the third balancing parameter r ₃ has been obtained from the combination of the front channel and the rear channels as described later with respect to FIG. 6, the 4-channel configuration is particularly suitable for the present invention. Overbalanced parameter representation. This is based on the fact that the parameter r ₃ is a pre-post balance parameter obtained from the pair of channels, the pair of channels having the combination of the rear channels A and E (as the first channel) and It has a combination of left channel B, right channel E and center channel C (as the front channel).

因此，如同是在一單一參數聲道裝設中之情況，可自動地獲得兩個環繞聲道之組合聲道能量，而無需任何進一步分離計算及隨後組合。Thus, as is the case in a single parametric channel setup, the combined channel energy of the two surround channels can be automatically obtained without any further separation calculations and subsequent combinations.

當必須從一單聲道重建五個聲道時，需要另一平衡參數r₄ 。此參數r₄ 可再次位於一下一較高調整層中。When five channels must be reconstructed from a single channel, another balancing parameter r _{4 is required} . This parameter r ₄ can again be located in the next higher adjustment layer.

當必須實施一5.1重建時，需要每一平衡參數。因此，必須將一下一較高調整層(包括該下一平衡參數r₅ )傳送至一接收器及由該接收器來估計。Each balance parameter is required when a 5.1 reconstruction must be performed. Therefore, the next higher adjustment layer (including the next balance parameter r ₅ ) must be transmitted to and estimated by the receiver.

然而，使用依據聲道之擴充數目來擴充該IID參數的相同方法，可擴充上述IID參數以涵蓋具有比該5.1組態大之數目的聲道之聲道組態。因此，本發明並非局限於上面概述之範例。However, using the same method of augmenting the IID parameter depending on the number of expansions of the channel, the above IID parameters can be extended to cover a channel configuration having a larger number of channels than the 5.1 configuration. Accordingly, the invention is not limited to the examples outlined above.

現在觀察該聲道組態係一5.1聲道組態之情況，此為最通常使用情況中之一。再者，假設從兩個聲道重建該5.1聲道。對於此情況而言，可藉由以下面式子來取代該等參數r₃ 及r₄ 以界定一不同組之參數：Now observe that the channel configuration is a 5.1 channel configuration, which is one of the most common use cases. Again, assume that the 5.1 channel is reconstructed from two channels. For this case, the parameters r ₃ and r ₄ can be replaced by the following formula to define a different set of parameters:

該等參數q₃ 及q₄ 表示該前與後左聲道間之能量比及該前與後右聲道間之能量比。可想像幾個其它參數化。The parameters q ₃ and q ₄ represent the energy ratio between the front and rear left channels and the energy ratio between the front and rear right channels. Imagine a few other parameterizations.

在第5圖中，可見到修改之參數化。取代具有一用以概述該前與後聲道間之能量分佈的參數(如第4圖中之r₃ 所概述)及一用以描述該左環繞聲道與該右環繞聲道間之能量分佈(如第4圖中之r₄ 所概述)，使用該等參數q₃ 及q₄ 以描述該左前102與左環繞101聲道間之能量比及該右前聲道104與該右環繞聲道105間之能量比。In Figure 5, the parameterization of the modification can be seen. Instead of having a parameter for summarizing the energy distribution between the front and rear channels (as outlined by r ₃ in FIG. 4) and one for describing the energy distribution between the left surround channel and the right surround channel (As outlined in r ₄ of FIG. ₄ ), the parameters q ₃ and q _{4 are used} to describe the energy ratio between the left front 102 and the left surround 101 and the right front channel 104 and the right surround channel 105. The energy ratio between the two.

本發明教示可使用幾個參數組以表示該等多聲道信號。本發明之一額外特徵係可依據所使用之參數的量化之型態以選擇不同參數化。The teachings of the present invention may use several sets of parameters to represent the multi-channel signals. An additional feature of the present invention is that different parameterizations can be selected depending on the type of quantization of the parameters used.

以一使用參數化之粗量化的系統做為一個範例，由於高位元率限制，因而應該使用一在該上行混音程序中不會擴大誤差之參數化。Taking a system using parameterized coarse quantization as an example, due to the high bit rate limitation, a parameterization that does not expand the error in the upstream mixing program should be used.

觀察在一用以從一聲道重建5.1聲道之系統中上述重建能量的兩個表示式：Observe two representations of the above reconstructed energy in a system for reconstructing 5.1 channels from one channel:

明顯可知由於該M、A、C及F參數之相當小量化效應，因而該等減算會產生該B及D能量之大變化。It is apparent that due to the relatively small quantization effect of the M, A, C and F parameters, such subtraction produces a large change in the B and D energy.

依據本發明，應該使用一幾乎對該等參數之量化不會有敏感之不同參數化。因此，如果使用粗量化，則上述所界定之r₁ 參數：In accordance with the present invention, a different parameterization that is not sensitive to the quantification of the parameters should be used. Therefore, if coarse quantization is used, the r ₁ parameters defined above are:

可由依據下式之替代界定來取代：Can be replaced by an alternative definition according to the following formula:

此產生依據下式之重建能量的方程式：This produces an equation for the reconstruction energy according to the following formula:

及A、E、C及F之重建能量的方程式保持與上述相同。明顯可知此參數從量化觀點來看表示一最佳狀態系統。The equations for the reconstruction energies of A, E, C, and F remain the same as described above. It is apparent that this parameter represents an optimal state system from a quantitative point of view.

在第6圖中，描述上述所說明之能量比。不同輸出聲道以101至105來表示且相同於第1圖以及因此在此不做進一步詳述。將該揚聲器裝設分割成前部及後部。藉由第6圖中由r₃ 參數所表示之箭頭來描述該整個前聲道裝設(102、103及104)與該等後聲道(101及105)間之能量分佈。In Fig. 6, the energy ratios described above are described. The different output channels are denoted by 101 to 105 and are identical to Figure 1 and therefore will not be described in further detail herein. The speaker assembly is divided into a front portion and a rear portion. The energy distribution between by the arrow in FIG. 6 indicated by the r ₃ parameter to describe the whole of the front mounting channel (102, 103, and 104) with such rear channel (101 and 105).

本發明之另一重要顯著特徵在於當觀察該參數化Another important distinguishing feature of the present invention is that when the parameterization is observed

時，從量化觀點來看它不僅是一更佳狀態系統。上述參數化亦具有下列優點：可獲得用以重建三個前聲道之參數而不會對該等環繞聲道有任何影響。可相像一參數r₂ 係描述該中央聲道與所有其它聲道間之關係。然而，此將具有下例缺點：該等環繞聲道將包含在該等前聲道所述之參數的估計中。From a quantitative point of view, it is not only a better state system. The above parameterization also has the advantage that parameters for reconstructing the three front channels can be obtained without any effect on the surround channels. The relationship between the center channel and all other channels can be described as a parameter r ₂ system. However, this would have the disadvantage of the following example: the surround channels will be included in the estimates of the parameters described in the front channels.

記住在本發明中所描述之參數化亦可應用至聲道間之關聯或同調的測量，明顯可知在r₂ 之計算中包含該等後聲道對精確地重建該等前聲道之成功有顯著的負面影響。It is to be noted that the parameterization described in the present invention can also be applied to the correlation or coherence measurement between channels, and it is apparent that the success of reconstructing the front channels by including the rear channel pairs in the calculation of r ₂ is known. There are significant negative effects.

可相像在所有前聲道中之相同信號及在該等後聲道中之完全無相關信號的情況，以做為一個範例。此並非罕見的，假設經常使用該等後聲道以重建該原始聲音之周圍環境資訊。It can be seen as an example of the same signal in all front channels and the absence of relevant signals in the back channels. This is not uncommon, assuming that the back channels are often used to reconstruct the surrounding environment of the original sound.

如果描述該中央聲道係有關於所有其它聲道，則因為該等後聲道完全不相關，所以該中央與所有其它聲道之加總間之相關程度將相當低。對於一用以估計該前左/右聲道與該後左/右聲道間之相關性的參數具有相同之事實。If the description of the center channel is for all other channels, the degree of correlation between the center and all other channels will be quite low because the back channels are completely uncorrelated. The same fact is used for a parameter used to estimate the correlation between the front left/right channel and the rear left/right channel.

因此，我們達成一可正確地重建該等能量之參數化，然而該參數化並沒有包括所有前聲道係相同(亦即，非常相關)的資訊。該參數化確實包括將該左及右前聲道解相關至該等後聲道及亦將該中央聲道解相關至該等後聲道之資訊。然而，所有前聲道係相同之事實係無法從此一參數化來推論。Therefore, we have reached a parameterization that correctly reconstructs these energies, however this parameterization does not include all of the same (ie, very relevant) information about the front channel. The parameterization does include decoupling the left and right front channels to the back channels and also correlating the center channels to the back channels. However, the fact that all front channels are the same cannot be inferred from this parametric.

因為該等後聲道未包含在該解碼器側上所使用之參數的估計中以重建該等前聲道，所以此可藉由使用下列本發明所教示之式子來克服：Since the back channels are not included in the estimate of the parameters used on the decoder side to reconstruct the front channels, this can be overcome by using the following formula taught by the present invention:

依據本發明藉由r₂ 來表示該中央聲道103與該左前102及右前103聲道間之能量分佈。藉由r₄ 來描述該左環繞聲道101與該右環繞聲道105間之能量分佈。最後，藉由r₁ 來提供該左前聲道102與該右前聲道104間之能量分佈。明顯可知，除r₁ 在此對應於該左前揚聲器與該右前揚聲器間之能量分佈(因相對於整個左側及整個右側)之外，所有參數相同於第4圖中所述。基於完整性，該參數r₅ 亦提供用以概述該中央聲道103與該LFE聲道106間之能量分佈。The energy distribution between the center channel 103 and the left front 102 and right front 103 channels is represented by r _{2 in} accordance with the present invention. The energy distribution between the left surround channel 101 and the right surround channel 105 is described by r ₄ . Finally, r ₁ is provided by the front left channel 102 and the distribution of energy between the right front channel 104. It is apparent that all parameters are the same as described in FIG. 4 except that r ₁ corresponds here to the energy distribution between the left front speaker and the right front speaker (as opposed to the entire left and the entire right side). Based on the integrity, the parameter r _{5 is} also provided to summarize the energy distribution between the center channel 103 and the LFE channel 106.

第6圖顯示本發明之較佳參數化實施例的概要。該第一平衡參數r₁ (由實線所表示)構成一前-左/前-右平衡參數。該第二平衡參數r₂ 係一中央左-右平衡參數。該第三平衡參數r₃ 構成一前/後平衡參數。該第四平衡參數r₄ 構成一後-左/後-右平衡參數。最後，該第五參數r₅ 構成一中央/LFE平衡參數。Figure 6 shows an overview of a preferred parameterized embodiment of the present invention. The first balance parameter r ₁ (represented by the solid line) constitutes a front-left/front-right balance parameter. The second balance parameter r ₂ is a central left-right balance parameter. The third balance parameter r ₃ constitutes a pre/post balance parameter. The fourth balance parameter r ₄ constitutes a back-left/back-right balance parameter. Finally, the fifth parameter r ₅ constitutes a central/LFE balancing parameter.

第4圖顯示一相關情況。該第一平衡參數r₁ (在一下行混音左/右平衡中藉由第4圖中之實線來描述)可由一在該等聲道B與D(下面聲道對)間所界定之原始前-左/前-右平衡參數來取代。此以第4圖中之虛線r₁ 來描述及對應於第5圖及第6圖中之實線r₁ 。Figure 4 shows a related situation. The first balance parameter r ₁ (described by the solid line in FIG. 4 in a downmix left/right balance) may be defined by a channel B and D (lower channel pair) The original front-left/pre-right balance parameters are replaced. This is described by the broken line r _{1 in} FIG. 4 and corresponds to the solid line r _{1 in} FIGS. 5 and 6.

在一雙聲道情況中，該等參數r₃ 及r₄ (亦即，該前/後平衡參數及該後-左/右平衡參數)由兩個單側前/後參數所取代。該第一單側前/後參數q₃ 亦可被視為該第一平衡參數，其中該第一平衡參數係從該左環繞聲道A及該左聲道B所構成之聲道對所獲得。該第二單側前/左平衡參數係該參數q₄ ，其可被視為該第二參數，該第二參數係根據該右聲道D及該右環繞聲道E所構成之第二聲道對。再者，兩個聲道對係彼此不相關的。該中央/左-右平衡參數r₂ 亦具有相同之事實，該中央/左-右平衡參數r₂ 具有一中央聲道C以做為一第一聲道及該左及右聲道B及D之加總以做為一第二聲道。In the case of a two-channel, the parameters r ₃ and r ₄ (i.e., the front/rear balance parameters and the back-left/right balance parameters) are replaced by two one-sided front/rear parameters. The first one-sided front/rear parameter q ₃ can also be regarded as the first balance parameter, wherein the first balance parameter is obtained from the channel pair formed by the left surround channel A and the left channel B . The second one-sided front/left balance parameter is the parameter q ₄ , which can be regarded as the second parameter, and the second parameter is based on the second sound formed by the right channel D and the right surround channel E Right. Furthermore, the two channel pairs are not related to each other. The central/left-right balance parameter r ₂ also has the same fact, the central/left-right balance parameter r ₂ has a center channel C as a first channel and the left and right channels B and D The sum is used as a second channel.

依據本發明界定另一參數化，該另一參數化針對一從一個或兩個聲道重建5.1聲道之系統本身相當適合於粗量化。Another parameterization is defined in accordance with the present invention, which is quite suitable for coarse quantization for a system that reconstructs 5.1 channels from one or two channels.

至於一個聲道至5.1聲道而言：As for one channel to 5.1 channel:

及 and

以及至於二個聲道至5.1聲道之情況：As for the case of two channels to 5.1 channels:

及 and

明顯可知上述參數化包括比嚴格理論觀點所需要要多之參數，以正確地再分配該等傳輸信號之能量至該等重建之信號。然而，該參數化對量化誤差之敏感係非常遲鈍的。It is apparent that the above parameterization includes more parameters than are required by the strict theoretical point of view to correctly redistribute the energy of the transmitted signals to the reconstructed signals. However, the sensitivity of this parameterization to quantization errors is very slow.

上述針對一2-聲道裝設所提及之參數組使用幾個參考聲道。然而，相較於第6圖中之參數組態，第7圖中之參數組僅依據下行混音聲道而非原始聲道來做為參考聲道。該等平衡參數q₁ 、q₃ 及q₄ 係由完全不同聲道對所獲得。The above mentioned reference sets for a 2-channel installation use several reference channels. However, compared to the parameter configuration in Figure 6, the parameter set in Figure 7 is only used as the reference channel based on the downmix channel instead of the original channel. The equalization parameters q ₁ , q ₃ and q ₄ are obtained from completely different pairs of channels.

雖然已描述幾個本發明實施例，其中用以獲得平衡參數之聲道對僅包括原始聲道(第4圖、第5圖及第6圖)或包括原始聲道及下行混音聲道(第4圖及第5圖)或僅依據該下行混音聲道以做為在第7圖之底部所表示的參考聲道，但是最好在第2圖之環繞資料編碼器206內所包括之參數產生器係操作以僅使用原始聲道或原始聲道之組合而非在該等聲道對中之聲道的一基本聲道或基本聲道之組合，其中該等平衡參數係根據該等聲道對。此乃是由於無法完全保證該單一基本聲道或該兩個立體聲基本聲道不在會在從一環繞編碼器傳輸至一環繞解碼器期間發生能量變化。可藉由一音頻編碼器205(第2圖)或一音頻解碼器302(第3圖)在一低-位元率狀態下操作以造成該下行混音聲道或該單一下行混音聲道之能量變化。此情況會導致該單下行混音聲道或該等立體下行混音聲道之能量的操控，該操控在該左與右立體聲下行混音聲道間可以是不同的或甚至可以是頻率選擇性的或時間選擇性的。Although several embodiments of the invention have been described in which the channel pairs used to obtain the balance parameters include only the original channels (Figs. 4, 5, and 6) or include the original channels and the downmix channels ( 4 and 5) or only the downlink mixing channel as the reference channel represented at the bottom of FIG. 7, but preferably included in the surround encoder 206 of FIG. The parameter generator is operative to use only a combination of the original channel or the original channel rather than a combination of a base channel or a base channel of the channel in the pair of channels, wherein the equalization parameters are based on the Channel pair. This is due to the inability to fully guarantee that the single base channel or the two stereo base channels will not undergo an energy change during transmission from a surround encoder to a surround decoder. The downlink mixer channel or the single downmix channel can be caused by an audio encoder 205 (Fig. 2) or an audio decoder 302 (Fig. 3) operating in a low-bit rate state. The energy changes. This condition may result in manipulation of the energy of the single downmix channel or the stereo downmix channels, which may be different or even frequency selective between the left and right stereo downmix channels Or time selective.

為了完全安全地反對此能量變化，依據本發明針對每一下行混音聲道之每一區域及頻帶傳送一額外電平參數。所以當該等平衡參數係根據該原始信號而非該下行混音信號時，因為任何能量校正將不影響該等原始聲道間之平衡情況，所以一單一校正因數對每一頻帶係足夠的。甚至當沒有傳送額外電平參數時，任何下行混音聲道能量變化將不會在該音頻圖像中導致音源之失真局部化，然而將只會導致一般音量變化，該一般音量變化不會像藉由改變平衡狀態所造成之音源的遷移一樣惱人。In order to completely and safely oppose this energy change, an additional level parameter is transmitted in accordance with the present invention for each region and frequency band of each downstream mixing channel. Therefore, when the equalization parameters are based on the original signal rather than the downmix signal, a single correction factor is sufficient for each band because any energy correction will not affect the balance between the original channels. Even when no extra level parameters are transmitted, any downmix channel energy changes will not cause distortion localization of the source in the audio image, but will only result in a general volume change, which will not be like a normal volume change. The migration of sound sources caused by changing the balance state is as annoying.

重要的是要注意需要小心，以便(該等下行混音聲道之)能量M係上面所概述之能量B、D、A、E、C及F之加總。由於在被下行混音至一個聲道之不同聲道間的相位相依性，所以不會經常是這種情況。可傳送該能量校正因數以做為一額外參數r_M ，以及因此將在該解碼器側上所接收之下行混音信號界定成為：It is important to note that care must be taken so that the energy M (of the downstream mixing channels) is the sum of the energies B, D, A, E, C and F outlined above. This is not always the case due to the phase dependence between the different channels being downmixed to one channel. The energy correction factor can be transmitted as an additional parameter r _M , and thus the line mix signal received on the decoder side is defined as:

在第9圖中，概述依據本發明之額外參數r_M 的應用。在將該下行混音信號傳送至該上行混音模組701-705前在901中藉由該額外參數r_M 修改該下行混音信號。這些係相同於第7圖所述者及在此將不做進一步詳述。熟習該項技藝者明顯可知上面單聲道下行混音範例之參數r_M 可擴充至每一下行混音一個參數及因此並非局限於一單一下行混音聲道。In Figure 9, an application of the additional parameter r _M in accordance with the present invention is outlined. The downlink mix signal is modified by the additional parameter r _M in 901 before the downlink mix signal is transmitted to the upstream mix module 701-705. These are the same as those described in Figure 7 and will not be described in further detail herein. It will be apparent to those skilled in the art that the parameters r _{M of the} above mono downmixing paradigm can be extended to one parameter per downstream mix and thus are not limited to a single downmix channel.

第9a圖描述一發明電平參數計算器900，然而第9b圖表示一發明電平校正器902。第9a圖表示在該編碼器側上之情況，以及第9b圖描述在該解碼器側上之對應情況。該電平參數或「額外」參數r_M 係一用以提供某一能量比之校正因數。假設下面示範性情節來做解釋。針對某一原始多聲道信號，一方面具有一「主下行混音」及另一方面具有一「參數下行混音」。已依據例如主觀品質印象由在一播音室中之音效工程師產生該主下行混音。此外，某一音頻儲存媒體亦包括該參數下行混音，該參數下行混音已藉由例如第2圖之環繞編碼器203來實施。該參數下行混音包括一基本聲道或兩個基本聲道，上述基本聲道使用該原始多聲道信號之平衡參數組或任何其它參數表示來形成該多聲道重建之基礎。Figure 9a depicts an inventive level parameter calculator 900, while Figure 9b shows an inventive level corrector 902. Figure 9a shows the situation on the encoder side, and Figure 9b depicts the corresponding situation on the decoder side. The level parameter or "extra" parameter r _M is used to provide a correction factor for a certain energy ratio. The following exemplary scenarios are assumed to be explained. For a certain original multi-channel signal, on one hand, there is a "mainstream downmix" and on the other hand, a "parameter downmix". The primary downmix has been generated by a sound engineer in a studio based on, for example, a subjective quality impression. In addition, an audio storage medium also includes the parameter downlink mix, which is implemented by, for example, the surround encoder 203 of FIG. The parameter downmix includes a base channel or two base channels that use the balance parameter set of the original multichannel signal or any other parameter representation to form the basis for the multichannel reconstruction.

例如可以是下面情況：廣播員希望不要傳送該參數下行混音，然而希望將該主下行混音從一發送器傳送至接收器。此外，為了將該主下行混音提升至多聲道表示，該廣播員亦傳送該原始多聲道信號之一參數表示。因為(在一頻帶中及在一區塊中之)能量可(或通常將)在該主下行混音與該參數下行混音間做變化，所以在區塊900中產生一相對電平參數r_M 及將其傳送至該接收器以做為一額外參數。該電平參數係從該主下行混音及該參數下行混音所獲得及最好是在該主下行混音及該參數下行混音之一區塊及一頻帶內之能量的比率。For example, it may be the case that the broadcaster wishes to not transmit the parameter downmix, but it is desirable to transfer the primary downmix from a transmitter to the receiver. In addition, in order to promote the main downmix to a multi-channel representation, the announcer also transmits a parameter representation of the original multi-channel signal. Since (in a frequency band and in a block) energy can (or will typically) vary between the primary downmix and the parameter downmix, a relative level parameter r is generated in block 900. _M and pass it to the receiver as an additional parameter. The level parameter is obtained from the main downmix and the parameter downmix and preferably the ratio of the energy in one of the main downmix and the parameter downmix and a band.

通常，計算該電平參數以成為該等原始聲道之能量(E_orig )的加總與該(等)下行混音聲道之能量間的比率，其中此(等)下行混音聲道可以是該參數下行混音(E_PD )或該主下行混音(E_MD )或任何其它下行混音信號。通常，使用從一編碼器傳送至一解碼器之特定下行混音信號的能量。Typically, the level parameter is calculated to be the ratio of the sum of the energy of the original channels (E _orig ) to the energy of the (equal) downmix channel, wherein the (equal) downmix channel can Is the parameter Downmix (E _PD ) or the Main Downmix (E _MD ) or any other downstream mix. Typically, the energy of a particular downstream mix signal transmitted from an encoder to a decoder is used.

第9b圖描述該電平參數使用之一解碼器側實施。將該電平參數及該下行混音信號輸入至該電平校正器區塊902。該電平校正器依據該電平參數校正該單一基本聲道或該幾個基本聲道。因為該額外參數r_M 係一相對值，所以此相對值係藉由該對應基本聲道之能量來操控。Figure 9b depicts the implementation of this level parameter using one of the decoder sides. The level parameter and the downmix signal are input to the level corrector block 902. The level corrector corrects the single base channel or the base channels according to the level parameter. Since the additional parameter r _M is a relative value, the relative value is manipulated by the energy of the corresponding basic channel.

雖然第9a及9b圖表示一對該下行混音聲道或該等下行混音聲道施加電平校正之情況，但是該亦可將該電平參數整合至該上行混音矩陣中。為此目的，在第8圖之方程式中的m之每次出現係由「r_M M」來取代。Although the figures 9a and 9b show the case where a level correction is applied to the pair of downmix channels or the downmix channels, the level parameters can also be integrated into the upmix matrix. For this purpose, each occurrence of m in the equation of Fig. 8 is replaced by "r _M M".

硏究當從2聲道重建5.1聲道之情況，可觀察下面描述。When the 5.1 channel is reconstructed from 2 channels, the following description can be observed.

如果使用具有第2圖及第3圖所概述之編解碼器205及302的本發明，需要一些更多考量。觀察稍早所界定之IID參數，其中依據下面式子來界定r₁ ：If the invention with codecs 205 and 302 as outlined in Figures 2 and 3 is used, some more considerations are required. Observe the IID parameter defined earlier, where r ₁ is defined according to the following formula:

因為該系統從2聲道重建5.1聲道，其中假設該兩個傳輸聲道係該等環繞聲道之立體聲下行混音，所以此參數係暗示地可在該解碼器側上獲得。Since the system reconstructs 5.1 channels from 2 channels, assuming that the two transmission channels are stereo downmixes of the surround channels, this parameter is implicitly available on the decoder side.

然而，在一位元率限制下操作之音頻編解碼器可以修改該頻譜分佈，以便在該解碼器側上所測量之L及R能量不同於在該編碼器側上之數值。依據本發亦可針對從兩個聲道重建5.1聲道時之情況藉由傳送下列參數以使對該重建聲道之能量分佈的影響消失：However, an audio codec operating at a one-bit rate limit can modify the spectral distribution such that the measured L and R energies on the decoder side are different from the values on the encoder side. According to the present invention, the effect of the energy distribution on the reconstructed channel disappears by transmitting the following parameters for the case of reconstructing 5.1 channels from two channels:

如果提供發信手段，則該解碼器可使用不同參數組編碼目前信號區段及選擇用以對所要處理之特定信號區段提供最低負擔之IID參數。該右前與後聲道間之能量電平可能係相似的，以及該前與後左聲道間之能量電平可能係相似的，然而在該右前與後聲道中之電平係顯著不同的。假設有參數之差量編碼(delta coding)及隨後熵編碼(entropy coding)，則使用參數q₃ 及q₄ 以取代r₃ 及r₄ 係更有效的。對於另一具有不同特性之信號區段而言，一不同參數組可以提供一較低位元率負擔。本發明允許自由地在不同參數表示間做切換，以便最小化該目前已編碼信號區段之位元率負擔，其中該信號區段之特性係已知的。切換於該等IID參數之不同參數化間以便獲得最低可能位元率負擔及提供發信手段以表示目前使用什麼參數化的能力係本發明之基本特徵。If a means of signaling is provided, the decoder can encode the current signal segment using different parameter sets and select an IID parameter to provide the lowest burden on the particular signal segment to be processed. The energy levels between the right front and rear channels may be similar, and the energy levels between the front and rear left channels may be similar, however the levels in the right front and back channels are significantly different. . Assuming that there is delta coding of the parameters and subsequent entropy coding, it is more efficient to use the parameters q ₃ and q ₄ instead of the r ₃ and r ₄ systems. For another signal segment with different characteristics, a different set of parameters can provide a lower bit rate burden. The present invention allows for free switching between different parameter representations in order to minimize the bit rate burden of the currently encoded signal segment, wherein the characteristics of the signal segment are known. The ability to switch between different parameterizations of the IID parameters to obtain the lowest possible bit rate burden and to provide means of signaling to indicate what parameterization is currently used is an essential feature of the present invention.

再者，可在頻率方向或在時間方向完成該等參數之差量編碼，以及完成不同參數間之差量編碼。依據本發明，假設提供發信手段以表示所使用之特定差量編碼，則可對一參數相對於任何其它參數實施差量編碼。Furthermore, the difference encoding of the parameters can be done in the frequency direction or in the time direction, and the difference encoding between the different parameters can be completed. In accordance with the present invention, assuming that a means of signaling is provided to indicate the particular delta encoding used, a parameter can be differentially encoded with respect to any other parameter.

任何編碼架構之一重要特徵係實施可調編碼之能力。此意味著可將該已編碼位元流分割成幾個不同層。核心層可由本身來解碼，以及可解碼較高層以增強該已解碼核心層信號。對於不同情況而言，可獲得層之數目可以是變化的，然而只要該核心層係可獲得的，該解碼器可產生輸出樣本。使用該r₁ 至r₅ 參數之上面所概述的多聲道編碼之參數化本身相當適合於可調式編碼。因此，可將例如該兩個環繞聲道(A及E)之資料儲存在一增強層(亦即，該等參數r₃ 及r₄ 及在一核心層中對應於該等前聲道之參數(由參數r₁ 及r₂ 所表示))中。An important feature of any coding architecture is the ability to implement tunable coding. This means that the encoded bit stream can be split into several different layers. The core layer can be decoded by itself, and the higher layer can be decoded to enhance the decoded core layer signal. The number of available layers may vary for different situations, but as long as the core layer is available, the decoder can produce output samples. The parameterization of the multi-channel coding outlined above using the r ₁ to r ₅ parameters is itself well suited for tunable coding. Therefore, for example, the data of the two surround channels (A and E) can be stored in an enhancement layer (that is, the parameters r ₃ and r ₄ and the parameters corresponding to the front channels in a core layer) (indicated by parameters r ₁ and r ₂ )).

在第10圖中，概述依據本發明之可調位元流實施。該等位元流層係以1001及1002來描述，其中1001係該核心層，其持有該波形編碼下行混音信號及持有用以重建該等前聲道(102、103及104)之參數r₁ 及r₂ 。1002所描述之增強層持有用以重建該等後聲道(101及105)之參數。In Fig. 10, an implementation of an adjustable bit stream in accordance with the present invention is outlined. The bit stream layer is described by 1001 and 1002, wherein 1001 is the core layer, which holds the waveform encoded downmix signal and holds the front channel (102, 103 and 104) for reconstruction. Parameters r ₁ and r ₂ . The enhancement layer described in 1002 holds parameters for reconstructing the back channels (101 and 105).

本發明之另一動要觀點係在一多聲道組態中使用解相關器。在PCT/SE02/01372專利文件中已針對一個或兩個聲道情況詳細一解相關器之使用的觀點。然而，當將此理論擴充至多於兩個聲道時，會產生本發明所要解決之數個問題。Another important aspect of the present invention is the use of a decorrelator in a multi-channel configuration. The idea of the use of a correlator has been detailed in the PCT/SE02/01372 patent document for one or two channel cases. However, when this theory is extended to more than two channels, several problems to be solved by the present invention arise.

基本數學顯示：為了從N個信號完成M個相互解相關信號，需要M-N個解相關器，其中所有不同解相關器用以從一共同輸入信號產生複數個相互正交輸出信號。假設一輸入x(t)產生一輸出y(t)且及幾乎使交互相關E [xy ^* ]消失，則一解相關器通常是一全通或幾乎全通濾波器。另外的知覺準則可獲得一良好解相關器之設計，設計方法之一些範例在加入該原始信號至該解相關信號時亦可最小化梳形濾波器特性及最小化在暫態信號上之一有時太長之脈衝響應的效應。一些習知技藝解相關器使用一人造反射鏡來解相關。習知技藝亦可藉由例如修改複雜子頻帶樣本之相位以包括分數延遲，進而達到較高回聲密度及因而完成更長時間之擴散。Basic Mathematical Display: In order to complete M mutually decorrelated signals from N signals, MN decorrelators are required, with all of the different decorrelators used to generate a plurality of mutually orthogonal output signals from a common input signal. Suppose an input x(t) produces an output y(t) and And almost the interaction correlation E [ xy ^* ] disappears, then a decorrelator is usually an all-pass or almost all-pass filter. Additional perceptual criteria can be used to obtain a good decorrelator design. Some examples of design methods can minimize the comb filter characteristics and minimize one of the transient signals when adding the original signal to the decorrelated signal. The effect of an impulse response that is too long. Some conventional art decorators use an artificial mirror to decorrelate. Conventional techniques can also achieve higher echo densities and thus longer diffusions by, for example, modifying the phase of complex sub-band samples to include fractional delays.

本發明提出用以修改一以反射鏡為主之解相關器以便達到多個可從一共同輸入信號產生複數個相互解相關輸出信號之解相關器的方法。如果兩個解相關器之輸出y₁ (t)及y₂ (t)具有消失或幾乎消失之交互相關(假設有相同輸入)，則使該兩個解相關器相互地解相關。假設該輸入係靜態白雜訊，則接著在E []消失或幾乎消失之感知中該等脈衝響h₁ 及h₂ 必須是正交的。複數組之成對相互解相關解相關器可以數個方式來建構。實施此修改之一有效方式係改變相位旋轉因數q(為該分數延遲之部分)。The present invention proposes a method for modifying a mirror-based decorrelator to achieve a plurality of decorrelators that can generate a plurality of mutually decorrelated output signals from a common input signal. If the outputs y ₁ (t) and y ₂ (t) of the two decorrelators have an alternating correlation that disappears or almost disappears (assuming the same input), then the two decorrelators are decorrelated to each other. Assuming that the input is static white noise, then on E [ The impulses h ₁ and h ₂ must be orthogonal in the perception of disappearance or almost disappearance. The pairwise de-correlation decorrelator of a complex array can be constructed in several ways. One effective way to implement this modification is to change the phase rotation factor q (which is part of the fractional delay).

本發明特定相位旋轉因數可以是在該等全通濾波器中之延遲線的部分或剛好是一總分數延遲。在該後者情況中，此方法並非局限於全通或反射鏡式濾波器，然而亦可應用至例如包括一分數延遲部之簡單延遲。可在一Z-域中將該解相關器中之一全通濾波器連結描述成為：The particular phase rotation factor of the present invention may be part of the delay line in the all-pass filter or just a total fractional delay. In this latter case, the method is not limited to all-pass or mirror filters, but can also be applied to, for example, a simple delay including a fractional delay. One of the all-pass filter connections in the decorrelator can be described in a Z-domain as:

其中q係複數相位旋轉因數(|q |=1)，m係在樣本中之延遲線長度，以及a係濾波器係數。其於穩定理由，該濾波器係數之大小必須限制在|a |<1。然而，藉由使用替代濾波器係數a'=-a，以界定一新反射鏡，其具有相同反射延遲特性，然而具有一與該未修改反射鏡之輸出顯著不相關之輸出。再者，該相位旋轉因數q之修改可藉由例如加入一固定相位偏移q'=qe^jC 來完成。該常數C可用以做為一固定相位偏移或可以下列方式來調整：針對所有被施加有該常數C之頻帶而言，該常數C將對應於一固定時間偏移。該相位偏移常數C亦可以是一隨機值，其對於所有頻帶而言係不同的。Where q is the complex phase rotation factor (| q |=1), m is the delay line length in the sample, and the a-line filter coefficient. For stability reasons, the size of the filter coefficient must be limited to | a |<1. However, by using the alternative filter coefficients a' = -a to define a new mirror that has the same reflection delay characteristics, however, has an output that is significantly uncorrelated with the output of the unmodified mirror. Furthermore, the modification of the phase rotation factor q can be accomplished by, for example, adding a fixed phase offset q'=qe ^jC . This constant C can be used as a fixed phase offset or can be adjusted in such a way that for all frequency bands to which the constant C is applied, the constant C will correspond to a fixed time offset. The phase offset constant C can also be a random value that is different for all frequency bands.

依據本發明，藉由將一具有n×(m+p)大小之上行混音矩陣H應用至一具有(m+p)×1大小之行向量信號，以實施從m個聲道產生n個聲道。According to the present invention, by applying an upstream mixing matrix H having an n×(m+p) size to a row vector signal having a size of (m+p)×1, n generations are generated from m channels. Channel.

其中m係m個已下行混音及編碼信號，以及使在s中之p信號兩者相互地解相關及與在m中之所有信號解相關。這些解相關信號係藉由解相關器由在m中之信號所產生。然後，使n個重建信號a'、b'、...包含在該行向量中。Where m is the m downmixed and encoded signals, and the p signals in s are mutually decorrelated and decoupled from all signals in m. These decorrelated signals are generated by the signal in m by the decorrelator. Then, n reconstruction signals a', b', ... are included in the row vector.

x'=Hy。x'=Hy.

藉由第11圖來描述上述情況，其中該等解相關信號係由該等解相關器1102、1103及1104所產生。該上行混音矩陣H係由1101所提供，用以對該向量y操作以提供該輸出信號x'。The above is described by Figure 11, wherein the decorrelated signals are generated by the decorrelators 1102, 1103, and 1104. The upstream mixing matrix H is provided by 1101 to operate on the vector y to provide the output signal x'.

假設R=E[xx^* ]為該原始信號向量之相關矩陣，假設R'=E[x'x'^* ]為該重建信號之相關矩陣。在此及在下面中，對於一具有複數項之向量X的矩陣而言，X^* 表示伴隨矩陣---X之複數共軛轉置。Let R = E[xx ^* ] be the correlation matrix of the original signal vector, assuming R' = E[x'x' ^* ] is the correlation matrix of the reconstructed signal. Here and in the following, for a matrix having a vector X of complex terms, X ^* represents a complex conjugate transpose of the adjoint matrix --X.

R之對角線包含該等能量值A、B、C...及可由上面所界定之能量定額解碼成一總能量電平。因為R^* =R，所以只有n(n-1)/2個不同非對角線交互相關值，其包含將藉由調整該上行混音矩陣H來完全地或部分地重建之資訊。該完整相關結構之重建對應於該情況R'=R。正確能量電平之重建僅對應於下列情況，其中R'及R在對角線上係相等的。The diagonal of R contains the energy values A, B, C... and can be decoded into a total energy level by the energy rating defined above. Since R ^* = R, there are only n(n-1)/2 different non-diagonal cross-correlation values, which contain information that will be completely or partially reconstructed by adjusting the upstream mix matrix H. The reconstruction of the complete correlation structure corresponds to the case R'=R. The reconstruction of the correct energy level corresponds only to the case where R' and R are equal on the diagonal.

在從m=1聲道成為n聲道之情況中，藉由使用p=n-1個相互解相關解相關器(一上行混音矩陣H)達成該完整相關結構之重建，其中該上行混音矩陣H滿足下列條件：In the case of changing from m = 1 channel to n channel, the reconstruction of the complete correlation structure is achieved by using p = n - 1 mutual decorrelation decorrelator (an upstream mixing matrix H), wherein the upmix The tone matrix H satisfies the following conditions:

其中M係該單傳輸信號之能量。因為R係正半定矩陣，所以已熟知現在一個解答。再者，針對H之設計保留n(n-1)/2自由度，其係使用於本發明中以獲得該上行混音矩陣之另外期望特性。一中心設計準則為H對該傳輸相關資料之相依性應該是平順的。Where M is the energy of the single transmitted signal. Because R is a positive semi-definite matrix, it is well known to be an answer now. Again, n(n-1)/2 degrees of freedom are reserved for the design of H, which is used in the present invention to obtain additional desirable characteristics of the upstream mixing matrix. A central design criterion for H should be responsive to the transmission-related data.

參數化該上行混音矩陣之一傳統方式為H=UDV，其中U及V係正交矩陣以及D係一對角矩陣。可選擇D之絕對值的平方等於R/M之特徵值。刪去V及挑選該等特徵值以便將最大值應用至第一座標將最小化在該輸出中之解相關信號的總能量。在實數情況中該正交矩陣U係藉由n(n-1)/2旋轉角度來參數化。傳送在那些角度之形式中的相關資料及D之n個對角值將立即提供H之期望平順相依性。然而，因為能量資料必須被變換成特徵值，所以此方法犧牲可調能力。One of the conventional ways of parameterizing the upstream mixing matrix is H=UDV, where U and V are orthogonal matrices and D is a pair of angular matrices. The square of the absolute value of D can be selected to be equal to the eigenvalue of R/M. Deleting V and selecting the eigenvalues to apply the maximum value to the first coordinate will minimize the total energy of the decorrelated signal in the output. In the real case, the orthogonal matrix U is parameterized by an n(n-1)/2 rotation angle. Transmitting the relevant data in the form of those angles and the n diagonal values of D will immediately provide the desired smooth compliance of H. However, because the energy data must be transformed into eigenvalues, this approach sacrifices the ability to adjust.

本發明所教示之第二方法係藉由以R=GR₀ G來界定一正規化相關矩陣R₀ 以使在R中之能量部與相關部分離，其中G係一具有等於R之對角項的平方根之對角值(亦即，、...)的對角矩陣，R₀ 在對角線上具有相同對角值。假設H0係一正交上行混行矩陣，其在同等能量之完全無關信號的情況中界定較佳正規化上行混音。此較佳上行混音矩陣之範例為：The second method taught by the present invention defines a normalized correlation matrix R ₀ by R = GR ₀ G to separate the energy portion in R from the correlation portion, wherein the G system has a diagonal term equal to R The diagonal value of the square root (ie, , The diagonal matrix of ...), R ₀ has the same diagonal value on the diagonal. It is assumed that H0 is an orthogonal up-mixing matrix that defines a better normalized upstream mix in the case of a completely unrelated signal of equal energy. An example of this preferred upstream mixing matrix is:

然後，以H =GSH ₀ /來界定上行混音，其中該矩陣S解出SS^* =R₀ 。選擇此解答對在R₀ 中之正規化交互相關值的相依性為連續的，以便在R₀ =I之情況中S等於單位矩陣。Then, with H = GSH ₀ / Defined upstream mixing, a solution wherein the matrix S SS ^* = R _0. This dependency on the selected answers normalized cross-correlation values in R ₀ is in the continuous matrix S is equal to in the case of the R ₀ = I.

將該n個聲道分割成較少聲道之群係一種重建部分交互相關結構之合宜方式。依據本發明，對於從1聲道重建5.1聲道之情況而言，一特別有利編組為{a,e},{c},{b,d},{f}，其中沒有解相關應用至該等群{c}及{f}，以及該等群{a,e}及{b,d}係藉由相同下行混音/解相關對之上行混音所產生。對於這兩個子系統而言，選擇在完全未相關情況中之較佳正規化上行混音分別成為：Segmenting the n channels into groups of fewer channels is a convenient way to reconstruct a portion of the cross-correlation structure. According to the present invention, for the case of reconstructing 5.1 channels from 1 channel, a particularly advantageous grouping is {a, e}, {c}, {b, d}, {f}, where no decorrelation is applied to the The equal groups {c} and {f}, and the groups {a, e} and {b, d} are generated by the same downmixing/de-correlated pair of upstream mixes. For both subsystems, the preferred normalized upstream mix in the completely unrelated case is selected as:

因此，將只傳送及重建15個交互相關之總數中的兩個，亦即，在聲道{a,e}與{b,d}間之交互相關。在上述所使用之術語中，此對於n=6、m=1及p=1之情況而言是設計上的一個範例。該上行混音矩陣H係6×2之大小且在第3及第6列上的第2行中之對應於輸出c'及f'的兩個項為零。Therefore, only two of the 15 cross-correlation totals will be transmitted and reconstructed, that is, the interaction between the channels {a, e} and {b, d}. Among the terms used above, this is an example of design for the case of n=6, m=1, and p=1. The upstream mixing matrix H is 6 x 2 in size and the two terms corresponding to the outputs c' and f' in the second row on the third and sixth columns are zero.

本發明所教示之用以併入解相關信號的第三方法係一較簡單觀點：每一輸出聲道具有一不同解相關器，以造成解相關信號s_a 、s_b ...。然後使該等重建信號成為：A third method for incorporating a decorrelated signal as taught by the present invention is a simpler view: each output channel has a different decorrelator to cause decorrelated signals s _a , s _b .... Then make the reconstruction signals into:

等等。and many more.

該等參數φ_a 、φ_b ...控制在輸出聲道a'、b'...中所呈現之解相關信號的數量。該相關資料係以這些角度之形式來傳送。可易於計算：在例如聲道a'與b'間之結果正規化交互相關係等於乘積cosφ_a cosφ_b 。當成對交互相關之數目為n(n-1)/2及具有n個解相關器時，如果n>3，則通常不可能以此方法來匹配一特定相關結構，然而優點是一非常簡單且穩定解碼方法及對在每一輸出聲道中所呈現之解相關信號的所產生數量之直接控制。此能使解相關信號之混合係根據併入有例如聲道對之能量電平差的感知準則。The parameters φ _a , φ _b ... control the number of decorrelated signals presented in the output channels a', b'.... This related information is transmitted in the form of these angles. It can be easily calculated that the normalized interactive phase relationship between, for example, the channels a' and b' is equal to the product cosφ _a cosφ _b . When the number of pairwise interaction correlations is n(n-1)/2 and there are n decorrelators, if n>3, it is usually impossible to match a specific correlation structure in this way, but the advantage is that it is very simple and A stable decoding method and direct control of the number of generated decorrelation signals presented in each output channel. This enables the mixture of decorrelated signals to be based on a perceptual criterion incorporating energy level differences such as channel pairs.

對於從m>1聲道重建n聲道之情況而言，不再將相關矩陣R_y =E[yy^* ]假設為對角矩陣，以及必須考慮到R'=HR_y H^* 對該目標R之匹配。因為Ry具有分塊矩陣結構For the case of reconstructing the n channel from m>1 channel, the correlation matrix R _y =E[yy ^* ] is no longer assumed to be a diagonal matrix, and R'=HR _y H ^* must be considered for the target R Match. Because Ry has a block matrix structure

所以產生簡化，其中R_m =E[mm^* ]及R_s =E[ss^* ]。再者，假設為相互解相關解相關器，該矩陣R_s 為對角矩陣。注意到此亦會影響有關於正確能量之重建的上行混音設計。解決方法係要在該解碼器中計算或從編碼器傳送有關於該等下行混音信號之相關結構R_m 的資訊。So a simplification is produced, where R _m = E[mm ^* ] and R _s = E[ss ^* ]. Furthermore, it is assumed that the decorrelator is mutually de-correlated, and the matrix R _s is a diagonal matrix. Note that this also affects the upstream mix design with respect to the reconstruction of the correct energy. The solution is to calculate or transmit from the encoder information about the correlation structure R _m of the downstream mix signals in the decoder.

對於從2聲道重建5.1聲道之情況，上行混音之較佳方法為：For the case of reconstructing 5.1 channels from 2 channels, the preferred method of upstream mixing is:

其中s₁ 可從m₁ =l_d 之解相關來獲得及s₂ 可從m₂ =r_d 之解相關來獲得。Where s ₁ can be obtained from the decorrelation of m ₁ = l _d and s ₂ can be obtained from the decorrelation of m ₂ = r _d .

在此，將該等群{a,b}及{d,e}視為已考量成對交互相關之分離1→2聲道系統。對於聲道c及f而言，調整加權，以便Here, the groups {a, b} and {d, e} are regarded as separate 1→2 channel systems that have been considered to be pairwise interactive. For channels c and f, adjust the weighting so that

E |h ₃₁ m ₁ +h ₃₂ m ₂ |² =C , E | h ₃₁ m ₁ + h ₃₂ m ₂ | ² = C ,

E |h ₆₁ m ₁ +h ₆₂ m ₂ |² =F 。 E | h ₆₁ m ₁ + h ₆₂ m ₂ | ² = F .

本發明可針對各種用於類比或數位信號之儲存或傳輸的使用任意編解碼器之系統實施在硬體晶片及DSP中。第2圖及第3圖顯示本發明之可能實施。在此範例中，顯示一用以操作6個輸入信號之系統(一5.1聲道組態)。在顯示該編碼器側之第2圖中，將該等分離聲道之類比輸入信號轉換成為數位信號201及使用每一聲道之濾波器組來分析202。將該濾波器組之輸出饋入該環繞編碼器203，該環繞編碼器203包括一參數產生器，其實施一下行混音以產生由該音頻編碼器205所編碼之一個或二個聲道。再者，依據本發明擷取像IID及ICC參數之環繞參數，以及依據本發明擷取用以概述資料之時間頻率格(time frequency grid)及哪一個參數化被使用的控制資料204。如本發明所教示，編碼該等擷取參數206，以切換於不同參數化之間或以可調方式配置該等參數。將該等環繞參數207、控制信號及編號下行混音信號208多工處理209成為一串列位元流。The present invention can be implemented in hardware chips and DSPs for a variety of systems using arbitrary codecs for the storage or transmission of analog or digital signals. Figures 2 and 3 show possible implementations of the invention. In this example, a system for operating six input signals (a 5.1 channel configuration) is shown. In the second diagram showing the encoder side, the analog input signals, such as separate channels, are converted into digital signals 201 and analyzed 202 using a filter bank for each channel. The output of the filter bank is fed to the surround encoder 203, which includes a parameter generator that performs a line mix to produce one or two channels encoded by the audio encoder 205. Furthermore, in accordance with the present invention, surround parameters such as IID and ICC parameters are retrieved, and a time frequency grid for summarizing the data and which parameterized control data 204 are used in accordance with the present invention are retrieved. As taught by the present invention, the capture parameters 206 are encoded to switch between different parameterizations or to configure the parameters in an adjustable manner. The surround parameters 207, control signals, and numbered downmix signal 208 multiplex processing 209 are a series of bitstreams.

在第3圖中，顯示一典型解碼器實施(亦即，一用以產生多聲道重建之裝置)。在此，假設該音頻解碼器以一頻域表示法輸出一信號，例如：在QMF合成濾波器組前之MPEG-4高效率AAC解碼器的輸出。對該串列位元流實施解多工處理301及將該編碼環繞資料饋入該環繞資料解碼器303及將該等下行混音編碼聲道饋入該音頻解碼器302(在此範例中為MPEG-4高效率AAC解碼器)。該環繞資料解碼器解碼該環繞資料及將其饋入該環繞解碼器305，該環繞解碼器305包括一上行混音器，其依據該解碼下行混音聲道及該環繞資料與該等控制信號以重建6個聲道。合成306該環繞解碼器之頻域輸出以成為時域信號，接著將該等時域信號藉由DAC 307轉換成為類比信號。In Figure 3, a typical decoder implementation (i.e., a means for generating multi-channel reconstruction) is shown. Here, it is assumed that the audio decoder outputs a signal in a frequency domain representation, for example, the output of an MPEG-4 high efficiency AAC decoder in front of the QMF synthesis filter bank. Performing a demultiplexing process 301 on the serial bit stream and feeding the encoded surround data into the surround data decoder 303 and feeding the downlink mixed code channels into the audio decoder 302 (in this example, MPEG-4 high efficiency AAC decoder). The surround data decoder decodes the surround data and feeds it into the surround decoder 305. The surround decoder 305 includes an upstream mixer, according to the decoded downlink mix channel and the surround data and the control signals. To reconstruct 6 channels. The frequency domain output of the surround decoder is synthesized 306 to become a time domain signal, which is then converted to an analog signal by the DAC 307.

雖然主要已描述有關於平衡參數之產生及使用的本發明，但是在此要強調用以獲得平衡參數之聲道對的相同編組最好亦是用以計算聲道間同調參數或這兩個聲道對間之「寬度」參數。此外，使用相同於該平衡參數計算所用之聲道對亦可獲得聲道間時間差或一種「相位信號」。在接收器側上，亦可使用除該等平衡參數之外或做為該等平衡參數之替代的這些參數，以產生一多聲道重建。在另一情況中，除其它參考聲道所決定之其它聲道間電平差之外，還可使用該等聲道間同調參數或甚至該等聲道間時間差。然而，有鑑於如第10a圖及第10b圖所述之本發明的可調能力特徵，最好對所有參數使用相同聲道對，以便在一可調位元流中每一調整層包括用以重建該子群之輸出聲道的所有參數，其中該子群之輸出聲道可藉由在第10b圖之列表的倒數第二行中所概述之個別調整層來產生。本發明在只計算在個別聲道對間之同調參數或時間差參數及將其傳送至一解碼器時係有用的。在此情況中，當實施一多聲道重建時，該等電平參數已存在於該解碼器以供使用。Although the invention has been described primarily with respect to the generation and use of balance parameters, it is emphasized here that the same grouping of channel pairs for obtaining balanced parameters is preferably used to calculate inter-channel coherence parameters or both. The "width" parameter between the pairs. In addition, an inter-channel time difference or a "phase signal" can also be obtained using the same pair of channels used for the calculation of the balance parameter. On the receiver side, these parameters can be used in addition to or as an alternative to the equalization parameters to produce a multi-channel reconstruction. In another case, the inter-channel coherence parameters or even the inter-channel time differences may be used in addition to other inter-channel level differences as determined by other reference channels. However, in view of the adjustable capability features of the present invention as described in Figures 10a and 10b, it is preferred to use the same pair of channels for all parameters so that each adjustment layer is included in an adjustable bit stream. All parameters of the output channel of the subgroup are reconstructed, wherein the output channels of the subgroup are generated by the individual adjustment layers outlined in the penultimate row of the list in Figure 10b. The present invention is useful in calculating only the coherence parameters or time difference parameters between individual channel pairs and transmitting them to a decoder. In this case, when a multi-channel reconstruction is implemented, the level parameters are already present at the decoder for use.

可依據本發明方法之某些實施需求，以硬體或軟體方式實施本發明方法。可使用一數位儲存媒體(特別是儲存有電子可讀取控制信號之磁碟或光碟)來實施，該等電子可讀取控制信號與一可程式電腦系統配合，以便實施本發明方法。因此，本發明通常係一具有儲存在一機械可讀取載體中之程式碼的電腦程式產品，當該電腦程式產品在一電腦上執行時，該程式碼係操作用以實施本發明方法。因此，換句話說，本發明方法係一具有程式碼之電腦程式，該程式碼用以在該電腦程式在一電腦上執行時實施本發明方法中之至少一方法。The process of the invention may be carried out in a hard or soft manner in accordance with certain embodiments of the method of the invention. It can be implemented using a digital storage medium, particularly a magnetic disk or optical disk storing electronically readable control signals, which cooperate with a programmable computer system to carry out the method of the present invention. Accordingly, the present invention is generally a computer program product having a code stored in a mechanically readable carrier, the program being operative to carry out the method of the present invention when the computer program product is executed on a computer. Thus, in other words, the method of the present invention is a computer program having a program code for performing at least one of the methods of the present invention when the computer program is executed on a computer.

101．．．左環繞聲道101. . . Left surround channel

102．．．左前聲道102. . . Left front channel

103．．．中央聲道103. . . Central channel

104．．．右前聲道104. . . Right front channel

105．．．右環繞聲道105. . . Right surround channel

106．．．LEF聲道106. . . LEF channel

201．．．ADC201. . . ADC

202．．．分析濾波器組202. . . Analysis filter bank

203．．．環繞編碼器203. . . Surround encoder

204．．．控制信號204. . . control signal

205．．．音頻編碼器205. . . Audio encoder

206．．．環繞資料編碼器206. . . Surround data encoder

207．．．環繞參數207. . . Surround parameter

208．．．編號下行混音信號208. . . Numbered downmix signal

209．．．多工器209. . . Multiplexer

301．．．解多工器301. . . Demultiplexer

302．．．音頻解碼器302. . . Audio decoder

303．．．環繞資料解碼器303. . . Surround data decoder

304．．．控制信號304. . . control signal

305．．．環繞解碼器305. . . Surround decoder

306．．．合成濾波器組306. . . Synthesis filter bank

307．．．DAC307. . . DAC

700．．．下行混音器700. . . Downstream mixer

900．．．電平參數計算器900. . . Level parameter calculator

902．．．電平校正器902. . . Level corrector

1001．．．位元流層1001. . . Bit stream layer

1002．．．位元流層1002. . . Bit stream layer

1101．．．解相關器1101. . . Decomposer

1103．．．解相關器1103. . . Decomposer

1104．．．解相關器1104. . . Decomposer

l_d ,r_d ．．．最佳立體聲道l _d , r _d . . . Best stereo channel

m,l,n．．．單聲道m,l,n. . . Mono

r₁ ．．．參數r ₁ . . . parameter

r₂ ．．．參數r ₂ . . . parameter

r₃ ．．．參數r ₃ . . . parameter

r₄ ．．．參數r ₄ . . . parameter

r₅ ．．．參數r ₅ . . . parameter

r_M ．．．額外參數r _M . . . Additional parameters

α．．．加權因數α. . . Weighting factor

β．．．加權因數β. . . Weighting factor

γ．．．加權因數γ. . . Weighting factor

δ．．．加權因數δ. . . Weighting factor

A．．．聲道A. . . Channel

B．．．聲道B. . . Channel

C．．．聲道C. . . Channel

D．．．聲道D. . . Channel

E．．．聲道E. . . Channel

E_MD ．．．主下行混音E _MD . . . Main downmix

E_orig ．．．原始聲道之能量E _orig . . . Original channel energy

E_PD ．．．參數下行混音E _PD . . . Parametric downmix

F．．．重建聲道之能量F. . . Rebuilding the energy of the channel

第1圖描述在本發明中之一5.1聲道組態所使用的學術用語；Figure 1 depicts the academic terms used in one of the 5.1 channel configurations of the present invention;

第2圖描述本發明之一較佳實施例的一合適編碼器實施；Figure 2 depicts a suitable encoder implementation of one preferred embodiment of the present invention;

第3圖描述本發明之一較佳實施例的一合適解碼器實施；Figure 3 depicts a suitable decoder implementation of one preferred embodiment of the present invention;

第4圖描述依據本發明之多聲道信號的一較佳參數化；Figure 4 depicts a preferred parameterization of a multi-channel signal in accordance with the present invention;

第5圖描述依據本發明之多聲道信號的一較佳參數化；Figure 5 depicts a preferred parameterization of a multi-channel signal in accordance with the present invention;

第6圖描述依據本發明之多聲道信號的一較佳參數化；Figure 6 depicts a preferred parameterization of a multi-channel signal in accordance with the present invention;

第7圖描述一用以產生一單一基本聲道或兩個基本聲道之下行混音架構的示意裝設；Figure 7 depicts a schematic setup for generating a single base channel or two basic channel sub-mixing architectures;

第8圖描述一上行混音架構之示意表示，該上行混音架構係依據本發明平衡參數及該下行混音架構之資訊；Figure 8 depicts a schematic representation of an upstream mixing architecture that balances parameters and information about the downstream mixing architecture in accordance with the present invention;

第9a圖剛要性地描述依據本發明在該編碼器側上之電平參數的決定；Figure 9a is a diagrammatic description of the decision of the level parameter on the encoder side in accordance with the present invention;

第9b圖剛要性地描述依據本發明在該解碼器側上之電平參數的使用；Figure 9b is a diagrammatic description of the use of level parameters on the decoder side in accordance with the present invention;

第10a圖描述一在位元流之不同層中具有該多聲道參數化之不同部分的可調式位元流；Figure 10a depicts an adjustable bit stream having different portions of the multi-channel parameterization in different layers of the bitstream;

第10b圖描述一可調能力表，其表示使用哪些平衡參數來建構哪些聲道及不使用及計算哪些平衡參數及聲道；以及第11圖描述依據本發明之上行混音矩陣的應用。Figure 10b depicts an adjustable capability table indicating which balance parameters are used to construct which channels and which balance parameters and channels are not used and calculated; and Figure 11 depicts the application of the upstream mix matrix in accordance with the present invention.

902．．．電平校正器902. . . Level corrector

m,l,n．．．單聲道m,l,n. . . Mono

r_M ．．．額外參數r _M . . . Additional parameters

Claims

A device for generating a level parameter in a parameter representation of a multi-channel signal, the multi-channel signal having a plurality of original channels, the parameter representation comprising a parameter group, the parameter group being at least The multi-channel reconstruction is allowed when the mixing channel is used together. The device comprises: a level parameter calculator (900) for calculating a level parameter (r _M ), the level parameter is in a main downmix a level difference between a tone and a parameter downmix, wherein the parameter indicates a downmix according to the parameter; an output interface for generating an output data, the output data including the level parameter and the parameter group or the a flat parameter and the at least one lower mixing channel, and a parameter generator formed to generate the parameter set, wherein a left/right balance parameter is used as a first balance parameter, a central balance The parameter is used as a second balance parameter, a front/back balance parameter as a third balance parameter, a rear-left/right balance parameter as a fourth balance parameter, and a low frequency enhancement balance parameter as a fifth Balance parameters.

The apparatus of claim 1, wherein the parameter comprises a parameter group for each of a plurality of frequency bands of the at least one of the next mixing channels, and wherein the parameter calculator (900) operates A level parameter is calculated for each of the bands.

The device of claim 1, wherein the parameter representation comprises a parameter set for one of a continuous period of one of the at least one of the next mixing channels, and Wherein the level parameter calculator (900) is operative to calculate a level parameter for each of a continuous period of one of the at least one downstream mixing channel.

The apparatus of claim 1, wherein the output interface is operative to generate an adjustable data stream, the adjustable data stream including parameters of the first subgroup of the parameter set in a lower adjustment layer, the A subgroup parameter allows reconstruction of an output channel of the first subgroup, the adjustable data stream including parameters of a second subgroup of the parameter set in a higher adjustment layer, the second subgroup and the first The subgroups together allow reconstruction of the output channels of the second subgroup, and wherein the output interface is further manipulated to cause the level parameter to enter the lower adjustment layer.

A device for reconstructing a multi-channel representation using a parameter representation to generate one of a raw multi-channel signal having at least three original channels, the parameter representation having a parameter set, the parameter set being A multi-channel reconstruction is allowed when at least the next mixing channel is used together. The parameter representation includes a level parameter that is a level difference between a primary downmix and a parametric downmix, wherein The parameter indicates that the downmix is based on the parameter, and the device includes: a level corrector (902) for using the level parameter to apply level correction of a single basic channel or a plurality of basic channels, using The level parameter weights the single base channel or the plurality of base channels so that an upmixing can be performed by using parameters in the parameter set to obtain a corrected multi-channel reconstruction.

A method of generating a level parameter in a parameter representation of a multi-channel signal, the multi-channel signal having a plurality of original channels, the parameter representation comprising a parameter group, the parameter group being mixed with at least the next line A multi-channel reconstruction is allowed when the channels are used together, the method comprising: calculating (900) a level parameter (r _M ), the level parameter being a level between a main down mix and a parameter down mix a difference, wherein the parameter indicates that the downmix is based on the parameter; and generating an output data, the output data including the level parameter and the parameter set or the level parameter and the at least one lower mix channel.

A method of reconstructing a multi-channel representation using a parameter representation to produce one of a raw multi-channel signal having at least three original channels, the parameter representation having a parameter set, the parameter set being A multi-channel reconstruction is allowed when at least the next mixing channel is used together. The parameter representation includes a level parameter that is a level difference between a primary downmix and a parametric downmix, wherein The parameter representation is based on the parameter downmixing, the method comprising: using the level parameter to implement (902) level correction of a single base channel or a plurality of base channels, using the level parameter to weight the single The basic channel or a number of basic channels so that an upmixing can be performed by using the parameters in the parameter set to obtain a corrected multi-channel reconstruction.

A recording medium having a computer program with machine readable instructions, when executed on a computer, implements the method as described in claim 6 or 7.