TW201116078A - Apparatus and method for generating a level parameter, apparatus and method for generating a multi-channel representation and a storage media stored parameter representation - Google Patents

Apparatus and method for generating a level parameter, apparatus and method for generating a multi-channel representation and a storage media stored parameter representation Download PDF

Info

Publication number
TW201116078A
TW201116078A TW099130574A TW99130574A TW201116078A TW 201116078 A TW201116078 A TW 201116078A TW 099130574 A TW099130574 A TW 099130574A TW 99130574 A TW99130574 A TW 99130574A TW 201116078 A TW201116078 A TW 201116078A
Authority
TW
Taiwan
Prior art keywords
parameter
channel
channels
level
representation
Prior art date
Application number
TW099130574A
Other languages
Chinese (zh)
Other versions
TWI458365B (en
Inventor
Heiko Purnhagen
Lars Villemoes
Jonas Engdegard
Jonas Roeden
Kristofer Kjoerling
Original Assignee
Coding Tech Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/EP2005/003848 external-priority patent/WO2005101370A1/en
Application filed by Coding Tech Ab filed Critical Coding Tech Ab
Publication of TW201116078A publication Critical patent/TW201116078A/en
Application granted granted Critical
Publication of TWI458365B publication Critical patent/TWI458365B/en

Links

Landscapes

  • Stereophonic System (AREA)

Abstract

A parameter representation of a multi-channel signal having several original channels includes a parameter set, which, when used together with at least one down-mix channel allows a multi-channel reconstruction. An additional level parameter (rM) is calculated such that an energy of the at least one downmix channel weighted by the level parameter isI equal to a sum of energies of the original channels. The additional level parameter is transmitted to a multi-channel reconstructor together with the parameter set or together with a down-mix channel. An apparatus for generating a multi-channel representation uses the level parameter to correct (902) the energy of the at least one transmitted down-mix channel before entering the down-mix signal into an up-mixer or within the up-mixing process.

Description

.201116078 六、發明說明: 【發明所屬之技術領域】 本發明係有關於使用空間參數對音頻信號的多聲道表 示的編碼。本發明教示用以估計及界定可用以從一些聲道 (少於輸出聲道的數目)重建一多聲道信號之適當參數的新 方法。特別地,著重在最小化該多聲道表示之位元率及提供 該多聲道信號之一編碼表示,其能夠容易地針對所有可能聲 道配置編碼及解碼該資料。 Φ 【先前技術】 發明名稱爲「用於低位元率音頻編碼應用之有效及可調 式參數立體聲編碼」之PCT/SE02/0 1 3 72已顯示可從一單聲 道信號重建一非常相似於原始立體聲圖像之立體聲圖像(假 設具有該立體聲圖像之非常緊密表示)。基本原理係將輸入 信號分割成頻帶及時間區段,以及針對這些頻帶及時間區段 估計聲道間強度差(IID)及聲道間同調性(ICC)。第一參數係 在特定頻帶中之兩個聲道間的功率分佈之測量,以及第二參 # 數係該特定頻帶之兩個聲道間的相關性之估計。在解碼器側 上,藉由依據該IID資料在兩個輸出聲道間分配該單聲道信 號及藉由加入一解相關信號以便保持原始立體聲道之聲道 相關性,以從該單聲道信號重建該立體聲圖像。 對於一多聲道情況(在上下文中之多聲道表示兩個輸出 聲道以上)而言,必須說明幾個額外問題。現在有幾個多聲 道配置。最通常所知道的是5.1配置(中間聲道、前左/右聲 道、環繞左/右聲道及LEE聲道)。然而,亦現在有許多其它 -4- 201116078 配置。從完整的編碼器/解碼器系統觀點來說,期望具有一 可針對所有聲道配置使用相同參數組(例如:IID及ICC)或 其子組。ITU-R BS.775界定幾個下行混音架構(d〇wn-mix schemes) ’以便能從一特定聲道配置獲得—包括較少聲道之 聲道配置。取代經常必須解碼所有聲道及依據一下行混音, 期望具有一多聲道表示,其能使一接收器在解碼該等聲道前 擷取有關於手上聲道配置之參數》再者,從一可調式或內嵌 式編碼觀點而言期望有一固有可調之參數組,在該觀點中例 φ 如可將對應於該等環繞聲道之資料儲存在位元流之一加強 層中。 相反於以上所述,亦可期望能依據所處理之信號的特性 來使用不同參數界定,以便在會對所處理之目前信號區段導 致最低位元率負擔的參數化間做切換。 使用一加總信號或下行混音信號及額外參數附加資訊 之多聲道信號的另一表示係爲本技藝中所知之雙聲道信號 編碼(Binaural Cue Coding, BCC)。此技術被描述於2003年 # 11月第6期第11卷IEEE語音處理會刊之作者爲F.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to encoding of a multi-channel representation of an audio signal using spatial parameters. The present invention teaches a new method for estimating and defining appropriate parameters that can be used to reconstruct a multi-channel signal from some channels (less than the number of output channels). In particular, emphasis is placed on minimizing the bit rate of the multi-channel representation and providing an encoded representation of the multi-channel signal that can easily encode and decode the data for all possible channel configurations. Φ [Prior Art] The invention titled "Efficient and Adjustable Parametric Stereo Coding for Low Bit Rate Audio Coding Applications" PCT/SE02/0 1 3 72 has been shown to be reconstructable from a mono signal a very similar to the original Stereo image of a stereo image (assuming a very tight representation of the stereo image). The basic principle is to divide the input signal into frequency bands and time segments, and to estimate inter-channel intensity difference (IID) and inter-channel coherence (ICC) for these bands and time segments. The first parameter is a measure of the power distribution between two channels in a particular frequency band, and the second parameter is an estimate of the correlation between the two channels of that particular frequency band. On the decoder side, by assigning the mono signal between the two output channels in accordance with the IID data and by adding a decorrelated signal to maintain the channel correlation of the original stereo channel, from the mono channel The signal reconstructs the stereo image. For a multi-channel case (multiple channels in the context represent more than two output channels), several additional questions must be accounted for. There are now several multi-channel configurations. The most commonly known is the 5.1 configuration (middle channel, front left/right channel, surround left/right channel, and LEE channel). However, there are now many other -4- 201116078 configurations. From a complete encoder/decoder system perspective, it is desirable to have one or the same set of parameters (e.g., IID and ICC) or subgroups for all channel configurations. ITU-R BS.775 defines several downstream mix architectures (d〇wn-mix schemes) so that they can be obtained from a particular channel configuration - including less channel configurations. Instead of having to decode all the channels and relying on the next line of mixing, it is desirable to have a multi-channel representation that enables a receiver to retrieve parameters relating to the configuration of the hand channel before decoding the channels. It is desirable from an adjustable or inline coding perspective to have an inherently adjustable set of parameters, in which case φ can store data corresponding to the surround channels in one of the bitstream enhancement layers. Contrary to the above, it may also be desirable to use different parameter definitions depending on the characteristics of the signal being processed in order to switch between parameterizations that would result in the lowest bit rate burden for the current signal segment being processed. Another representation of a multi-channel signal using a summed or downmix signal and additional parameter additional information is known as Binaural Cue Coding (BCC). This technique was described in 2003. #十一月6期11 The author of the IEEE Speech Processing Journal is F.

Baumgarte及C. Faller的「雙聲道信號編碼·第一篇:聽覺 心理學基礎及設計原理」及2003年1 1月第6期第1 1卷IEEE 語音處理會刊之作者爲C. Faller及F. Baumgarte的「雙聲 道信號編碼-第二篇:架構及應用」中》 通常,雙聲道信號編碼係一種依據一下行混音聲道及附 加資訊的多聲道空間表示之方法。針對音頻重建或音頻提供 以一BCC編碼器所計算及以一BCC解碼器所使用之幾個參 ί S3 -5- .201116078 數包括聲道間電平差、聲道間時間差及聲道間同調參數。這 些聲道間信號係一空間圖像之感知的決定因數。這些參數係 提供給該原始多聲道信號之時間樣本的區塊及亦提供有頻 率選擇性,以便多聲道信號樣本之每一區塊對於數個頻帶而 言具有數個信號。在C播放聲道之一般情況中,在複數對聲 道間之每一子頻帶中(亦即,針對相對於一參考聲道之每一 聲道)考量該等聲道電平差及該等聲道間時間差。將一聲道 界定成對每一聲道間電平差之參考聲道。由於該等聲道間電 Φ 平差及該等聲道間時間差,因而可提供一音源至一所使用之 播放裝設的複數對揚聲器中之一對揚聲器間的任何方向。爲 了決定一已提供音源之擴散的寬度,考量所有音頻聲道之每 一子頻帶的一參數係足夠的。此參數係該聲道間同調參數。 該已提供音源之寬度係藉由修改該等子頻帶信號來控制,以 便所有可能聲道對具有相同聲道間同調參數。 在BCC編碼中,決定在該參考聲道丨與任何其它聲道 間之所有聲道間電平差。當例如決定該中央聲道爲該參考聲 • 道時,計算在該左聲道與該中央聲道間之第一聲道間電平 差、在該右左聲道與該中央聲道間之第二聲道間電平差、在 該左環繞聲道與該中央聲道間之第三聲道間電平差及在該 右環繞聲道與該中央聲道間之第四聲道間電平差。此情節描 述一 5 -聲道架構。當該5 -聲道架構額外地包括—低頻增強 型聲道(亦爲所知之「超低音喇5/\(sub-wo〇fer)」聲道)時,計 算在該低頻增強型聲道與該中央聲道(該單一參考聲道)間 之第五聲道間電平差。 I S3 -6- .201116078 當使用該單一下行混音聲道(亦稱爲「單」聲道)及傳輸 信號,例如:ICLD(聲道間電位差、ICTD(聲道間時間差)及 ICC(聲道間同調))來重建該原始多聲道時,使用這些信號來 修改該單信號之頻譜係數。使用一用以決定每一頻譜係數之 電平修改的正實數以實施該電平修改。使用一複數之大小來 決定每一頻譜係數的相位修改以產生該聲道間時間差。另一 功能決定該同調影響。藉由先計算該參考聲道之因數以計算 每一聲道之電平修改的因數。計算該參考聲道之因數,以便 B 對於每一頻率部分而言所有聲道之功率的總和相同於該合 量信號之功率。然後,依據該參考聲道之電平修改因數,使 用個別ICLD參數來計算其它聲道之電平修改因數。 因此,爲了實施BCC合成,計算該參考聲道之電平修 改因數。爲了此計算,需要一頻帶之所有ICLD參數。然後, 依據該單聲道之電平修改,可計算其它聲道(亦即,非該參 考聲道之聲道)之電平修改因數。 此方法之缺點在於:對於一完整重建而言,需要每一聲 φ 道間電平差。當出現一易出錯傳輸聲道時’此需求會造成更 大問題。因爲需要每一聲道間電平差以計算每一多聲道輸出 信號,所以在一傳送聲道間電平差內之每一錯誤將導致在該 重建多聲道信號中之錯誤。在另一情況中,雖然一聲道間電 平差僅是例如該左環繞聲道或右環繞聲道所需’但是當在傳 輸期間遺失此聲道間電平差時,則無法實施重建’其中因爲 重要資訊係包含在該前左聲道(下面稱爲左聲道)、該前石聲 道(下面稱爲右聲道)及該中央聲道中,所以該左環繞聲道 ί S3 201116078 及右環繞聲道對於多聲道重建並非是重要的。當在輸輸期間 遺失該低頻增強型聲道之聲道間電平差時,此情況變得更 糟。在此情況中,雖然該低頻增強型聲道對收聽者之收聽舒 適並非是決定性的,但是可能不會有多聲道重建或僅有一錯 誤多聲道重建。因爲,將在單一聲道間電平差中之錯誤傳播 至每一重建輸出聲道內之錯誤。 參數多聲道表示之問題在於:通常提供聲道間電平差 (例如:在BCC編碼中之ICLD或在其它參數多聲道表示中 Β 之平衡値)做爲相對値而非絕對値。在BCC中,一 ICLD參 數描述在一聲道與一參考聲道間之電平差。亦可提供平衡値 以做爲在一聲道對中之兩個聲道間的比率。當重建該多聲道 信號時,將此等電平差或平衡參數應用至一基本聲道,該基 本聲道係可以是一單聲基本聲道或一具有兩個基本聲道之 立體聲基本聲道信號。因此,在至少一基本聲道中所包含之 能量係沿著5個或6個重建輸出聲道來分配。因此’在一重 建輸出聲道中之絕對能量係由該聲道間電平差或該平衡參 • 數及在該接收器輸入上之下行混音信號的能量來決定。 « 當在該接收器輸入上之下行混音信號的能量相對於一 編碼器所輸出之一下行混音信號變化的情況出現時’將發生 電平變化。在此上下文中,強調當該等參數具有頻率選擇性 時,依據所使用之參數化架構,此電平變化不僅導致所建立 信號之一般音量變化,而且亦會導致大量的人工因素。當例 如:在頻率範圍中相較於在另外位置之一頻帶而言較常操控 該下行混音信號之某一頻帶時,因爲在該某一頻帶之輸出聲 m -8- 201116078 道中的頻率成分具有一太低或太高之電平,所以此操控在該 重建輸出信號中係明顯易見的。 此外,適時改變電平操作亦將導致該重建輸出信號之總 電平隨著時間變化及因此被認爲是一惱人的人工因素。 雖然上述情況係集中在藉由編碼、傳輸及解碼一下行混 音信號所造成之電平操作,但是亦會發生其它電平偏移。由 於在要被下行混音成一個或兩個聲道之不同聲道間的相位 相依性,因而會發生下列情況:該單聲信號具有一不等於在 Φ 該原始信號中之能量的總和。因爲通常例如藉由加入時間波 形以取樣式(sample-wise)來實施該下行混音,所以雖然左信 號與右信號當然具有某一信號能量,但是在該兩個信號間之 例如180度的相位差將導致在該下行混音信號中之兩個聲道 的完全抵消,進而導致零的能量。雖然在正常情況中,將非 常不可能發生此種情形,但是因爲所有信號當然不是完全不 相關的,所以仍然會發生能量變化。因爲該重建輸出信號之 能量將不同於該原始多聲道信號之能量,所以此等變化亦會 φ 導致在該重建輸出信號中之音量變動及亦將導致人工因素。 【發明内容】 本發明之一目的在於提供一種可造成一具有改善輸出 品質之多聲道重建的參數化觀念。 此目的係藉由依據申請專利範圍第1項之一種用以產生 電平參數之裝置、依據申請專利範圍第7項之一種用以產生 一重建多聲道表示之裝置、依據申請專利範圍第9項之一種 m -9- 201116078 用以產生電平參數之方法、依據申請專利範圍第10項之一 種用以產生一重建多聲道表示之方法、依據申請專利範圍第 11項之一種電腦程式或依據申請專利範圍第12項之一種參 數表示來達成。 本發明係依據下面之硏究結果:爲了高品質重建及有鑑 於彈性編碼/傳輸及解碼架構,將一額外電平參數與一多聲 道信號之下行混音信號或參數表示一起傳送,以致於一多聲 道重建器可一起使用此電平參數與該等電平差參數及該下 • 行混音信號,以便再生一多聲道輸出信號,而不會遭遇電平 變化或頻率選擇性電平所引起之人工因素。 依據本發明,計算該電平參數,以便以該電平參數加權 (例如:乘或除)之至少一下行混音聲道的能量等於該等原始 聲道之能量的加總。 在一實施例中,該電平參數係由在該(等)下行混音聲道 之能量與該等原始聲道之能量的加總間之比率所獲得。在此 實施例中,在該編碼器側上計算該(等)下行混音聲道與該原 • 始多聲道信號間之任何電平差及將其輸入至該資料流以做 爲一電平校正因數,該電平校正因數被視爲一額外參數,該 額外參數亦可被提供給該(等)下行混音聲道之樣本的一區 塊及被提供給某一頻帶。因此,針對每一區域及頻帶(存在 有聲道間電平差或平衡參數),加入一新電平參數。 因爲本發明允許傳送一多聲道信號之一不同於該等參 數所根據之下行混音的下行混音,所以本發明亦提供彈性。 當例如:一廣播站不希望播放一多聲道解碼器所產生之一下Baumgarte and C. Faller's "Two-Channel Signal Coding, Part I: Fundamentals and Design Principles of Auditory Psychology" and the author of the IEEE Speech Processing Journal, Issue No. 6 of January, 2003, is C. Faller and F. Baumgarte's "Two-Channel Signal Encoding - Part 2: Architecture and Applications" In general, two-channel signal coding is a method of multi-channel spatial representation based on the next line of mixing channels and additional information. For audio reconstruction or audio, the number of parameters calculated by a BCC encoder and used by a BCC decoder is 参S3 -5- .201116078. The number includes inter-channel level difference, inter-channel time difference, and inter-channel coherence. parameter. These inter-channel signals are the determining factor for the perception of a spatial image. These parameters are the blocks of time samples supplied to the original multi-channel signal and are also frequency selective such that each block of the multi-channel signal samples has several signals for several frequency bands. In the general case of C-playing channels, each of the sub-bands between the complex pairs of channels (i.e., for each channel relative to a reference channel) takes into account the level differences of the channels and such The time difference between channels. One channel is defined as a reference channel for the level difference between each channel. Due to the inter-channel power Φ adjustment and the time difference between the channels, it is possible to provide any direction from one source to one of the plurality of pairs of speakers used in a playback device. In order to determine the width of the diffusion of the supplied sound source, it is sufficient to consider a parameter of each sub-band of all audio channels. This parameter is the coherence parameter between the channels. The width of the supplied sound source is controlled by modifying the sub-band signals so that all possible channel pairs have the same inter-channel co-modulation parameters. In BCC encoding, the level difference between all channels between the reference channel and any other channel is determined. When, for example, determining that the center channel is the reference channel, calculating a level difference between the first channel between the left channel and the center channel, between the right channel and the center channel Level difference between two channels, level difference between the left channel between the left surround channel and the center channel, and level between the fourth channel between the right surround channel and the center channel difference. This episode describes a 5-channel architecture. When the 5-channel architecture additionally includes a low frequency enhanced channel (also known as a "sub-wo〇fer" channel), the low frequency enhanced channel is calculated The level difference between the fifth channel and the center channel (the single reference channel). I S3 -6- .201116078 When using this single downmix channel (also known as "single" channel) and transmitting signals, such as: ICLD (inter-channel potential difference, ICTD (inter-channel time difference) and ICC (sound) The inter-channel coherence)) When reconstructing the original multi-channel, these signals are used to modify the spectral coefficients of the single signal. A positive real number is used to determine the level modification of each spectral coefficient to implement the level modification. A complex number is used to determine the phase modification of each spectral coefficient to produce the inter-channel time difference. Another feature determines the coherence effect. The factor of the level modification of each channel is calculated by first calculating the factor of the reference channel. The factor of the reference channel is calculated such that for each frequency portion, the sum of the powers of all channels is the same as the power of the combined signal. Then, based on the level modification factor of the reference channel, the individual ICLD parameters are used to calculate the level modification factor of the other channels. Therefore, in order to perform BCC synthesis, the level correction factor of the reference channel is calculated. For this calculation, all ICLD parameters for a band are required. Then, based on the level modification of the mono, the level modification factor of the other channels (i.e., the channels other than the reference channel) can be calculated. The disadvantage of this method is that for a complete reconstruction, a level difference between each φ channel is required. This requirement poses a greater problem when an error-prone transmission channel is present. Since each level difference between channels is required to calculate each multi-channel output signal, each error within the level difference between the transmission channels will result in an error in the reconstructed multi-channel signal. In another case, although the level difference between the one channels is only required for, for example, the left surround channel or the right surround channel, 'the reconstruction cannot be performed when the inter-channel level difference is lost during transmission'. Since the important information is included in the front left channel (hereinafter referred to as the left channel), the front stone channel (hereinafter referred to as the right channel), and the center channel, the left surround channel ί S3 201116078 And right surround channels are not important for multi-channel reconstruction. This situation gets worse when the level difference between the channels of the low frequency enhanced channel is lost during the transmission. In this case, although the low frequency enhanced channel is not decisive for the listener's listening comfort, there may be no multi-channel reconstruction or only one error multi-channel reconstruction. Because the error in the level difference between the single channels is propagated to the error in each reconstructed output channel. The problem with parameter multi-channel representation is that it usually provides a level difference between channels (for example: ICLD in BCC encoding or 値 in other parameters multi-channel representation) as a relative rather than an absolute 値. In BCC, an ICLD parameter describes the level difference between a channel and a reference channel. Balance 値 can also be provided as the ratio between the two channels in a pair of channels. When the multi-channel signal is reconstructed, the level difference or balance parameter is applied to a basic channel, which may be a mono basic channel or a stereo basic sound with two basic channels. Signal. Thus, the energy contained in at least one of the basic channels is distributed along five or six reconstructed output channels. Thus, the absolute energy in a reconstructed output channel is determined by the level difference between the channels or the balance parameter and the energy of the downmix signal at the receiver input. « A level change occurs when the energy of the downmix signal on the receiver input appears as a function of a change in the downmix signal output by an encoder. In this context, it is emphasized that when the parameters are frequency selective, depending on the parametric architecture used, this level change not only results in a general volume change of the established signal, but also a large number of artifacts. When, for example, a frequency band of the downlink mix signal is more frequently manipulated in the frequency range than in one of the other positions, because the frequency component in the output sound m -8 - 201116078 of the certain frequency band There is a level that is too low or too high, so this manipulation is clearly visible in the reconstructed output signal. In addition, changing the level operation in a timely manner will also cause the total level of the reconstructed output signal to change over time and is therefore considered an annoying artifact. Although the above situation is concentrated on the level operation caused by encoding, transmitting and decoding the next line of the mixed signal, other level shifts may occur. Due to the phase dependence between the different channels to be downmixed into one or two channels, the following occurs: the mono signal has a sum that is not equal to the energy in the original signal. Since the downmix is usually implemented, for example, by adding a time waveform to sample-wise, although the left and right signals of course have a certain signal energy, for example, a phase of 180 degrees between the two signals The difference will result in complete cancellation of the two channels in the downstream mix signal, which in turn results in zero energy. Although it is very unlikely that this will happen under normal conditions, energy changes will still occur because all signals are of course not completely uncorrelated. Since the energy of the reconstructed output signal will be different from the energy of the original multi-channel signal, such variations will also cause a change in volume in the reconstructed output signal and will also result in artifacts. SUMMARY OF THE INVENTION One object of the present invention is to provide a parametric concept that can result in a multi-channel reconstruction with improved output quality. The object is to generate a device for reconstructing a multi-channel representation according to a device for generating a level parameter according to item 1 of the patent application scope, according to claim 7 of the patent application scope, according to the scope of claim 9 M -9- 201116078 A method for generating a level parameter, a method for generating a reconstructed multi-channel representation according to claim 10, a computer program according to claim 11 or This is achieved in accordance with a parameter representation of item 12 of the scope of the patent application. The present invention is based on the following findings: for high quality reconstruction and in view of the elastic coding/transmission and decoding architecture, an additional level parameter is transmitted along with a multi-channel signal down-mix signal or parameter representation such that A multi-channel reconstructor can use the level parameter together with the level difference parameter and the down-mix signal to reproduce a multi-channel output signal without encountering level changes or frequency selective power The artificial factor caused by Ping. In accordance with the present invention, the level parameter is calculated such that at least the energy of the next mixing channel is weighted (e.g., multiplied or divided) by the level parameter equal to the sum of the energies of the original channels. In one embodiment, the level parameter is obtained from the ratio of the energy of the (equal) downmix channel to the sum of the energy of the original channels. In this embodiment, any level difference between the (equal) downstream mixing channel and the original multi-channel signal is calculated on the encoder side and input to the data stream as an electric The level correction factor, which is considered as an additional parameter, may also be provided to a block of the sample of the (equal) downmix channel and to a certain frequency band. Therefore, a new level parameter is added for each region and frequency band (there is an inter-channel level difference or balance parameter). The present invention also provides flexibility because the present invention allows one of the transmissions of a multi-channel signal to be different from the downstream mix of the line-mixed sounds according to the parameters. When, for example, a broadcast station does not wish to play one of the multi-channel decoders

-10- 201116078 行混音信號,然而希望播放在一播音室由一音效工程師所產 生之一下行混音.信號(係一依據人類之主觀及創造印象之下 行混音)時,會出現此等情況。不過,播放者可能亦希望傳 送有關於此「主下行混音」之多聲道參數。依據本發明,藉 由該電平參數提供該參數組與該主下行混音間之適應,在此 情況中,該電平參數係在該主下行混音與該參數下行混音間 之電平差,其中該參數組係根據該參數下行混音。 本發明之優點在於:因爲亦可使有關於一下行混音信號 之參數組適應於另一下行混音,其中該另一下行混音並非在 參數計算期間所產生,所以該額外電平參數提供改善之輸出 品質及改善之彈性。 爲了位元率之減少,最好應用該新電平參數之△-編碼以 及量化及熵編碼。特別地,因爲頻帶間或時間區塊間之變化 將不會那麼高,以致於可獲得相對小的差値,此連合隨後熵 編碼(例如:霍夫曼編碼器)之使用以允許良好編碼增益之可 能性,所以△-編碼將導致高編碼增益。 在本發明之一較佳實施例中,使用一包括至少兩個不同 平衡參數之多聲道信號參數表示,該至少兩個不同平衡參數 表示兩個不同聲道對間之平衡。特別地,彈性、可調能力、 抗錯誤及甚至位元率率效率係下面事實之結果:第一聲道對 (第一平衡參數之根據)係不同於第二聲道對(第二平衡參數 之根據)’其中形成這些聲道對之四個聲道皆彼此不同。 因此,本發明觀念不同於該單一參考聲道觀念及使用一 多平衡或超平衡觀念,該多平衡或超平衡觀念對人類之聲音 t S3 -11- 201116078 印象更直學及更自然。特別地,構成該第一及第二平衡參數 之聲道對可包括原始聲道、下行混音聲道或最好是輸入聲道 間之某些組合。 已發現到一從該中央聲道(做爲該第一聲道)所獲得之 平衡參數以及該左原始聲道與該右原始聲道(做爲該聲道對 之第二聲道)之加總對於在該中央聲道與該左及右聲道間提 供一精確能量分佈是特別有用的。注意到在此上下文中這三 個聲道通常包括聲場之大部分資訊,其中特別地該左-右立 Φ 體聲局部化不僅受左與右間之平衡的影響,而且亦受中央與 左右之加總間的平衡之影響。依據本發明之一較佳實施例藉 由使用此平衡參數來反映此觀察。 最好,當傳送一單一單下行混音信號時,已發現到除該 中央/左十右平衡參數之外,還有一左/右平衡參數、一後-左/ 後-右平衡參數及一前/後平衡參數係一位元率-有效參數表 示之最佳解答,其係彈性、抗錯誤及可免於大程度人工因素。 在接收器側上,相較於單獨藉由該已傳輸資訊來計算每 φ 一聲道之BCC合成,本發明之多平衡表示額外地使用在用 以產生該下行混音聲道之下行混音架構上的資訊。因此,在 該下行混音架構(未使用於習知技藝系統中)上之資料亦使 用於除該平衡參數之外還有上行混音》因此,實施該上行混 音操作,以便藉由該平衡參數來決定在一重建多聲道信號 (針對一平衡參數形成一聲道對)內之聲道間的平衡。 此觀念(亦即,不同平衡參數具有不同聲道對)可產生一 些聲道而不需知道每一傳輸平衡參數。特別地,可重建該 I S3 •12- 201116078 左、右及中央聲道而不需知道任何後-左/後-右平衡或不需知 道前/後平衡。因爲從一位元流擷取一額外參數或傳送一額 外平衡參數至一接收器因而允許一個或多個額外聲道之重 建,所以此結果允許非常微調之可調能力。此與該習知技藝 單一參考系統成對比,在該習知技藝單一參考系統中需要每 一聲道間電平差以重建所有已重建輸出聲道之所有子群或 只有一子群。 因爲可使該等平衡參數之選擇適應於某一重建環境,所 φ 以本發明觀念亦是有彈性的。當例如:一 5-聲道裝設形成該 原始多聲道信號裝設時及當一 4-聲道裝設形成一重建多聲 道裝設時,一前·後平衡參數允許計算該組合環繞聲道而不 需要對該左環繞聲道及該左環繞聲道有任何了解,其中該重 建多聲道裝設只具有一單一環繞揚聲器,而該單一環繞揚聲 器例如是設置在收聽者之後面。此與一單一參考聲道系統成 對比,在該單一參考聲道系統中必須從該資料流擷取該左環 繞聲道之聲道間電平差及該右環繞聲道之聲道間電平差。然 • 後,必須計算該左環繞聲道及該右環繞聲道。最後’必須加 入兩個聲道以針對一 4-聲道重建裝設獲得該單一環繞揚聲 器聲道。因爲由於該更直覺及更使用者導向之平衡參數表示 並非受限於一單一參考聲道而亦可允許使用原始聲道之組 合以做爲一平衡參數聲道對之一聲道因而可自動地發送該 組合環繞聲道,所以不必在該平衡參數表示中實施所有這些 步驟。 本發明係有關於音頻信號之參數化多聲道表示的問-10- 201116078 Line mixing signal, however, it is desirable to play a downmix. Signal generated by a sound engineer in a studio (which is based on human subjective and creative impressions). Happening. However, the player may also wish to transmit multi-channel parameters related to this "main downmix". According to the present invention, the level parameter is used to provide an adaptation between the parameter group and the main downmix, in which case the level parameter is the level between the main downmix and the parameter downmix. Poor, where the parameter group is downmixed according to the parameter. An advantage of the present invention is that the additional level parameter is provided because the parameter set for the next line of mixing signal can also be adapted to another downstream mix, wherein the other downstream mix is not generated during parameter calculation. Improved output quality and improved flexibility. In order to reduce the bit rate, it is preferable to apply the Δ-encoding of the new level parameter as well as quantization and entropy coding. In particular, because the variation between inter-band or time blocks will not be so high that relatively small differences can be obtained, the use of this concatenation followed by entropy coding (eg, Huffman coder) to allow for good coding gain The possibility, so Δ-encoding will result in a high coding gain. In a preferred embodiment of the invention, a multi-channel signal parameter representation comprising at least two different balance parameters is used, the at least two different balance parameters representing a balance between two different channel pairs. In particular, flexibility, adjustability, error resistance, and even bit rate rate efficiency are the result of the fact that the first channel pair (the basis of the first balance parameter) is different from the second channel pair (the second balance parameter) According to the fact that the four channels in which these channel pairs are formed are different from each other. Thus, the inventive concept differs from the single reference channel concept and the use of a multi-balance or over-balance concept that is more straightforward and more natural to the human voice t S3 -11-201116078. In particular, the pairs of channels that make up the first and second balance parameters may include some combination of the original channel, the downmix channel, or preferably the input channel. A balance parameter obtained from the center channel (as the first channel) and the addition of the left original channel and the right original channel (as the second channel of the channel pair) have been found It is always useful to provide a precise energy distribution between the center channel and the left and right channels. It is noted that in this context these three channels usually comprise most of the information of the sound field, wherein in particular the left-right vertical Φ body localization is not only affected by the balance between the left and the right, but also by the center and left and right. The effect of the balance between the total. This observation is reflected by the use of this balancing parameter in accordance with a preferred embodiment of the present invention. Preferably, when transmitting a single single downmix signal, it has been found that in addition to the center/left ten right balance parameter, there is a left/right balance parameter, a back-left/back-right balance parameter, and a front The /after balance parameter is the best solution for one-bit rate-effective parameter representation, which is elastic, error-resistant and immune to large artificial factors. On the receiver side, the multi-balance representation of the present invention is additionally used to generate a line mix for the downmix channel as compared to the BCC synthesis for each channel of φ by the transmitted information alone. Architectural information. Therefore, the data on the downstream mixing architecture (not used in the prior art system) is also used in addition to the balancing parameters for the uplink mix. Therefore, the upstream mixing operation is implemented to achieve the balance. The parameters determine the balance between the channels in a reconstructed multi-channel signal (forming a pair of channels for a balanced parameter). This concept (i.e., different balance parameters have different pairs of channels) can produce some channels without knowing each transmission balance parameter. In particular, the I S3 •12- 201116078 left, right and center channels can be reconstructed without knowing any post-left/back-right balance or without knowing the front/rear balance. This result allows for very fine-tuning of the ability to adjust, either by taking an extra parameter from a bit stream or by transmitting an extra balance parameter to a receiver, thus allowing reconstruction of one or more additional channels. This is in contrast to the prior art single reference system in which a level difference between channels is required to reconstruct all subgroups or only a subgroup of all reconstructed output channels. Since the selection of the balance parameters can be adapted to a certain reconstruction environment, the concept of the invention is also flexible. A front-and-back balance parameter allows calculation of the combined surround when, for example, a 5-channel setup forms the original multi-channel signal setup and when a 4-channel setup forms a reconstructed multi-channel setup The channel does not require any knowledge of the left surround channel and the left surround channel, wherein the reconstructed multi-channel device has only a single surround speaker, and the single surround speaker is, for example, disposed behind the listener. This is in contrast to a single reference channel system in which the inter-channel level difference of the left surround channel and the inter-channel level of the right surround channel must be retrieved from the stream. difference. After that, the left surround channel and the right surround channel must be calculated. Finally, two channels must be added to obtain the single surround speaker channel for a 4-channel reconstruction. Because the more intuitive and user-oriented balanced parameter representation is not limited to a single reference channel, the combination of the original channels can be allowed to be used as a balanced channel pair of channels and thus automatically The combined surround channel is sent, so it is not necessary to implement all of these steps in the balanced parameter representation. The invention relates to a parameterized multi-channel representation of an audio signal

-13- 201116078 題。本發明提供一有效方式以界定該多聲道表示之適當參數 及亦提供可擷取用以表示所期望聲道組態之參數的能力而 不需解碼所有聲道。本發明進一步解決針對一特定信號區段 以選擇最佳參數組態之問題,以便最小化要針對該特定信號 區段編碼該空間參數所需之位元率。本發明亦槪述如何在一 般多聲道環境中應用在先前只可應用於兩個聲道情況之解 相關方法。 在較佳實施例中,本發明包括下面特徵: Φ -在該等編碼器側上將該多聲道信號下行混音成爲一個或 兩個聲道表示; -已知有該多聲道信號,界定用以表示該等多聲道信號之參 數,以便在一彈性每幀基礎中最小化位元率或使該解碼器 能擷取在一位元流位準上之聲道組態: -假設該聲道組態目前係由該解碼器來支援,在該解碼器側 上擷取該相關參數組; -假設有該目前聲道組態,產生所需數目之相互解相關信 號, -假設該參數組係由該位元流資料及該等解相關信號所解 碼,重建該等輸出信號; -界定該多聲道音頻信號之參數化,以便可使用相同參數或 該等參數之一子組,而無關於該聲道組態; -界定該多聲道音頻信號之參數化,以便可在一可調式編碼 架構中使用該等參數,在該架構處將該參數組之子組傳送 於該可調式流之不同層中; l S1 -14- 201116078 -界定該多聲道音頻信號之參數化,以便來自該解碼器之輸 出信號的能量重建不會受下面音頻編解碼器所損害,該音 頻編解碼器係用以編碼該下行混音信號; -該多聲道音頻信號之不同參數化間做切換,以便最小化用 以編號該參數化之位元率負擔; -界定該多聲道音頻信號之參數化,其中包括一用以表示該 下行混音信號之能量校正因數的參數; -使用數個相互解相關之解相關器,以重建該多聲道信號; Φ 以及 -從一依據該傳送參數組所計算之上行混音矩陣重建該多 聲道信號。 現將藉由有關於所附圖式之說明範例來描述本發明,其 中該等說明範例並非用以限定本發明之範圍或精神。 【實施方式】 下面描述之實施例僅用以說明本發明在音頻信號之多 I 聲道表示的原理。可了解到在此所述之配置及細節的修改及 變化對熟習該項技藝者而言係顯而易知的。因此,意思僅由 即將描述之申請專利範例所限定,而非由在此之實施例的描 述及說明所呈現之特定細節來限定。 在槪述如何參數化IID及ICC參數及如何應用這些參數 以便重建音頻信號之多聲道表示的本發明之下面描述中,假 設所有提及之信號係在一濾波器阻中之子頻帶信號或對應 聲道之整個頻率範圍的一部分之一些其它頻率選擇性表 示》因此,了解到本發明並非局限在一特定濾波器組,以及 m -15- 201116078 下面針對該信號之子頻帶表示的一頻帶來槪述本發明,以及 相同操作應用至所有子頻帶信號。 雖然一平衡參數亦稱爲一「聲道間強度差(IID)」參數, 但是強調在一聲道對間之一平衡參數沒有必要是在該聲道 對之第一聲道中的能量或強度及在該聲道對中之第二聲道 的能量或強度。通常,該平衡參數表示在該聲道對之兩個聲 道間的一聲音源的局部化。雖然此局部化通常係由能量/電 平/強度差所提供,但是可使用信號之其它特性(例如:兩個 p 聲道之功率測量或該等聲道之時間或頻率包封等)。 在第1圖中,顯現一 5.1聲道組態之不同聲道,其中a(t) 101表示該左環繞聲道,b(t) 102表示該左前聲道,c(t) 103 表示該中央聲道,d(t) 104表示該右前聲道,e(t) 105表示 該右環繞聲道,以及f(t) 106表示該LEF(低頻音效(low frequency effect))聲道 ° 假設我們界定期望運算子爲: 以及因此上面所槪述之聲道的能量可依據下面來界定 (在此以左環繞聲道做爲範例): A = E[a\t)\0 在該編碼器側上將5·聲道下行混音成爲一 2-聲道表示 或一 1-聲道表示。此能夠以幾個方式來完成,以及一通常所 使用之方式爲由下面所界定之ITU下行混音: 5.1至2-聲道下行混音: ί S3 -16- 201116078 ld = ab(t) + pa(t) + yc(t) + 5f(t) rd (t) = ad ⑴ + pe(t) + yc(t)十 5f ⑴ 以及5.1至1-聲道下行混音: mAO = JjUAO+rAO) 常數a、p、γ及δ之通常所使用之値爲: a =1,β = χ = ^ > 以及在=0 » 將該IID參數界定成爲兩個任意選擇聲道或加權群之聲 道的能量比。假如有上述針對該5.1聲道組態之所槪述的聲 道之能量,則可界定幾組之IID參數。 第7圖表示一普通下行混音器700,其使用上述方程 式,以便計算一單聲道m或兩個最佳立體聲道^及rd。通 常,該下行混音器使用某些下行混音資訊。在一線性下行混 音之較佳實施例中,此下行混音資訊包括加權因數a、|3、γ 及δ。在本技藝中已知可使用更多或更少常數或非常數加權 因數》 在一 ITU建議下行混音中,a係設定爲1,ρ及γ係設定 爲等於0.5之平方根,以及δ係設定爲0。通常,因素α可在 1.5與0.5之間變化。此外,因素β與γ係彼此不同的及在〇 與1之間變化。該低頻增強型聲道f(t)具有相同之事實。此 聲道之因數δ可在0與1之間變化。此外,該左-下行混音及 該右-下行混音之因數不必彼此相等。當考量一例如藉由一 音效工程師所實施之非自動下行混音時,此變得更清楚。進 一步指導該音效工程實施一創造性下行混音而非一由任何-13- 201116078 Question. The present invention provides an efficient way to define the appropriate parameters for the multi-channel representation and also provides the ability to retrieve parameters for representing the desired channel configuration without having to decode all of the channels. The present invention further addresses the problem of selecting an optimal parameter configuration for a particular signal segment in order to minimize the bit rate required to encode the spatial parameter for that particular signal segment. The present invention also describes how to apply a decorrelation method that was previously only applicable to two channels in a general multi-channel environment. In a preferred embodiment, the invention includes the following features: Φ - downmixing the multi-channel signal to one or two channel representations on the encoder side; - the multi-channel signal is known, Defining the parameters used to represent the multi-channel signals to minimize the bit rate in an elastic per-frame basis or to enable the decoder to capture the channel configuration at one bit stream level: - Assumption The channel configuration is currently supported by the decoder, which retrieves the relevant parameter set on the decoder side; - assuming the current channel configuration, produces the required number of mutual decorrelated signals, - assuming The parameter set is decoded by the bit stream data and the decorrelated signals to reconstruct the output signals; - defining a parameterization of the multi-channel audio signal so that the same parameter or a subset of the parameters can be used, Regardless of the channel configuration; - defining the parameterization of the multi-channel audio signal so that the parameters can be used in an adjustable coding architecture at which the subset of the parameter set is transmitted In different layers of flow; l S1 -14- 201116078 Defining the parameterization of the multi-channel audio signal such that energy reconstruction from the output signal of the decoder is not compromised by an audio codec for encoding the downstream mix signal; Switching between different parameterizations of the multi-channel audio signal to minimize the bit rate burden for numbering the parameterization; defining a parameterization of the multi-channel audio signal, including one for indicating the downlink mix a parameter of the energy correction factor of the tone signal; - using a plurality of de-correlated decorrelators to reconstruct the multi-channel signal; Φ and - reconstructing the multi-voice from an upstream mixing matrix calculated from the set of transmission parameters Signal. The present invention will be described by way of example with reference to the accompanying drawings, which are not intended to limit the scope or spirit of the invention. [Embodiment] The embodiments described below are only for explaining the principle of the multi-channel representation of the audio signal of the present invention. It will be appreciated that modifications and variations of the configurations and details described herein will be apparent to those skilled in the art. Therefore, the meaning of the present invention is defined by the specific examples of the patent application to be described, and not by the specific details presented in the description and description of the embodiments herein. In the following description of the present invention, which describes how to parameterize IID and ICC parameters and how to apply these parameters in order to reconstruct a multi-channel representation of an audio signal, assume that all of the mentioned signals are sub-band signals or corresponding in a filter block. Some other frequency selective representation of a portion of the entire frequency range of the channel. Thus, it is understood that the invention is not limited to a particular filter bank, and m -15-201116078 for a frequency band represented by the subband of the signal. The invention, as well as the same operation, applies to all sub-band signals. Although a balance parameter is also referred to as an "inter-channel intensity difference (IID)" parameter, it is not necessary to emphasize the balance parameter of one channel pair as the energy or intensity in the first channel of the channel pair. And the energy or intensity of the second channel in the pair of channels. Typically, the balance parameter represents the localization of a sound source between the two channels of the pair of channels. While this localization is typically provided by energy/level/intensity differences, other characteristics of the signal can be used (e.g., power measurements of two p-channels or time or frequency encapsulation of such channels, etc.). In Fig. 1, a different channel of a 5.1 channel configuration is presented, where a(t) 101 represents the left surround channel, b(t) 102 represents the left front channel, and c(t) 103 represents the center. The channel, d(t) 104 represents the right front channel, e(t) 105 represents the right surround channel, and f(t) 106 represents the LEF (low frequency effect) channel. The expected operator is: and therefore the energy of the channel described above can be defined as follows (here the left surround channel is used as an example): A = E[a\t)\0 on the encoder side The 5 channel downmix is made into a 2-channel representation or a 1-channel representation. This can be done in several ways, and one commonly used way is the ITU Downmix as defined below: 5.1 to 2-channel downmix: ί S3 -16- 201116078 ld = ab(t) + Pa(t) + yc(t) + 5f(t) rd (t) = ad (1) + pe(t) + yc(t) tens 5f (1) and 5.1 to 1-channel downmix: mAO = JjUAO+rAO The usual constants a, p, γ, and δ are: a = 1, β = χ = ^ > and at 0 » define the IID parameter as two arbitrary channel or weighted group The energy ratio of the road. If there is the above described energy for the 5.1 channel configuration, then the IID parameters for several groups can be defined. Figure 7 shows a conventional downmixer 700 that uses the above equation to calculate a mono m or two optimal stereo channels and rd. Typically, the downstream mixer uses some downstream mix information. In a preferred embodiment of a linear downmix, the downmix information includes weighting factors a, |3, γ, and δ. It is known in the art that more or less constants or non-constant weighting factors can be used. In an ITU recommended downmix, a is set to 1, ρ and γ are set equal to the square root of 0.5, and the delta system is set. Is 0. Generally, the factor a can vary between 1.5 and 0.5. Further, the factors β and γ are different from each other and vary between 〇 and 1. This low frequency enhanced channel f(t) has the same fact. The channel factor δ can vary between 0 and 1. Furthermore, the factors of the left-downmix and the right-downmix do not have to be equal to each other. This becomes clearer when considering a non-automatic downmix performed by a sound engineer, for example. Further guide the sound engineering to implement a creative downstream mix instead of any

17· 201116078 數學定律所支配之下行混音。取而代之,該音效工程師係由 他本身自己的創造感覺來支配。當某一參數組記錄此「創造 性的」下行混音時,將依據本發明由第8圖所示之一發明上 行混音器來使用該「創造性的」下行混音,此不僅由該等參 數來支配,而且亦由該下行混音架構之額外資訊來支配。 當如同在第7圖中已實施一線性下行混音時,該等加權 參數係在該下行混音架構上要由該上行混音器所使用之最 佳資訊。然而,當呈現在該下行混音架構中所使用之其它資 φ 訊時,一上行混音器亦可使用此其它資訊以做爲該下行混音 架構之資訊。此其它資訊例如亦可以是在一上行混音-矩陣 之矩陣元素內之某些矩陣元素或某些因素或函數(例如:如 第1 1圖所示)。 假設有第1圖所槪述之5.1聲道組態及觀察其它聲道組 態如何相關於該5 . 1聲道組態:對於不可獲得環繞聲道之3 · 聲道情況而言,亦即可依據上述記號獲得B、C及D。對於 —4-聲道組態而言,可獲得B、C及D,然而亦可獲得用以 • 表示該單環繞聲道或在此上下文中所一般表示之後聲道的 A與E之組合。 本發明使用應用至所有這些聲道之IID參數,亦即,該 5.1聲道組態之4-聲道子組在描述該5.1聲道之IID參數組 內具有一對應子組。下面IID參數組解決此問題:17· 201116078 The mathematics law governs the mixing of the lines. Instead, the sound engineer is dominated by his own creative feelings. When a certain parameter group records this "creative" downstream mix, the "creative" downstream mix will be used in accordance with the invention by the invention of the upstream mixer shown in Fig. 8, which is not only by the parameters It is dominated and is also dominated by additional information about the downstream mix architecture. When a linear downmix is implemented as in Figure 7, the weighting parameters are the best information to be used by the upstream mixer on the downstream mix architecture. However, when presenting other information used in the downlink mixing architecture, an upstream mixer can also use this other information as information for the downstream mixing architecture. Such other information may, for example, also be some matrix elements or certain factors or functions within a matrix element of an upstream mix-matrix (e.g., as shown in Figure 11). Suppose there is a 5.1 channel configuration as described in Figure 1 and how the other channel configurations are related to the 5.1 channel configuration: for the 3 channel case where no surround channel is available, ie B, C, and D can be obtained based on the above symbols. For a 4-channel configuration, B, C, and D are available, but a combination of A and E for the single surround channel or the subsequent channels in this context is also available. The present invention uses IID parameters applied to all of these channels, i.e., the 4-channel sub-group of the 5.1 channel configuration has a corresponding sub-group within the IID parameter set describing the 5.1 channel. The following IID parameter group resolves this issue:

L a2B + fi2A + y2C + 62F Γ, ~ Λ ~ aiD + p2E + y1C-v51FL a2B + fi2A + y2C + 62F Γ, ~ Λ ~ aiD + p2E + y1C-v51F

r22C h ~ a\B + D) IS] •18- 201116078 _ β\Α + Ε)r22C h ~ a\B + D) IS] •18- 201116078 _ β\Α + Ε)

a2(B + D) + y22CA2(B + D) + y22C

β2Α A r'= ~We ~ Ί. __δ1 IF_β2Α A r'= ~We ~ Ί. __δ1 IF_

r$ = a\B + D) + p2{A + E) + Y12C 明顯可知n參數對應於該左下行混音聲道與該右下行 混音聲道間之能量比。r2參數對應於中央聲道與該左及右前 聲道間之能量比。r3參數對應於該三個前聲道與該兩個環繞 聲道間之能量比》Γ4參數對應於該兩個環繞聲道間之能量 ® 比。r5參數對應於該LFE聲道與所有其它聲道間之能量比。 在第4圖中,描述上面所述之能量比。不同輸出聲道係 由101至105所表示及相同於第1圖所示以及因而在此不做 詳細陳述。將該揚聲器裝設分割成左及右半部’其中該中央 聲道103係兩個半部之部分。該左半部平面與該右半部平面 間之能量比正好是η參數。此係藉由第4圖中之η下方的 實體線來表示。再者,在該中央聲道103與該左前102及右 前103聲道間之能量分佈係由r2所表示。最後,在該整個前 ^ 聲道裝設(102、103及104)與該後聲道(101及105)間之能量 分佈係以r3參數由第4圖中之箭頭來描述。 假設有上述參數化及該傳輸單下行混音聲道之能量: Μ = -{α\Β + Ω)·¥β2{Α + Ε)^ 1γ20 + 2S2F) 2 , 該等重建聲道之能量可表示成爲:r$ = a\B + D) + p2{A + E) + Y12C It is obvious that the n parameter corresponds to the energy ratio between the left downmix channel and the right down mix channel. The r2 parameter corresponds to the energy ratio between the center channel and the left and right front channels. The r3 parameter corresponds to the energy ratio between the three front channels and the two surround channels. The Γ4 parameter corresponds to the energy ratio between the two surround channels. The r5 parameter corresponds to the energy ratio between the LFE channel and all other channels. In Figure 4, the energy ratios described above are described. The different output channels are denoted by 101 to 105 and are identical to those shown in Figure 1 and thus will not be described in detail herein. The speaker assembly is divided into left and right halves, wherein the central channel 103 is part of two halves. The energy ratio between the left half plane and the right half plane is exactly the η parameter. This is indicated by the solid line below η in Figure 4. Furthermore, the energy distribution between the center channel 103 and the left front 102 and right front 103 channels is represented by r2. Finally, the energy distribution between the entire pre-channel assembly (102, 103 and 104) and the rear channel (101 and 105) is described by the arrows in Figure 4 with the r3 parameter. Suppose there is the above parameterization and the energy of the single downlink mixing channel: Μ = -{α\Β + Ω)·¥β2{Α + Ε)^ 1γ20 + 2S2F) 2 , the energy of the reconstructed channels can be Expressed as:

F= 2M 2γ2 \ + r5 ί S3 -19- ^4201 116078 β2 \ + rAl + r3l + rsF= 2M 2γ2 \ + r5 ί S3 -19- ^4201 116078 β2 \ + rAl + r3l + rs

2M Ε2M Ε

C 1 + r4 1 + r3 1 + r5 _1__r2__1__1_ 2γ2 1 + r2 1 + r3 1 + rsC 1 + r4 1 + r3 1 + r5 _1__r2__1__1_ 2γ2 1 + r2 1 + r3 1 + rs

2M2M

2M2M

B 2-^—M-/32A-r2C-S2F 1 + Ί , 1 、 2——M-fi2E-Y2C-d2F .1 + > 重 致 導 道 聲 建 重 等。 該量 至能 配之 分道 量聲 匕匕 厶口 會 之原 號等 信該 M於 將同 ^__ ΠΠ 可才 , 有 此具 因道 聲 II 建 上述較佳上行混音架構係描述於第8圖中。從F、A、E、 C、B及D之方程式可清楚知道該下行混音架構之由該上行 混音器所使用的資訊係該等加權因數α、、γ及δ,在使此加 權或未加權聲道一起加入或彼此扣減以便獲得某一數目之 下行混音聲道前,使用該等加權因數以加權該等原始聲道, 其中下行混音聲道之數目小於原始聲道之數目。因此,從第 8圖可清楚知道依據本發明該等重建聲道之能量不僅由從一 編碼側傳送至一解碼側之平衡參數所決定,而且可由該下行 混音因數α、β、γ及δ。 當考量第8圖時,變成可清楚知道爲了計算該左及右能 量Β及D,可在該方程式中使用已計算之聲道能量F、α、Ε 及C。然而,此沒有必要包含一連續上行混音架構。取而代 之,爲了獲得一例如使用某一上行混音矩陣(具有某些上行 混音矩陣元素)來實施之完全平行上行混音架構,將A、C、 E & F之方程式插入Β及D之方程式。因此,變得可清楚知 I S3 -20- 201116078 道重建聲道能量僅由平衡參數、下行混音聲道及該下行混音 架構之資訊(例如:該等下行混音因數)來決定。 如從下面將可明顯知道,假設有上述IID參數,則明顯 易知已解決用以界定一參數組之IID參數(可用於數個聲道 組態)的問題。觀察該三個聲道組態(亦即,從一可獲得聲道 重建三個前聲道)來做爲一個範例,可明顯易知因爲a、e及 F聲道不存在’所以Γ3、“及r5參數係不顯著的。亦可明顯 易知因爲參數h描述該左與右前聲道間之能量比及參數Γ2 φ 描述該中央聲道與該左及右前聲道間之能量比,所以參數ri 及r2係足以從一下行混音單聲道重建該三個聲道。 在更一般情況中’可容易地看到上述之IID參數(Γι...Γ5) 係應用至用以從m個聲道重建η個聲道之所有子組,其中 m<r^6。觀察第4圖,可以說: -對於一從1聲道重建2聲道之系統而言,從ri參數獲得用 以保持該等聲道間之正確能量比的充分資訊; -對於一從1聲道重建3聲道之系統而言,從^及Γ2參數 φ 獲得用以保持該等聲道間之正確能量比的充分資訊; -對於一從1聲道重建4聲道之系統而言,從ri、12及Γ3 參數獲得用以保持該等聲道間之正確能量比的充分資訊; -對於一從1聲道重建5聲道之系統而言,從^、^、^及 Ν參數獲得用以保持該等聲道間之正確能量比的充分資 訊; -對於一從1聲道重建5.1聲道之系統而言,從ri、r2、r3、 『4及r5參數獲得用以保持該等聲道間之正確能量比的充 m -21- 201116078 分資訊; -對於一從2聲道重建5.1聲道之系統而言,從r2、r3、r4 及r5參數獲得用以保持該等聲道間之正確能量比的充分 資訊。 上述可調能力特徵可藉由第l〇b圖中之列表來描述。第 l〇a圖中所述且在稍後所說明之可調位元流亦可適用於第 1 Ob圖中之列表,以便獲得比第1 0a圖所述者更細之可調能 力。 (I 本發明之優點特別在於:可容易地從一單平衡參數Π 重建該左及右聲道,而無需知道或擷取任何其它平衡參數。 爲此目的,在第8圖中之B、D的方程式中,將聲道A、C、 F及Ε簡單地設定成爲零。 在另一情況中,當只考量該平衡參數r2時,該等重建聲 道係該中央聲道與該低頻聲道(此聲道未被設定成零)間之 加總及該左與右聲道間之加總。因此,可只使用一單一參數 來重建該中央聲道及該單音信號。此特徵對於一簡單3-聲道 • 表示是有用的,其中例如藉由對分以從左及右之加總獲得該 左及右信號,以及其中藉由該平衡參數r2正確地決定該中央 與該左右之加總間的能量。 在此上下文中,該等平衡參數ri或r2係位於一較低調 整層中。 至於第l〇b圖之列表中的第二項,表示如何只使用兩個 平衡參數取代所有5個平衡參數來產生三個聲道b、D及C 與F間之加總,相較位於該較低調整層中之參數π或~, -22- 201116078 這些參數η及r2中之一可以已經在一較高調整層中。 當考量第8圖中之方程式時,變得清楚知道:爲了計算 C,將未擷取參數r5及另一非擷取參數13設定成零。在另一 情況中,亦將該等未使用聲道A、E及F亦設定成零,以便 可計算該三個聲道B、D及該中央聲道與該低頻增強型聲道 F之組合。 當使一 4-聲道表示上行混音時,只從該參數資料流擷取 ri、i"2及r3係足夠的。在此上下文中’相較於參數η或r2, φ r3可以在一下一較高調整層中。因爲如同稍後有關於第6圖 所述,已從該等前聲道與該等後聲道之組合獲得第三平衡參 數r3,所以該4-聲道組態特別適合相關於本發明之超平衡參 數表示。此乃是基於下面事實:該參數Γ3係一從該聲道對所 獲得之前·後平衡參數,該聲道對具有該等後聲道A與E之 組合(做爲第一聲道)及具有左聲道B、右聲道E及中央聲道 C之組合(做爲該等前聲道)。 因此,如同是在一單一參數聲道裝設中之情況,可自動 • 地獲得兩個環繞聲道之組合聲道能量,而無需任何進一步分 離計算及隨後組合。 當必須從一單聲道重建五個聲道時,需要另一平衡參數 Γ4。此參數r4可再次位於一下一較高調整層中。 當必須實施一 5.1重建時,需要每一平衡參數。因此, 必須辑一下一較高調整層(包括該下一平衡參數r5)傳送至一 接收器及由該接收器來估計。 然而,使用依據聲道之擴充數目來擴充該IID參數的相 [S3 -23- 201116078 同方法,可擴充上述IID參數以涵蓋具有比該5 . 1組態大之 數目的聲道之聲道組態。因此,本發明並非局限於上面槪述 之範例。 現在觀察該聲道組態係一 5.1聲道組態之情況,此爲最 通常使用情況中之一。再者,假設從兩個聲道重建該5.1聲 道。對於此情況而言,可藉由以下面式子來取代該等參數r3 及r4以界定一不同組之參數:B 2-^-M-/32A-r2C-S2F 1 + Ί , 1 , 2 --M-fi2E-Y2C-d2F .1 + > Re-directing the sound of the channel and so on. The amount can be matched to the original number of the voice channel, and the letter M will be the same as ^__ ΠΠ, and this is the reason why the above-mentioned better uplink mixing architecture is described in 8 in the picture. From the equations of F, A, E, C, B, and D, it is clear that the information used by the upstream mixer of the downlink mixing architecture is the weighting factors α, γ, and δ, which are weighted or The unweighted channels are added together or subtracted from each other to obtain a certain number of lower line mixing channels, and the weighting factors are used to weight the original channels, wherein the number of downstream mixing channels is less than the number of original channels . Therefore, it is clear from Fig. 8 that the energy of the reconstructed channels according to the present invention is determined not only by the balance parameters transmitted from an encoding side to a decoding side, but also by the downstream mixing factors α, β, γ, and δ. . When considering Fig. 8, it becomes clear that in order to calculate the left and right energy Β and D, the calculated vocal energy F, α, Ε and C can be used in the equation. However, it is not necessary to include a continuous upstream mixing architecture. Instead, in order to obtain a fully parallel upstream mixing architecture implemented using, for example, an upstream mixing matrix (with certain upstream mixing matrix elements), the equations of A, C, E & F are inserted into the equations of Β and D. . Therefore, it becomes clear that the I S3 -20- 201116078 channel reconstruction channel energy is determined only by the balance parameters, the downmix channel, and the information of the downmix architecture (for example, the downmix factors). As will be apparent from the following, assuming the above IID parameters, it is apparent that the problem of defining an IID parameter of a parameter set (available for several channel configurations) has been solved. Observing the three-channel configuration (that is, reconstructing three front channels from an available channel) as an example, it is obvious that since the a, e, and F channels do not exist, 'Γ3,' And the r5 parameter is not significant. It can also be clearly known because the parameter h describes the energy ratio between the left and right front channels and the parameter Γ2 φ describes the energy ratio between the center channel and the left and right front channels, so the parameters Ri and r2 are sufficient to reconstruct the three channels from the next line of mixing. In a more general case, 'the above IID parameters (Γι...Γ5) can be easily seen to be used from m The channel reconstructs all subgroups of n channels, where m<r^6. Looking at Figure 4, it can be said that: - For a system that reconstructs 2 channels from 1 channel, the ri parameter is obtained to maintain Full information on the correct energy ratio between the channels; - For a system that reconstructs 3 channels from 1 channel, the ^ and Γ2 parameters φ are obtained to maintain the correct energy ratio between the channels. Information; - For a system that reconstructs 4 channels from 1 channel, obtained from the ri, 12, and Γ3 parameters to maintain the sound Full information on the correct energy ratio between the two; - For a system that reconstructs 5 channels from 1 channel, obtain sufficient information from the ^, ^, ^, and Ν parameters to maintain the correct energy ratio between the channels. - For a system that reconstructs 5.1 channels from 1 channel, the ri, r2, r3, 『4 and r5 parameters are used to maintain the correct energy ratio between the channels. m -21- 201116078 Information; - For a system that reconstructs 5.1 channels from 2 channels, sufficient information is obtained from the r2, r3, r4, and r5 parameters to maintain the correct energy ratio between the channels. It is described by the list in the l〇b diagram. The adjustable bit stream described in the l〇a diagram and described later can also be applied to the list in the 1st Ob map in order to obtain the ratio 1 The finer adjustable ability described in Figure 0a (I) The advantage of the invention is in particular that the left and right channels can be easily reconstructed from a single balanced parameter , without knowing or taking any other balancing parameters. For this purpose, in the equations of B and D in Fig. 8, the channels A, C, F, and Ε are simply set to zero. In another case, when only the balance parameter r2 is considered, the reconstructed channels are the sum of the center channel and the low frequency channel (the channel is not set to zero) and the left and right sounds The sum of the tracks. Therefore, the center channel and the tone signal can be reconstructed using only a single parameter. This feature is useful for a simple 3-channel• representation, for example by halving from left And the right sum obtains the left and right signals, and wherein the energy between the center and the sum of the left and right is correctly determined by the balance parameter r2. In this context, the balance parameters ri or r2 are located at one In the lower adjustment layer, the second item in the list of the l〇b diagram shows how to replace all five balance parameters with only two balance parameters to generate three channels b, D, and the addition of C and F. In total, one of these parameters η and r2 may already be in a higher adjustment layer than the parameter π or ~, -22- 201116078 located in the lower adjustment layer. When considering the equation in Fig. 8, it becomes clear that in order to calculate C, the untaken parameter r5 and the other non-taken parameter 13 are set to zero. In another case, the unused channels A, E, and F are also set to zero so that the three channels B, D and the combination of the center channel and the low frequency enhanced channel F can be calculated. . When a 4-channel is indicated for the upstream mix, only ri, i"2, and r3 are sufficient from the parameter stream. In this context, φ r3 can be in a next higher adjustment layer than the parameter η or r2. Since the third balance parameter r3 has been obtained from the combination of the front channel and the rear channels as described later with respect to FIG. 6, the 4-channel configuration is particularly suitable for the super-related to the present invention. Balance parameter representation. This is based on the fact that the parameter Γ3 is a pre- and post-equalization parameter obtained from the pair of channels, the pair of channels having the combination of the rear channels A and E (as the first channel) and having A combination of left channel B, right channel E, and center channel C (as the front channel). Thus, as in the case of a single parametric channel installation, the combined channel energy of the two surround channels can be automatically obtained without any further separation calculations and subsequent combinations. When five channels must be reconstructed from a single channel, another balancing parameter Γ4 is required. This parameter r4 can again be located in the next higher adjustment layer. Each balance parameter is required when a 5.1 reconstruction must be implemented. Therefore, it is necessary to compile a higher adjustment layer (including the next balance parameter r5) to be transmitted to and estimated by the receiver. However, using the same method of expanding the IID parameter according to the number of expansions of the channel [S3-23-201116078, the above IID parameter can be expanded to cover a channel group having a larger number of channels than the 5.1 configuration. state. Therefore, the present invention is not limited to the examples described above. Now observe that the channel configuration is a 5.1 channel configuration, which is one of the most common use cases. Again, assume that the 5.1 channel is reconstructed from both channels. For this case, the parameters of a different group can be defined by replacing the parameters r3 and r4 with the following formula:

β2ΕΕ2Ε

a^D 該等參數q3& q4表示該前與後左聲道間之能量比及該 前與後右聲道間之能量比。可想像幾個其它參數化。 在第5圖中,可見到修改之參數化。取代具有—用以槪 述該前與後聲道間之能量分佈的參數(如第4圖中之r3所槪 述)及一用以描述該左環繞聲道與該右環繞聲道間之能量分 佈(如第4圖中之r4所槪述),使用該等參數q3及q4以描述 φ 該左前1〇2與左環繞1〇1聲道間之能量比及該右前聲道104 與該右環繞聲道1 05間之能量比。 本發明教示可使用幾個參數組以表示該等多聲道信 號。本發明之一額外特徵係可依據所使用之參數的量化之型 態以選擇不同參數化。 以一使用參數化之粗量化的系統做爲一個範例,由於高 位元率限制,因而應該使用一在該上行混音程序中不會擴大 誤差之參數化。 I S3 -24- 201116078 觀察在一用以從一聲道重建5.1聲道之系統中上述重建 能量的兩個表示式: 1 ( r \ 5 = _L ΐ-^-Μ-β^-γ^-δ^ « I 1 + na^D These parameters q3&q4 represent the energy ratio between the front and rear left channels and the energy ratio between the front and rear right channels. Imagine a few other parameterizations. In Figure 5, the parameterization of the modification can be seen. Instead of having a parameter for arranging the energy distribution between the front and rear channels (as described in r3 in FIG. 4) and one for describing the energy between the left surround channel and the right surround channel Distribution (as described in r4 in Figure 4), using the parameters q3 and q4 to describe the energy ratio between the left front 1〇2 and the left surround 1〇1 channel and the right front channel 104 and the right The energy ratio between the surround channels and the 05. The teachings of the present invention may use several sets of parameters to represent the multi-channel signals. An additional feature of the present invention is that different parameterizations can be selected depending on the type of quantization of the parameters used. As an example of a system using parameterized coarse quantization, due to the high bit rate limitation, a parameterization that does not expand the error in the upstream mix should be used. I S3 -24- 201116078 Observe two representations of the above reconstructed energy in a system for reconstructing 5.1 channels from one channel: 1 ( r \ 5 = _L ΐ-^-Μ-β^-γ^- δ^ « I 1 + n

DD

M-fi2E-Y2C-S2F 明顯可知由於該M、A、C及F參數之相當小量化效應, 因而該等減算會產生該B及D能量之大變化》 依據本發明,應該使用一幾乎對該等參數之量化不會有 敏感之不同參數化。因此,如果使用粗量化,則上述所界定 之Γ1參數:M-fi2E-Y2C-S2F clearly shows that due to the relatively small quantization effect of the M, A, C and F parameters, the reduction will produce a large change in the energy of B and D. According to the present invention, an almost The quantification of parameters is not sensitive to different parameterizations. Therefore, if coarse quantization is used, the Γ1 parameter defined above:

L a2B + p2A + Y2C + 52FL a2B + p2A + Y2C + 52F

Γ' =Ί~ a2D + fi2E + y2C + 52F 可由依據下式之替代界定來取代:Γ' =Ί~ a2D + fi2E + y2C + 52F can be replaced by an alternative definition according to the following formula:

B 1 D。 此產生依據下式之重建能量的方程式: „ 1 Κ 1 1 1 … α 1 + r, 1 + r21 + r31 + rs „ 1 1 1 1 1 … a 1 + r, 1 + r21 + r31 + r5 及A、E、C及F之重建能量的方程式保持與上述相同。明 顯可知此參數從量化觀點來看表示一最佳狀態系統。 在第6圖中,描述上述所說明之能量比。不同輸出聲道 以101至105來表示且相同於第1圖以及因此在此不做進/ 步詳述。將該揚聲器裝設分割成前部及後部。藉由第6圖中 -25- 201116078 由〇參數所表示之箭頭來描述該整個前聲道裝設(102、103 及1〇4)與該等後聲道(101及105)間之能量分佈。 Ψ發明之另一重要顯著特徵在於當觀察該參數化 r - r22C 2 a\B + D)B 1 D. This produces an equation for the reconstruction energy according to the following formula: „ 1 Κ 1 1 1 ... α 1 + r, 1 + r21 + r31 + rs „ 1 1 1 1 1 ... a 1 + r, 1 + r21 + r31 + r5 and The equations for the reconstruction energies of A, E, C, and F remain the same as described above. It is apparent that this parameter represents an optimal state system from a quantitative point of view. In Fig. 6, the energy ratios described above are described. The different output channels are denoted by 101 to 105 and are identical to Fig. 1 and therefore will not be described in detail herein. The speaker assembly is divided into a front portion and a rear portion. The energy distribution between the entire front channel arrangement (102, 103 and 1〇4) and the rear channels (101 and 105) is described by the arrows indicated by the parameters in Figure 6-25-201116078. . Another important distinguishing feature of the invention is that when observing the parameterized r - r22C 2 a\B + D)

BB

1 D 時’從量化觀點來看它不僅是一更佳狀態系統。上述參數化 亦具有下列優點:可獲得用以重建三個前聲道之參數而不會 φ 對該等環繞聲道有任何影響。可相像一參數r2係描述該中央 聲道與所有其它聲道間之關係。然而,此將具有下例缺點: 該等環繞聲道將包含在該等前聲道所述之參數的估計中。 記住在本發明中所描述之參數化亦可應用至聲道間之 關聯或同調的測量,明顯可知在r2之計算中包含該等後聲道 對精確地重建該等前聲道之成功有顯著的負面影響。 可相像在所有前聲道中之相同信號及在該等後聲道中 之完全無相關信號的情況,以做爲一個範例。此並非罕見 ^ 的,假設經常使用該等後聲道以重建該原始聲音之周圍環境 資訊。 如果描述該中央聲道係有關於所有其它聲道,則因爲該 等後聲道完全不相關,所以該中央與所有其它聲道之加總間 之相關程度將相當低。對於一用以估計該前左/右聲道與該 後左/右聲道間之相關性的參數具有相同之事實》 因此,我們達成一可正確地重建該等能量之參數化,然 而該參數化並沒有包括所有前聲道係相同(亦即,非常相關) I S3 -26- 201116078 的資訊。該參數化確實包括將該左及右前聲道解相關至該等 後聲道及亦將該中央聲道解相關至該等後聲道之資訊。然 而,所有前聲道係相同之事實係無法從此一參數化來推論。 因爲該等後聲道未包含在該解碼器側上所使用之參數 的估計中以重建該等前聲道,所以此可藉由使用下列本發明At 1 D, it is not only a better state system from a quantitative point of view. The above parameterization also has the advantage that parameters for reconstructing the three front channels can be obtained without φ having any effect on the surround channels. The relationship between the center channel and all other channels can be described as a parameter r2. However, this would have the following disadvantages: The surround channels would be included in the estimates of the parameters described in the front channels. It is to be noted that the parameterization described in the present invention can also be applied to the measurement of the correlation or coherence between the channels. It is apparent that the inclusion of the rear channel pairs in the calculation of r2 is successful in accurately reconstructing the front channels. Significant negative impact. It can be seen as an example of the same signal in all the front channels and the absence of relevant signals in the back channels. This is not uncommon ^, assuming that the back channels are often used to reconstruct the surrounding information of the original sound. If the description of the center channel is for all other channels, the degree of correlation between the center and all other channels will be quite low because the back channels are completely uncorrelated. For a parameter used to estimate the correlation between the front left/right channel and the rear left/right channel, therefore, we achieve a parameterization that correctly reconstructs the energy, however, the parameter The information does not include all of the same (ie, very relevant) I S3 -26- 201116078. The parameterization does include de-correlating the left and right front channels to the back channels and also correlating the center channels to the back channels. However, the fact that all front channels are the same cannot be inferred from this parametric. Since the back channels are not included in the estimation of the parameters used on the decoder side to reconstruct the front channels, this can be achieved by using the following invention

所教示之式子來克服: y22C h = a2(B + D)The formula taught to overcome: y22C h = a2(B + D)

B ri =— 依據本發明藉由r2來表示該中央聲道103與該左前102 及右前103聲道間之能量分佈。藉由r4來描述該左環繞聲道 101與該右環繞聲道105間之能量分佈。最後,藉由η來提 供該左前聲道102與該右前聲道104間之能量分佈。明顯可 知,除Π在此對應於該左前揚聲器與該右前揚聲器間之能量 分佈(因相對於整個左側及整個右側)之外,所有參數相同於 第4圖中所述。基於完整性,該參數Γ5亦提供用以槪述該中 鲁 央聲道1〇3與該LFE聲道106間之能量分佈。 第6圖顯示本發明之較佳參數化實施例的槪要。該第一 平衡參數η(由實線所表示)構成一前·左/前-右平衡參數。該 第二平衡參數Γ2係一中央左·右平衡參數。該第三平衡參數 r3構成一前/後平衡參數。該第四平衡參數r4構成一後-左/ 後-右平衡參數。最後,該第五參數r5構成一中央/LFE平衡 參數。 第4圖顯示一相關情況。該第一平衡參數ri (在一下行 I S3 -27- 201116078 混音左/右平衡中藉由第4圖中之實線來描述)可由一在該等 聲道B與D(下面聲道對)間所界定之原始前-左/前-右平衡參 數來取代。此以第4圖中之虛線ri來描述及對應於第5圖及 第6圖中之實線ri。B ri = - The energy distribution between the center channel 103 and the left front 102 and right front 103 channels is represented by r2 in accordance with the present invention. The energy distribution between the left surround channel 101 and the right surround channel 105 is described by r4. Finally, the energy distribution between the left front channel 102 and the right front channel 104 is provided by η. It is apparent that, except for the energy distribution between the left front speaker and the right front speaker (as opposed to the entire left and the entire right side), all parameters are the same as described in FIG. Based on the integrity, the parameter Γ5 is also provided to describe the energy distribution between the central channel 1〇3 and the LFE channel 106. Figure 6 shows a summary of a preferred parametric embodiment of the present invention. The first balance parameter η (represented by the solid line) constitutes a front left/front-right balance parameter. The second balance parameter Γ2 is a central left-right balance parameter. The third balance parameter r3 constitutes a pre/post balance parameter. The fourth balance parameter r4 constitutes a back-left/back-right balance parameter. Finally, the fifth parameter r5 constitutes a central/LFE balance parameter. Figure 4 shows a related situation. The first balance parameter ri (described in a downlink I S3 -27-201116078 mix left/right balance by the solid line in FIG. 4) may be in one of the channels B and D (the lower channel pair) The original pre-left/pre-right balance parameters defined between them are replaced. This is described by the broken line ri in Fig. 4 and corresponds to the solid line ri in Figs. 5 and 6.

在一雙聲道情況中,該等參數r3及r4(亦即,該前/後平 衡參數及該後-左/右平衡參數)由兩個單側前/後參數所取 代。該第一單側前/後參數q3亦可被視爲該第一平衡參數, 其中該第一平衡參數係從該左環繞聲道A及該左聲道B所構 成之聲道對所獲得。該第二單側前/左平衡參數係該參數 q^’其可被視爲該第二參數,該第二參數係根據該右聲道D 及該右環繞聲道E所構成之第二聲道對。再者,兩個聲道對 係彼此不相關的。該中央/左-右平衡參數!*2亦具有相同之事 實’該中央/左·右平衡參數^具有一中央聲道C以做爲一第 一聲道及該左及右聲道B及D之加總以做爲一第二聲道。 依據本發明界定另一參數化,該另一參數化針對一從一 個或兩個聲道重建5.1聲道之系統本身相當適合於粗量化。In the case of a two-channel, the parameters r3 and r4 (i.e., the front/back balance parameter and the back-left/right balance parameter) are replaced by two one-sided front/rear parameters. The first one-sided front/rear parameter q3 can also be regarded as the first balance parameter, wherein the first balance parameter is obtained from the pair of channels formed by the left surround channel A and the left channel B. The second one-sided front/left balance parameter is the parameter q^' which can be regarded as the second parameter, and the second parameter is based on the right channel D and the right surround channel E. Right. Furthermore, the two channel pairs are not related to each other. The center/left-right balance parameter! *2 also has the same fact 'The central/left-right balance parameter ^ has a center channel C as a first channel and the sum of the left and right channels B and D as a second Channel. Another parameterization is defined in accordance with the present invention which is inherently suitable for coarse quantization for a system that reconstructs 5.1 channels from one or two channels.

至於-_ β2Α 個聲道至5.1聲道而言 α2Β ~αΓ 2d Μ β1Εnr 及% S2Fnr 以及至於二個聲道至5.1聲道之情況 £A,As for -_ β2Α channel to 5.1 channel α2Β ~αΓ 2d Μ β1Εnr and % S2Fnr and as for the two channels to 5.1 channel £A,

L <hL <h

2B r2c 2d β2Ε2B r2c 2d β2Ε

LL

RR

RR

s2f ~M 明顯可知上述參數化包括比嚴格理論觀點所需要要多 之參數,以正確地再分配該等傳輸信號之能量至該等重建之 信號。然而’該參數化對量化誤差之敏感係非常遲鈍的。 ί S3 28- 201116078 上述針對一 2 -聲道裝設所提及之參數組使用幾個參考 聲道。然而,相較於第6圖中之參數組態,第7圖中之參數 組僅依據下行混音聲道而非原始聲道來做爲參考聲道。該等 平衡參數qi、q3及q4係由完全不同聲道對所獲得。 雖然已描述幾個本發明實施例’其中用以獲得平衡參數 之聲道對僅包括原始聲道(第4圖、第5圖及第6圖)或包括 原始聲道及下行混音聲道(第4圖及第5圖)或僅依據該下行 混音聲道以做爲在第7圖之底部所表示的參考聲道,但是最 φ 好在第2圖之環繞資料編碼器206內所包括之參數產生器係 操作以僅使用原始聲道或原始聲道之組合而非在該等聲道 對中之聲道的一基本聲道或基本聲道之組合,其中該等平衡 參數係根據該等聲道對。此乃是由於無法完全保證該單一基 本聲道或該兩個立體聲基本聲道不在會在從一環繞編碼器 傳輸至一環繞解碼器期間發生能量變化。可藉由一音頻編碼 器20 5 (第2圖)或一音頻解碼器3〇2(第3圖)在一低-位元率 狀態下操作以造成該下行混音聲道或該單一下行混音聲道 φ 之能量變化。此情況會導致該單下行混音聲道或該等立體下 行混音聲道之能量的操控,該操控在該左與右立體聲下行混 音聲道間可以是不同的或甚至可以是頻率選擇性的或時間 選擇性的。 爲了完全安全地反對此能量變化,依據本發明針對每一 下行混音聲道之每一區域及頻帶傳送一額外電平參數。所以 當該等平衡參數係根據該原始信號而非該下行混音信號 時,因爲任何能量校正將不影響該等原始聲道間之平衡情 ί S3 -29- 201116078 況,所以一單一校正因數對每一頻帶係足夠的。甚至當沒有 傳送額外電平參數時,任何下行混音聲道能量變化將不會在 該音頻圖像中導致音源之失真局部化,然而將只會導致一般 音量變化,該一般音量變化不會像藉由改變平衡狀態所造成 之音源的遷移一樣惱人。 重要的是要注意需要小心,以便(該等下行混音聲道之) 能量Μ係上面所槪述之能量B、D、A、E、C及F之加總。 由於在被下行混音至一個聲道之不同聲道間的相位相依 φ 性,所以不會經常是這種情況。可傳送該能量校正因數以做 爲一額外參數rM,以及因此將在該解碼器側上所接收之下行 混音信號界定成爲: = \ia\B + ΰ) + β\Α + Ε) + 2y2C + 2S2F) 在第9圖中,槪述依據本發明之額外參數!^的應用。 在將該下行混音信號傳送至該上行混音模組70 1 -705前在 901中藉由該額外參數rM修改該下行混音信號。這些係相同 φ 於第7圖所述者及在此將不做進一步詳述。熟習該項技藝者 明顯可知上面單聲道下行混音範例之參數ΓΜ可擴充至每一 下行混音一個參數及因此並非局限於一單一下行混音聲道》 第9a圖描述一發明電平參數計算器900,然而第9b圖 表示一發明電平校正器902。第9a圖表示在該編碼器側上之 情況’以及第9b圖描述在該解碼器側上之對應情況。該電 平參數或「額外」參數rM係一用以提供某一能量比之校正 因數。假設下面示範性情節來做解釋。針對某一原始多聲道 I S] -30- 201116078 信號,一方面具有一「主下行混音」及另一方面具有一「參 數下行混音」。已依據例如主觀品質印象由在一播音室中之 音效工程師產生該主下行混音。此外,某一音頻儲存媒體亦 包括該參數下行混音’該參數下行混音已藉由例如第2圖之 環繞編碼器203來實施。該參數下行混音包括一基本聲道或 兩個基本聲道,上述基本聲道使用該原始多聲道信號之平衡 參數組或任何其它參數表示來形成該多聲道重建之基礎。 例如可以是下面情況:廣播員希望不要傳送該參數下行 φ 混音,然而希望將該主下行混音從一發送器傳送至接收器。 此外,爲了將該主下行混音提升至多聲道表示,該廣播員亦 傳送該原始多聲道信號之一參數表示。因爲(在一頻帶中及 在一區塊中之)能量可(或通常將)在該主下行混音與該參數 下行混音間做變化,所以在區塊900中產生一相對電平參數 γμ及將其傳送至該接收器以做爲一額外參數。該電平參數係 從該主下行混音及該參數下行混音所獲得及最好是在該主 下行混音及該參數下行混音之一區塊及一頻帶內之能量的 φ 比率。 通常,計算該電平參數以成爲該等原始聲道之能量 (Edg)的加總與該(等)下行混音聲道之能量間的比率,其中 此(等)下行混音聲道可以是該參數下行混音(EPD)或該主下 行混音(EMD)或任何其它下行混音信號。通常’使用從一編 碼器傳送至一解碼器之特定下行混音信號的能量。 第9b圖描述該電平參數使用之一解碼器側實施。將該 電平參數及該下行混音信號輸入至該電平校正器區塊90 2。It is apparent from s2f ~M that the above parameterization includes more parameters than is required by the strict theoretical point of view to correctly redistribute the energy of the transmitted signals to the reconstructed signals. However, the sensitivity of this parameterization to quantization errors is very slow. ί S3 28- 201116078 The above mentioned reference sets for a 2-channel installation use several reference channels. However, compared to the parameter configuration in Figure 6, the parameter set in Figure 7 is only used as the reference channel based on the downstream mix channel instead of the original channel. The equalization parameters qi, q3 and q4 are obtained from completely different pairs of channels. Although several embodiments of the present invention have been described, the channel pair in which the balance parameter is obtained includes only the original channel (Figs. 4, 5, and 6) or includes the original channel and the downmix channel ( 4 and 5) or only the downlink mixing channel as the reference channel represented at the bottom of FIG. 7, but the most φ is included in the surround data encoder 206 of FIG. The parameter generator is operative to use only a combination of the original channel or the original channel rather than a combination of a base channel or a base channel of the channel in the pair of channels, wherein the balance parameters are based on Equal channel pair. This is due to the inability to fully guarantee that the single fundamental channel or the two stereo base channels will not undergo an energy change during transmission from a surround encoder to a surround decoder. It can be operated in a low-bit rate state by an audio encoder 20 5 (Fig. 2) or an audio decoder 3〇2 (Fig. 3) to cause the downmix channel or the single downmix The energy of the sound channel φ changes. This condition may result in manipulation of the energy of the single downmix channel or the stereo downmix channels, which may be different or even frequency selective between the left and right stereo downmix channels Or time selective. In order to completely safely oppose this energy change, an additional level parameter is transmitted in accordance with the present invention for each region and frequency band of each of the downstream mixing channels. Therefore, when the balance parameters are based on the original signal instead of the downmix signal, since any energy correction will not affect the balance between the original channels, a single correction factor pair Each band is sufficient. Even when no extra level parameters are transmitted, any downmix channel energy changes will not cause distortion localization of the source in the audio image, but will only result in a general volume change, which will not be like a normal volume change. The migration of sound sources caused by changing the balance state is as annoying. It is important to note that care must be taken so that the energy of the downstream mixing channels is the sum of the energies B, D, A, E, C and F described above. This is not always the case because of the phase dependent φ nature between the different channels that are downmixed to one channel. The energy correction factor can be transmitted as an additional parameter rM, and thus the line mix signal received on the decoder side is defined as: = \ia\B + ΰ) + β\Α + Ε) + 2y2C + 2S2F) In Figure 9, the additional parameters in accordance with the present invention are described! ^ Application. The downlink mix signal is modified by the additional parameter rM in 901 before the downlink mix signal is transmitted to the upstream mix module 70 1 -705. These are the same φ as described in Figure 7 and will not be described in further detail herein. It is obvious to those skilled in the art that the parameters of the above mono downmixing example can be extended to one parameter of each downmix and thus are not limited to a single downmix channel. Figure 9a depicts an inventive level parameter. Calculator 900, however, Figure 9b shows an inventive level corrector 902. Fig. 9a shows the case on the encoder side' and Fig. 9b depicts the correspondence on the decoder side. The level parameter or "extra" parameter rM is used to provide a correction factor for a certain energy ratio. The following exemplary scenarios are assumed to be explained. For a single original multi-channel I S] -30- 201116078 signal, on one hand there is a "main downmix" and on the other hand a "parameter down mix". The primary downmix has been generated by a sound engineer in a studio based on, for example, a subjective quality impression. In addition, an audio storage medium also includes the parameter downlink mix. The parameter downmix has been implemented by, for example, the surround encoder 203 of FIG. The parameter downmix includes a base channel or two base channels that use the balance parameter set of the original multichannel signal or any other parameter representation to form the basis for the multichannel reconstruction. For example, it may be the case that the broadcaster wishes to not transmit the parameter down φ mix, but it is desirable to transfer the main downmix from a transmitter to the receiver. In addition, in order to promote the main downmix to a multi-channel representation, the broadcaster also transmits a parameter representation of the original multi-channel signal. Since (in a frequency band and in a block) energy can (or will typically) vary between the main downmix and the parameter downmix, a relative level parameter γμ is generated in block 900. And pass it to the receiver as an additional parameter. The level parameter is obtained from the main downmix and the downmix of the parameter and preferably the ratio of the φ of the energy in a block of the main downmix and the downmix of the parameter and a frequency band. Typically, the level parameter is calculated to be the ratio of the sum of the energy of the original channels (Edg) to the energy of the (equal) downmix channel, wherein the (equal) downmix channel can be This parameter is Downmix (EPD) or the Main Downmix (EMD) or any other downstream mix signal. Typically, the energy of a particular downstream mix signal transmitted from a codec to a decoder is used. Figure 9b depicts the implementation of this level parameter using one of the decoder sides. The level parameter and the downmix signal are input to the level corrector block 90 2 .

-31- 201116078 該電平校正器依據該電平參數校正該單一基本聲道或該幾 個基本聲道。因爲該額外參數係一相對値,所以此相對 値係藉由該對應基本聲道之能量來操控。 雖然第9a及9 b.圖表示一對該下行混音聲道或該等下行 混音聲道施加電平校正之情況,但是該亦可將該電平參數整 合至該上行混音矩陣中。爲此目的,在第8圖之方程式中的 m之每次出現係由「rMM」來取代。 硏究當從2聲道重建5.1.聲道之情況,可觀察下面描述。 如果使用具有第2圖及第3圖所槪述之編解碼器205及 3 02的本發明,需要一些更多考量。觀察稍早所界定之nD 參數’其中依據下面式子來界定Π:-31- 201116078 The level corrector corrects the single basic channel or the basic channels according to the level parameter. Since the additional parameter is a relative enthalpy, the relative enthalpy is manipulated by the energy of the corresponding elementary channel. Although the 9a and 9b. figures show the case where a level correction is applied to the pair of downmix channels or the downmix channels, the level parameters can also be integrated into the upmix matrix. For this purpose, each occurrence of m in the equation of Fig. 8 is replaced by "rMM". When the 5.1 channel is reconstructed from the 2 channel, the following description can be observed. If the present invention having the codecs 205 and 322 described in Figs. 2 and 3 is used, some more considerations are required. Observe the nD parameter defined earlier] where Π is defined according to the following formula:

_L a2B + fi2A + Y2C + S2F Γ,a2D + p2E + Y2C + 52F 因爲該系統從2聲道重建5.1聲迨,其中假設該兩個傳 輸聲道係該等環繞聲道之立體聲下行混音,所以此參數係暗 示地可在該解碼器側上獲得。 然而’在一位元率限制下操作之音頻編解碼器可以修改 該頻譜分佈,以便在該解碼器側上所測量之L及R能量不同 於在該編碼器側上之數値。依據本發亦可針對從兩個聲道重 建5.1聲道時之情況藉由傳送下列參數以使對該重建聲道之 能量分佈的影響消失:._L a2B + fi2A + Y2C + S2F Γ, a2D + p2E + Y2C + 52F because the system reconstructs 5.1 sonar from 2 channels, assuming that the two transmission channels are stereo downmixes of the surround channels, so This parameter is implicitly available on the decoder side. However, an audio codec operating at a one-bit rate limit can modify the spectral distribution such that the measured L and R energy on the decoder side is different from the number on the encoder side. According to the present invention, the effect of the energy distribution on the reconstructed channel disappears by transmitting the following parameters for the case of reconstructing 5.1 channels from two channels:

B D〇 如果提供發信手段,則該解碼器可使用不同參數組編碼 目即信號區段及選擇用以對所要處理之特定信號區段提供 is] -32- 201116078 最低負擔之IID參數。該右前與後聲道間之能量電平可能係 相似的,以及該前與後左聲道間之能量電平可能係相似的, 然而在該右前與後聲道中之電平係顯著不同的。假設有參數 之差量編碼(delta coding)及隨後熵編碼(entropy coding),則 使用參數q3及q4以取代r3及r4係更有效的。對於另一具有 不同特性之信號區段而言,一不同參數組可以提供一較低位 元率負擔。本發明允許自由地在不同參數表示間做切換,以 便最小化該目前已編碼信號區段之位元率負擔,其中該信號 區段之特性係已知的。切換於該等IID參數之不同參數化間 以便獲得最低可能位元率負擔及提供發信手段以表示目前 使用什麼參數化的能力係本發明之基本特徵。 再者,可在頻率方向或在時間方向完成該等參數之差量 編碼,以及完成不同參數間之差量編碼。依據本發明,假設 提供發信手段以表示所使用之特定差量編碼,則可對一參數 相對於任何其它參數實施差量編碼。 任何編碼架構之一重要特徵係實施可調編碼之能力。此 意味著可將該已編碼位元流分割成幾個不同層。核心層可由 本身來解碼,以及可解碼較高層以增強該已解碼核心層信 號。對於不同情況而言,可獲得層之數目可以是變化的,然 而只要該核心層係可獲得的,該解碼器可產生輸出樣本。使 用該^至Μ參數之上面所槪述的多聲道編碼之參數化本身 相當適合於可調式編碼。因此,可將例如該兩個環繞聲道(A 及E)之資料儲存在一增強層(亦即,該等參數^及r4及在— 核心層中對應於該等前聲道之參數(由參數^及Γ2所表示)) IS3 -33- 201116078 中。 在第10圖中,槪述依據本發明之可調位元流實施。該 等位元流層係以1001及1002來描述,其中1001係該核心 層,其持有該波形編碼下行混音信號及持有用以重建該等前 聲道(102、103及104)之參數η及r2。1002所描述之增強層 持有用以重建該等後聲道(101及105)之參數。 本發明之另一動要觀點係在一多聲道組態中使用解相 關器β在PCT/SE02/01 372專利文件中已針對一個或兩個聲 φ 道情況詳細一解相關器之使用的觀點。然而,當將此理論擴 充至多於兩個聲道時,會產生本發明所要解決之數個問題。 基本數學顯示:爲了從Ν個信號完成Μ個相互解相關 信號,需要Μ-Ν個解相關器,其中所有不同解相關器用以 從一共同輸入信號產生複數個相互正交輸出信號。假設一輸 入x(t)產生一輸出y(t)l£|>fj=4c|2j及幾乎使交互相關ψ/] 消失,則一解相關器通常是一全通或幾乎全通濾波器。另外 的知覺準則可獲得一良好解相關器之設計,設計方法之一些 φ 範例在加入該原始信號至該解相關信號時亦可最小化梳形 濾波器特性及最小化在暫態信號上之一有時太長之脈衝響 應的效應。一些習知技藝解相關器使用一人造反射鏡來解相 關。習知技藝亦可藉由例如修改複雜子頻帶樣本之相位以包 括分數延遲,進而達到較高回聲密度及因而完成更長時間之 擴散。 本發明提出用以修改一以反射鏡爲主之解相關器以便 達到多個可從一共同輸入信號產生複數個相互解相關輸出 m -34- 201116078 信號之解相關器的方法。如果兩個解相關器之輸出yi(t)及 y2(t)具有消失或幾乎消失之交互相關(假設有相同輸入),則 使該兩個解相關器相互地解相關。假設該輸入係靜態白雜 訊,則接著在五[vd消失或幾乎消失之感知中該等脈衝響Μ 及h2必須是正交的。複數組之成對相互解相關解相關器可 以數個方式來建構。實施此修改之一有效方式係改變相位旋 轉因數q(爲該分數延遲之部分)。 本發明特定相位旋轉因數可以是在該等全通濾波器中 之延遲線的部分或剛好是一總分數延遲。在該後者情況中, 此方法並非局限於全通或反射鏡式濾波器,然而亦可應用至 例如包括一分數延遲部之簡單延遲。可在一Z-域中將該解相 關器中之一全通濾波器連結描述成爲: 其中q係複數相位旋轉因數(M = l),m係在樣本中之延 遲線長度,以及a係濾波器係數。其於穩定理由,該濾波器 φ 係數之大小必須限制在|«| <1 »然而,藉由使用替代濾波器係 數a’ = -a,以界定一新反射鏡,其具有相同反射延遲特性, 然而具有一與該未修改反射鏡之輸出顯著不相關之輸出。再 者,該相位旋轉因數q之修改可藉由例如加入一固定相位偏 移cf = qeje來完成。該常數C可用以做爲一固定相位偏移或 可以下列方式來調整:針對所有被施加有該常數C之頻帶而 言,該常數C將對應於一固定時間偏移。該相位偏移常數C 亦可以是一隨機値,其對於所有頻帶而言係不同的。 [S3 -35- •201116078 依據本發明,藉由將一具有n x(m + p)大小之上行混音矩 陣Η應用至一具有(m + p)x 1大小之行向量信號,以實施從m 個聲道產生η個聲道。 m y = s 其中m係m個已下行混音及編碼信號,以及使在s中 之P信號兩者相互地解相關及與在m中之所有信號解相關。 這些解相關信號係藉由解相關器由在m中之信號所產生。然 後,使η個重建信號a’、b'、…包含在該行向量中。 X' = H y。 藉由第11圖來描述上述情況,其中該等解相關信號係 由該等解相關器1 102、1 103及1 104所產生。該上行混音矩 陣Η係由1101所提供,用以對該向量y操作以提供該輸出 信號X'。 假設= 該原始信號向量之相關矩陣,假設 R| = E[xix〃]爲該重建信號之相關矩陣。在此及在下面中,對 於一具有複數項之向量X的矩陣而言,X'表示伴隨矩陣---X 之複數共軛轉置。 R之對角線包含該等能量値A、B、C··.及可由上面所界 定之能量定額解碼成一總能量電平。因爲,所以只有 n(n-l)/2個不同非對角線交互相關値,其包含將藉由調整該 上行混音矩陣Η來完全地或部分地重建之資訊。該完整相關 結構之重建對應於該情況R' = R。正確能量電平之重建僅對 應於下列情況,其中《"'及尺在對角線上係相等的。 m -36- 201116078 在從m=l聲道成爲n聲道之情況中,藉由使用 個相互解相關解相關器(一上行混音矩陣H)達成該完整相關 結構之重建’其中該上行混音矩陣Η滿足下列條件: ΗΗ,=上R Μ 其中Μ係該單傳輸信號之能量。因爲R係正半定矩陣, 所以已熟知現在一個解答。再者,針對Η之設計保留n(n-l)/2 自由度,其係使用於本發明中以獲得該上行混音矩陣之另外 期望特性。一中心設計準則爲Η對該傳輸相關資料之相依性 ®應該是平順的。 參數化該上行混音矩陣之一傳統方式爲H = UDV,其中U 及V係正交矩陣以及D係一對角矩陣。可選擇d之絕對値 的平方等於R/M之特徵値。刪去V及挑選該等特徵値以便 將最大値應用至第一座標將最小化在該輸出中之解相關信 號的總能量。在實數情況中該正交矩陣U係藉由n(n-l)/2 旋轉角度來參數化。傳送在那些角度之形式中的相關資料及 I D之η個對角値將立即提供η之期望平順相依性。然而,因 爲能量資料必須被變換成特徵値,所以此方法犧牲可調能 力。 本發明所教示之第二方法係藉由以R = GR〇G來界定一正 規化相關矩陣R〇以使在R中之能量部與相關部分離,其中 G係一具有等於R之對角項的平方根之對角値(亦即,A、 #…)的對角矩陣,RQ在對角線上具有相同對角値。假設H0 係一正交上行混行矩陣,其在同等能量之完全無關信號的情 況中界定較佳正規化上行混音。此較佳上行混音矩陣之範例 m -37- 201116078 • 1 1 Ί 1 1 Γ "1 -Γ 1 9 一 1 1 -V2 1 » 一 11-1-1 1 1 2 β -V2 0 2 1-1-1 1 1-11-1 然後,以if = 05^。/、/立來界定上行混音,其中該矩陣S解 出SS、R〇。選擇此解答對在RG中之正規化交互相關値的相 依性爲連續的,以便在R〇 = I之情況中S等於單位矩陣。 將該η個聲道分割成較少聲道之群係一種重建部分交互 相關結構之合宜方式。依據本發明,對於從1聲道重建5.1 聲道之情況而言,一特別有利編組爲{a,e},{c},{b,d}, {f},其中沒有解相關應用至該等群{c}及{ f},以及該等群{a, e}及{b,d}係藉由相同下行混音/解相關對之上行混音所產 生。對於這兩個子系統而言,選擇在完全未相關情況中之較 佳正規化上行混音分別成爲: 丄「1 -1"! _]^[1 1' 万 1 j,万b -1.。 因此,將只傳送及重建15個交互相關之總數中的兩個,亦 即,在聲道{a,e}與{b,d}間之交互相關。在上述所使用之術 語中,此對於n = 6、m=l及p=l之情況而言是設計上的一個 範例。該上行混音矩陣Η係6x2之大小且在第3及第6列上 的第2行中之對應於輸出c1及Γ的兩個項爲零。 本發明所教示之用以倂入解相關信號的第三方法係一 較簡單觀點:每一輸出聲道具有一不同解相關器’以造成解 相關信號sa、sb然後使該等重建信號成爲: -38- 201116078 α = 4αΪΜ (mcos% + sfl sin%), b = 4bTm (wcos% + sin 外), 等等。 該等參數cpa、<Pb…控制在輸出聲道a'、b’…中所呈現之 解相關信號的數量。該相關資料係以這些角度之形式來傳 送。可易於計算:在例如聲道a'與N間之結果正規化交互相 關係等於乘積COSCpaCOS(pb。當成對交互相關之數目爲 n(n-1 )/2及具有η個解相關器時,如果n>3,則通常不可能 • 以此方法來匹配—特定相關結構,然而優點是一非常簡單且 穩定解碼方法及對在每一輸出聲道中所呈現之解相關信號 的所產生數量之直接控制。此能使解相關信號之混合係根據 倂入有例如聲道對之能量電平差的感知準貝IJ ^ 對於從m>I聲道重建n聲道之情況而言,不再將相關矩 陣Ry = E[yy ]假設爲對角矩陣,以及必須考慮到R, = HRyH·對 該目標R之匹配。因爲Ry具有分塊矩陣結構 「圪 〇1 Rv = mB D〇 If a means of signaling is provided, the decoder can use different parameter sets to encode the target signal segment and select the IID parameter to provide the minimum burden of is] -32- 201116078 for the particular signal segment to be processed. The energy levels between the right front and rear channels may be similar, and the energy levels between the front and rear left channels may be similar, however the levels in the right front and back channels are significantly different. . Assuming that there is parameter delta coding and subsequent entropy coding, it is more efficient to use parameters q3 and q4 instead of r3 and r4. For another signal segment with different characteristics, a different parameter set can provide a lower bit rate burden. The present invention allows for free switching between different parameter representations in order to minimize the bit rate burden of the currently encoded signal segment, wherein the characteristics of the signal segment are known. The ability to switch between different parameterizations of the IID parameters in order to obtain the lowest possible bit rate burden and to provide means of signaling to indicate what parameterization is currently used is an essential feature of the present invention. Furthermore, the difference encoding of the parameters can be done in the frequency direction or in the time direction, and the difference encoding between the different parameters can be completed. In accordance with the present invention, assuming that a means of signaling is provided to indicate the particular delta encoding used, a parameter can be differentially encoded with respect to any other parameter. An important feature of any coding architecture is the ability to implement tunable coding. This means that the encoded bit stream can be split into several different layers. The core layer can be decoded by itself, and the higher layer can be decoded to enhance the decoded core layer signal. The number of available layers may vary for different situations, but as long as the core layer is available, the decoder can produce output samples. The parameterization of the multi-channel coding described above using the ^ to Μ parameter is quite suitable for the tunable coding. Therefore, for example, the data of the two surround channels (A and E) can be stored in an enhancement layer (that is, the parameters corresponding to the front channels in the parameters ^ and r4 and in the core layer (by The parameters ^ and Γ2 are indicated)) IS3 -33- 201116078. In Fig. 10, the implementation of the adjustable bit stream in accordance with the present invention is described. The bit stream layer is described by 1001 and 1002, wherein 1001 is the core layer, which holds the waveform encoded downmix signal and holds the front channel (102, 103 and 104) for reconstruction. The enhancement layers described by parameters η and r2. 1002 hold parameters for reconstructing the back channels (101 and 105). Another gist of the present invention is the use of a decorrelator in a multi-channel configuration. In the PCT/SE02/01 372 patent document, the use of a decorrelator has been detailed for one or two acoustic φ channels. . However, when this theory is expanded to more than two channels, several problems to be solved by the present invention arise. Basic mathematics shows that in order to complete a mutual de-correlation signal from one signal, a 解-Ν decorrelator is needed, in which all the different decorrelators are used to generate a plurality of mutually orthogonal output signals from a common input signal. Suppose an input x(t) produces an output y(t)l£|>fj=4c|2j and almost eliminates the correlation correlation ]/], then a decorrelator is usually an all-pass or almost all-pass filter . An additional perceptual criterion can be used to design a good decorrelator. Some φ examples of the design method can minimize the comb filter characteristics and minimize one of the transient signals when adding the original signal to the decorrelated signal. Sometimes the effect of an impulse response that is too long. Some conventional art decorators use an artificial mirror to de-correlate. Conventional techniques can also achieve higher echo densities and thus longer diffusions by, for example, modifying the phase of complex sub-band samples to include fractional delays. The present invention proposes a method for modifying a mirror-based decorrelator to achieve a plurality of decorrelators that can generate a plurality of mutually de-correlated outputs m-34-201116078 signals from a common input signal. If the outputs of the two decorrelators yi(t) and y2(t) have an alternating or disappearing correlation (assuming the same input), then the two decorrelators are decorrelated to each other. Assuming that the input is static white noise, then the pulse Μ and h2 must be orthogonal in the perception of five [vd disappearing or almost disappearing. Pairwise de-correlation decorrelators of complex arrays can be constructed in several ways. One effective way to implement this modification is to change the phase rotation factor q (which is part of the fractional delay). The particular phase rotation factor of the present invention may be part of the delay line in the all-pass filter or just a total fractional delay. In this latter case, the method is not limited to all-pass or mirror filters, but can be applied to, for example, a simple delay including a fractional delay. One of the all-pass filter connections in the decorrelator can be described in a Z-domain as: where q is the complex phase rotation factor (M = l), m is the delay line length in the sample, and the a-line filtering Factor. For stability reasons, the size of the filter φ coefficient must be limited to |«| <1 » However, by using the alternative filter coefficient a' = -a to define a new mirror with the same reflection delay characteristics However, there is an output that is significantly uncorrelated with the output of the unmodified mirror. Furthermore, the modification of the phase rotation factor q can be accomplished by, for example, adding a fixed phase offset cf = qeje. This constant C can be used as a fixed phase offset or can be adjusted in such a way that for all bands to which the constant C is applied, the constant C will correspond to a fixed time offset. The phase offset constant C can also be a random chirp, which is different for all frequency bands. [S3 - 35- • 201116078 According to the present invention, by applying an upstream mixing matrix 具有 having an nx(m + p) size to a row vector signal having a size of (m + p) x 1 to implement from m Each channel produces n channels. m y = s where m is the m downmixed and encoded signals, and the P signals in s are de-correlated with each other and de-correlated with all signals in m. These decorrelated signals are generated by the signal in m by the decorrelator. Then, n reconstructed signals a', b', ... are included in the row vector. X' = H y. The above is described by Figure 11, wherein the decorrelated signals are generated by the decorrelators 1 102, 1 103 and 1 104. The upstream mixing matrix is provided by 1101 to operate on the vector y to provide the output signal X'. Assume = the correlation matrix of the original signal vector, assuming that R| = E[xix〃] is the correlation matrix of the reconstructed signal. Here and in the following, for a matrix having a vector X of complex terms, X' denotes a complex conjugate transpose of the adjoint matrix ---X. The diagonal of R contains the energy 値A, B, C··. and can be decoded into a total energy level by the energy quotation defined above. Because, so there are only n(n-l)/2 different non-diagonal cross-correlation 値, which contain information that will be completely or partially reconstructed by adjusting the upstream mixing matrix 。. The reconstruction of the complete correlation structure corresponds to the case R' = R. The reconstruction of the correct energy level corresponds only to the following cases, where "' and the ruler are equal on the diagonal. M -36- 201116078 In the case of changing from m=l channel to n channel, the reconstruction of the complete correlation structure is achieved by using a mutual decorrelation decorrelator (an upstream mixing matrix H) The tone matrix Η satisfies the following conditions: ΗΗ, = upper R Μ where Μ is the energy of the single transmitted signal. Because R is a positive semi-definite matrix, it is well known to be an answer now. Furthermore, n(n-l)/2 degrees of freedom are reserved for the design of Η, which is used in the present invention to obtain additional desirable characteristics of the upstream mixing matrix. A central design criterion is that the dependencies on the transmission-related data should be smooth. One of the traditional ways of parameterizing the upstream mixing matrix is H = UDV, where U and V are orthogonal matrices and D is a pair of angular matrices. The square of the absolute 値 of d can be chosen to be equal to the characteristic R of R/M. Deleting V and selecting the features 値 to apply the maximum 値 to the first coordinate will minimize the total energy of the decorrelated signal in the output. In the real case, the orthogonal matrix U is parameterized by an n(n-l)/2 rotation angle. Transmitting the relevant data in the form of those angles and the η diagonals of I D will immediately provide the desired smoothness of η. However, this method sacrifices the ability to adjust because the energy data must be transformed into a characteristic 値. The second method taught by the present invention defines a normalized correlation matrix R〇 by R = GR 〇 G to separate the energy portion in R from the correlation portion, wherein the G system has a diagonal term equal to R The diagonal matrix of the square roots (ie, A, #...), the RQ has the same diagonal 对 on the diagonal. It is assumed that H0 is an orthogonal up-mixing matrix that defines a better normalized upstream mix in the case of completely unrelated signals of equal energy. An example of this preferred upstream mixing matrix m -37- 201116078 • 1 1 Ί 1 1 Γ "1 -Γ 1 9 A 1 1 -V2 1 » A 11-1-1 1 1 2 β -V2 0 2 1 -1-1 1 1-11-1 Then, with if = 05^. /, / to define the upstream mix, where the matrix S solves SS, R 〇. The choice of this solution is continuous for the normalized interaction correlation 在 in the RG, so that S is equal to the identity matrix in the case of R 〇 = I. Segmenting the n channels into groups of fewer channels is a convenient way to reconstruct a portion of the cross-correlation structure. According to the present invention, for the case of reconstructing 5.1 channels from 1 channel, a particularly advantageous grouping is {a, e}, {c}, {b, d}, {f}, where no decorrelation is applied to the The equal groups {c} and {f}, and the groups {a, e} and {b, d} are generated by the same downmixing/de-correlated pair of upstream mixes. For both subsystems, the preferred normalized upstream mix in the completely unrelated case is: 丄 "1 -1"! _]^[1 1' million 1 j, 10,000 b -1. Therefore, only two of the 15 cross-correlation totals will be transmitted and reconstructed, that is, the interaction between the channels {a, e} and {b, d}. In the terms used above, this An example of design is the case for n = 6, m = 1, and p = 1. The upstream mix matrix is 6x2 in size and corresponds to the second row on the 3rd and 6th columns. The two terms of the outputs c1 and Γ are zero. The third method for invoking the decorrelated signal as taught by the present invention is a simpler view: each output channel has a different decorrelator to cause a decorrelated signal Sa, sb then make these reconstruction signals: -38- 201116078 α = 4αΪΜ (mcos% + sfl sin%), b = 4bTm (wcos% + sin outside), etc. These parameters cpa, <Pb... The number of decorrelated signals presented in the output channels a', b', ... is controlled. The relevant data is transmitted in the form of these angles. It can be easily calculated: for example The normalized interaction relationship between the paths a' and N is equal to the product COSCpaCOS (pb. When the number of pairwise interactions is n(n-1)/2 and there are n decorrelators, if n>3, then usually Impossible • Matching in this way—a specific correlation structure, but the advantage is a very simple and stable decoding method and direct control over the number of generated de-correlated signals presented in each output channel. The correlation of the correlation signals is based on the perceptual quasi-IJ of the energy level difference such as the channel pair. For the case of reconstructing the n channel from the m>I channel, the correlation matrix Ry = E[yy is no longer used. ] assumed to be a diagonal matrix, and must take into account R, = HRyH· matches the target R. Because Ry has a block matrix structure "圪〇1 Rv = m

〇 K L· ·> Δ 所以產生簡化’其中Rm = E[mm*;^ Rs = E[ss·]。再者, 假設爲相互解相關解相關器’該矩陣Rs爲對角矩陣。注意 到此亦會影響有關於正確能量之重建的上行混音設計。解決 方法係要在該解碼器中計算或從編碼器傳送有關於該等下 行混音信號之相關結構Rm的資訊β 對於從2聲道重建5.1聲道之情況,上行混音之較佳方 法爲·· -39- I $} 201116078 a 'κ 0 0 b' κ 0 Κ 0 c κ Κ 0 0 ά 0 Κι 0 κ e 0 Κ 0 Κ /. Λ Κι 0 0 m2〇 K L· ·> Δ thus produces a simplification' where Rm = E[mm*;^ Rs = E[ss·]. Furthermore, it is assumed that the decorrelator is mutually de-correlated. The matrix Rs is a diagonal matrix. Note that this will also affect the upstream mix design with respect to the reconstruction of the correct energy. The solution is to calculate or transmit from the encoder the information about the correlation structure Rm of the downlink mixing signals. In the case of reconstructing the 5.1 channel from the 2-channel, the preferred method of the upstream mixing is ·· -39- I $} 201116078 a 'κ 0 0 b' κ 0 Κ 0 c κ Κ 0 0 ά 0 Κι 0 κ e 0 Κ 0 Κ /. Λ Κι 0 0 m2

Sl s2 其中Si可從mi = ld之解相關來獲得及s2可從m2 = rd之解 相關來獲得。 在此,將該等群{a,b}及{d,e}視爲已考量成對交互相關 之分離1->2聲道系統。對於聲道c及f而言,調整加權,以 便 五丨+Vw2|2j=C, 本發明可針對各種用於類比或數位信號之儲存或傳輸 的使用任意編解碼器之系統實施在硬體晶片及DSP中。第2 圖及第3圖顯示本發明之可能實施。在此範例中,顯示一用 以操作6個輸入信號之系統(一5.1聲道組態)。在顯示該編' φ 碼器側之第2圖中,將該等分離聲道之類比輸入信號轉換成 爲數位信號20 1及使用每一聲道之濾波器組來分析202。將 該濾波器組之輸出饋入該環繞編碼器203,該環繞編碼器203 包括一參數產生器’其實施一下行混音以產生由該音頻編碼 器2 05所編碼之一個或二個聲道。再者,依據本發明擷取像 IID及ICC參數之環繞參數,以及依據本發明擷取用以槪述 資料之時間頻率格(time frequency grid)及哪一個參數化被 使用的控制資料204。如本發明所教示,編碼該等擷取參數 -40- 201116078 2 06,以切換於不同參數化之間或以可調方式配置該等參 數。將該等環繞參數2 07、控制信號及編號下行混音信號208 多工處理209成爲一串列位元流。Sl s2 where Si can be obtained from the decorrelation of mi = ld and s2 can be obtained from the solution of m2 = rd. Here, the groups {a, b} and {d, e} are regarded as separated 1-> 2-channel systems that have been considered to be pairwise interactively related. For channels c and f, the weighting is adjusted so that 丨+Vw2|2j=C, the present invention can be implemented on a hardware chip for various systems using arbitrary codecs for storage or transmission of analog or digital signals. And DSP. Figures 2 and 3 show possible implementations of the invention. In this example, a system (a 5.1 channel configuration) for operating six input signals is shown. In the second diagram showing the side of the code φ coder, the analog input signals such as the separated channels are converted into a digital signal 20 1 and the filter bank using each channel is analyzed 202. The output of the filter bank is fed to the surround encoder 203, which includes a parameter generator 'which performs a line mix to produce one or two channels encoded by the audio encoder 205 . Furthermore, in accordance with the present invention, surround parameters such as IID and ICC parameters are captured, and a time frequency grid for deduplicating the data and which parameterized control data 204 are used in accordance with the present invention are retrieved. As taught by the present invention, the parameters -40-201116078 2 06 are encoded to switch between different parameterizations or to configure the parameters in an adjustable manner. The surround parameter 2 07, the control signal, and the numbered downmix signal 208 multiplex processing 209 are a series of bit streams.

在第3圖中,顯示一典型解碼器實施(亦即,一用以產 生多聲道重建之裝置)。在此,假設該音頻解碼器以一頻域 表示法輸出一信號,例如:在QMF合成濾波器組前之MPEG-4 高效率AAC解碼器的輸出。對該串列位元流實施解多工處 理301及將該編碼環繞資料饋入該環繞資料解碼器303及將 _ 該等下行混音編碼聲道饋入該音頻解碼器302(在此範例中 爲MPEG-4高效率AAC解碼器)。該環繞資料解碼器解碼該 環繞資料及將其饋入該環繞解碼器305,該環繞解碼器305 包括一上行混音器,其依據該解碼下行混音聲道及該環繞資 料與該等控制信號以重建6個聲道。合成306該環繞解碼器 之頻域輸出以成爲時域信號,接著將該等時域信號藉由DAC 3 07轉換成爲類比信號。 雖然主要已描述有關於平衡參數之產生及使用的本發 φ 明,但是在此要強調用以獲得平衡參數之聲道對的相同編組 最好亦是用以計算聲道間同調參數或這兩個聲道對間之「寬 度」參數。此外,使用相同於該平衡參數計算所用之聲道對 亦可獲得聲道間時間差或一種「相位信號」。在接收器側上, 亦可使用除該等平衡參數之外或做爲該等平衡參數之替代 的這些參數,以產生一多聲道重建。在另一情況中,除其它 參考聲道所決定之其它聲道間電平差之外,還可使用該等聲 道間同調參數或甚至該等聲道間時間差。然而,有鑑於如第 I S3 -41- 201116078 10a圖及第10b圖所述之本發明的可調能力特徵,最好對所 有參數使用相同聲道對,以便在一可調位元流中每一調整層 包括用以重建該子群之輸出聲道的所有參數,其中該子群之 輸出聲道可藉由在第l〇b圖之列表的倒數第二行中所槪述之 個別調整層來產生。本發明在只計算在個別聲道對間之同調 參數或時間差參數及將其傳送至一解碼器時係有用的。在此 情況中,當實施一多聲道重建時,該等電平參數已存在於該 解碼器以供使用。 g 可依據本發明方法之某些實施需求,以硬體或軟體方式 實施本發明方法。可使用一數位儲存媒體(特別是儲存有電 子可讀取控制信號之磁碟或光碟)來實施,該等電子可讀取 控制信號與一可程式電腦系統配合,以便實施本發明方法。 因此,本發明通常係一具有儲存在一機械可讀取載體中之程 式碼的電腦程式產品,當該電腦程式產品在一電腦上執行 時,該程式碼係操作用以實施本發明方法。因此,換句話說, 本發明方法係一具有程式碼之電腦程式,該程式碼用以在該 φ 電腦程式在一電腦上執行時實施本發明方法中之至少一方 法。 【圖式簡單說明】 第1圖描述在本發明中之一5.1聲道組態所使用的學術 用語; 第2圖描述本發明之一較佳實施例的一合適編碼器實 施; 第3圖描述本發明之一較佳實施例的一合適解碼器實 I S3 -42- 201116078 施; 第4圖描述依據本發明之多聲道信號的一較佳參數化; 第5圖描述依據本發明之多聲道信號的一較佳參數化; 第6圖描述依據本發明之多聲道信號的一較佳參數化; 第7圖描述一用以產生一單一基本聲道或兩個基本聲道 之下行混音架構的示意裝設; 第8圖描述一上行混音架構之示意表示,該上行混音架 構係依據本發明平衡參數及該下行混音架構之資訊; φ 第9a圖剛要性地描述依據本發明在該編碼器側上之電 平參數的決定; 第9b圖剛要性地描述依據本發明在該解碼器側上之電 平參數的使用; 第10a圖描述一在位元流之不同層中具有該多聲道參數 化之不同部分的可調式位元流; 第10b圖描述一可調能力表,其表示使用哪些平衡參數 來建構哪些聲道及不使用及計算哪些平衡參數及聲道;以及 Φ 第11圖描述依據本發明之上行混音矩陣的應用。 【主要元件符號說明】 101 左 環 繞 聲 道 102 左 刖 聲 道 103 中 央 聲 道 104 右 前 jtsn* 聲 道 105 右 環 遊 7*?/y 聲 道 106 LEF *ru 聲 道 i S) -43- 201116078In Fig. 3, a typical decoder implementation (i.e., a device for generating multi-channel reconstruction) is shown. Here, it is assumed that the audio decoder outputs a signal in a frequency domain representation, for example, the output of an MPEG-4 high efficiency AAC decoder in front of the QMF synthesis filter bank. Performing a demultiplexing process 301 on the serial bit stream and feeding the encoded surround data into the surround data decoder 303 and feeding the downlink mixed code channels to the audio decoder 302 (in this example For MPEG-4 high efficiency AAC decoder). The surround data decoder decodes the surround data and feeds it into the surround decoder 305. The surround decoder 305 includes an upstream mixer, according to the decoded downlink mix channel and the surround data and the control signals. To reconstruct 6 channels. The frequency domain output of the surround decoder is synthesized 306 to become a time domain signal, which is then converted to an analog signal by the DAC 307. Although the present invention has been described primarily with respect to the generation and use of balance parameters, it is emphasized here that the same grouping of channel pairs for obtaining balanced parameters is preferably used to calculate inter-channel coherence parameters or both. The "width" parameter between the channel pairs. In addition, a channel time difference or a "phase signal" can also be obtained using the same channel pair used for the calculation of the balance parameter. On the receiver side, these parameters can be used in addition to or as an alternative to the equalization parameters to produce a multi-channel reconstruction. In another case, the inter-channel co-modulation parameters or even the inter-channel time differences may be used in addition to other inter-channel level differences as determined by other reference channels. However, in view of the adjustable capability feature of the present invention as described in Figures 1 S3 - 41 - 2011 16078 10a and 10b, it is preferred to use the same channel pair for all parameters in order to be in an adjustable bit stream. An adjustment layer includes all parameters for reconstructing an output channel of the subgroup, wherein the output channels of the subgroup are identifiable by the individual adjustment layers recited in the penultimate row of the list of the lth diagram To produce. The present invention is useful in calculating only the coherence parameters or time difference parameters between individual channel pairs and transmitting them to a decoder. In this case, when a multi-channel reconstruction is implemented, the level parameters are already present at the decoder for use. g The process of the invention may be carried out in a hard or soft manner in accordance with certain implementation requirements of the process of the invention. It can be implemented using a digital storage medium (particularly a disk or optical disk storing electronically readable control signals) that cooperate with a programmable computer system to carry out the method of the present invention. Accordingly, the present invention is generally a computer program product having a program code stored in a mechanically readable carrier, the program being operative to carry out the method of the present invention when the computer program product is executed on a computer. Thus, in other words, the method of the present invention is a computer program having a program code for performing at least one of the methods of the present invention when the φ computer program is executed on a computer. BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 depicts an academic term used in a 5.1 channel configuration in the present invention; Fig. 2 depicts a suitable encoder implementation in accordance with a preferred embodiment of the present invention; A suitable decoder of a preferred embodiment of the present invention is implemented in an S S -42 - 201116078; FIG. 4 depicts a preferred parameterization of a multi-channel signal in accordance with the present invention; FIG. 5 depicts a plurality of preferred embodiments in accordance with the present invention; A preferred parameterization of the channel signal; Figure 6 depicts a preferred parameterization of the multi-channel signal in accordance with the present invention; Figure 7 depicts a method for generating a single basic channel or two basic channels below A schematic representation of a mixing architecture; Figure 8 depicts a schematic representation of an upstream mixing architecture that balances parameters and information of the downstream mixing architecture in accordance with the present invention; φ Figure 9a is described Determination of level parameters on the encoder side in accordance with the present invention; Figure 9b is a schematic depiction of the use of level parameters on the decoder side in accordance with the present invention; Figure 10a depicts a bit stream flow The multi-channel parameterization in different layers The same part of the adjustable bit stream; Figure 10b depicts an adjustable capability table that indicates which balance parameters are used to construct which channels and which balance parameters and channels are not used and calculated; and Φ Figure 11 depicts the basis The application of the inventive upstream mixing matrix. [Main component symbol description] 101 Left loop around channel 102 Left 声 Channel 103 Central channel 104 Right front jtsn* Channel 105 Right ring Tour 7*?/y Channel 106 LEF *ru Channel i S) -43- 201116078

20 1 ADC 202 分析 203 環繞 204 控制 205 音頻 206 環繞 207 環繞 208 編號 209 多工 30 1 解多 302 音頻 303 環繞 304 控制 305 環繞 306 合成 307 D AC 700 下行 900 電平 902 電平 1001 位元 1002 位元 110 1 解相 1 103 解相 1104 解相 濾波器組 編碼器 信號 編碼器 資料編碼器 參數 下行混音信號 器 工器 解碼器 資料解碼器 信號 解碼器 濾波器組 混音器 參數計算器 校正器 流層 流層 關器 關器 關器 [s] -44- 20111607820 1 ADC 202 Analysis 203 Surround 204 Control 205 Audio 206 Surround 207 Surround 208 No. 209 Multiplex 30 1 Solution 302 Audio 303 Surround 304 Control 305 Surround 306 Synthetic 307 D AC 700 Downstream 900 Level 902 Level 1001 Bit 1002 Bit Element 110 1 Dephase 1 103 Dephase 1104 Deconstruction Filter Bank Encoder Signal Encoder Data Encoder Parameter Downmix Mixer Worker Decoder Data Decoder Signal Decoder Filter Group Mixer Parameter Calculator Corrector Flow laminar flow gate closing device [s] -44- 201116078

ld,rd 最 佳 體 聲 道 m, 1,n 單 聲 道 r i 參 數 Γ2 參 數 Γ3 參 數 Γ4 參 數 Γ5 參 數 ΓΜ 額 外 參 數 a 加 權 因 數 β 加 權 因 數 y 加 權 因 數 δ 加 權 因 數 A 聲 道 B 聲 道 C 聲 道 D 聲 道 E 聲 道 E M D 主 下 行 混 音 E 〇 r i g 原 始 聲 道 之 能 量 E p D 參 數 下 行 混 音 F 重 建 聲 道 之 能 量 I S3 -45-Ld,rd best body channel m, 1,n mono ri parameterΓ2 parameterΓ3 parameterΓ4 parameterΓ5 parameterΓΜ extra parameter a weighting factor β weighting factor y weighting factor δ weighting factor A channel B channel C channel D channel E channel EMD main downstream mix E 〇rig original channel energy E p D parameter downmixing F reconstructing channel energy I S3 -45-

Claims (1)

201116078 七、申請專利範圍: 1. 一種用以在一多聲道信號之一參數表示內產生—電平參 數之裝置,該多聲道信號具有數個原始聲道’該參數表示 包括一參數組,該參數組在與至少一下行混音聲道一起使 用時允許一多聲道重建,該裝置包括: —電平參數計算器(900),用以計算一電平參數(rM) ’該 電平參數係在一主下行混音與一參數下行混音間之電平 差,其中該參數表示係根據該參數下行混音;以及 ^ 一輸出介面,用以產生輸出資料,該輸出資料包括該電 平參數及該參數組或該電平參數及該至少一下行混音聲 道。 2. 如申請專利範圍第丨項之裝置,其中該參數表示對該至少 一下行混音聲道之多個頻帶中的每一頻帶而言包括一參 數組,以及 其中該參數計算器(9 00)係操作以針對該等頻帶之每一 頻帶計算一電平參數。 φ 3,.如申請專利範圍第1項之裝置,其中該參數表示包括針對 在該至少一下行混音聲道之一連續期間中之一期間的一 參數組’以及其中該電平參數計算器(9 00)係操作以針對在 該至少一下行混音聲道之—連續期間中的每一期間,計算 一電平參數。 4·如申請專利範圍第1項之裝置,其中該輸出介面係操作以 調資料流,該可調資料流在—較低調整層中包括 該參數組之第〜子群的參數,該第—子群之參數允許第一 I S3 -46- 201116078 子群之輸出聲道的重建, 該可調資料流在一較高調整層中包括該參數組之第二 子群的參數,該第二子群與該第一子群一起允許第二子群 之輸出聲道的重建,以及 其中該輸出介面進一步操作使該電平參數進入至該較 低調整層。 5.如申請專利範圍第1項之裝置,其中進一步包括一參數產 生器,該參數產生器係形成用以產生一左/右平衡參數以做 爲一第一平衡參數、一中央平衡參數做爲一第二平衡參 數、一前/後平衡參數做爲一第三平衡參數、一後-左/右平 衡參數做爲一第四平衡參數及一低頻增強平衡參數做爲 一第五平衡參數。 6·—種使用一參數表示以產生一原始多聲道信號之一重建 多聲道表示之裝置,該原始多聲道信號具有至少三個原始 聲道,該參數表示具有一參數組,該參數組在與至少一下 行混音聲道一起使用時允許一多聲道重建,該參數表示包 括一電平參數’該電平參數係在一主下行混音與一參數下 行混音間之電平差,其中該參數表示係根據該參數下行混 音,該裝置包括: —電平校正器(902),用以使用該電平參數以應用該至少 一下行混音聲道之電平校正,使用該電平參數來加權該至 少一下行混音聲道,以便可藉由使用在該參數組中之參數 實施上行混音以獲得一校正多聲道重建。 7.—種在一多聲道信號之一參數表示內產生一電平參數的 I S3 -47- 201116078 方法’該多聲道信號具有數個原始聲道,該參數表示包括 一參數組’該參數組在與至少一下行混音聲道一起使用時 允許一多聲道重建,該方法包括: 計算(9 0 0)—電平參數(Γμ),該電平參數係在一主下行混 音與一參數下行混音間之電平差,其中該參數表示係根據 該參數下行混音;以及 產生輸出資料’該輸出資料包括該電平參數及該參數組 或該電平參數及該至少一下行混音聲道。 φ 8. 一種使用—參數表示以產生一原始多聲道信號之一重建多 '聲道表示的方法’該原始多聲道信號具有至少三個原始聲 道,該參數表示具有一參數組,該參數組在與至少一下行 混音聲道一起使用時允許一多聲道重建,該參數表示包括 一電平參數’該電平參數係在一主下行混音與一參數下行 混音間之電平差,其中該參數表示係根據該參數下行混 音,該方法包括: 使用該電平參數以實施(902)該至少一下行混音聲道之 φ 電平校正,使用該電平參數來加權該至少一下行混音聲 道,以便可藉由使用在該參數組中之參數實施上行混音以 獲得一校正多聲道重建。 9. 一種具有機械可讀取指令之電腦程式的記錄媒體,可在一 電腦上執行實施如申請專利範圍第7項或第8項所述之方 法。 -48-201116078 VII. Patent application scope: 1. A device for generating a level parameter in a parameter representation of a multi-channel signal, the multi-channel signal having a plurality of original channels 'this parameter representation includes a parameter group The parameter set allows for a multi-channel reconstruction when used with at least the next line of mixing channels, the apparatus comprising: - a level parameter calculator (900) for calculating a level parameter (rM) 'this power The flat parameter is a level difference between a main downmix and a parametric downmix, wherein the parameter indicates a downmix according to the parameter; and an output interface for generating output data, the output data including the a level parameter and the parameter group or the level parameter and the at least one lower line mixing channel. 2. The device of claim 2, wherein the parameter comprises a parameter group for each of a plurality of frequency bands of the at least one of the next mixing channels, and wherein the parameter calculator (9 00) The system operates to calculate a level parameter for each of the bands. Φ 3,. The device of claim 1, wherein the parameter representation comprises a parameter set for one of a continuous period of one of the at least one of the next mixing channels and wherein the level parameter calculator (9 00) is operative to calculate a level parameter for each of the successive periods of the at least one lower mixing channel. 4. The apparatus of claim 1, wherein the output interface is operative to adjust a data stream, the adjustable data stream including a parameter of the first to the subgroup of the parameter group in the lower adjustment layer, the first The subgroup parameter allows reconstruction of the output channel of the first I S3 -46 - 201116078 subgroup, the adjustable data stream including parameters of the second subgroup of the parameter set in a higher adjustment layer, the second sub The group, along with the first subgroup, allows reconstruction of the output channels of the second subgroup, and wherein the output interface is further operative to cause the level parameter to enter the lower adjustment layer. 5. The apparatus of claim 1, further comprising a parameter generator configured to generate a left/right balance parameter as a first balance parameter and a central balance parameter as A second balance parameter, a front/back balance parameter is used as a third balance parameter, a back-left/right balance parameter is used as a fourth balance parameter and a low frequency enhancement balance parameter is used as a fifth balance parameter. 6. A device for reconstructing a multi-channel representation using one of the parameter representations to produce one of the original multi-channel signals, the original multi-channel signal having at least three original channels, the parameter representation having a parameter set, the parameter The group allows for a multi-channel reconstruction when used with at least the next mixing channel, the parameter representation including a level parameter 'the level parameter is the level between a primary downmix and a parametric downmix. Poor, wherein the parameter indicates that the downmix is based on the parameter, the apparatus includes: - a level corrector (902) for using the level parameter to apply level correction of the at least one downmix channel, using The level parameter weights the at least one lower mixing channel so that an upmixing can be performed by using parameters in the parameter set to obtain a corrected multi-channel reconstruction. 7. An I S3 -47 - 201116078 method for generating a level parameter in a parameter representation of a multi-channel signal. Method 'The multi-channel signal has a plurality of original channels, the parameter representation comprising a parameter group' The parameter set allows for a multi-channel reconstruction when used with at least the next line of mixing channels, the method comprising: calculating (9 0 0) - level parameter (Γμ), the level parameter is in a main downstream mix a level difference between a parameter and a downmix, wherein the parameter indicates that the line is mixed according to the parameter; and generating an output data, wherein the output data includes the level parameter and the parameter group or the level parameter and the at least one Line mixing channels. φ 8. A method of reconstructing a multi-channel representation using one of the parameter representations to generate one of the original multi-channel signals, the original multi-channel signal having at least three original channels, the parameter representation having a parameter set, The parameter set allows for a multi-channel reconstruction when used with at least the next line of mixing channels. The parameter representation includes a level parameter 'the level parameter is between a main downstream mix and a parameter downstream mix. Adjustment, wherein the parameter representation is downmixing according to the parameter, the method comprising: using the level parameter to implement (902) φ level correction of the at least one lower mixing channel, using the level parameter to weight The at least one line of mixing channels is such that an upmixing can be performed by using parameters in the parameter set to obtain a corrected multi-channel reconstruction. A recording medium having a computer program with mechanically readable instructions, which can be implemented on a computer as described in claim 7 or 8. -48-
TW099130574A 2005-04-12 2005-08-09 Apparatus and method for generating a level parameter, apparatus and method for generating a multi-channel representation and a storage media stored parameter representation TWI458365B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2005/003848 WO2005101370A1 (en) 2004-04-16 2005-04-12 Apparatus and method for generating a level parameter and apparatus and method for generating a multi-channel representation

Publications (2)

Publication Number Publication Date
TW201116078A true TW201116078A (en) 2011-05-01
TWI458365B TWI458365B (en) 2014-10-21

Family

ID=44936611

Family Applications (1)

Application Number Title Priority Date Filing Date
TW099130574A TWI458365B (en) 2005-04-12 2005-08-09 Apparatus and method for generating a level parameter, apparatus and method for generating a multi-channel representation and a storage media stored parameter representation

Country Status (1)

Country Link
TW (1) TWI458365B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI677248B (en) * 2011-05-09 2019-11-11 美商Dts股份有限公司 Room characterization and correction for multi-channel audio

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW569551B (en) * 2001-09-25 2004-01-01 Roger Wallace Dressler Method and apparatus for multichannel logic matrix decoding
KR20040063155A (en) * 2001-11-23 2004-07-12 코닌클리케 필립스 일렉트로닉스 엔.브이. Perceptual noise substitution

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI677248B (en) * 2011-05-09 2019-11-11 美商Dts股份有限公司 Room characterization and correction for multi-channel audio
TWI700937B (en) * 2011-05-09 2020-08-01 美商Dts股份有限公司 Room characterization and correction for multi-channel audio

Also Published As

Publication number Publication date
TWI458365B (en) 2014-10-21

Similar Documents

Publication Publication Date Title
TWI334736B (en) Apparatus and method for generating a level parameter, apparatus and method for generating a multi-channel representation and a storage media stored parameter representation
TWI313857B (en) Apparatus for generating a parameter representation of a multi-channel signal and method for representing multi-channel audio signals
TWI458365B (en) Apparatus and method for generating a level parameter, apparatus and method for generating a multi-channel representation and a storage media stored parameter representation
Laitinen Techniques for versatile spatial-audio reproduction in time-frequency domain

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent