TW200921643A

TW200921643A - A method of merging at least two input object-oriented audio parameter streams into an output object-oriented audio parameter stream

Info

Publication number: TW200921643A
Application number: TW97123574A
Authority: TW
Inventors: Dirk Jeroen Breebaart; Erik Gosuinus Petrus Schuijers; Arnoldus Werner Johannes Oomen
Original assignee: Koninkl Philips Electronics Nv
Priority date: 2007-06-27
Filing date: 2008-06-24
Publication date: 2009-05-16
Also published as: WO2009001292A1

Abstract

A method of merging at least two input object-oriented audio parameter streams into an output object-oriented audio parameter stream is disclosed. Each object-oriented audio parameter stream comprises object-oriented audio parameter values. Said object-oriented audio parameter values represent statistical properties of audio objects as a function of time. Said method comprises the following step. First, calculating a synchronized stream for each said input object-oriented audio parameter stream takes place. Said synchronized stream has object-oriented parameter values at predetermined temporal positions. Said predetermined temporal positions are the same for all synchronized streams. For each synchronized stream said object-oriented parameter values at the predetermined temporal positions are calculated by means of interpolating of the object-oriented parameter values of the corresponding input object-oriented audio parameter stream. Second, creating of the output object-oriented audio parameter stream is performed. Said output object-oriented audio parameter stream has object-oriented parameter values at the predetermined temporal position obtained by combining the object-oriented audio parameter values of the synchronized streams at said same predetermined temporal position.

Description

200921643 九、發明說明：【發明所屬之技術領域】200921643 IX. Description of invention: [Technical field to which the invention belongs]

【先前技術】[Prior Art]

手於基於物件之空間音訊編碼。該工丨j心平调首汛物件的技術吸 ϊ框架内，一個工作組已著。該工作組的目標為"探索新技術及現有MPEG環繞元件之再使用及用於位元率有效編碼多個聲源或物件至多個降混音通道及相應空間參數之技術”。換言之，該目標為以相應參數編碼多個音訊物件於一有限組的降混音通道内。在解碼器側，使用者可藉由例如重新配置該等單個音訊物件與内容進行互動。與内容之互動易在物件導向解碼器中實現。其藉由包括一轉譯步驟於解碼過程之後實現。該轉譯步驟與該解碼過程結合作為一單獨處理步驟以省去決定單個物件之需求。對擴音器重放，該種結合描述於Faller，C.之”parametfie joint-coding of audio sources" ’ 第 120屆 AES研討會會議兮己錄，法國巴黎，2006年5月。對耳機重放，一種解碼及頭部相關傳遞函數處理之有效結合描述於Breebaart，J.、 Herre, J. ' Villemoes, L. ' Jin, C. ' Kjorling, K. ' Plogsties 132374.doc 200921643 J· KoPPens，J. (2006)之"Multi-channel goes mobile: MPEG Surround binaural rendering" ’ 第 29屆 AES研討會會議記錄，韓國首爾。當使用物件導向方式，如上所述，於電信會議時，需要合併多個物件導向音訊流，每個音訊流包含其自身降混音及源自不同遠端的物件導向音訊參數流。作爲結果，一單獨物件導向日流產生’其可藉由—物件導向解碼器進一步解碼一般地，降混音音訊流採用基於訊框的結構。由物件導向、A碼所提供之該等物件導向參數形成—個流，其乂時間函數之形式反映該等音訊物件之統計學特性。因此，該等物件導向音訊參數係對一特定訊框或甚至一訊框之一部分爲有效的。在電信會議應用巾，該等遠端之多個物件導向音訊編碼器所採用之成框不太可能理想地在時間上對準。此外，；^音訊物件可能導致不同成框及不同物件導向音訊參數位置’因該等參數位置較佳由内容決定。 -併該等物件導向音訊流（該等音訊流由多個降混音及相應物件導向音訊參數流組成）將要求㈣該等物件導向音訊流之至少-個以便於對準該等音訊流之訊框邊界。該合併方法之缺點為引入一極不受歡迎之額外延遲於即時電信 /電信會議系統中。【發明内容】本發明之一目的儀接供—#人，，，于权供種合併至少兩個物件導向音訊參數机之增強方法’該方法不需爲對準訊框邊界而延遲該等物件導向音訊參數流。 132374.doc 200921643 該目的已由根據本發明之請求項i所定義的—種將至少兩個輸入物件導向音訊參數流合併成一輸出物件導向音訊參數流的方法達到。每個物件導向音訊參數流包含物件導向音訊參數值。該等物件導向音訊參數值以—時間函數之形式表現該等音訊物件之統計學特性。該方法包含以下步驟。第-，計算各個該輸入物件導向音訊參數流的一同步流。該同步流具有在預定暫時位置的物件導向參數值。該等預定暫時位置對所有同步流係相同。對每個同步流，咳Hand-based spatial audio coding based on objects. Within the framework of the technical attraction of the work of the first object, a working group has already taken place. The goal of the working group is to "explore new technologies and reuse of existing MPEG surround components and techniques for efficiently encoding multiple sound sources or objects to multiple downmix channels and corresponding spatial parameters at bit rates." In other words, The goal is to encode a plurality of audio objects in a limited set of downmix channels with corresponding parameters. On the decoder side, the user can interact with the content by, for example, reconfiguring the individual audio objects. Implemented in an object-oriented decoder. This is accomplished by including a translation step after the decoding process. The translation step is combined with the decoding process as a separate processing step to eliminate the need to determine a single object. For loudspeaker playback, The combination is described in Faller, C., "parametfie joint-coding of audio sources", 'The 120th AES Symposium Conference, Paris, France, May 2006. An effective combination of headphone playback, a decoding and head related transfer function processing is described in Breebaart, J., Herre, J. 'Villemoes, L. 'Jin, C. 'Kjorling, K. ' Plogsties 132374.doc 200921643 J · KoPPens, J. (2006) "Multi-channel goes mobile: MPEG Surround binaural rendering" ' Record of the 29th AES Symposium, Seoul, South Korea. When using the object-oriented approach, as described above, in teleconferencing, multiple object-oriented audio streams need to be combined, each audio stream containing its own downmix and object-oriented audio parameter streams originating from different remote ends. As a result, a single object-oriented day stream produces 'which can be further decoded by the object-oriented decoder. Generally, the down-mixed audio stream uses a frame-based structure. The object-oriented parameters provided by the object-oriented, A-code form a stream whose 乂 time function reflects the statistical properties of the audio objects. Therefore, the object-oriented audio parameters are valid for a particular frame or even a portion of a frame. In teleconferencing applications, the frames used by the plurality of object-oriented audio encoders at the far end are less likely to be ideally aligned in time. In addition, ^^ audio objects may result in different framed and different object-oriented audio parameter positions' because the position of the parameters is better determined by the content. - and the object-oriented audio streams (the audio streams consisting of a plurality of downmixes and corresponding object-oriented audio parameter streams) will require (4) the objects to be directed to at least one of the audio streams to facilitate alignment of the audio streams Frame boundary. The disadvantage of this integration approach is the introduction of an undesired additional delay in instant telecommunication/telecom conferencing systems. SUMMARY OF THE INVENTION One object of the present invention is to provide an enhanced method for combining at least two object-oriented audio parameter machines with a human-made, and the method does not need to delay the objects for aligning the frame boundaries. The stream of audio parameters is directed. 132374.doc 200921643 This object has been achieved by a method for combining at least two input object-oriented audio parameter streams into an output object-oriented audio parameter stream as defined in claim i of the present invention. Each object-oriented audio parameter stream contains object-oriented audio parameter values. The object-oriented audio parameter values represent the statistical properties of the audio objects in the form of a time function. The method includes the following steps. First, a synchronous stream of each of the input object-oriented audio parameter streams is calculated. The synchronized stream has an object orientation parameter value at a predetermined temporary location. These predetermined temporary locations are the same for all synchronized streams. Cough for each sync stream

等預定暫時位置之該等物件導向參數值係藉由相應輸入物件導向音訊參數流之物件導向參數值之内插進行計算。第 -輸出物件導向音訊參數流之產生被執行。該輸出物件導向音訊參數流具有在狀暫時位置的物件導向參數值， "亥等參數值係藉由組合該相同預定暫時位置之同步流之物件導向音訊參數值獲得。根據本發明之合併至少兩個輸入物件導向音訊參數流之方法的優點為該等物件導向音訊參數流不需爲合併該等流要求延遲。相反，一極簡單處理被執行以獲得同步流之物件導向音訊參數值。所提出方法之額㈣益為橫跨該等物件導向音訊參數流之訊㈣的不同物件導向音訊參數位置之問題被克服。在實施例中，遽波被應用於同步流之物件導向音訊泉數值應用例如—簡單分段線性内插值具有所得到的該等内插值之物件導向音訊參數值為低通濾波之效應。因此，在線)·生内插值情況下，例如高通濾波可被應用以減少由該 132374.doc 200921643 内插值所引起的低通遽波效應。應用滤波於同步流之物件導向音訊參數值之優點為其確保如相應輸入物件導向音訊參數流之一相似動態行爲。換言<，其改良同步品質：因其幫助同步流之物件導向音訊參數值模擬該等參數之原始行爲。該同步過程為相應輸入物#導向音訊參數流提供同步流在-實施例中’所應用之濾波係自適應的以便於匹配該等相應輸入物件導向音訊參數值之統計學特性。使用該自適應濾波給同步過程帶來進一步改良。因該等物件導向音訊參數以-時間函數之形式反映該等音訊物件之統計學特性，故需要該同步過程將該等隨時間之波動考慮在内。在一實施例中，一線性預測編碼分析被使用以決定輸入物件導向音訊參數流之該等物件導向音訊參數值之一包絡並隨後在據波期間，該包絡被賦予給相應同步流之該等物件導向音訊參數值。言亥同步流之物件導向音訊參數值之後處理/濾波之方法確保所得到的同步參數具有如原始輸入參數之相似特性。在:實施例中，互不相同之頻率分辨率藉由上採樣該等物件導向音訊參數值至一更高頻率分辨率之方式進行匹配。該等輸入物件導向音訊參數流具有互不相同之頻率分辨率係可能發生的。在該情況下，該等頻率分辨率必須被匹配。j採樣該等物件導向音訊參數值至一更高頻率分辨率具有簡單、且不需廣泛計算工作之優點。因在絕大多數 132374.doc 200921643 系先中頻率刀辨率對參數帶具有共同邊緣/闕值頻率，上採樣可藉由簡單複製適當參數值完成。本發明進-步提供褒置請求項，—啓動—可程式化裝置以執行根據本發明之方法的電腦程式產品，及—包含一根據本發明之裝置的電信會議系統。【實施方式】本&明以上及其他方面將參考圖式所示實施例闡明及解說。圖1顯示根據本發明之將至少兩個輸入物件導向音訊參數流合併成-輸出物件導向音訊參數流的方法之流程圖：每個物件導向a 參數流包含物件導向音訊參數值。該等物件導向音訊參數值以一時間函數之形式表現音訊物件之統計學特性m包含以下步驟。步驟m包含計算各個《亥輸入物件導向音訊參數流的一同步流。該同步流具有在預疋暫時位置的物件導向參數值^該等預定暫時位置對所有同步流係相同的。對每個同步流，該等預定暫時位置之該等物件導向參數值藉由相應輸人物件導向音訊參數流之物件導向參數值之内插值進行計算。步驟⑽包含輸出物件導向音訊參數流之產生。該輸出物件導向音訊參數流具有在預定暫時位置的物件導向參數值，該等參數值藉由組：該相同預定暫時位置之同步流之物件導向音訊參數值獲传。該組合藉由串連對應於該等輸入物件導向音訊參數流之物件導向音訊參數值實現。包含於該等物件導向音訊，數流中之物件導向音訊參數係依據時間/頻率塊 I32374.doc -10- 200921643 (time/frequency tile)被設置於每個暫時位置。每個物件導向音訊參數與一音訊物件相關，且每個音訊物件依次指7 於該等時間/頻率塊之一者。因此，步驟12〇中所採用：：連對應於該等輸入物件導向音訊參數流之物件導向音訊參數值分別被執行於該等時間/頻率塊之各者。一特定時間/ 頻率塊的串連之參數值之順序係隨意的。圖2顯不一範例架構，其中兩個輸入物件導向音訊參數流被合併成-個輸出物件導向音訊參數流。當應用該物件The object orientation parameter values of the predetermined temporary position are calculated by interpolation of the object-oriented parameter values of the corresponding input object-oriented audio parameter stream. The generation of the first-output object-oriented audio parameter stream is performed. The output object-oriented audio parameter stream has an object-oriented parameter value at a temporary position, and the parameter value such as "Hai" is obtained by combining the object-oriented audio parameter values of the synchronous stream of the same predetermined temporary position. An advantage of the method of combining at least two input objects to direct audio parameter streams in accordance with the present invention is that the object-oriented audio parameter streams need not be delayed in order to incorporate the streams. Instead, a very simple process is performed to obtain the object-oriented audio parameter values of the synchronized stream. The amount of the proposed method (4) is overcome by the problem of the position of the different object-oriented audio parameters across the flow of the object-oriented audio parameters (4). In an embodiment, chopping is applied to the object-oriented audio spring of the synchronous stream. For example, the simple piecewise linear interpolation has the effect that the resulting object-oriented audio parameter value of the interpolated value is low-pass filtering. Therefore, in the case of online interpolation, for example, high-pass filtering can be applied to reduce the low-pass chopping effect caused by the interpolated value of the 132374.doc 200921643. The advantage of applying an object that is filtered to the synchronized stream to the value of the audio parameter is to ensure that the dynamic behavior of one of the input object-oriented audio parameter streams is similar. In other words, it improves the quality of the synchronization: the object-oriented audio parameter values that help synchronize the flow simulate the original behavior of the parameters. The synchronization process provides a synchronized stream for the corresponding input #guided audio parameter stream. The filter applied in the embodiment is adaptive to match the statistical characteristics of the corresponding input object-oriented audio parameter values. This adaptive filtering is used to bring further improvements to the synchronization process. Since the object-oriented audio parameters reflect the statistical properties of the audio objects in the form of a time function, the synchronization process is required to take into account such fluctuations over time. In one embodiment, a linear predictive coding analysis is used to determine an envelope of the object-oriented audio parameter values of the input object-oriented audio parameter stream and then the envelope is assigned to the corresponding sync stream during the wave period. Object-oriented audio parameter values. The method of processing/filtering the object-oriented audio parameter values after the synchronization stream ensures that the resulting synchronization parameters have similar characteristics as the original input parameters. In an embodiment, mutually different frequency resolutions are matched by upsampling the objects to direct audio parameter values to a higher frequency resolution. These input object-oriented audio parameter streams may have different frequency resolutions that may occur. In this case, the frequency resolutions must be matched. j Sampling these object-oriented audio parameter values to a higher frequency resolution is simple and does not require extensive computational work. Since the majority of the 132374.doc 200921643 prior-frequency resolution has a common edge/阙 frequency for the parameter band, upsampling can be done by simply copying the appropriate parameter values. The present invention further provides a device request, a boot-programmable device for executing a computer program product in accordance with the method of the present invention, and a teleconferencing system comprising a device according to the present invention. [Embodiment] The above and other aspects will be clarified and explained with reference to the embodiments shown in the drawings. 1 shows a flow diagram of a method for combining at least two input object-oriented audio parameter streams into an output object-oriented audio parameter stream in accordance with the present invention: Each object-oriented a-parameter stream includes object-oriented audio parameter values. The fact that the object-oriented audio parameter values represent the statistical properties of the audio object in the form of a time function comprises the following steps. Step m includes calculating a synchronous stream of each of the input object-oriented audio parameter streams. The synchronized stream has object tracking parameter values at the pre-temporary temporary position. The predetermined temporary positions are the same for all synchronized flow systems. For each of the synchronized streams, the object-oriented parameter values of the predetermined temporary positions are calculated by interpolating the values of the object-oriented parameter values of the respective input-oriented audio parameter streams. Step (10) includes generating an output of the object-oriented audio parameter stream. The output object-oriented audio parameter stream has object-oriented parameter values at predetermined temporary positions, the parameter values being passed by the group: the object-oriented audio parameter values of the synchronized stream of the same predetermined temporary position. The combination is implemented by serially connecting the object-oriented audio parameter values corresponding to the input object-oriented audio parameter streams. Included in the object-oriented audio, the object-oriented audio parameters in the stream are set to each temporary position according to the time/frequency block I32374.doc -10- 200921643 (time/frequency tile). Each object-directed audio parameter is associated with an audio object, and each audio object is sequentially referred to as one of the time/frequency blocks. Thus, in step 12:: the object-directed audio parameter values corresponding to the input object-oriented audio parameter streams are each performed on each of the time/frequency blocks. The order of the parameter values of the concatenation of a particular time/frequency block is arbitrary. Figure 2 shows an example architecture in which two input object-oriented audio parameter streams are combined into one output object-oriented audio parameter stream. When applying the object

導向方法以用於電信會議應用時，來自不同遠端之多個物件導向音訊流必須被結合為一個單獨流。每個流包含其自身降混音流及相應物件導向音訊參數流。在本發明中Y該物件導向音訊參數可與該等物件參數互換㈣。在兩個: 與方處之物件導向編碼器21〇及22()分別在該兩方編碼各種音訊物件組2〇1及202。解碼器之各者分別產生一降混音流 211與212，及相應物件參數流221與222。該兩個物件導向音訊流被饋送給單元230,其執行該等流之合併。用於電仏會議領域之單元230的一個例子為多方控制單元 (=MCU)。該單元230產生合併之降混音213及合併之物件參數流。隨後，該合併之物件導向音訊流可在接收參與方處在物件導向解碼器240中基於使用者資料224予以進行解碼。該使用者資料224係關於例如在三維空間中之物件定位。解碼的結果係產生呈現給使用者的一經轉譯之輸出 214。該等物件參數以一時間函數之形式反映該等音訊物件之 132374.doc 200921643 統計學特性。因此，該等參數隨時間而變化且僅在特定時間間隔期間有效。該等時間間隔一般被稱爲訊框。該等物件參數係在-個特定訊框甚至一訊框之一部分期間有效。在電信會議應用令，多個物件導向音訊編碼器之該等訊框不太可能完美地在時間上對準。此外，不同音訊物件可能 . ^致不同成框及不同物件參數位置，因物件參數位置較佳 - 地取決於内容。若無所提出之本發明之同步處理，則該等物件導向音訊流之合併將需要延遲該等降混音之至少一者 η 及相應物件參數以便於對準該等待合併流之訊框邊界。根據本發明之合併至少兩個物件導向音訊參數流之方法的優點為不需延遲該等物件導向音訊參數流以合併該等流。而是，執行一非常簡單處理，以獲得同步流之物件導向音訊參數值。所提出方法之額外利益為橫跨該等物件導向音訊參數流之訊框内的不同物件導向音訊參數位置之問題被克服。圖3顯示同步流之物件導向音訊參數值331、332、及 υ 333，其藉由在預定暫時位置35ι、352、及353之相應輸入物件導向音訊參數流之物件導向音訊參數值32 1、322、 323、及324之間内插值獲得。圖3描述對輸入物件參數流31〇及32〇之三個相鄰訊框〇、 1、及2之物件參數值。該輸入物件參數流3丨〇係保持原樣不動之參考流’同時該第二輸入物件參數流320將被同步以使該等物件參數之新位置對準該輸入物件參數流3 1 〇之物件參數。該兩個輸入流3 1 0及320之物件參數的成框及佈 I32374.doc 12 200921643 置係非對準的。該物件參數流3 10之成框及參數位置被複製至同步流330。該物件參數流32〇之物件參數值被内插於該輸入物件參數流32〇之物件參數值之間，如虛線所示。暫時位置311、312、及313之該等内插值之物件參數值被複製至該同步流3 3 0。When directed to teleconferencing applications, multiple object-oriented audio streams from different remotes must be combined into a single stream. Each stream contains its own downmix stream and the corresponding object-oriented audio parameter stream. In the present invention, the object-oriented audio parameters can be interchanged with the object parameters (4). In two: the object-oriented encoders 21 and 22() at the respective sides encode the respective audio object groups 2〇1 and 202 on the two sides. Each of the decoders produces a downmix stream 211 and 212, respectively, and corresponding object parameter streams 221 and 222. The two object-oriented audio streams are fed to unit 230, which performs the combining of the streams. An example of a unit 230 for the field of electronic conferences is a multi-party control unit (= MCU). The unit 230 produces a combined downmix 213 and a combined object parameter stream. The merged object-oriented audio stream can then be decoded based on the user profile 224 in the object-oriented decoder 240 at the receiving participant. The user profile 224 is related to the positioning of objects, for example, in a three-dimensional space. The result of the decoding is a translated output 214 that is presented to the user. The object parameters reflect the statistical properties of the audio objects in the form of a time function. Therefore, the parameters vary over time and are only valid during certain time intervals. These time intervals are generally referred to as frames. The object parameters are valid during a particular frame or even a portion of a frame. In teleconferencing applications, it is unlikely that the frames of multiple object-oriented audio encoders will be perfectly aligned in time. In addition, different audio objects may result in different frame and different object parameter positions, because the object parameter position is better - depending on the content. Without the proposed synchronization process of the present invention, the combination of the object-oriented audio streams would require delaying at least one of the downmixes η and corresponding object parameters to facilitate alignment of the frame boundaries of the pending merge stream. An advantage of the method of combining at least two object-oriented audio parameter streams in accordance with the present invention is that there is no need to delay the flow of the object-oriented audio parameters to merge the streams. Instead, a very simple process is performed to obtain the object-directed audio parameter values for the synchronized stream. An additional benefit of the proposed method is that the problem of directing the position of the audio parameters across different objects within the frame of the object-directed audio parameter stream is overcome. 3 shows object-oriented audio parameter values 331, 332, and 333 of the synchronous stream, which are guided by the corresponding input objects at predetermined temporary positions 35, 352, and 353 to the object-oriented audio parameter values 32 1 , 322 of the audio parameter stream. Interpolated values between 323, and 324 are obtained. Figure 3 depicts the object parameter values for three adjacent frames 〇, 1, and 2 of the input object parameter streams 31〇 and 32〇. The input object parameter stream 3 is maintained as it is, and the second input object parameter stream 320 is synchronized such that the new position of the object parameters is aligned with the input object parameter stream 3 1 . . The frame of the object parameters of the two input streams 3 1 0 and 320 and the cloth I32374.doc 12 200921643 are misaligned. The block and parameter locations of the object parameter stream 3 10 are copied to the sync stream 330. The object parameter value of the object parameter stream 32 is interpolated between the object parameter values of the input object parameter stream 32, as indicated by the dashed lines. The object parameter values of the interpolated values of the temporary positions 311, 312, and 313 are copied to the synchronized stream 3 3 0 .

在一實施例中，該等預定暫時位置在該等輸入物件導向音訊參數流之一個中對應於物件導向|訊參數值之暫時位置。但是，其它用於決定預定暫時位置的方案也是可能的。例如，兩個流可同時在缺失於另一物件參數流之暫時位置被同步。另一選擇為依賴該等同步位置之密集度和/ 或計算複雜性選取該等暫時位置。進一步地，雖然所討論之例子中僅有兩個輸入流被合併，但大數目的輸入流亦可被合併。這在报大程度上係取決於應用的。對電信會議應用，可被合併流的數目取與— 個單獨參與方進行通訊的參與方之數目。在一實施例中’内插值係分段線性的。斟及.（n，b，p+l)表示的物件參數值，其中n為物件數字，b 參數帶數字’ p為參數索引，則内插值計算如下. ·、 σ(η, b, p) = Μ>ρ_λσ{η, b,p-l) + wp+：a(n, b,p + \) WP· 1，Wp+,為總和為1的兩個内插值權重：In one embodiment, the predetermined temporary positions correspond to a temporary position of the object steering parameter value in one of the input object oriented audio parameter streams. However, other schemes for determining a predetermined temporary location are also possible. For example, two streams can be synchronized simultaneously at a temporary location that is missing from another object parameter stream. Another option is to select such temporary locations depending on the intensity and/or computational complexity of the synchronized locations. Further, although only two input streams are combined in the example discussed, a large number of input streams can also be combined. This is largely determined by the application. For teleconferencing applications, the number of merged streams can be taken as the number of participants communicating with each individual party. In one embodiment, the interpolated values are piecewise linear.斟 and (n, b, p + l) represent the object parameter values, where n is the object number, b parameter with the number 'p is the parameter index, then the interpolation value is calculated as follows. ·, σ(η, b, p) = Μ>ρ_λσ{η, b,pl) + wp+:a(n, b,p + \) WP· 1, Wp+, are the two interpolated weights whose sum is 1:

且wp-i，wp + 丨分別與具有位置（p_l) 位置（p)之距離（示例中）成反比。單’且不需大量計算工作。 132374.doc -13· 200921643 或者’該等物件參數之内插值可在一不同領域内執行，例如： σ {n,b,p) = wp_}a2(n,b,p-\) + wp+ia2(n,b,p + l) 或 g( ^n,^>P))~^p^log(a(n,b,p-\))+wp^l〇g(a(n,b,p + \}) 圓4顯示一架構，其中 …."似’队："以，,，"、，_, 7 "IL〜奶旰导向And wp-i, wp + 丨 are inversely proportional to the distance from the position (p_l) position (p), respectively (in the example). Single ' does not require a lot of calculation work. 132374.doc -13· 200921643 or 'Interpolation of these object parameters can be performed in a different field, for example: σ {n,b,p) = wp_}a2(n,b,p-\) + wp+ Ia2(n,b,p + l) or g( ^n,^>P))~^p^log(a(n,b,p-\))+wp^l〇g(a(n, b,p + \}) Circle 4 shows an architecture where ...."like' team:"to,,,",,_, 7 "IL~milk-oriented

音訊參數值。在—實施例中，該所應用之據波係自適應於該等相應輸人物件導向音訊參數值之統計學特性的。元件 401及402代表輸入物件參數流，其中流4〇1係為該第二物件參數流4G2提供暫時位置之參考流。該等用於同步之暫時位置被包含於由411所標示之該等控制參數中。具有該物件參數流402之相應暫時位置412的物件參數被饋送至同 ▲步早元41G及資料處理單以2Q。該單元例利用内插法使 δ亥物件參數流彻同步。應用例如—簡單分段線性内插值 =有所得到的該等内插值之物件導向音訊參數值為低通遽 /之效應。所得到的同步流431被饋送至該資料處理單元 Γν:牛波器44°。該遽波器44°較佳為-低階高通據波料物件導向音訊參數值應^波之優點 :其：：-如相應輸入物件導向音訊參數流之相似動態行件導^之，其改良同步質量’因其幫助同步流之該等物參數值模擬輸入流之該等參數之原始行爲。基 420決定且2集之統計資料，濾波器係數432由該單元 4且被波器彻。該據波器例在其輸出產 5步物件參數值441，該等參徂顯不一如原始輸入物 132374.doc 14 200921643 件導向音訊參數流之相似動態行爲。圖5顯示一線性預測編碼分析之使用，其用以決定輸入物件導向音訊參數流之物件導向音訊參數值之一頻譜包絡並隨後在濾波期間將該包絡賦予給相應同步流之物件導向音訊參數值。該等暫時位置5丨丨自該參考輸入物件參數流導出且被饋送至同步單元51〇。該等物件參數及相應暫時位置512自該輸入物件參數流5〇2導出且反饋送至該同步單元5 10。5亥同步單元5〗〇使該等位置5丨丨之流之物件參數同步。為使同步物件參數獲得如原始輸入物件參數之相似特性，一 LPC分析法分別在單元520及MO中被同時執行於該等原始輸人物件參數及該等时物件參數。在單元 =〇中執行於該等同步物件參數531之Lpc分析導致所謂的譜白化。該滤波器單元55〇起到一頻譜白化渡波可獲得應用頻譜白色輸入之反向(綜合)據波器時，單元570將… 包因此’第二據波器早兀57G將流5〇2之原始物物件來/数之頭0曰包絡加至該等白化 / 2。該兩個濾波器級550及570可被έ士人单獨渡波器單元。該LPC分析較佳執行於^為一個地’該自相關估算使用一隨時心、自相關。較佳在-實施❹ 變化之滑動窗執行。相同之頻率分辨率件導向音訊參數流具有互不不同物件導向編石馬器進行編:併^入物件導向音訊流由相同之頻率分辨率。 '"專編碼器可能使用互不在—實施例中，嗜黧Η τ丄 132374.doc 〇不相同之頻率分辨率藉由對該等 200921643 物件導向音訊參數佶 ,^ ^ v 求平均值之方法進行匹配，以# m 更尚頻率分辨率。 ^獲得一不需過量計算工作處理該問題之極其簡單之方法，其該等物件導向音訊糸成参數包含經定義分開時間/頻率塊的禋《訊物件之 .. 少相對）層級資訊。該層級資訊係右關於參數值所袁老★1 貝成係有曰a ^之9訊物件的時間/頻率塊中的能量< :之用於時間/頻率塊的該頻率可具有不同分辨率:：二，可具有不同分裂至多個頻率帶。、帶寬可隨不同分辨率而變化。歡目及其兑月藉由上採樣該等物件導向音訊參數值至一更高 ,:刀辨率之方式匹配該等互不相同之頻率分辨率。由於大多數系統中頻率分辨率對參數帶具有共同邊緣頻率，所 2可藉由*製適當物件導向音訊參數值而達成上採樣。該考輪入物件參數流具有9個頻率帶，如圖6中610描繪。 ^ f輸入物件參數流具有一較低頻帛分辨率630及6個頻倘若該兩個物件參數流之該等共同邊緣頻率係對準二則可藉由複製該等參數值（如虛線箭頭所示）至交疊頻率帶而獲得該第二流之經上採樣之頻率分辨率，如圖中 γ20所示。630中之頻率帶51為61〇中兩個頻率帶^及…之等饧物。因此630之bl之左邊物件參數值被複製到62〇之頻 ;1同時63〇之b 1之右邊物件參數值被複製到620之頻率帶b2。所提出之將至少兩個輸入物件導向音訊參數流合併成一出物件導向音訊參數流的方法可藉由一裝置實現，其包 132374.doc -16· 200921643 含.同步構件及組合構件。該處理構件計算各個該輸入物件導向音訊參數流的—同步流。該同步流具有在預定暫時位置的物件導向參數值。該等預定暫時位置對所有同步流相同的。對每個同步流’該等預定暫時位置之該等物件導向參數值係藉由相應輸入物件參數流之物件導向參數值 =内插值進仃計算。組合構件產生輸出物件參數流。該輸勿件參數具有在預定暫時位置的物件導向參數值，該等 /數值藉由組合該相同預定暫時位置之該等同步流之該等 =導向音訊參數值而獲得。該組合係藉由串連對應於該物件導向音財㈣之物料向音.訊參數值而實現。實也例巾種電腦程式產品執行根據本發明之方法。置實a例巾電仏會議系統包含-根據本發明之裝雖然本說明書集中於合併輸人物件導向音訊參數流，作為電信會議制之目的，對應於料物件導向音訊參數流 :降混音音訊流亦需要合併。一種類似於所提出之合併該專物件導向參數流的方法可被應詩合併該等降混音。應注意上述該等實施例係為説明而非限制本發明日，且熟 :此項技術者可在不背離附加請求項之範圍的前提下設計多種替代實施例。在附隨請求項巾，圓㈣⑽何參考標記^應為限制該請求項。㈣”包含"並不排除請求項所列出元件 132374.doc 200921643 或步驟之外的元件或步驟之存在。一 A 1干石則面的接飼” 或”-個”並不排除複數個該等元件之存在。本發明可 =含數個離散單元的硬體實現’及藉由電腦實現。【圖式簡單說明] 圖1顯示根據本發明之脾5 rTWm h 月之將至少兩個輸入物件導向音m參數流合併成-輸出物件導向音訊參數流之方法的流程圖，Audio parameter value. In an embodiment, the applied data system is adaptive to the statistical properties of the corresponding input component-oriented audio parameter values. Elements 401 and 402 represent input object parameter streams, wherein stream 4〇1 provides a reference stream of temporary locations for the second object parameter stream 4G2. These temporary positions for synchronization are included in the control parameters indicated by 411. The object parameters having the respective temporary positions 412 of the object parameter stream 402 are fed to the same step 41G and the data processing list to 2Q. This unit example uses interpolation to synchronize the parameters of the object. Applications such as - simple piecewise linear interpolation = the resulting object-oriented audio parameter values of the interpolated values are low pass / effect. The resulting sync stream 431 is fed to the data processing unit Γν: the wave filter 44°. The chopper 44° is preferably a low-order high-pass according to the object-oriented audio parameter value of the wave object: the following:: - If the corresponding input object is directed to the audio parameter flow, the similar dynamic line guide is improved. The quality of the synchronization 'simulates the original behavior of the parameters of the input stream because of the value of the parameter values that help to synchronize the flow. The base 420 determines the statistics of the 2 sets, and the filter coefficients 432 are from the unit 4 and are filtered. The data filter example produces a 5-step object parameter value 441 at its output, which is similar to the similar dynamic behavior of the original input object 132374.doc 14 200921643 piece-oriented audio parameter stream. Figure 5 shows the use of a linear predictive coding analysis to determine the spectral envelope of one of the object-oriented audio parameter values of the input object-oriented audio parameter stream and then assign the envelope to the object-oriented audio parameter value of the corresponding synchronous stream during filtering. . The temporary positions 5 are derived from the reference input object parameter stream and fed to the synchronization unit 51A. The object parameters and corresponding temporary positions 512 are derived from the input object parameter stream 5〇2 and fed back to the synchronization unit 5 10. The 5th synchronization unit 5 synchronizes the object parameters of the streams at the locations. In order for the synchronized object parameters to obtain similar characteristics as the original input object parameters, an LPC analysis method is simultaneously performed in the units 520 and MO, respectively, on the original input character parameters and the isochronous object parameters. The Lpc analysis performed on the synchronized object parameters 531 in unit = 导致 results in so-called spectral whitening. When the filter unit 55 is used to perform a spectral whitening wave to obtain the inverse (integrated) data filter of the applied spectrum white input, the unit 570 will ... package the second data device as early as 57G will flow 5〇2 The original object is added to the whitening/2. The two filter stages 550 and 570 can be individually driven by the gentleman. The LPC analysis is preferably performed on a locale, the autocorrelation estimate using a self-correlation, autocorrelation. Preferably, the sliding window is implemented in the implementation of the change. The same frequency resolution component-oriented audio parameter stream has different object-oriented stone cutters for editing: and the object-oriented audio stream is encoded by the same frequency. '"Special encoders may use mutual exclusion--in the embodiment, the method of averaging the frequency resolution of the object-oriented audio parameters 佶, ^ ^ v Matches to # m more frequency resolution. ^ Obtain an extremely simple method that does not require an over-computation work to deal with the problem, and the object-oriented audio parameters include hierarchical information that defines a separate time/frequency block. The level information is right about the parameter value. The old time is 1 成成 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ : : : : : : : : : : : : : : : : : : :: Second, can have different splits to multiple frequency bands. The bandwidth can vary with different resolutions. The eye-catching and the month-to-month are used to upsample the values of the object-oriented audio parameters to a higher value: the resolution of the knife matches the mutually different frequency resolutions. Since the frequency resolution of the parameter band has a common edge frequency in most systems, the upsampling can be achieved by directing the appropriate object to the audio parameter value. The test wheel entry object parameter stream has nine frequency bands, as depicted by 610 in FIG. ^ f input object parameter stream has a lower frequency resolution 630 and 6 frequencies. If the common edge frequency of the two object parameter streams is aligned, the parameter values can be copied (as indicated by the dashed arrow) The frequency resolution of the upsampled second stream is obtained by overlapping the frequency band, as indicated by γ20 in the figure. The frequency band 51 in 630 is the two frequency bands of the 61 〇 and the like. Therefore, the value of the left object parameter of bl of 630 is copied to the frequency of 62 ;; 1 the value of the right object parameter of b1 of 63 同时 is copied to the frequency band b2 of 620. The proposed method of combining at least two input object-oriented audio parameter streams into an object-oriented audio parameter stream can be implemented by a device, the package 132374.doc -16· 200921643 including the synchronization member and the composite member. The processing component calculates a sync stream of each of the input object-oriented audio parameter streams. The sync stream has an object orientation parameter value at a predetermined temporary position. These predetermined temporary locations are the same for all synchronized streams. The object-oriented parameter values for each of the predetermined temporary positions for each of the synchronized streams are calculated by the object-oriented parameter value = interpolated value of the corresponding input object parameter stream. The combined component produces an output object parameter stream. The input parameter has an object-oriented parameter value at a predetermined temporary position obtained by combining the values of the =-directed audio parameters of the synchronized streams of the same predetermined temporary position. The combination is achieved by concatenating the material corresponding to the object-oriented audio (4) to the audio signal value. The computer program product performs the method according to the present invention.实施。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 Streams also need to be merged. A method similar to the proposed method of combining the object-oriented parameter streams can be combined with the downmix. It is to be noted that the above-described embodiments are intended to be illustrative, and not restrictive, and that the invention may be practiced in various alternative embodiments without departing from the scope of the appended claims. In the accompanying request, the circle (4) (10) and the reference mark ^ should limit the request. (d) "contains " does not exclude the existence of elements or steps other than the components listed in the request item 132374.doc 200921643 or steps. A A 1 dry stone surface feeding" or "-" does not exclude plural The existence of these components. The invention can be implemented as a hardware implementation with several discrete units' and by computer. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flow chart showing a method for combining at least two input object-oriented sound m parameter streams into an output object-oriented audio parameter stream according to the spleen 5 rTWm h month of the present invention.

士圖2顯^範例架構，其中兩個輸人物件導向音訊參數流被合併成一個輸出物件導向音訊參數流；圖3顯示同步流之物件導向音訊參數值，該等參數由在預定暫時位置之相應輸入物件導向音訊參數流之物件導向音訊參數值之間的内插值獲得；圖4顯示一架構’其中濾波被應用於同步流之物件導向音訊參數值；圖5顯示一線性預測編碼分析之使用，其用以決定輸入物件導向音訊參數流之物件導向音訊參數值之一頻譜包絡並隨後在濾波期間將該包絡賦予給相應同步流之物件導向音訊參數值；圖6顯示藉由上採樣該等物件導向音訊參數值至—更高頻率分辨率之方式匹配互不相同之頻率分辨率。门在該等圖式中’同一參考數字表示相似或相應之部件。圖中所指不部件之部分一般以軟體形式實現，因而其代表軟體實體’例如軟體模組或物件。【主要元件符號說明】 132374.doc 200921643 203 音訊物件組 204 音訊物件組 215 物件導向編碼器 216 降混音流 217 降混音流 218 合併之降混音 219 轉譯之輸出 220 物件導向編碼器 221 物件參數流 222 物件參數流 224 使用者資料 230 ΧΙΌ — 早兀 240 物件導向解碼器 310, 320 輸入物件參數流 311, 312, 313 暫時位置 321, 322, 323, 物件導向音訊參數值 330 同步流 331, 332, 333 物件導向音訊參數值 35 1, 352, 353 預定暫時位置 401, 402 輸入物件參數流 410 同步單元 411 控制參數 412 暫時位置 420 資料處理單元 132374.doc -19- 200921643 431 同步流 432 濾波器係數 440 濾波器 441 同步物件參數值 501 輸入物件參數流 510 510 同步單元 - 511, 512 暫時位置 520, 540 xjO —· 早兀 f.s 531 同步物件參數 550, 570 濾波器單元 562 白化物件參數Figure 2 shows the example architecture, in which two input character-oriented audio parameter streams are combined into one output object-oriented audio parameter stream; Figure 3 shows the object-oriented audio parameter values of the synchronous stream, which are at predetermined temporary positions. The interpolated value between the object-oriented audio parameter values of the corresponding input object-oriented audio parameter stream is obtained; FIG. 4 shows an architecture 'where the filtering is applied to the object-oriented audio parameter value of the synchronous stream; FIG. 5 shows the use of a linear predictive coding analysis And determining the spectral envelope of one of the object-oriented audio parameter values of the input object-oriented audio parameter stream and then assigning the envelope to the object-oriented audio parameter value of the corresponding synchronous stream during filtering; FIG. 6 shows that by upsampling Object-oriented audio parameter values to - higher frequency resolutions match different frequency resolutions. Doors In the drawings, the same reference numerals indicate similar or corresponding parts. Portions of the components referred to in the figures are generally implemented in software, and thus represent software entities such as software modules or objects. [Major component symbol description] 132374.doc 200921643 203 Audio object group 204 Audio object group 215 Object-oriented encoder 216 Downmix stream 217 Downmix stream 218 Combined downmix 219 Translation output 220 Object-oriented encoder 221 Object Parameter stream 222 object parameter stream 224 user profile 230 ΧΙΌ - early 240 object oriented decoder 310, 320 input object parameter stream 311, 312, 313 temporary position 321, 322, 323, object oriented audio parameter value 330 sync stream 331, 332, 333 object-oriented audio parameter value 35 1, 352, 353 predetermined temporary position 401, 402 input object parameter stream 410 synchronization unit 411 control parameter 412 temporary position 420 data processing unit 132374.doc -19- 200921643 431 synchronous stream 432 filter Coefficient 440 Filter 441 Synchronized object parameter value 501 Input object parameter stream 510 510 Synchronization unit - 511, 512 Temporary position 520, 540 xjO —· Early 兀 fs 531 Synchronized object parameter 550, 570 Filter unit 562 White material parameter

132374.doc -20-132374.doc -20-

Claims

200921643 X. The scope of application for patents·· 1. Kind of at least two input object-oriented objects are directed to the direction of the audio parameter stream, and the round-robin-to-object-oriented audio parameter values are included, and the parent object is directed to the audio parameter stream for a time. The form of the function is expressed in the form of 44 object-oriented audio parameter values to include the following steps: Bamboo "玍 This method calculates the flow of each of the input character-oriented audio parameter streams - Synchronous production 1 == There is an object-oriented parameter value at a predetermined temporary position 1st: All synchronous flow systems are the same; for each synchronous flow, the object-oriented parameter values of the time-of-phase position are calculated by interpolating the object-oriented parameter values of the audio parameter stream by the corresponding input 7 to generate an output. An object-oriented audio parameter stream; the output object-oriented audio parameter stream having an object-oriented parameter value at the predetermined temporary position, the parameter values being the object-oriented audio by combining the synchronous streams of the same predetermined temporary position Obtained from the parameter value. The method of claim 1, wherein the predetermined temporary positions correspond to temporary positions of the object-oriented audio parameter values in one of the input object-oriented audio parameter streams. 3. The method of claim 1, wherein the interpolation is segmentally linear. 4. The method of claim 1, wherein the filtering is applied to the object-oriented audio parameter values of the synchronized stream. 5. The method of claim 4, wherein the applied filter is adaptive to facilitate matching the statistics of the corresponding input object-oriented audio parameter values to 132374.doc 200921643. 6 · If requesting the method 'which uses 'linear predictive coding analysis to determine the value of the object-oriented audio parameter stream, the object-oriented audio parameter, the cheek spectrum envelope, and then assign the envelope to the 7 during the filtering : The value of the object-oriented audio parameters of the two streams should be synchronized. The method of claim 1, wherein the input objects are directed to the audio parameter stream / and have different frequency resolutions. 8. If the request item 7 >, borrowing from ..., the different frequency resolutions of the object-oriented audio parameter values are averaged, a higher frequency resolution is obtained. 9:::: Method 7 of the '7' wherein the mutually different frequency resolutions 2 are matched by the method of sampling the values of the object-oriented audio parameters to the higher frequency resolution. The method of 立贝9' wherein the upsampling is achieved by copying the appropriate object to the value of the θ parameter. The m is configured to combine at least two input component-oriented audio parameter streams into an output object-oriented audio parameter stream, wherein each object-oriented audio parameter stream includes object-oriented audio parameter values, and the objects are oriented to the audio value. The form of the function represents the characteristics of the audio objects, and the apparatus comprises: a synchronization component for calculating a synchronization stream of each of the input objects to the audio stream; the synchronization stream has an object guide at a predetermined temporary position, the parameter value 'The predetermined temporary positions are the same for all synchronized flow systems; for the parent synchronous stream, the object-oriented parameter values of the predetermined temporary positions I32374.doc 200921643 are object-oriented parameter values directed to the audio parameter stream by the respective input objects Interpolating for calculation; 12. 13. 14. 15. 16. A composite member for generating an output object-oriented audio parameter stream; the j: object-oriented audio parameter stream having an object D parameter value at the predetermined temporary position (d) the parameter values are obtained by combining the values of the object-oriented audio parameters of the synchronized streams of the same predetermined temporary bits. . The device of item U, wherein the device further comprises a filtering component, wherein the objects of the synchronous streams direct the chopping of the audio parameter values. The apparatus, wherein the (four) wave member is configured to adapt to the statistical characteristics of the value of the audio parameter of the specific object. The device of claim 12 wherein the device further comprises matching components, and VIII is adapted to match the resolutions when the input objects are not the same. The frequency resolution of the parameter stream is used to execute any request item (1). A teleconferencing system' which includes - such as the devices of claims 11 through !4. 132374.doc