TWI762949B

TWI762949B - Method for loss concealment, method for decoding a dirac encoding audio scene and corresponding computer program, loss concealment apparatus and decoder

Info

Publication number: TWI762949B
Application number: TW109119714A
Authority: TW
Inventors: 古拉米福契斯; 馬庫斯穆爾特斯; 史蒂芬多希拉; 安德利亞尹申瑟
Original assignee: 弗勞恩霍夫爾協會
Priority date: 2019-06-12
Filing date: 2020-06-11
Publication date: 2022-05-01
Also published as: JP2024063226A; CN114097029A; AU2020291776B2; JP7453997B2; WO2020249480A1; KR20220018588A; AU2020291776A1; ZA202109798B; SG11202113230QA; US20220108705A1; EP3984027B1; TW202113804A; CA3142638A1; BR112021024735A2; EP3984027A1; EP4372741A2; JP2022536676A; MX2021015219A

Abstract

A method for loss concealment of spatial audio parameters, the spatial audio parameters comprise at least a direction of arrival information; the method comprising the following steps: -receiving a first set of spatial audio parameters comprising at least a first direction of arrival information; -receiving a second set of spatial audio parameters; comprising at least a second direction of arrival information; and replacing the second direction of arrival information of a second set by a replacement direction of arrival information derived from the first direction of arrival information, if at least the second direction of arrival information or a portion of the second direction of arrival information is lost or damaged.

Description

Method for loss concealment, method for decoding Dirac encoded audio scenes and corresponding computer program, loss concealment device and decoder

發明領域 Field of Invention

本發明之實施例涉及一種用於空間音訊參數之丟失消隱之方法、一種用於解碼一DirAC經編碼音訊場景之方法及對應電腦程式。其他實施例涉及一種用於空間音訊參數之丟失消隱之丟失消隱設備及包含一封包丟失消隱設備之一解碼器。較佳實施例描述用於補償由於在音訊場景之傳輸期間發生的丟失及損毀訊框或封包造成的品質降級之概念/方法，對於該音訊場景，空間影像藉由方向性音訊寫碼(DirAC)範式以參數方式寫碼。 Embodiments of the present invention relate to a method for loss concealment of spatial audio parameters, a method for decoding a DirAC encoded audio scene, and a corresponding computer program. Other embodiments relate to a loss concealment apparatus for loss concealment of spatial audio parameters and a decoder including a packet loss concealment apparatus. The preferred embodiment describes a concept/method for compensating for quality degradation due to lost and damaged frames or packets that occur during transmission of audio scenes for which aerial imagery is encoded by directional audio coding (DirAC) Paradigm writes code parametrically.

發明背景 Background of the Invention

語音及音訊通信可歸因於傳輸期間的封包丟失而經受不同品質問題。實際上，網路中之不良條件(諸如位元錯誤及抖動)可能導致一些封包之丟失。此等丟失導致如咔嗒聲、撲通聲或非所要靜默之嚴重偽聲，其極大地劣化接收器側重建構的語音或音訊信號之感知品質。為對抗封包丟失之不良影響，已在習知語音及音訊寫碼方案中提出封包丟失消隱(PLC)演算法。此類演算法通常在接收器側處藉由產生合成音訊信號以消隱所接收位元串流中的遺漏資料來操作。 Voice and audio communications may suffer from different quality issues due to packet loss during transmission. In fact, poor conditions in the network, such as bit errors and jitter, may cause some packets to be lost. These losses result in severe artefacts such as clicks, thumps, or unwanted silences, which greatly degrade the perceived quality of the speech or audio signal on which the receiver focuses. To combat the adverse effects of packet loss, a packet loss concealment (PLC) algorithm has been proposed in conventional speech and audio coding schemes. Such algorithms typically operate at the receiver side by generating a synthesized audio signal to conceal missing data in the received bitstream.

DirAC為一種感知激勵空間音訊處理技術，其藉由一組空間參數及降混信號緊湊地且高效地表示聲場。降混信號可為諸如A格式或B格式之音訊格式的單音、立體聲或多聲道信號，亦稱為一階立體混響(FAO)。降混信號藉由空間DirAC參數補充，該等空間DirAC參數在到達方向(DOA)及每時間/頻率單位的擴散度方面描述音訊場景。在儲存、串流傳輸或通信應用中，降混信號藉由習知核心寫碼器(例如，EVS或EVS之立體聲/多聲道擴展或任何其他單/立體聲/多聲道編解碼器)寫碼，從而旨在保留每一聲道之音訊波形。該核心核心寫碼器可圍繞諸如CELP之在時域中操作的基於變換之寫碼方案或語音寫碼方案而建置。核心寫碼器可接著整合諸如封包丟失消隱(PLC)演算法之已經存在的錯誤恢復工具。 DirAC is a perceptually-excited spatial audio processing technique that compactly and efficiently represents a sound field with a set of spatial parameters and downmix signals. The downmix signal may be a mono, stereo or multi-channel signal in an audio format such as A format or B format, also known as first order stereo reverberation (FAO). The downmix signal is complemented by spatial DirAC parameters that describe the audio scene in terms of direction of arrival (DOA) and diffusion per time/frequency unit. In storage, streaming or communication applications, the downmix signal is written by a conventional core codec (eg EVS or Stereo/Multichannel Extension of EVS or any other mono/stereo/multichannel codec) code, so as to preserve the audio waveform of each channel. The core core code writer may be built around a transform-based code writing scheme operating in the time domain, such as CELP, or a speech code writing scheme. The core writer can then integrate already existing error recovery tools such as packet loss concealment (PLC) algorithms.

另一方面，不存在保護DirAC空間參數之現有解決方案。因此，需要改良之方法。 On the other hand, there is no existing solution to protect the DirAC spatial parameters. Therefore, improved methods are needed.

發明概要 Summary of Invention

本發明之目標為提供用於DirAC之上下文中的丟失消隱之概念。 It is an object of the present invention to provide a concept of loss concealment for use in the context of DirAC.

此目標係由獨立請求項之標的物解決。 This goal is addressed by the subject matter of the independent claim.

本發明之實施例提供一種用於空間音訊參數之丟失消隱之方法，該等空間音訊參數包含至少一到達方向資訊。該方法包含以下步驟：˙接收包含一第一到達方向資訊及一第一擴散度資訊之第一組空間音訊參數；˙接收包含一第二到達方向資訊及一第二擴散度資訊之第二組空間音訊參數；以及˙若至少該第二到達方向資訊或該第二到達方向資訊之一部分丟失，則用自該第一到達方向資訊導出的一替換到達方向資訊替換一第二組之該第二到達方向資訊。 Embodiments of the present invention provide a method for loss concealment of spatial audio parameters, where the spatial audio parameters include at least one direction of arrival information. The method includes the following steps: ˙ receiving a first set of spatial audio parameters including a first direction of arrival information and a first diffusivity information; ˙ receiving a second set including a second direction of arrival information and a second diffusivity information Spatial audio parameters; and ˙ if at least the second direction of arrival information or a portion of the second direction of arrival information is missing, replacing the second direction of arrival information of a second set with a replacement direction of arrival information derived from the first direction of arrival information Arrival party to information.

本發明之實施例係基於以下發現：在到達資訊丟失或損壞之情況下，丟失/損壞的到達資訊可用自另一可用到達資訊導出的到達資訊替換。舉例而言，若第二到達資訊丟失，則其可用第一到達資訊替換。換言之，此意謂實施例提供用於空間參數音訊之封包丟失消隱方案，對於該空間參數音訊，在傳輸丟失之情況下，方向性資訊藉由使用先前良好接收的方向性資訊及抖動來加以恢復。因此，實施例使得能夠對抗以直接參數寫碼的空間音訊聲音之傳輸中的封包丟失。 Embodiments of the present invention are based on the discovery that in the event that arrival information is lost or corrupted, the missing/corrupted arrival information can be replaced with arrival information derived from another available arrival information. For example, if the second arrival information is missing, it can be replaced with the first arrival information. In other words, this means that embodiments provide a packet loss concealment scheme for spatial parameter audio for which, in case of transmission loss, the directional information is improved by using previously well received directional information and jitter recover. Thus, embodiments enable combating packet loss in the transmission of direct parameter-coded spatial audio sound.

其他實施例提供一種方法，其中該第一組空間音訊參數及該第二組空間音訊參數分別包含一第一擴散資訊及一第二擴散資訊。在此情況下，策略可如下：根據實施例，該第一擴散度資訊或該第二擴散度資訊係自與至少一個到達方向資訊相關的至少一個能量比導出。根據實施例，該方法進一步包含用自該第一擴散度資訊導出的一替換擴散度資訊替換一第二組之該第二擴散度資訊。此為基於擴散並不在訊框之間改變很多的假設的所謂保持策略之一部分。出於此原因，簡單但有效的方法為保持在傳輸期間丟失的訊框之最末良好接收訊框之參數。此整個策略之另一部分為用該第一到達資訊替換該第二到達資訊，然而，其已在基本實施例之上下文中論述。認為空間影像必須隨時間推移相對穩定通常係安全的，可以針對DirAC參數(即，到達方向，其在訊框之間亦可能不會改變很多)對其進行轉譯。 Other embodiments provide a method, wherein the first set of spatial audio parameters and the second set of spatial audio parameters include a first diffusion information and a second diffusion information, respectively. In this case, the strategy may be as follows: According to an embodiment, the first diffusivity information or the second diffusivity information is derived from at least one energy ratio related to at least one direction of arrival information. According to an embodiment, the method further comprises replacing the second diffusivity information of a second set with a replacement diffusivity information derived from the first diffusivity information. This is part of a so-called hold strategy based on the assumption that diffusion does not change much between frames. For this reason, a simple but effective approach is to keep the parameters of the last well received frame of the frame lost during transmission. Another part of this overall strategy is to replace the second arrival information with the first arrival information, however, this has been discussed in the context of the basic embodiment. It is generally safe to assume that the aerial image must be relatively stable over time, which can be translated for the DirAC parameters (ie, the direction of arrival, which may also not change much between frames).

根據其他實施例，該替換到達方向資訊符合該第一到達方向資訊。在此情況下，可使用一方向之稱作抖動之一策略。此處，根據實施例，該替換步驟可包含使該替換到達方向資訊抖動之步驟。替代地或另外，在雜訊為該第一到達方向資訊時，該替換步驟可包含注入以獲得該替換到達方向資訊。抖動可接著藉由在使用先前方向用於該訊框時將隨機雜訊注入至該先前方向而幫助使所顯現聲場更自然且更令人愉快。根據實施例，若該第一擴散度資訊或該第二擴散度資訊指示一高擴散度，則較佳執行該注入步驟。或者，若該第一擴散度資訊或該第二擴散度資訊高於該擴散度資訊之指示一高擴散度之一預定臨限值，則可執行該注入步驟。根據其他實施例，該擴散度資訊包含關於由該第一組空間音訊參數及/或該第二組空間音訊參數描述的一音訊場景之方向性分量與非方向性分量之間的一比率的更多空間。根據實施例，待注入之該隨機雜訊取決於該第一擴散度資訊及該第二擴散度資訊。或者，待注入之該隨機雜訊按照取決於一第一擴散度資訊及/或一第二擴散度資訊之一因數進行縮放。因此，根據實施例，該方法可進一步包含以下步驟：分析由該第一組空間音訊參數及/或該第二組空間音訊參數描述的一音訊場景之調性(tonality)、分析屬於該第一空間音訊參數及/或該第二空間音訊參數之一所傳輸降混之該調性以獲得描述該調性的一調性值。待注入之該隨機雜訊接著取決於該調性值。根據實施例，按照與調性值之倒數一起減小的因數執行按比例縮小，或若調性增大，則執行按比例縮小。 According to other embodiments, the alternate direction of arrival information corresponds to the first direction of arrival information. In this case, a strategy called dithering in one direction can be used. Here, according to an embodiment, the step of replacing may include the step of dithering the replacement direction-of-arrival information. Alternatively or additionally, when the noise is the first direction of arrival information, the replacing step may include injecting to obtain the alternate direction of arrival information. Dithering can then help make the displayed The live soundstage is more natural and pleasant. According to an embodiment, the implanting step is preferably performed if the first diffusivity information or the second diffusivity information indicates a high diffusivity. Alternatively, the implanting step may be performed if the first diffusivity information or the second diffusivity information is above a predetermined threshold value of the diffusivity information indicating a high diffusivity. According to other embodiments, the diffusivity information includes updates on a ratio between directional components and non-directional components of an audio scene described by the first set of spatial audio parameters and/or the second set of spatial audio parameters multiple spaces. According to an embodiment, the random noise to be injected depends on the first diffusion information and the second diffusion information. Alternatively, the random noise to be injected is scaled by a factor depending on a first diffusivity information and/or a second diffusivity information. Therefore, according to an embodiment, the method may further comprise the steps of: analyzing the tonality of an audio scene described by the first set of spatial audio parameters and/or the second set of spatial audio parameters, analyzing the tonality belonging to the first set of spatial audio parameters The key of the transmitted downmix of the spatial audio parameter and/or one of the second spatial audio parameters to obtain a key value describing the key. The random noise to be injected then depends on the tone value. According to an embodiment, the scaling down is performed by a factor that decreases with the inverse of the key value, or if the key is increased, the scaling down is performed.

根據另一策略，可使用包含以下步驟之方法：外推該第一到達方向資訊以獲得該替換到達方向資訊。根據此方法，可設想估計該音訊場景中的聲音事件之目錄以外推估計目錄。若聲音事件在空間中良好地局域化且作為點源(具有低擴散度之直接模型)，則此尤其恰當。根據實施例，外推係基於屬於一或多組空間音訊參數之一或多個額外到達方向資訊。根據實施例，若該第一擴散度資訊及/或該第二擴散度資訊指示一低擴散度或若該第一擴散度資訊及/或該第二擴散度資訊指示一低擴散度低於用於擴散度資訊之一預定臨限值，則執行一外推。 According to another strategy, a method comprising the steps of extrapolating the first direction of arrival information to obtain the alternate direction of arrival information may be used. According to this method, it is conceivable to estimate the list of sound events in the audio scene to extrapolate the estimated list. This is especially true if the sound events are well localized in space and as point sources (direct models with low diffusion). According to an embodiment, the extrapolation is based on one or more additional direction of arrival information belonging to one or more sets of spatial audio parameters. According to an embodiment, if the first diffusivity information and/or the second diffusivity information indicates a low diffusivity or if the first diffusivity information and/or the second diffusivity information indicates a low diffusivity lower than At a predetermined threshold value of the diffusivity information, an extrapolation is performed.

根據實施例，該第一組空間音訊參數屬於一第一時間點及/或一第一訊框，該第二組空間音訊參數兩者屬於一第二時間點或一第二訊框。或者，該第二時間點在該第一時間點之後，或該第二訊框在該第一訊框之後。在回到大部分組空間音訊參數用於外推之實施例時，顯而易見，較佳使用屬於例如在彼此之後的複數個時間點/訊框的更多組空間音訊參數。 According to an embodiment, the first set of spatial audio parameters belongs to a first time point and/or a first frame, and both the second set of spatial audio parameters belong to a second time point or a second frame. Alternatively, the second time point is after the first time point, or the second frame is after the first frame. back to most When grouping spatial audio parameters for an embodiment of extrapolation, it is obvious that it is preferable to use more groups of spatial audio parameters that belong to, eg, a plurality of time points/frames after each other.

根據另一實施例，該第一組空間音訊參數包含用於一第一頻帶的空間音訊參數之第一子集及用於一第二頻帶的空間音訊參數之一第二子集。該第二組空間音訊參數包含用於該第一頻帶之空間音訊參數之另一第一子集及用於該第二頻帶之空間音訊參數之另一第二子集。 According to another embodiment, the first set of spatial audio parameters includes a first subset of spatial audio parameters for a first frequency band and a second subset of spatial audio parameters for a second frequency band. The second set of spatial audio parameters includes another first subset of spatial audio parameters for the first frequency band and another second subset of spatial audio parameters for the second frequency band.

另一實施例提供一種用於解碼一DirAC經編碼音訊場景之方法，其包含以下步驟：解碼包含一降混、第一組空間音訊參數及第二組空間音訊參數之該DirAC經編碼音訊場景。此方法進一步包含如上文所論述的用於丟失消隱之方法步驟。 Another embodiment provides a method for decoding a DirAC encoded audio scene, comprising the steps of: decoding the DirAC encoded audio scene including a downmix, a first set of spatial audio parameters, and a second set of spatial audio parameters. The method further includes method steps for loss blanking as discussed above.

根據實施例，上文論述的方法可由電腦實施。因此，一實施例涉及一種其上儲存有一電腦程式之電腦可讀儲存媒體，該電腦程式具有一程式碼，該程式碼用於在於一電腦上執行時執行根據前述技術方案中的一者之方法。 According to an embodiment, the methods discussed above may be implemented by a computer. Accordingly, one embodiment relates to a computer-readable storage medium having stored thereon a computer program having a program code for performing a method according to one of the preceding technical solutions when executed on a computer .

另一實施例涉及一種用於空間音訊參數(其包含至少一到達方向資訊)之丟失消隱的丟失消隱設備。該設備包含一接收器及一處理器。該接收器經組配以接收第一組空間音訊參數及第二組空間音訊參數(見上文)。該處理器經組配以在第二到達方向資訊丟失或損壞之情況下用自該第一到達方向資訊導出的一替換到達方向資訊替換該第二組之該第二到達方向資訊。另一實施例涉及用於一DirAC經編碼音訊場景的解碼器，該解碼器包含該丟失消隱設備。 Another embodiment relates to a loss concealment apparatus for loss concealment of spatial audio parameters including at least one direction of arrival information. The apparatus includes a receiver and a processor. The receiver is configured to receive a first set of spatial audio parameters and a second set of spatial audio parameters (see above). The processor is configured to replace the second direction of arrival information of the second set with a replacement direction of arrival information derived from the first direction of arrival information if the second direction of arrival information is lost or corrupted. Another embodiment relates to a decoder for a DirAC encoded audio scene, the decoder including the loss concealment device.

10:分析階段 10: Analysis Phase

10':直接分析階段 10': Direct Analysis Phase

12:濾波器組分析 12: Filter Bank Analysis

12a-n:帶通濾波器 12a-n: Bandpass filters

14c:能量分析 14c: Energy Analysis

14e:用於能量之分析階段 14e: For the analysis phase of energy

14i:用於強度之分析階段 14i: Analysis phase for strength

15:波束成形/信號選擇實體 15: Beamforming/Signal Selection Entity

16d:擴散度/擴散度估計器 16d: Diffusivity/Diffusivity Estimator

16e:方向 16e: Orientation

16i:方向估計器 16i: Orientation Estimator

17:編碼器 17: Encoder

20:合成實體 20: Synthetic entities

20':直接分析 20': Direct Analysis

21:空間元資料解碼器 21: Spatial metadata decoder

22a:第一串流 22a: First stream

22b:第二串流 22b: Second stream

23:輸出合成 23: Output Synthesis

24:虛擬麥克風 24: Virtual Microphone

25:EVS解碼器 25: EVS decoder

26:擴散度參數 26: Diffusion parameter

27:方向參數 27: Direction parameter

28:去相關實體 28: De-related entities

29:擴音器 29: Amplifier

50:設備 50: Equipment

52:介面 52: Interface

54:處理器 54: Processor

72:DirAC解碼器 72: DirAC decoder

100,200:方法 100,200: Method

110,120,130:基本步驟 110, 120, 130: Basic steps

210:額外解碼步驟 210: Additional decoding steps

隨後將參考附圖論述本發明之實施例，其中：圖1展示說明DirAC分析及合成之示意性方塊圖；圖2展示較低位元速率3D音訊寫碼器中的DirAC分析及合成之示意性詳細方塊圖；圖3a展示根據基本實施例的用於丟失消隱之方法之示意性流程圖；圖3b展示根據基本實施例的示意性丟失消隱設備；圖4a、圖4b展示DDR(圖4a窗口大小W=16，圖4b窗口大小W=512)之量測擴散度功能之示意圖以便說明實施例；圖5展示擴散度功能中之量測方向(方位角及仰角)之示意圖以便說明實施例；圖6a展示根據實施例的用於解碼DirAC經編碼音訊場景之方法的示意性流程圖；以及圖6b展示根據一實施例的用於DirAC經編碼音訊場景之解碼器之示意性方塊圖。 Embodiments of the invention will be discussed subsequently with reference to the accompanying drawings, in which: Figure 1 shows a schematic block diagram illustrating DirAC analysis and synthesis; Figure 2 shows a schematic diagram of DirAC analysis and synthesis in a lower bit rate 3D audio encoder detailed block diagram; Figure 3a shows a schematic flow diagram of a method for loss blanking according to a basic embodiment; Figure 3b shows a schematic loss blanking apparatus according to a basic embodiment; Figures 4a, 4b show DDR (Figure 4a window size W= 16, Fig. 4b is a schematic diagram of the measurement diffusivity function of the window size W=512) to illustrate the embodiment; Fig. 5 shows a schematic diagram of the measurement direction (azimuth and elevation) in the diffusivity function to illustrate the embodiment; Fig. 6a shows A schematic flow diagram of a method for decoding a DirAC encoded audio scene according to an embodiment; and Figure 6b shows a schematic block diagram of a decoder for a DirAC encoded audio scene according to an embodiment.

下文中，將隨後參考附圖論述本發明之實施例，其中相同附圖標號提供給具有相同或類似功能之物件/元件，以使得其描述相互適用且可互換。在詳細論述本發明之實施例之前，給出DirAC之介紹。 In the following, embodiments of the present invention will be subsequently discussed with reference to the accompanying drawings, wherein the same reference numerals are provided to items/elements having the same or similar functions, so that their descriptions are mutually applicable and interchangeable. Before discussing in detail embodiments of the present invention, an introduction to DirAC is given.

較佳實施例之詳細說明 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

DirAC之介紹：DirAC為感知激勵空間聲音再現。假設在一個時間瞬時且對於一個臨界頻帶，聽覺系統之空間解析度限於解碼一個方向提示及耳間相干性之另一提示。 Introduction of DirAC: DirAC is perceptually stimulated spatial sound reproduction. It is assumed that at one time instant and for one critical frequency band, the spatial resolution of the auditory system is limited to decoding one directional cue and another cue for interaural coherence.

基於此等假設，DirAC藉由使兩個串流：非方向性擴散串流及方向性不擴散串流交叉衰減在一個頻帶中表示空間聲音。在兩個階段執行DirAC處理：第一階段為如圖1a所說明的分析，且第二階段為如圖1b所說明的合成。 Based on these assumptions, DirAC represents spatial sound in one frequency band by cross-fading two streams: a non-directional diffuse stream and a directional non-diffusion stream. DirAC processing is performed in two stages: the first stage is analysis as illustrated in Figure 1a, and the second stage is synthesis as illustrated in Figure 1b.

圖1a展示包含一或多個帶通濾波器12a-n接收麥克風信號W、 X、Y及Z之分析階段10，用於能量之分析階段14e及用於強度之分析階段14i。藉由使用時間配置，可判定擴散度Ψ(參見附圖標記16d)。擴散度Ψ係基於能量分析14c及強度分析而判定。基於強度及分析，可判定方向16e。方向判定之結果為方位角及仰角。Ψ、azi及ele作為元資料而輸出。此等元資料由圖1b展示的合成實體20使用。 FIG. 1a shows receiving microphone signals W, including one or more bandpass filters 12a-n, Analysis stage 10 for X, Y and Z, analysis stage 14e for energy and analysis stage 14i for intensity. By using the time configuration, the diffusivity Ψ (see reference numeral 16d) can be determined. The diffusivity Ψ is determined based on the energy analysis 14c and the intensity analysis. Based on the intensity and analysis, the direction 16e can be determined. The result of the direction determination is the azimuth angle and the elevation angle. Ψ, azi, and ele are output as metadata. These metadata are used by the composite entity 20 shown in Figure 1b.

圖1b展示的合成實體20包含第一串流22a及第二串流22b。第一串流包含複數個帶通濾波器12a-n及用於虛擬麥克風24之計算實體。第二串流22b包含用於處理元資料之構件，即用於擴散度參數之26及用於方向參數之27。此外，去相關器28用於合成階段20，其中此去相關實體28接收兩個串流22a、22b之資料。去相關器28之輸出可饋送至擴音器29。 The composite entity 20 shown in FIG. 1b includes a first stream 22a and a second stream 22b. The first stream includes a plurality of bandpass filters 12a-n and a computational entity for the virtual microphone 24. The second stream 22b includes means for processing metadata, namely 26 for the diffusivity parameter and 27 for the direction parameter. In addition, a decorrelator 28 is used in the synthesis stage 20, wherein the decorrelator entity 28 receives the data of the two streams 22a, 22b. The output of decorrelator 28 may be fed to loudspeaker 29 .

在DirAC分析階段中，呈B格式之一階重合麥克風被視為輸入且在頻域中分析擴散度及聲音之到達方向。 In the DirAC analysis stage, a first-order coincident microphone in B format is considered as input and the diffusivity and direction of arrival of sound are analyzed in the frequency domain.

在DirAC合成階段中，聲音被分成二個串流，不擴散串流及擴散串流。使用振幅平移將不擴散串流再現為點源，可藉由使用向量基礎振幅平移(VBAP)[2]進行振幅平移。擴散串流負責產生包圍感且藉由將彼此去相關之信號輸送至擴音器而產生。 During the DirAC synthesis stage, the sound is split into two streams, the non-diffusion stream and the diffuse stream. The non-diffusion stream is reproduced as a point source using Amplitude Panning, which can be performed by using Vector Basis Amplitude Panning (VBAP) [2]. The diffuse stream is responsible for creating a sense of enveloping and is produced by sending signals that are decorrelated with each other to the loudspeaker.

DirAC參數(在下文中亦稱為空間元資料或DirAC元資料)由擴散度及方向之元組組成。方向可藉由兩個角(方位角及仰角)以球形座標表示，而擴散度為在0與1之間的純量因數。 DirAC parameters (hereinafter also referred to as spatial metadata or DirAC metadata) consist of tuples of diffusivity and direction. The direction can be expressed in spherical coordinates by the two angles (azimuth and elevation), and the spread is a scalar factor between 0 and 1.

下文中，將相對於圖2論述DirAC空間音訊寫碼之系統。圖2展示二階段DirAC分析10'及DirAC合成20'。此處，DirAC分析包含濾波器組分析12、方向估計器16i及擴散度估計器16d。16i及16d兩者皆輸出擴散度/方向資料作為空間元資料。此資料可使用編碼器17編碼。直接分析20'包含空間元資料解碼器21、輸出合成23、濾波器組合成，從而使得能夠將信號輸出至擴音器 FOA/HOA。 In the following, the DirAC spatial audio coding system will be discussed with respect to FIG. 2 . Figure 2 shows two stage DirAC analysis 10' and DirAC synthesis 20'. Here, the DirAC analysis includes a filter bank analysis 12, a direction estimator 16i, and a diffusivity estimator 16d. Both 16i and 16d output diffusivity/direction data as spatial metadata. This data can be encoded using encoder 17. Direct analysis 20' comprises spatial metadata decoder 21, output synthesis 23, filter combination to enable signal output to loudspeakers FOA/HOA.

與處理空間元資料的所論述直接分析階段10'及直接合成階段20'並行地，使用EVS編碼器/解碼器。在分析側上，基於輸入信號B格式執行波束成形/信號選擇(參見波束成形/信號選擇實體15)。信號接著進行EVS編碼(參見附圖標記17)。信號接著進行EVS編碼。在合成側(參見附圖標記20')上，使用EVS解碼器25。此EVS解碼器將信號輸出至濾波器組分析12，該濾波器組分析將其信號輸出至輸出合成23。 In parallel with the discussed direct analysis stage 10' and direct synthesis stage 20' that process spatial metadata, an EVS encoder/decoder is used. On the analysis side, beamforming/signal selection is performed based on the input signal B format (see beamforming/signal selection entity 15). The signal is then EVS encoded (see reference numeral 17). The signal is then EVS encoded. On the synthesis side (see reference numeral 20') an EVS decoder 25 is used. This EVS decoder outputs the signal to filter bank analysis 12 which outputs its signal to output synthesis 23 .

由於現在已論述直接分析/直接合成10'/20'之結構，將詳細論述功能性。 Since the structure of the direct analysis/direct synthesis 10'/20' has now been discussed, the functionality will be discussed in detail.

編碼器分析10'通常分析呈B格式之空間音訊場景。替代地，DirAC分析可經調整以分析不同音訊格式，如音訊物件或多聲道信號或任何空間音訊格式之組合。DirAC分析自輸入音訊場景提取參數表示。每個時間-頻率單位所量測之到達方向(DOA)及擴散度形成該等參數。該DirAC分析之後為空間元資料編碼器，該空間元資料編碼器對DirAC參數量化且編碼以獲得低位元速率參數表示。 The encoder analysis 10' typically analyzes the spatial audio scene in B format. Alternatively, DirAC analysis can be adapted to analyze different audio formats, such as audio objects or multi-channel signals or any combination of spatial audio formats. DirAC analysis extracts parametric representations from the input audio scene. The measured direction of arrival (DOA) and diffusivity for each time-frequency unit form these parameters. The DirAC analysis is followed by a spatial metadata encoder that quantizes and encodes the DirAC parameters to obtain a low bit rate parameter representation.

對自不同源或音訊輸入信號導出之降混信號以及該等參數進行寫碼以藉由習知音訊核心寫碼器來傳輸。在較佳實施例中，EVS音訊寫碼器較佳用於寫碼降混信號，但本發明不限於此核心寫碼器且可應用於任何音訊核心寫碼器。該降混信號由被稱作輸送聲道之不同聲道組成：該信號可為例如構成B格式信號之四個係數信號，取決於目標位元速率之立體聲對或單音降混。經寫碼空間參數及經寫碼音訊位元串流在經由通訊聲道傳輸之前經多工。 The downmix signals derived from different sources or audio input signals and the parameters are encoded for transmission by a conventional audio core encoder. In the preferred embodiment, the EVS audio code writer is preferably used to code downmix signals, but the invention is not limited to this core code writer and can be applied to any audio core code writer. The downmix signal consists of different channels called transport channels: the signal can be, for example, the four coefficient signals that make up the B-format signal, a stereo pair or a mono downmix depending on the target bit rate. The encoded space parameters and the encoded audio bitstream are multiplexed prior to transmission over the communication channel.

在解碼器中，輸送聲道係藉由核心解碼器解碼，而DirAC元資料在運用經解碼輸送聲道輸送至DirAC合成之前首先經解碼。DirAC合成使用經解碼元資料用於控制直接聲音串流之再現及其與擴散聲音串流的混合。再現聲場可再現於任意擴音器佈局上或可以任意次序以立體混響格式(HOA/FOA)產生。 In the decoder, the transport channels are decoded by the core decoder, and the DirAC metadata is first decoded before being transported to DirAC synthesis using the decoded transport channels. DirAC synthesis uses decoded metadata for controlling the reproduction of the direct sound stream and its mixing with the diffuse sound stream. reproduced sound Fields can be reproduced on any loudspeaker layout or can be produced in a stereo reverb format (HOA/FOA) in any order.

DirAC參數估計：在每一頻帶中，估計聲音之到達方向連同聲音之擴散度。自輸入B格式分量之時間-頻率分析w ⁱ(n),x ⁱ(n),y ⁱ(n),z ⁱ(n)，壓力及速度向量可判定為：P ⁱ(n,k)=W ⁱ(n,k) DirAC parameter estimation: In each frequency band, the direction of arrival of the sound is estimated along with the diffusion of the sound. From the time-frequency analysis of the input B-format components w ⁱ ( n ) , x ⁱ ( n ) , y ⁱ ( n ) , z ⁱ ( n ), the pressure and velocity vectors can be determined as: P ⁱ ( n,k )= Wi ( ⁿ ,k )

U ⁱ(n,k)=X ⁱ(n,k) e _x+Y ⁱ(n,k) e _y+Z ⁱ(n,k) e _z，其中i為輸入之索引，且k與n為時間-頻率資料塊之時間與頻率索引，且e _x、e _y、e _z表示笛卡爾單位向量。P(n,k)及U(n,k)用於經由計算強度向量來計算DirAC參數，即DOA及擴散度：

其中

表示複共軛。組合式聲場之擴散度由下式給出：

U ⁱ ( n,k ) = X ⁱ ( n,k ) e _x + Y ⁱ ( n,k ) e _y + Z ⁱ ( n,k ) e _z , where i is the index of the input and k and n are The time and frequency indices of time-frequency data blocks, and ex, e y _, ez _represent _Cartesian unit vectors. P ( n,k ) and U ( n,k ) are used to calculate DirAC parameters, ie DOA and diffusivity, by calculating the intensity vector:

in

represents the complex conjugate. The diffusivity of the combined sound field is given by:

其中E{.}表示時間平均運算符，c為聲速，且E(k,n)為聲場能量，其由下式給出：

where E{.} represents the time-averaging operator, c is the speed of sound, and E ( k,n ) is the sound field energy, which is given by:

聲場之擴散度經界定為聲音強度與能量密度之間的比率，具有0與1之間的值。 The diffusivity of a sound field is defined as the ratio between sound intensity and energy density, having a value between 0 and 1.

到達方向(DOA)藉助於單位向量方向(n,k)來表達，其界定為

The direction of arrival (DOA) is expressed by means of the unit vector direction ( n,k ), which is defined as

到達方向係藉由對B格式輸入之能量分析判定且可界定為強度向量之相對方向。方向經界定於笛卡爾座標中，但可容易地變換於由單位半徑、方位角及仰角界定的球形座標中。 The direction of arrival is determined by energy analysis of the B-format input and can be defined as the intensity direction. The relative direction of the quantity. The directions are defined in Cartesian coordinates, but can be easily transformed in spherical coordinates defined by unit radius, azimuth, and elevation.

在傳輸的情況下，參數需要經由位元串流傳輸至接收器側。為在能力受限的網路上穩健地傳輸，低位元速率位元串流係優選的，其可藉由針對DirAC參數設計有效寫碼方案來達成。其可藉由在不同頻帶及/或時間單位上對參數求平均、預測、量化及熵寫碼而例如使用諸如頻帶分組之技術。在解碼器處，在網路中未出現錯誤的情況下，可針對每一時間/頻率單位(k、n)解碼所傳輸之參數。然而，若網路條件並不足夠良好以確保恰當的封包傳輸，則封包可能在傳輸期間丟失。本發明旨在提供對後一情況的解決方案。 In the case of transmission, the parameters need to be transmitted to the receiver side via a bitstream. For robust transmission over capability-constrained networks, low bit rate bitstreaming is preferred, which can be achieved by designing an efficient coding scheme for the DirAC parameters. It may eg use techniques such as band grouping by averaging, predicting, quantizing and entropy coding the parameters over different frequency bands and/or time units. At the decoder, the transmitted parameters can be decoded for each time/frequency unit (k, n) without errors in the network. However, if network conditions are not good enough to ensure proper packet transmission, packets may be lost during transmission. The present invention aims to provide a solution to the latter case.

最初，DirAC意欲用於處理B格式記錄信號，亦稱為一階立體混響信號。然而，分析可容易地擴展至組合全向或方向性麥克風之任何麥克風陣列。在此情況下，由於DirAC參數之本質不變，因此本發明仍為恰當的。 Originally, DirAC was intended for processing B-format recorded signals, also known as first-order stereo reverberation signals. However, the analysis can easily be extended to any microphone array combining omnidirectional or directional microphones. In this case, since the nature of the DirAC parameters does not change, the present invention is still appropriate.

此外，亦稱為元資料之DirAC參數可在將麥克風信號輸送至空間音訊寫碼器之前的麥克風信號處理期間直接計算。接著將等效或類似於呈元資料形式的DirAC參數之空間音訊參數及降混信號之音訊波形來直接饋送至基於DirAC之空間寫碼系統。可容易地自輸入元資料針對每參數範圍導出DoA及擴散度。此類輸入格式有時稱作MASA(元資料輔助空間音訊)格式。MASA允許系統忽略計算空間參數所需的麥克風陣列之特異性及其外觀尺寸。此等將使用特定針對於併有麥克風的裝置之處理來在空間音訊寫碼系統外部導出。 In addition, DirAC parameters, also known as metadata, can be calculated directly during microphone signal processing before feeding the microphone signal to the spatial audio encoder. The spatial audio parameters equivalent or similar to the DirAC parameters in the form of metadata and the audio waveform of the downmix signal are then fed directly to the DirAC based spatial coding system. DoA and diffusivity can be easily derived for each parameter range from the input metadata. Such input formats are sometimes referred to as MASA (Metadata Aided Spatial Audio) formats. MASA allows the system to ignore the specificity of the microphone array and its apparent size required to compute spatial parameters. These will be derived outside the spatial audio coding system using processing specific to devices incorporating microphones.

本發明之實施例可使用如圖2所說明的空間寫碼系統，其中描繪基於DirAC的空間音訊編碼器及解碼器。將關於圖3a及圖3b論述實施例，其中在此之前將論述DirAC模型之擴展。 Embodiments of the present invention may use a spatial coding system as illustrated in Figure 2, which depicts a DirAC-based spatial audio encoder and decoder. Embodiments will be discussed with respect to Figures 3a and 3b, where an extension of the DirAC model will be discussed before this.

根據實施例，亦可藉由允許不同方向性分量具有相同時間/頻率資料塊來擴展DirAC模型。其可以兩個主要方式擴展：第一擴展由每T/F資料塊發送兩個或更多個DoA組成。每一DoA必須接著與能量或能量比相關聯。舉例而言，第l個DoA可與方向性分量之能量與總體音訊場景能量之間的能量比Γ _l相關聯：

According to an embodiment, the DirAC model can also be extended by allowing different directional components to have the same time/frequency data block. It can be extended in two main ways: The first extension consists of sending two or more DoAs per T/F data block. Each DoA must then be associated with an energy or energy ratio. For example, the 1 -th _DoA can be associated with the energy ratio Γ1 between the energy of the directional component and the overall audio scene energy:

其中I _l(k,n)為與第l方向相關聯之強度向量。若L個DoA連同其L個能量比一起傳輸，則可接著自L個能量比如下推斷擴散度：

where I _l ( k,n ) is the intensity vector associated with the l -th direction. If L DoAs are transported along with their L energy ratios, the diffusivity can then be inferred from the L energy ratios as follows:

在位元串流中傳輸的空間參數可為L個方向連同L個能量比，或此等最新參數亦可轉換為L-1個能量比+擴散度參數。 The spatial parameters transmitted in the bitstream can be L directions together with L energy ratios, or these latest parameters can also be converted to L − 1 energy ratios + diffusivity parameters.

第二擴展由將2D或3D空間分裂成非重疊扇區以及針對每一扇區傳輸一組DirAC參數(DoA+逐扇區擴散度)組成。吾人接著論述如在[5]中介紹的高階DirAC。 The second extension consists of splitting the 2D or 3D space into non-overlapping sectors and transmitting for each sector a set of DirAC parameters (DoA+sector-wise diffusivity). We go on to discuss higher-order DirACs as introduced in [5].

兩個擴展實際上可組合，且本發明對於兩個擴展皆為恰當的。 The two extensions are in fact combinable, and the invention is appropriate for both.

圖3a及圖3b說明本發明之實施例，其中圖3a展示集中於基本概念/所使用方法100之方法，其中所使用設備50由圖3b展示。 Figures 3a and 3b illustrate an embodiment of the invention, wherein Figure 3a shows a method focusing on the basic concept/method used 100, wherein the apparatus 50 used is shown by Figure 3b.

圖3a說明包含基本步驟110、120及130之方法100。 Figure 3a illustrates a method 100 comprising basic steps 110, 120 and 130.

第一步驟110與120彼此相當，即涉及接收若干組空間音訊參數。在第一步驟110中，接收第一組，其中在第二步驟120中，接收第二組。另外，可存在進一步接收步驟(未展示)。應注意，第一組可涉及第一時間點/第一訊框，第二組可涉及第二(後續)時間點/第二(後續)訊框，等。如上文所論述，第一組以及第二組可包含擴散度資訊(Ψ)及/或方向資訊(方位角及仰角)。此資訊可使用空間元資料編碼器進行編碼。現在，假設第二組資訊在傳輸期間丟失或損壞。在此情況下，第二組由第一組替換。此實現用於如DirAC參數的空間音訊參數之封包丟失消隱。 The first steps 110 and 120 are equivalent to each other, ie involve receiving sets of spatial audio parameters. In a first step 110 a first group is received, wherein in a second step 120 a second group is received. Additionally, there may be further receiving steps (not shown). It should be noted that the first group may relate to the first time point/frame, the second group may relate to the second (subsequent) time point/second (subsequent) frame, and so on. As discussed above, the first group starts with And the second set may contain diffusivity information (Ψ) and/or direction information (azimuth and elevation). This information can be encoded using a spatial metadata encoder. Now, suppose that the second set of information is lost or corrupted during transmission. In this case, the second group is replaced by the first group. This implementation is used for packet loss concealment of spatial audio parameters such as DirAC parameters.

在封包丟失之情況下，需要替代丟失訊框之已擦除DirAC參數以限制對品質之影響。此可藉由考慮過去接收的參數以合成方式產生遺漏參數來達成。不穩定空間影像可感知為令人不愉快且感知為偽影，但嚴格恆定的空間影像可能感知為不自然的。 In the case of packet loss, the erased DirAC parameter in place of the lost frame is required to limit the impact on quality. This can be achieved by synthetically generating missing parameters taking into account the parameters received in the past. Unstable aerial imagery may be perceived as unpleasant and artifact, but strictly constant aerial imagery may be perceived as unnatural.

如圖3a所論述的方法100可由如圖3b所展示的實體50執行。用於丟失消隱之設備50包含介面52及處理器54。經由該介面，可接收該等組空間音訊參數Ψ1、azi1、ele1、Ψ2、azi2、ele2、Ψn、azin、ele。處理器54分析所接收組，且在組丟失或損壞之情況下，其例如用先前接收的組或相當的組替換丟失或損壞的組。可使用此等不同策略，其將在下文論述。 The method 100 as discussed in Figure 3a may be performed by the entity 50 as shown in Figure 3b. Apparatus 50 for loss blanking includes interface 52 and processor 54 . Through the interface, the sets of spatial audio parameters Ψ1, azi1, ele1, Ψ2, azi2, ele2, Ψn, azin, ele can be received. The processor 54 analyzes the received group, and in the event of a missing or damaged group, it replaces the missing or corrupted group, eg, with a previously received group or an equivalent group. These different strategies can be used, which are discussed below.

保持策略：認為空間影像必須隨時間推移相對穩定通常係安全的，可以針對DirAC參數(即，到達方向及擴散，其在訊框之間不會改變很多)對其進行轉譯。出於此原因，簡單但有效的方法為保持在傳輸期間丟失的訊框之最末良好接收訊框之參數。 Hold strategy: It is generally safe to assume that aerial images must be relatively stable over time, which can be translated for DirAC parameters (ie, direction of arrival and spread, which do not change much between frames). For this reason, a simple but effective approach is to keep the parameters of the last well received frame of the frame lost during transmission.

方向之外推：替代地，可設想估計音訊場景中之聲音事件之軌跡，且接著嘗試外推估計軌跡。若聲音事件在空間中作為點源良好地局域化，則其尤其恰當，其在DirAC模型中由低擴散度反映。可自過去方向之觀測且擬合此等點之間的曲線來計算估計軌跡，其可隨內插或平滑化而演變。亦可使用回歸分析。接著藉由評估超出觀察到之資料範圍的擬合曲線來執行外推。 Directional Extrapolation: Alternatively, one could imagine estimating the trajectory of the sound events in the audio scene, and then try to extrapolate the estimated trajectory. It is especially appropriate if the sound event is well localized in space as a point source, which is reflected in the DirAC model by low diffusion. An estimated trajectory can be calculated from observations in past directions and fitting a curve between these points, which can evolve with interpolation or smoothing. Regression analysis can also be used. Extrapolation is then performed by evaluating the fitted curve over the range of the observed data.

在DirAC中，方向常常以極座標來表達、量化及寫碼。然而，以笛卡爾座標處理方向且接著處理軌跡通常更方便，以避免處置模數2π運算。 In DirAC, directions are often expressed, quantified, and coded in polar coordinates. However, it is often more convenient to process directions in Cartesian coordinates and then trajectories to avoid dealing with modulo 2π operations.

方向之抖動：在聲音事件較為擴散時，方向意義不大，且可視為隨機過程之實現。抖動可接著藉由在使用先前方向用於丟失的訊框時將隨機雜訊注入至該先前方向而幫助使所顯現聲場更自然且更令人愉快。注入雜訊及其方差可以是擴散性的函數。 Directional jitter: When the sound event is relatively diffuse, the direction has little meaning and can be regarded as the realization of a random process. Dithering can then help make the displayed sound field more natural and pleasing by injecting random noise into the previous direction when the previous direction was used for the lost frame. The injected noise and its variance can be a function of diffusivity.

使用標準DirAC音訊場景分析，吾人可研究擴散度對模型方向之準確度及有意義性之影響。使用人工B格式信號(針對其給出平面波分量與擴散場分量之間的直接與擴散能量比(DDR))，吾人可分析所得DirAC參數及其準確度。 Using standard DirAC audio scene analysis, we can study the effect of diffusion on the accuracy and meaningfulness of model orientation. Using an artificial B-format signal for which the direct to diffuse energy ratio (DDR) between the plane wave component and the diffuse field component is given, we can analyze the resulting DirAC parameters and their accuracy.

理論擴散度Ψ隨直接與擴散能量比(DDR)Γ而變，且表達為：

其中P _pw與P _diff分別為平面波與擴散功率，且Γ為以dB尺度表達之DDR。 The theoretical diffusivity Ψ is a function of the direct to diffusive energy ratio (DDR)Γ and is expressed as:

where P _pw and P _diff are the plane wave and diffuse power, respectively, and Γ is the DDR expressed in dB.

當然，有可能可使用三個所論述策略中之一者或組合。藉由處理器54取決於所接收空間音訊參數集合來選擇所使用的策略。為此，根據實施例，可分析音訊參數以使得能夠根據音訊場景之特性且更特定言之根據擴散度應用不同策略。 Of course, it is possible that one or a combination of the three discussed strategies could be used. The strategy used is selected by the processor 54 depending on the set of received spatial audio parameters. To this end, according to an embodiment, audio parameters may be analyzed to enable different strategies to be applied according to the characteristics of the audio scene, and more particularly according to the degree of diffusion.

此意謂，根據實施例，處理器54經組配以藉由使用先前良好接收的方向性資訊及抖動來提供空間參數音訊之封包丟失消隱。根據另一實施例，抖動隨估計擴散度或音訊場景之方向性分量與非方向性分量之間的能量比而變。根據實施例，抖動隨所傳輸降混信號之所量測調性而變。因此，分析器基於估計擴散度、能量比及/或調性來執行其分析。 This means, according to an embodiment, the processor 54 is configured to provide packet loss concealment for spatial parameter audio by using previously well received directional information and jitter. According to another embodiment, the jitter is a function of the estimated diffusivity or the energy ratio between the directional and non-directional components of the audio scene. According to an embodiment, jitter is a function of the measured modulation of the transmitted downmix signal. Accordingly, the analyzer performs its analysis based on the estimated diffusivity, energy ratio, and/or tonality.

在圖3a及圖3b中，量測擴散度根據DDR(藉由用均勻定位於球體上的N=466個不相關粉色雜訊模擬擴散場)及平面波(藉由將獨立粉色雜訊置放於0度方位角及0度仰角處)來給出。其確認，若觀測窗口長度W足夠大，則在DirAC分析中量測的擴散度為理論擴散度之良好估計。此意味著擴散度具有長期特性，這證實了在封包丟失之情況下，只要保持先前良好接收的值，就可以很好地預測參數。 In Figures 3a and 3b, the measured diffusivity is based on DDR (by simulating the diffuse field with N=466 uncorrelated pink noises uniformly positioned on the sphere) and plane waves (by placing independent pink noises on the 0 degrees in azimuth and 0 degrees in elevation). It confirms that if the observation window length W is large enough, then The diffusivity measured in the DirAC analysis is a good estimate of the theoretical diffusivity. This means that the diffusivity has long-term properties, which confirms that in the event of packet loss, the parameter can be well predicted as long as the previously well-received value is maintained.

另一方面，方向參數估計亦可根據真實擴散度來評估，其在圖4中報告。可以看出，平面波位置之估計仰角及方位角偏離真值位置(0度方位角及0度仰角)，標準偏差隨擴散度而增大。對於擴散度1，標準偏差對於界定於0度與360度之間的方位角為約90度，對應於均勻分佈的完全隨機角度。換言之，方位角由此無意義。可對仰角進行相同觀測。一般而言，估計方向之準確度及其有意義性隨擴散度而減小。由此預期，DirAC中的方向將隨時間推移而波動，且隨著擴散度之變化而偏離其期望值。此自然分散為DirAC模型之部分，其對真實地再現音訊場景為至關重要的。實際上，以恆定方向呈現DirAC之方向性分量(即使擴散度高)將產生實際感知上應該更寬的點源。 On the other hand, the directional parameter estimates can also be evaluated based on the true diffusivity, which is reported in Figure 4. It can be seen that the estimated elevation angle and azimuth angle of the plane wave position deviate from the true value position (azimuth angle of 0 degrees and elevation angle of 0 degrees), and the standard deviation increases with the degree of dispersion. For a diffusivity of 1, the standard deviation is about 90 degrees for azimuth angles defined between 0 and 360 degrees, corresponding to a uniform distribution of completely random angles. In other words, the azimuth angle is thus meaningless. The same observation can be made for the elevation angle. In general, the accuracy and significance of estimating directions decreases with diffusivity. It is thus expected that the direction in DirAC will fluctuate over time and deviate from its expected value as the diffusivity changes. This natural dispersion is part of the DirAC model, which is crucial for realistically reproducing the audio scene. In fact, presenting the directional component of DirAC in a constant direction (even if the diffusivity is high) will result in a point source that should actually be perceived to be wider.

出於上述原因，吾人提出除了保持策略之外還對方向應用抖動。抖動之振幅根據擴散度確定，且可例如遵循圖4中繪製之模型。可導出用於仰角及仰角量測角度之兩個模型，其標準偏差表達為：σ_azi=65Ψ^3.5+σ_ele For the above reasons, we propose to apply dithering to the orientation in addition to the hold strategy. The amplitude of the jitter is determined according to the diffusivity, and may, for example, follow the model plotted in FIG. 4 . Two models for elevation angle and elevation angle measurement can be derived, and the standard deviation is expressed as: σ _azi =65Ψ ^3.5 +σ _ele

σ_ele=33.25Ψ+1.25 σ _ele =33.25Ψ+1.25

DirAC參數消隱之偽碼由此可為：

The pseudocode for DirAC parameter blanking can thus be:

其中bad_frame_indicator[k]為指示索引k處之訊框是否良好接收之旗標。在訊框良好之情況下，對於對應於給定頻率範圍的每一參數範圍讀取、解碼及去量化DirAC參數。在訊框不良之情況下，直接保持來自相同參數範圍的最末良好接收訊框的擴散度，同時藉由利用注入根據擴散度索引以一因數縮放的隨機值而對最末良好接收索引進行去量化來導出方位角及仰角。函數random( )根據給定分佈輸出隨機值。隨機過程可例如遵循具有均值零及單位變化之標準常態分佈。替代地，其可遵循在-1與1之間的均勻分佈，或遵循使用例如以下偽碼的三角形概率密度：

where bad_frame_indicator[k] is a flag indicating whether the frame at index k is well received. With a good frame, the DirAC parameters are read, decoded and dequantized for each parameter range corresponding to a given frequency range. In the case of a bad frame, the diffusivity of the last good received frame from the same parameter range is directly maintained, while the last good received index is degraded by injecting a random value scaled by a factor according to the diffusivity index Quantization to derive azimuth and elevation. The function random( ) outputs random values according to a given distribution. A stochastic process may, for example, follow a standard normal distribution with mean zero and unit variation. Alternatively, it can follow a uniform distribution between -1 and 1, or follow a triangular probability density using, for example, the following pseudocode:

抖動尺度隨自相同參數範圍的最末良好接收訊框繼承的擴散度索引而變，且可依據自圖4推斷的模型導出。舉例而言，在於8個索引上寫碼擴散度的情況下，其可對應於下表：

另外，亦可取決於降混信號之性質來操控抖動強度。實際上，音調多變之信號傾向於與非音調信號一樣感知為更局域化之源。因此，可接著藉助於減小音調項目之抖動效應來根據所傳輸降混之調性調整抖動。可例如藉由計算長期預測增益而在時域中或藉由量測頻譜扁平度來在頻域中量測調性。 The jitter scale is a function of the diffusivity index inherited from the last well received frame of the same parameter range, and can be derived according to the model inferred from FIG. 4 . For example, in the case of writing code diffusivity over 8 indices, it may correspond to the following table:

In addition, the jitter strength can also be manipulated depending on the properties of the downmix signal. In fact, pitch-variable signals tend to be perceived as more localized sources, as are non-pitch signals. Thus, the dithering can then be adjusted according to the tone of the transmitted downmix by reducing the dithering effect of the tonal item. Tonality can be measured in the time domain, eg, by calculating the long-term prediction gain, or in the frequency domain by measuring the spectral flatness.

關於圖6a及圖6b，將論述涉及用於解碼DirAC經編碼音訊場景之方法(參見圖6a，方法200)及用於DirAC經編碼音訊場景之解碼器17(參見圖6b)之其他實施例。 With regard to Figures 6a and 6b, further embodiments involving a method for decoding a DirAC encoded audio scene (see Figure 6a, method 200) and a decoder 17 (see Figure 6b) for a DirAC encoded audio scene will be discussed.

圖6a說明包含方法100之步驟110、120及130及額外解碼步驟210之新方法200。解碼步驟使得能夠藉由使用第一組空間音訊參數及第二組空間音訊參數來解碼包含降混(未展示)之DirAC經編碼音訊場景，其中此處，使用藉由步驟130輸出的經替換第二組。此概念由圖6b展示的設備17使用。圖6b展示解碼器70，其包含用於空間音訊參數15之丟失消隱之處理器及DirAC解碼器72。DirAC解碼器72或更詳細地，DirAC解碼器72之處理器，接收例如直接來自介面52及/或由處理器52根據上文論述的方法處理的降混信號及該等組空間音訊參數。 FIG. 6a illustrates a new method 200 comprising steps 110 , 120 and 130 of method 100 and an additional decoding step 210 . The decoding step enables decoding a DirAC encoded audio scene including a downmix (not shown) by using the first set of spatial audio parameters and the second set of spatial audio parameters, wherein here, using The second group is replaced by the output of step 130 . This concept is used by the device 17 shown in Figure 6b. FIG. 6b shows a decoder 70 that includes a processor for loss concealment of spatial audio parameters 15 and a DirAC decoder 72 . DirAC decoder 72, or more specifically, the processor of DirAC decoder 72, receives, for example, the downmix signal and the set of spatial audio parameters directly from interface 52 and/or processed by processor 52 according to the methods discussed above.

儘管已在設備之上下文中描述一些態樣，但顯然，此等態樣亦表示對應方法之描述，其中區塊或裝置對應於方法步驟或方法步驟之特徵。類似地，方法步驟之上下文中所描述之態樣亦表示對應區塊或項目或對應設備之特徵的描述。可由(或使用)硬體設備(比如微處理器、可規劃電腦或電子電路)執行方法步驟中之一些或全部。在一些實施例中，可由此設備執行最重要之方法步驟中之某一或多者。 Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, wherein a block or means corresponds to a method step or a feature of a method step. Similarly, aspects described in the context of method steps also represent descriptions of features of corresponding blocks or items or corresponding devices. Some or all of the method steps may be performed by (or using) hardware devices such as microprocessors, programmable computers or electronic circuits. In some embodiments, one or more of the most important method steps may be performed by the apparatus.

本發明經編碼音訊信號可儲存於數位儲存媒體上或可在諸如無線傳輸媒體之傳輸媒體或諸如網際網路之有線傳輸媒體上傳輸。 The encoded audio signal of the present invention may be stored on a digital storage medium or may be transmitted over a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.

取決於某些實施要求，本發明之實施例可在硬體或軟體中實施。實施可使用數位儲存媒體來執行，該媒體例如軟性磁碟、DVD、Blu-Ray、CD、ROM、PROM、EPROM、EEPROM或快閃記憶體，該媒體上儲存有電子可讀控制信號，該等電子可讀控制信號與可程式化電腦系統協作(或能夠協作)，使得執行各別方法。因此，數位儲存媒體可係電腦可讀的。 Depending on certain implementation requirements, embodiments of the invention may be implemented in hardware or software. Implementations may be performed using a digital storage medium, such as a floppy disk, DVD, Blu-Ray, CD, ROM, PROM, EPROM, EEPROM, or flash memory on which electronically readable control signals are stored, such as Electronically readable control signals cooperate (or can cooperate) with the programmable computer system to cause the respective methods to be performed. Thus, the digital storage medium can be computer-readable.

根據本發明之一些實施例包含具有電子可讀控制信號之資料載體，其能夠與可規劃電腦系統協作，使得執行本文中所描述之方法中的一者。 Some embodiments according to the invention comprise a data carrier with electronically readable control signals capable of cooperating with a programmable computer system such that one of the methods described herein is performed.

通常，本發明之實施例可實施為具有程式碼之電腦程式產品，當電腦程式產品在電腦上執行時，程式碼操作性地用於執行該等方法中之一者。程式碼可例如儲存於機器可讀載體上。 Generally, embodiments of the present invention may be implemented as a computer program product having code operative to perform one of the methods when the computer program product is executed on a computer. The code can be stored, for example, on a machine-readable carrier.

其他實施例包含儲存於機器可讀載體上，用於執行本文中所描述之方法中的一者的電腦程式。 Other embodiments include storage on a machine-readable carrier for performing the operations described herein A computer program of one of the methods.

換言之，本發明方法之實施例因此為電腦程式，其具有用於在電腦程式於電腦上執行時執行本文中所描述之方法中之一者的程式碼。 In other words, an embodiment of the method of the present invention is thus a computer program having code for performing one of the methods described herein when the computer program is executed on a computer.

因此，本發明方法之另一實施例為資料載體(或數位儲存媒體，或電腦可讀媒體)，其包含記錄於其上的用於執行本文中所描述之方法中之一者的電腦程式。資料載體、數位儲存媒體或所記錄媒體典型地為有形的及/或非暫時性的。 Thus, another embodiment of the method of the present invention is a data carrier (or digital storage medium, or computer readable medium) comprising a computer program recorded thereon for performing one of the methods described herein. Data carriers, digital storage media or recorded media are typically tangible and/or non-transitory.

因此，本發明方法之另一實施例為表示用於執行本文中所描述之方法中的一者之電腦程式之資料串流或信號序列。資料串流或信號序列可例如經組態以經由資料通訊連接(例如，經由網際網路)而傳送。 Thus, another embodiment of the method of the present invention is a data stream or signal sequence representing a computer program for performing one of the methods described herein. A data stream or sequence of signals may, for example, be configured to be transmitted over a data communication connection (eg, via the Internet).

另一實施例包含處理構件，例如經組配或經調適以執行本文中所描述之方法中的一者的電腦或可規劃邏輯裝置。 Another embodiment includes processing means, such as a computer or programmable logic device configured or adapted to perform one of the methods described herein.

另一實施例包含電腦，其上安裝有用於執行本文中所描述之方法中之一者的電腦程式。 Another embodiment includes a computer having installed thereon a computer program for performing one of the methods described herein.

根據本發明之另一實施例包含經組態以(例如，電子地或光學地)傳送用於執行本文中所描述之方法中之一者的電腦程式至接收器的設備或系統。舉例而言，接收器可為電腦、行動裝置、記憶體裝置等等。設備或系統可(例如)包含用於將電腦程式傳送至接收器之檔案伺服器。 Another embodiment in accordance with the present invention includes an apparatus or system configured to transmit (eg, electronically or optically) a computer program for performing one of the methods described herein to a receiver. For example, the receiver may be a computer, a mobile device, a memory device, or the like. A device or system may, for example, include a file server for transmitting computer programs to receivers.

在一些實施例中，可規劃邏輯裝置(例如，場可規劃閘陣列)可用以執行本文中所描述之方法的功能性中之一些或全部。在一些實施例中，場可程式化閘陣列可與微處理器協作，以便執行本文中所描述之方法中之一者。通常，該等方法較佳地由任何硬體設備來執行。 In some embodiments, a programmable logic device (eg, an array of field programmable gates) may be used to perform some or all of the functionality of the methods described herein. In some embodiments, a field programmable gate array can cooperate with a microprocessor in order to perform one of the methods described herein. In general, these methods are preferably performed by any hardware device.

上述實施例僅說明本發明之原理。應理解，熟習此項技術者將顯而易見對本文中所描述之配置及細節的修改及變化。因此，其僅意欲由接下來之申請專利範圍之範疇限制，而非由藉由本文中實施例之描述解釋所呈現的特定細節限制。 The above-described embodiments merely illustrate the principles of the present invention. It will be understood that modifications and variations of the configurations and details described herein will be apparent to those skilled in the art. Therefore, it is only intended that the following The scope of the claims is to be limited in scope, rather than by the specific details presented by way of explanation of the embodiments herein.

參考文獻 references

˙[1] V. Pulkki, M-V. Laitinen, J. Vilkamo, J. Ahonen, T. Lokki，及T. Pihlajamäki, “Directional audio coding - perception-based reproduction of spatial sound” , International Workshop on the Principles and Application on Spatial Hearing, 2009年11月，Zao; Miyagi, Japan. ˙[1] V. Pulkki, M-V. Laitinen, J. Vilkamo, J. Ahonen, T. Lokki, and T. Pihlajamäki, “Directional audio coding - perception-based reproduction of spatial sound”, International Workshop on the Principles and Application on Spatial Hearing, November 2009, Zao; Miyagi, Japan.

˙[2] V. Pulkki, “Virtual source positioning using vector base amplitude panning” , J. Audio Eng. Soc., 45(6):456-466, 1997年6月. ˙[2] V. Pulkki, “Virtual source positioning using vector base amplitude panning”, J. Audio Eng. Soc., 45(6):456-466, June 1997.

˙[3] J. Ahonen and V. Pulkki, “Diffuseness estimation using temporal variation of intensity vectors” , Workshop on Applications of Signal Processing to Audio and Acoustics WASPAA, Mohonk Mountain House, New Paltz, 2009. ˙[3] J. Ahonen and V. Pulkki, “Diffuseness estimation using temporal variation of intensity vectors”, Workshop on Applications of Signal Processing to Audio and Acoustics WASPAA, Mohonk Mountain House, New Paltz, 2009.

˙[4] T. Hirvonen, J. Ahonen, and V. Pulkki, “Perceptual compression methods for metadata in Directional Audio Coding applied to audiovisual teleconference” , AES 126th Convention 2009, May 7-10, Munich, Germany. ˙[4] T. Hirvonen, J. Ahonen, and V. Pulkki, “Perceptual compression methods for metadata in Directional Audio Coding applied to audiovisual teleconference”, AES 126th Convention 2009, May 7-10, Munich, Germany.

˙[5] A. Politis, J. Vilkamo and V. Pulkki, "Sector-Based Parametric Sound Field Reproduction in the Spherical Harmonic Domain," in IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 5，第852-866頁，2015年8月. ˙[5] A. Politis, J. Vilkamo and V. Pulkki, "Sector-Based Parametric Sound Field Reproduction in the Spherical Harmonic Domain," in IEEE Journal of Selected Topics in Signal Processing , vol. 9, no. 5, p. Pages 852-866, August 2015.

100:方法 100: Method

110,120,130:基本步驟 110, 120, 130: Basic steps

Claims

A method for loss concealment of spatial audio parameters including at least one direction of arrival information, the method comprising the steps of: receiving a first set of information including at least one first direction of arrival (azi1, ele1) Spatial audio parameters; receive a second set of spatial audio parameters including at least one second direction of arrival (azi2, ele2) information; and if at least the second direction of arrival (azi2, ele2) information or the second direction of arrival (azi2, ele2) information ele2) part of the information is lost or damaged, replace the second direction of arrival (azi2, ele2) information of a second group with a replacement direction of arrival information derived from the first direction of arrival (azi1, ele1) information; wherein the The replacement step includes the step of dithering the replacement direction of arrival information; and/or wherein the replacement step includes injecting random noise into the first direction of arrival (azi1, ele1) information to obtain the replacement direction of arrival information; wherein if the first A diffusivity information (Ψ1) or a second diffusivity information (Ψ2) indicates a high diffusivity and/or if the first diffusivity information (Ψ1) or the second diffusivity information (Ψ2) is higher than that used for diffusing a predetermined threshold value of the degree information, the injection step is performed.

The method of claim 1, wherein the first group (1st group) of spatial audio parameters and the second group (2nd group) of spatial audio parameters comprise the first diffusivity information (Ψ1) and the second diffusivity, respectively Information (Ψ2).

The method of claim 2, wherein the first diffusivity information (Ψ1) or the second diffusivity information (Ψ2) is derived from at least one energy ratio associated with at least one direction of arrival information.

The method of claim 2, wherein the method further comprises replacing the second diffusivity information (Ψ2) of a second set (group 2) with a replacement diffusivity information derived from the first diffusivity information (Ψ1) .

The method of claim 1, wherein the alternate direction of arrival information conforms to the first direction of arrival (azi1, ele1) information.

The method of claim 1, wherein the diffusivity information comprises or is based on an audio scene described by the first (Group 1) spatial audio parameters and/or the second (Group 2) spatial audio parameters A ratio between the directional component and the non-directional component.

The method of claim 1, wherein the random noise to be injected depends on the first diffusivity information (Ψ1) and/or the second diffusivity information (Ψ2); and/or wherein the random noise to be injected Scaling is performed by a factor depending on the first diffusivity information (Ψ1) and/or the second diffusivity information (Ψ2).

The method of claim 1, further comprising the step of: analyzing the tonality of an audio scene described by the first set (group 1) of spatial audio parameters and/or the second set (group 2) of spatial audio parameters , or analyze the key of the downmix transmitted by one of the first (group 1) spatial audio parameters and/or one of the second (group 2) spatial audio parameters to obtain a key describing the key and wherein the random noise to be injected depends on the tonal value.

The method of claim 8, wherein the random noise is scaled down by a factor that decreases with the inverse of the tonal value, or if the tonal increases, the random noise is scaled down.

The method of claim 1, wherein the method includes the step of extrapolating the first direction of arrival (azi1, ele1) information to obtain the alternate direction of arrival information.

The method of claim 10, wherein the extrapolation is based on one or more additional direction of arrival information belonging to one or more sets of spatial audio parameters.

The method of claim 11, wherein if the first diffusivity information (Ψ1) and/or the second diffusivity information (Ψ2) indicates a low diffusivity or if the first diffusivity information (Ψ1) and/or the The extrapolation is performed when the second diffusivity information (Ψ2) is below a predetermined threshold value for the diffusivity information.

The method of claim 1, wherein the first group (Group 1) of spatial audio parameters belongs to a first time point and/or a first frame, and wherein the second group (the second group) of spatial audio parameters belongs to a second time point and/or a second frame; or wherein the first group ( Group 1) spatial audio parameters belong to a first time point, and wherein the second time point is after the first time point, or wherein the second frame is after the first frame.

The method of claim 1, wherein the first set (set 1) of spatial audio parameters comprises a first subset of spatial audio parameters for a first frequency band and one of spatial audio parameters for a second frequency band and/or wherein the second set (set 2) of spatial audio parameters includes another first subset of spatial audio parameters for the first frequency band and spatial audio parameters for the second frequency band Another second subset.

A method for decoding a DirAC encoded audio scene, comprising the steps of: decoding the DirAC encoded audio scene comprising a downmix, a first set of spatial audio parameters, and a second set of spatial audio parameters; according to a previous request One of 1 to 14 executes the method.

A computer readable bit storage medium having a computer program stored thereon, the computer program having a program code for executing the method of claim 1 or claim 15 when run on a computer.

A loss concealment device for loss concealment of spatial audio parameters, the spatial audio parameters including at least one direction of arrival information, the device comprising: a receiver for receiving a first direction of arrival (azi1, ele1 ) information for a first set of spatial audio parameters and for receiving a second set of spatial audio parameters including a second direction of arrival (azi2, ele2) information; a processor for at least the second direction of arrival ( azi2, ele2) information or part of the second arrival direction (azi2, ele2) information is lost or damaged, use the first arrival direction (azi1, ele1) An alternative direction of arrival information derived from the information replaces the second direction of arrival (azi2, ele2) information of the second set; wherein the replacing step includes the step of dithering the alternative direction of arrival information; and/or wherein the replacing step includes converting Random noise is injected into the first direction of arrival (azi1, ele1) information to obtain the alternate direction of arrival information; wherein if a first diffusivity information (Ψ1) or a second diffusivity information (Ψ2) indicates a high diffusivity and/or if the first diffusivity information (Ψ1) or the second diffusivity information (Ψ2) is above a predetermined threshold value for diffusivity information, the implanting step is performed.

A decoder for a DirAC encoded audio scene comprising a loss concealment device as claimed in item 17.