TWI847276B

TWI847276B - Encoding/decoding method, apparatus, device, storage medium, and computer program product

Info

Publication number: TWI847276B
Application number: TW111135552A
Authority: TW
Inventors: 劉帥; 高原; 王賓; 王喆
Original assignee: 大陸商華為技術有限公司
Priority date: 2021-09-29
Filing date: 2022-09-20
Publication date: 2024-07-01

Abstract

Embodiments of the present application disclose an encoding/decoding method, an apparatus, a device, a storage medium, and a computer program product, which belongs to the field of audio processing technologies. In the method, an HOA signal of an audio frame is encoded/decoded by combining a coding/decoding scheme selected based on a virtual loudspeaker and a coding/decoding scheme based on directional audio coding, that is, an appropriate coding/decoding scheme is selected for different audio frames, so that a compression rate of an audio signal can be improved. In addition, in order to achieve a smooth transition of auditory quality when switching between different encoding/decoding schemes, for some audio frames, a new encoding and decoding scheme is used to encode/decode these audio frames instead of directly using either of the foregoing two encoding/decoding schemes. That is, signals of a specified channel in the HOA signals of these audio frames are encoded into a bit stream, that is, encoding/decoding is performed by using a compromise solution, so that hearing quality after the HOA signals recovered by decoding are rendered and played can be smoothly transitioned.

Description

Coding and decoding method, device, equipment, storage medium and computer program product

本申請實施例涉及音訊處理技術領域，特別涉及一種編解碼方法、裝置、設備、儲存媒體及電腦程式產品。The present application embodiment relates to the field of audio processing technology, and more particularly to a coding and decoding method, device, equipment, storage medium and computer program product.

高階立體混響（higher order ambisonics，HOA）技術作為一種三維音訊技術，因其在進行三維音訊重播時具有更高的靈活性，因而得到了廣泛的關注。為了實現更好的聽覺效果，HOA 技術需要大量的資料記錄詳細的聲音場景資訊。但隨著HOA階數的增加將會產生更多的資料，大量的資料造成傳輸和儲存的困難。因此如何對HOA信號進行編解碼成為目前重點關注的問題。As a three-dimensional audio technology, higher order ambisonics (HOA) technology has received widespread attention because of its higher flexibility in three-dimensional audio playback. In order to achieve better auditory effects, HOA technology requires a large amount of data to record detailed sound scene information. However, as the HOA order increases, more data will be generated, and a large amount of data will cause difficulties in transmission and storage. Therefore, how to encode and decode HOA signals has become a key issue of concern at present.

相關技術提出了兩種對HOA信號進行編解碼的方案。其中一種方案為基於方向音訊編碼（directional audio coding，DirAC）的編解碼方案。在該方案中，編碼端從當前幀的HOA信號中提取核心層信號和空間參數，將提取的核心層信號和空間參數編入碼流。解碼端採用與編碼對稱的解碼方法從碼流中重建出當前幀的HOA信號。另一種方案為基於虛擬揚聲器選擇的編解碼方案。在該方案中，編碼端基於匹配投影（match-projection，MP）演算法從虛擬揚聲器集合中選擇與當前幀的HOA信號匹配的目標虛擬揚聲器，基於當前幀的HOA信號和目標虛擬揚聲器，確定虛擬揚聲器信號，基於當前幀的HOA信號和虛擬揚聲器信號確定殘差信號，將虛擬揚聲器信號和殘差信號編入碼流。解碼端採用與編碼對稱的解碼方法從碼流中重建出當前幀的HOA信號。Related technologies propose two schemes for encoding and decoding HOA signals. One of the schemes is a coding and decoding scheme based on directional audio coding (DirAC). In this scheme, the encoder extracts the core layer signal and spatial parameters from the HOA signal of the current frame and encodes the extracted core layer signal and spatial parameters into the bitstream. The decoder uses a decoding method symmetrical to the encoding to reconstruct the HOA signal of the current frame from the bitstream. The other scheme is a coding and decoding scheme based on virtual speaker selection. In this scheme, the encoder selects a target virtual speaker that matches the HOA signal of the current frame from a set of virtual speakers based on the match-projection (MP) algorithm, determines the virtual speaker signal based on the HOA signal of the current frame and the target virtual speaker, determines the residual signal based on the HOA signal of the current frame and the virtual speaker signal, and encodes the virtual speaker signal and the residual signal into the bitstream. The decoder uses a decoding method symmetric to the encoding to reconstruct the HOA signal of the current frame from the bitstream.

然而，對於聲場中相異性聲源較少的情況，基於虛擬揚聲器選擇的編解碼方案的壓縮率較高，對於聲場中相異性聲源較多的情況，基於DirAC的編解碼方案的壓縮率較高。其中，相異性聲源指聲源的位置和/或方向不同的點聲源。而不同音訊幀的聲場類型（與聲場中相異性聲源相關）可能不同，如果想要同時滿足對不同聲場類型下的音訊幀均有較高的壓縮率，需要根據各音訊幀的聲場類型為相應音訊幀選擇合適的編解碼方案，這樣就需要在不同的編解碼方案之間進行切換。但基於不同的編解碼方案重建出的HOA信號經過渲染重播後的聽覺品質不同，在不同的編解碼方案之間進行切換時，如何保證聽覺品質的平滑過渡是當前需要考慮的問題。However, when there are fewer heterogeneous sound sources in the sound field, the compression rate of the codec scheme selected based on the virtual speaker is higher. When there are more heterogeneous sound sources in the sound field, the compression rate of the codec scheme based on DirAC is higher. Among them, heterogeneous sound sources refer to point sound sources with different positions and/or directions. The sound field types of different audio frames (related to the heterogeneous sound sources in the sound field) may be different. If you want to satisfy the high compression rate of audio frames under different sound field types at the same time, you need to select a suitable codec scheme for the corresponding audio frame according to the sound field type of each audio frame, so you need to switch between different codec schemes. However, the HOA signals reconstructed based on different coding and decoding schemes have different auditory qualities after rendering and replay. When switching between different coding and decoding schemes, how to ensure a smooth transition of auditory quality is a problem that needs to be considered at present.

本申請實施例提供了一種編解碼方法、裝置、設備、儲存媒體及電腦程式產品，能夠在不同的編解碼方案之間進行切換時，保證聽覺品質的平滑過渡。所述技術方案如下：The present application embodiment provides a coding method, device, apparatus, storage medium and computer program product, which can ensure smooth transition of auditory quality when switching between different coding schemes. The technical solution is as follows:

第一方面，提供了一種編碼方法，該方法包括：根據當前幀的HOA信號確定當前幀的編碼方案，當前幀的編碼方案為第一編碼方案、第二編碼方案和第三編碼方案中的一種；其中，第一編碼方案為基於方向音訊編碼的HOA編碼方案（即DirAC解碼方案），第二編碼方案為基於虛擬揚聲器選擇的HOA編碼方案（可以簡稱為基於MP的HOA解碼方案），第三編碼方案為混合編碼方案；若當前幀的編碼方案為第三編碼方案，則將該HOA信號中指定通道的信號編入碼流，指定通道為該HOA信號的所有通道中的部分通道。其中，混合編碼方案在編碼過程中既會使用第一編碼方案（即DirAC編碼方案）相關的技術手段，也會使用第二編碼方案（基於MP的HOA編碼方案）相關的技術手段，所以叫混合編碼方案。In a first aspect, a coding method is provided, the method comprising: determining a coding scheme of a current frame according to an HOA signal of the current frame, the coding scheme of the current frame being one of a first coding scheme, a second coding scheme and a third coding scheme; wherein the first coding scheme is an HOA coding scheme based on directional audio coding (i.e., a DirAC decoding scheme), the second coding scheme is an HOA coding scheme based on virtual speaker selection (which can be simply referred to as an MP-based HOA decoding scheme), and the third coding scheme is a hybrid coding scheme; if the coding scheme of the current frame is the third coding scheme, a signal of a specified channel in the HOA signal is encoded into a bitstream, the specified channel being some of all channels of the HOA signal. Among them, the hybrid coding scheme uses both technical means related to the first coding scheme (i.e., the DirAC coding scheme) and technical means related to the second coding scheme (the MP-based HOA coding scheme) during the coding process, so it is called a hybrid coding scheme.

在本申請實施例中，針對不同的音訊幀選擇合適的編解碼方案，這樣能提升音訊信號的壓縮率。同時，對於某些音訊幀來說，並非直接採用第一編碼方案和第二編碼方案中的任一個，而是採用一種新的編解碼方案來編解碼這些音訊幀，即將這些音訊幀的HOA信號中指定通道的信號編入碼流，即採用一種折衷的方案進行編解碼，從而使得對解碼恢復出的HOA信號進行渲染播放後的聽覺品質能夠平滑過渡。In the embodiment of the present application, a suitable coding scheme is selected for different audio frames, which can improve the compression rate of the audio signal. At the same time, for some audio frames, instead of directly adopting any of the first coding scheme and the second coding scheme, a new coding scheme is adopted to encode and decode these audio frames, that is, the signals of the specified channels in the HOA signals of these audio frames are encoded into the bitstream, that is, a compromise scheme is adopted for coding and decoding, so that the auditory quality of the HOA signal after rendering and playback after decoding and recovery can be smoothly transitioned.

可選地，指定通道的信號包括一階立體混響（first-order ambisonics，FOA）信號，FOA信號包括全向的W信號，以及定向的X信號、Y信號和Z信號。Optionally, the signal of the designated channel includes a first-order ambisonics (FOA) signal, and the FOA signal includes an omnidirectional W signal, and directional X signal, Y signal, and Z signal.

可選地，將HOA信號中指定通道的信號編入碼流，包括：基於W信號、X信號、Y信號和Z信號，確定虛擬揚聲器信號和殘差信號；將虛擬揚聲器信號和殘差信號編入碼流。Optionally, encoding the signal of a designated channel in the HOA signal into a bitstream includes: determining a virtual speaker signal and a residual signal based on the W signal, the X signal, the Y signal, and the Z signal; and encoding the virtual speaker signal and the residual signal into the bitstream.

可選地，基於W信號、X信號、Y信號和Z信號，確定虛擬揚聲器信號和殘差信號，包括：將W信號確定為一路虛擬揚聲器信號；基於W信號、X信號、Y信號和Z信號確定三路殘差信號，或者，將X信號、Y信號和Z信號確定為三路殘差信號。可選地，將X信號、Y信號和Z信號分別與W信號之間的差信號確定為三路殘差信號。Optionally, determining the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal, and the Z signal includes: determining the W signal as one virtual speaker signal; determining three residual signals based on the W signal, the X signal, the Y signal, and the Z signal, or determining the X signal, the Y signal, and the Z signal as three residual signals. Optionally, determining the difference signals between the X signal, the Y signal, and the Z signal and the W signal as three residual signals.

可選地，將虛擬揚聲器信號和殘差信號編入碼流，包括：將這一路虛擬揚聲器信號與第一路預設單聲道信號組合，以得到一路身歷聲信號；將這三路殘差信號與第二路預設單聲道信號組合，以得到兩路身歷聲信號；通過身歷聲編碼器將得到的三路身歷聲信號分別編入碼流。Optionally, encoding the virtual speaker signal and the residual signal into the bitstream includes: combining the virtual speaker signal with the first preset mono signal to obtain a stereo signal; combining the three residual signals with the second preset mono signal to obtain two stereo signals; and encoding the obtained three stereo signals into the bitstream respectively through a stereo encoder.

可選地，將這三路殘差信號與第二路預設單聲道信號組合，以得到兩路身歷聲信號，包括：將這三路殘差信號中相關性最高的兩路殘差信號組合，以得到兩路身歷聲信號中的一路身歷聲信號；將這三路殘差信號中除相關性最高的兩路殘差信號之外的一路殘差信號與第二路預設單聲道信號組合，以得到兩路身歷聲信號中的另一路身歷聲信號。Optionally, the three channels of residual signals are combined with the second preset mono signal to obtain two channels of stereo sound signals, including: combining two channels of residual signals with the highest correlation among the three channels of residual signals to obtain one channel of the two channels of stereo sound signals; and combining a channel of residual signals other than the two channels of residual signals with the highest correlation among the three channels of residual signals with the second preset mono signal to obtain another channel of the two channels of stereo sound signals.

可選地，第一路預設單聲道信號為全零信號或全一信號，全零信號包括採樣點的值均為零的信號或者頻點的值均為零的信號，全一信號包括採樣點的值均為一的信號或者頻點的值均為一的信號；第二路預設單聲道信號為全零信號或全一信號；第一路預設單聲道信號與第二路預設單聲道信號相同或不同。Optionally, the first preset mono signal is an all-zero signal or an all-one signal, the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency values are all zero, and the all-one signal includes a signal whose sampling point values are all one or a signal whose frequency values are all one; the second preset mono signal is an all-zero signal or an all-one signal; the first preset mono signal is the same as or different from the second preset mono signal.

可選地，將虛擬揚聲器信號和殘差信號編入碼流，包括：通過單聲道編碼器將這一路虛擬揚聲器信號、以及這三路殘差信號中的各路殘差信號分別編入碼流。Optionally, encoding the virtual speaker signal and the residual signal into the bitstream includes: encoding the virtual speaker signal and each of the three residual signals into the bitstream respectively through a mono encoder.

可選地，根據當前幀的HOA信號確定當前幀的編碼方案之後，更包括：若當前幀的編碼方案為第一編碼方案，則按照第一編碼方案將該HOA信號編入碼流；若當前幀的編碼方案為第二編碼方案，則按照第二編碼方案將該HOA信號編入碼流。Optionally, after determining the coding scheme of the current frame according to the HOA signal of the current frame, it further includes: if the coding scheme of the current frame is the first coding scheme, encoding the HOA signal into the bitstream according to the first coding scheme; if the coding scheme of the current frame is the second coding scheme, encoding the HOA signal into the bitstream according to the second coding scheme.

可選地，根據當前幀的高階立體混響HOA信號確定當前幀的編碼方案，包括：根據該HOA信號確定當前幀的初始編碼方案，初始編碼方案為第一編碼方案或第二編碼方案；若當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案相同，則確定當前幀的編碼方案為當前幀的初始編碼方案；若當前幀的初始編碼方案為第一編碼方案且當前幀的前一幀的初始編碼方案為第二編碼方案，或當前幀的初始編碼方案為第二編碼方案且當前幀的前一幀的初始編碼方案為第一編碼方案，則確定當前幀的編碼方案為第三編碼方案。Optionally, determining the coding scheme of the current frame according to the high-order stereo reverberation HOA signal of the current frame includes: determining an initial coding scheme of the current frame according to the HOA signal, the initial coding scheme being the first coding scheme or the second coding scheme; if the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame, determining the coding scheme of the current frame to be the initial coding scheme of the current frame; if the initial coding scheme of the current frame is the first coding scheme and the initial coding scheme of the previous frame of the current frame is the second coding scheme, or the initial coding scheme of the current frame is the second coding scheme and the initial coding scheme of the previous frame of the current frame is the first coding scheme, determining the coding scheme of the current frame to be the third coding scheme.

可選地，根據該HOA信號確定當前幀的初始編碼方案之後，更包括：將當前幀的初始編碼方案的指示資訊編入碼流。Optionally, after determining the initial coding scheme of the current frame according to the HOA signal, it further includes: encoding indication information of the initial coding scheme of the current frame into the bitstream.

可選地，根據當前幀的高階立體混響HOA信號確定當前幀的編碼方案之後，更包括：確定當前幀的切換標誌的值，當當前幀的編碼方案為第一編碼方案或第二編碼方案時，當前幀的切換標誌的值為第一值；當當前幀的編碼方案為第三編碼方案時，當前幀的切換標誌的值為第二值；將切換標誌的值編入碼流。也即是，用切換標誌來指示當前幀是否為切換幀。Optionally, after determining the coding scheme of the current frame according to the high-order stereo reverberation HOA signal of the current frame, the method further includes: determining a value of a switching flag of the current frame, when the coding scheme of the current frame is the first coding scheme or the second coding scheme, the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the third coding scheme, the value of the switching flag of the current frame is the second value; and encoding the value of the switching flag into the bitstream. That is, the switching flag is used to indicate whether the current frame is a switching frame.

可選地，根據當前幀的HOA信號確定當前幀的編碼方案之後，更包括：將當前幀的編碼方案的指示資訊編入碼流。Optionally, after determining the coding scheme of the current frame according to the HOA signal of the current frame, it further includes: encoding indication information of the coding scheme of the current frame into the bitstream.

可選地，指定通道與第一編碼方案中預設的傳輸通道一致。這樣能夠保證切換幀的聽覺品質與採用第一編碼方案所編碼的音訊幀的聽覺品質相近。Optionally, the designated channel is consistent with a transmission channel preset in the first coding scheme, so as to ensure that the auditory quality of the switching frame is similar to the auditory quality of the audio frame encoded by the first coding scheme.

第二方面，提供了一種解碼方法，該方法包括：基於碼流獲得當前幀的解碼方案，當前幀的解碼方案為第一解碼方案、第二解碼方案和第三解碼方案中的一種；其中，第一解碼方案為基於方向音訊解碼的高階立體混響HOA解碼方案，第二解碼方案為基於虛擬揚聲器選擇的HOA解碼方案，第三解碼方案為混合解碼方案；若當前幀的解碼方案為第三解碼方案，則基於碼流確定當前幀的HOA信號中指定通道的信號，指定通道為該HOA信號的所有通道中的部分通道；基於指定通道的信號，確定該HOA 信號中除指定通道之外的一個或多個剩餘通道的增益；基於指定通道的信號和該一個或多個剩餘通道的增益，確定該一個或多個剩餘通道中各個剩餘通道的信號；基於指定通道的信號和該一個或多個剩餘通道的信號，獲得當前幀的重建HOA信號。其中，混合解碼方案在解碼過程中既會使用第一解碼方案（即DirAC解碼方案）相關的技術手段，也會使用第二解碼方案（基於MP的HOA解碼方案）相關的技術手段，所以叫混合解碼方案。In a second aspect, a decoding method is provided, the method comprising: obtaining a decoding scheme of a current frame based on a bit stream, the decoding scheme of the current frame being one of a first decoding scheme, a second decoding scheme and a third decoding scheme; wherein the first decoding scheme is a high-order stereo reverberation HOA decoding scheme based on directional audio decoding, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme; if the decoding scheme of the current frame is the third decoding scheme, determining a signal of a specified channel in the HOA signal of the current frame based on the bit stream, the specified channel being a portion of all channels of the HOA signal; determining the HOA based on the signal of the specified channel The gain of one or more residual channels other than the designated channel in the signal; based on the signal of the designated channel and the gain of the one or more residual channels, the signal of each residual channel in the one or more residual channels is determined; based on the signal of the designated channel and the signal of the one or more residual channels, the reconstructed HOA signal of the current frame is obtained. Among them, the hybrid decoding scheme uses both the technical means related to the first decoding scheme (i.e., the DirAC decoding scheme) and the technical means related to the second decoding scheme (the MP-based HOA decoding scheme) during the decoding process, so it is called a hybrid decoding scheme.

在本申請實施例中，由於編碼端採用第三編碼方案編碼當前幀的HOA信號時，將指定通道的信號編入了碼流，那麼解碼端從碼流中解析出指定通道的信號，之後基於指定通道的信號重建出剩餘通道的信號，進而重建出HOA信號。也即是，採用一種折衷的方案，從而使得對解碼恢復出的HOA信號進行渲染播放後的聽覺品質能夠平滑過渡。In the embodiment of the present application, since the encoder encodes the HOA signal of the current frame using the third encoding scheme, the signal of the designated channel is encoded into the bitstream, then the decoder parses the signal of the designated channel from the bitstream, and then reconstructs the signals of the remaining channels based on the signal of the designated channel, and then reconstructs the HOA signal. That is, a compromise solution is adopted, so that the auditory quality of the HOA signal after rendering and playback after decoding and recovery can be smoothly transitioned.

可選地，基於碼流確定當前幀的HOA信號中指定通道的信號，包括：基於碼流確定虛擬揚聲器信號和殘差信號；基於該虛擬揚聲器信號和殘差信號，確定指定通道的信號。Optionally, determining the signal of the designated channel in the HOA signal of the current frame based on the bitstream includes: determining a virtual speaker signal and a residual signal based on the bitstream; and determining the signal of the designated channel based on the virtual speaker signal and the residual signal.

可選地，基於碼流確定虛擬揚聲器信號和殘差信號，包括：通過身歷聲解碼器對碼流進行解碼，以得到三路身歷聲信號；基於這三路身歷聲信號，確定一路虛擬揚聲器信號和三路殘差信號。Optionally, determining the virtual speaker signal and the residual signal based on the bit stream includes: decoding the bit stream by a stereo decoder to obtain three-way stereo signals; and determining one virtual speaker signal and three-way residual signals based on the three-way stereo signals.

可選地，基於這三路身歷聲信號，確定一路虛擬揚聲器信號和三路殘差信號，包括：基於這三路身歷聲信號中的一路身歷聲信號，確定一路虛擬揚聲器信號；基於這三路身歷聲信號中的另兩路身歷聲信號，確定三路殘差信號。Optionally, determining one virtual speaker signal and three residual signals based on the three stereo sound signals includes: determining one virtual speaker signal based on one stereo sound signal among the three stereo sound signals; and determining three residual signals based on the other two stereo sound signals among the three stereo sound signals.

可選地，基於碼流確定虛擬揚聲器信號和殘差信號，包括：通過單聲道解碼器對碼流進行解碼，以得到一路虛擬揚聲器信號和三路殘差信號。Optionally, determining the virtual speaker signal and the residual signal based on the bit stream includes: decoding the bit stream by a mono decoder to obtain one virtual speaker signal and three residual signals.

可選地，指定通道的信號包括一階立體混響FOA信號，FOA信號包括全向的W信號，以及定向的X信號、Y信號和Z信號；基於虛擬揚聲器信號和殘差信號，確定指定通道的信號，包括：基於該虛擬揚聲器信號，確定W信號；基於該殘差信號與W信號確定X信號、Y信號和Z信號，或者，基於該殘差信號確定X信號、Y信號和Z信號。Optionally, the signal of the designated channel includes a first-order stereo reverberation FOA signal, the FOA signal includes an omnidirectional W signal, and directional X, Y, and Z signals; determining the signal of the designated channel based on the virtual speaker signal and the residue signal includes: determining the W signal based on the virtual speaker signal; determining the X, Y, and Z signals based on the residue signal and the W signal, or determining the X, Y, and Z signals based on the residue signal.

可選地，該方法更包括：若當前幀的解碼方案為第一解碼方案，則按照第一解碼方案，根據碼流獲得當前幀的重建HOA信號；若當前幀的解碼方案為第二解碼方案，則按照第二解碼方案，根據碼流獲得當前幀的重建HOA信號。Optionally, the method further includes: if the decoding scheme of the current frame is the first decoding scheme, then according to the first decoding scheme, the reconstructed HOA signal of the current frame is obtained according to the bit stream; if the decoding scheme of the current frame is the second decoding scheme, then according to the second decoding scheme, the reconstructed HOA signal of the current frame is obtained according to the bit stream.

可選地，按照第二解碼方案，根據碼流獲得當前幀的重建HOA信號，包括：按照第二解碼方案，根據碼流獲得初始HOA信號；若當前幀的前一幀的解碼方案為第三解碼方案，則根據當前幀的前一幀的高階增益，對初始HOA信號的高階部分進行增益調整；基於初始HOA信號的低階部分和經增益調整的高階部分，獲得重建HOA信號。也即是，通過高階增益調整，使得聽覺品質進一步地平滑過渡。Optionally, according to the second decoding scheme, the reconstructed HOA signal of the current frame is obtained according to the bit stream, including: according to the second decoding scheme, the initial HOA signal is obtained according to the bit stream; if the decoding scheme of the previous frame of the current frame is the third decoding scheme, the high-order part of the initial HOA signal is gain-adjusted according to the high-order gain of the previous frame of the current frame; based on the low-order part of the initial HOA signal and the gain-adjusted high-order part, the reconstructed HOA signal is obtained. That is, through the high-order gain adjustment, the auditory quality is further smoothed.

可選地，基於碼流獲得當前幀的解碼方案，包括：從碼流中解析出當前幀的切換標誌的值；若切換標誌的值為第一值，則從碼流中解析當前幀的解碼方案的指示資訊，指示資訊用於指示當前幀的解碼方案為第一解碼方案或第二解碼方案；若切換標誌的值為第二值，確定當前幀的解碼方案為第三解碼方案。Optionally, obtaining a decoding scheme for the current frame based on the bitstream includes: parsing a value of a switching flag of the current frame from the bitstream; if the value of the switching flag is a first value, parsing indication information of the decoding scheme of the current frame from the bitstream, the indication information being used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme; if the value of the switching flag is the second value, determining that the decoding scheme of the current frame is a third decoding scheme.

可選地，基於碼流獲得當前幀的解碼方案，包括：從碼流中解析出當前幀的解碼方案的指示資訊，指示資訊用於指示當前幀的解碼方案為第一解碼方案、第二解碼方案或第三解碼方案。Optionally, obtaining a decoding scheme for a current frame based on a bitstream includes: parsing indication information of a decoding scheme for the current frame from the bitstream, wherein the indication information is used to indicate that the decoding scheme for the current frame is a first decoding scheme, a second decoding scheme, or a third decoding scheme.

可選地，基於碼流獲得當前幀的解碼方案，包括：從碼流中解析出當前幀的初始解碼方案，初始解碼方案為第一解碼方案或第二解碼方案；若當前幀的初始解碼方案與當前幀的前一幀的初始解碼方案相同，則確定當前幀的解碼方案為當前幀的初始解碼方案；若當前幀的初始解碼方案為第一解碼方案且當前幀的前一幀的初始解碼方案為第二解碼方案，或當前幀的初始解碼方案為第二解碼方案且當前幀的前一幀的初始解碼方案為第一解碼方案，則確定當前幀的解碼方案為第三解碼方案。Optionally, obtaining a decoding scheme for a current frame based on a bitstream includes: parsing an initial decoding scheme for the current frame from the bitstream, the initial decoding scheme being a first decoding scheme or a second decoding scheme; if the initial decoding scheme for the current frame is the same as an initial decoding scheme for a frame previous to the current frame, determining the decoding scheme for the current frame to be the initial decoding scheme for the current frame; if the initial decoding scheme for the current frame is the first decoding scheme and the initial decoding scheme for the frame previous to the current frame is the second decoding scheme, or the initial decoding scheme for the current frame is the second decoding scheme and the initial decoding scheme for the frame previous to the current frame is the first decoding scheme, determining the decoding scheme for the current frame to be a third decoding scheme.

協力廠商面，提供了一種編碼裝置，所述編碼裝置具有實現上述第一方面中編碼方法行為的功能。所述編碼裝置包括一個或多個模組，該一個或多個模組用於實現上述第一方面所提供的編碼方法。A third party manufacturer provides a coding device, which has the function of implementing the coding method in the first aspect. The coding device includes one or more modules, and the one or more modules are used to implement the coding method provided in the first aspect.

也即是，提供了一種編碼裝置，該裝置包括：第一確定模組，用於根據當前幀的高階立體混響HOA信號確定當前幀的編碼方案，當前幀的編碼方案為第一編碼方案、第二編碼方案和第三編碼方案中的一種；其中，第一編碼方案為基於方向音訊編碼的HOA編碼方案，第二編碼方案為基於虛擬揚聲器選擇的HOA編碼方案，第三編碼方案為混合編碼方案；第一編碼模組，用於若當前幀的編碼方案為第三編碼方案，則將HOA信號中指定通道的信號編入碼流，指定通道為HOA信號的所有通道中的部分通道。That is, a coding device is provided, which includes: a first determination module, used to determine the coding scheme of the current frame according to the high-order stereo reverberation HOA signal of the current frame, the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme; wherein the first coding scheme is the HOA coding scheme based on directional audio coding, the second coding scheme is the HOA coding scheme based on virtual speaker selection, and the third coding scheme is a hybrid coding scheme; the first coding module is used to encode the signal of the specified channel in the HOA signal into the bitstream if the coding scheme of the current frame is the third coding scheme, and the specified channel is a part of all channels of the HOA signal.

可選地，指定通道的信號包括一階立體混響FOA信號，FOA信號包括全向的W信號，以及定向的X信號、Y信號和Z信號。Optionally, the signal of the designated channel includes a first-order stereo reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signal, Y signal, and Z signal.

可選地，第一編碼模組包括：第一確定子模組，用於基於W信號、X信號、Y信號和Z信號，確定虛擬揚聲器信號和殘差信號；編碼子模組，用於將該虛擬揚聲器信號和殘差信號編入碼流。Optionally, the first encoding module includes: a first determining submodule, used to determine the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal and the Z signal; and a coding submodule, used to encode the virtual speaker signal and the residual signal into a bitstream.

可選地，第一確定子模組用於：將W信號確定為一路虛擬揚聲器信號；基於W信號、X信號、Y信號和Z信號確定三路殘差信號，或者，將X信號、Y信號和Z信號確定為三路殘差信號。Optionally, the first determining submodule is used to: determine the W signal as one virtual speaker signal; determine three residual signals based on the W signal, the X signal, the Y signal and the Z signal, or determine the X signal, the Y signal and the Z signal as three residual signals.

可選地，編碼子模組用於：將這一路虛擬揚聲器信號與第一路預設單聲道信號組合，以得到一路身歷聲信號；將這三路殘差信號與第二路預設單聲道信號組合，以得到兩路身歷聲信號；通過身歷聲編碼器將得到的三路身歷聲信號分別編入碼流。Optionally, the encoding submodule is used to: combine the virtual speaker signal with the first preset mono signal to obtain a stereo signal; combine the three residual signals with the second preset mono signal to obtain two stereo signals; and encode the obtained three stereo signals into bit streams respectively through the stereo encoder.

可選地，編碼子模組用於：將這三路殘差信號中相關性最高的兩路殘差信號組合，以得到兩路身歷聲信號中的一路身歷聲信號；將這三路殘差信號中除相關性最高的兩路殘差信號之外的一路殘差信號與第二路預設單聲道信號組合，以得到兩路身歷聲信號中的另一路身歷聲信號。Optionally, the coding submodule is used to: combine the two most correlated residual signals among the three residual signals to obtain one stereo signal among the two stereo signals; and combine a residual signal other than the two most correlated residual signals among the three residual signals with the second preset mono signal to obtain the other stereo signal among the two stereo signals.

可選地，編碼子模組用於：通過單聲道編碼器將這一路虛擬揚聲器信號、以及這三路殘差信號中的各路殘差信號分別編入碼流。Optionally, the encoding submodule is used to encode the virtual speaker signal and each of the three residual signals into a bitstream respectively through a mono encoder.

可選地，該裝置更包括：第二編碼模組，用於若當前幀的編碼方案為第一編碼方案，則按照第一編碼方案將該HOA信號編入碼流；第三編碼模組，用於若當前幀的編碼方案為第二編碼方案，則按照第二編碼方案將該HOA信號編入碼流。Optionally, the device further includes: a second coding module, used to encode the HOA signal into the bitstream according to the first coding scheme if the coding scheme of the current frame is the first coding scheme; and a third coding module, used to encode the HOA signal into the bitstream according to the second coding scheme if the coding scheme of the current frame is the second coding scheme.

可選地，第一確定模組包括：第二確定子模組，用於根據該HOA信號確定當前幀的初始編碼方案，初始編碼方案為第一編碼方案或第二編碼方案；第三確定子模組，用於若當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案相同，則確定當前幀的編碼方案為當前幀的初始編碼方案；第四確定子模組，用於若當前幀的初始編碼方案為第一編碼方案且當前幀的前一幀的初始編碼方案為第二編碼方案，或當前幀的初始編碼方案為第二編碼方案且當前幀的前一幀的初始編碼方案為第一編碼方案，則確定當前幀的編碼方案為第三編碼方案。Optionally, the first determination module includes: a second determination submodule, used to determine an initial coding scheme of the current frame according to the HOA signal, the initial coding scheme being the first coding scheme or the second coding scheme; a third determination submodule, used to determine that the coding scheme of the current frame is the initial coding scheme of the current frame if the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame; and a fourth determination submodule, used to determine that the coding scheme of the current frame is the third coding scheme if the initial coding scheme of the current frame is the first coding scheme and the initial coding scheme of the previous frame of the current frame is the second coding scheme, or the initial coding scheme of the current frame is the second coding scheme and the initial coding scheme of the previous frame of the current frame is the first coding scheme.

可選地，該裝置更包括：第四編碼模組，用於將當前幀的初始編碼方案的指示資訊編入碼流。Optionally, the device further comprises: a fourth encoding module, used for encoding indication information of the initial encoding scheme of the current frame into the bitstream.

可選地，該裝置更包括：第二確定模組，用於確定當前幀的切換標誌的值，當當前幀的編碼方案為第一編碼方案或第二編碼方案時，當前幀的切換標誌的值為第一值；當當前幀的編碼方案為第三編碼方案時，當前幀的切換標誌的值為第二值；第五編碼模組，用於將該切換標誌的值編入碼流。Optionally, the device further includes: a second determination module, used to determine the value of the switching flag of the current frame, when the coding scheme of the current frame is the first coding scheme or the second coding scheme, the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the third coding scheme, the value of the switching flag of the current frame is the second value; a fifth coding module, used to encode the value of the switching flag into the bitstream.

可選地，該裝置更包括：第六編碼模組，用於將當前幀的編碼方案的指示資訊編入碼流。Optionally, the device further comprises: a sixth encoding module, used for encoding indication information of the encoding scheme of the current frame into the bitstream.

可選地，指定通道與第一編碼方案中預設的傳輸通道一致。Optionally, the designated channel is consistent with a transmission channel preset in the first coding scheme.

第四方面，提供了一種解碼裝置，所述解碼裝置具有實現上述第二方面中解碼方法行為的功能。所述解碼裝置包括一個或多個模組，該一個或多個模組用於實現上述第二方面所提供的解碼方法。In a fourth aspect, a decoding device is provided, wherein the decoding device has the function of implementing the decoding method in the second aspect. The decoding device includes one or more modules, wherein the one or more modules are used to implement the decoding method provided in the second aspect.

也即是，提供了一種解碼裝置，該裝置包括：第一獲得模組，用於基於碼流獲得當前幀的解碼方案，當前幀的解碼方案為第一解碼方案、第二解碼方案和第三解碼方案中的一種；其中，第一解碼方案為基於方向音訊解碼的高階立體混響HOA解碼方案，第二解碼方案為基於虛擬揚聲器選擇的HOA解碼方案，第三解碼方案為混合解碼方案；第一確定模組，用於若當前幀的解碼方案為第三解碼方案，則基於碼流確定當前幀的HOA信號中指定通道的信號，指定通道為HOA信號的所有通道中的部分通道；第二確定模組，用於基於指定通道的信號，確定HOA 信號中除指定通道之外的一個或多個剩餘通道的增益；第三確定模組，用於基於指定通道的信號和該一個或多個剩餘通道的增益，確定該一個或多個剩餘通道中各個剩餘通道的信號；第二獲得模組，用於基於指定通道的信號和該一個或多個剩餘通道的信號，獲得當前幀的重建HOA信號。That is, a decoding device is provided, which includes: a first acquisition module, which is used to obtain a decoding scheme of a current frame based on a bit stream, and the decoding scheme of the current frame is one of a first decoding scheme, a second decoding scheme and a third decoding scheme; wherein the first decoding scheme is a high-order stereo reverberation HOA decoding scheme based on directional audio decoding, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme; a first determination module, which is used to determine a signal of a specified channel in the HOA signal of the current frame based on the bit stream if the decoding scheme of the current frame is the third decoding scheme, and the specified channel is a part of all channels of the HOA signal; a second determination module, which is used to determine the HOA based on the signal of the specified channel. The signal includes a gain of one or more residual channels other than the designated channel in the signal; a third determination module is used to determine the signal of each residual channel in the one or more residual channels based on the signal of the designated channel and the gain of the one or more residual channels; and a second acquisition module is used to obtain the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signal of the one or more residual channels.

可選地，第一確定模組包括：第一確定子模組，用於基於碼流確定虛擬揚聲器信號和殘差信號；第二確定子模組，用於基於該虛擬揚聲器信號和殘差信號，確定指定通道的信號。Optionally, the first determination module includes: a first determination submodule, used to determine a virtual speaker signal and a residual signal based on a bit stream; and a second determination submodule, used to determine a signal of a designated channel based on the virtual speaker signal and the residual signal.

可選地，第一確定子模組用於：通過身歷聲解碼器對碼流進行解碼，以得到三路身歷聲信號；基於這三路身歷聲信號，確定一路虛擬揚聲器信號和三路殘差信號。Optionally, the first determination submodule is used to: decode the code stream through a stereo decoder to obtain three-way stereo signals; and determine one virtual speaker signal and three-way residual signals based on the three-way stereo signals.

可選地，第一確定子模組用於：基於這三路身歷聲信號中的一路身歷聲信號，確定一路虛擬揚聲器信號；基於這三路身歷聲信號中的另兩路身歷聲信號，確定三路殘差信號。Optionally, the first determination submodule is used to: determine a virtual speaker signal based on one of the three stereo sound signals; and determine three residual signals based on the other two of the three stereo sound signals.

可選地，第一確定子模組用於：通過單聲道解碼器對碼流進行解碼，以得到一路虛擬揚聲器信號和三路殘差信號。Optionally, the first determination submodule is used to: decode the code stream through a mono decoder to obtain one virtual speaker signal and three residual signals.

可選地，指定通道的信號包括一階立體混響FOA信號，FOA信號包括全向的W信號，以及定向的X信號、Y信號和Z信號；第一確定子模組用於：基於該虛擬揚聲器信號，確定W信號；基於該殘差信號與W信號確定X信號、Y信號和Z信號，或者，基於該殘差信號確定X信號、Y信號和Z信號。Optionally, the signal of the designated channel includes a first-order stereo reverberation FOA signal, the FOA signal includes an omnidirectional W signal, and directional X, Y, and Z signals; the first determination submodule is used to: determine the W signal based on the virtual speaker signal; determine the X, Y, and Z signals based on the residual signal and the W signal, or determine the X, Y, and Z signals based on the residual signal.

可選地，該裝置更包括：第一解碼模組，用於若當前幀的解碼方案為第一解碼方案，則按照第一解碼方案，根據碼流獲得當前幀的重建HOA信號；第二解碼模組，用於若當前幀的解碼方案為第二解碼方案，則按照第二解碼方案，根據碼流獲得當前幀的重建HOA信號。Optionally, the device further includes: a first decoding module, which is used to obtain a reconstructed HOA signal of the current frame according to the bit stream according to the first decoding scheme if the decoding scheme of the current frame is the first decoding scheme; and a second decoding module, which is used to obtain a reconstructed HOA signal of the current frame according to the bit stream according to the second decoding scheme if the decoding scheme of the current frame is the second decoding scheme.

可選地，第二解碼模組包括：第一獲得子模組，用於按照第二解碼方案，根據碼流獲得初始HOA信號；增益調整子模組，用於若當前幀的前一幀的解碼方案為第三解碼方案，則根據當前幀的前一幀的高階增益，對初始HOA信號的高階部分進行增益調整；第二獲得子模組，用於基於初始HOA信號的低階部分和經增益調整的高階部分，獲得重建HOA信號。Optionally, the second decoding module includes: a first acquisition submodule, used to obtain an initial HOA signal according to the bit stream according to the second decoding scheme; a gain adjustment submodule, used to adjust the gain of the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame if the decoding scheme of the previous frame of the current frame is the third decoding scheme; and a second acquisition submodule, used to obtain a reconstructed HOA signal based on the low-order part of the initial HOA signal and the gain-adjusted high-order part.

可選地，第一獲得模組包括：第一解析子模組，用於從碼流中解析出當前幀的切換標誌的值；第二解析子模組，用於若該切換標誌的值為第一值，則從碼流中解析當前幀的解碼方案的指示資訊，指示資訊用於指示當前幀的解碼方案為第一解碼方案或第二解碼方案；第三確定子模組，用於若該切換標誌的值為第二值，確定當前幀的解碼方案為第三解碼方案。Optionally, the first acquisition module includes: a first parsing submodule, used to parse the value of the switching flag of the current frame from the bit stream; a second parsing submodule, used to parse indication information of the decoding scheme of the current frame from the bit stream if the value of the switching flag is the first value, the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme; a third determination submodule, used to determine that the decoding scheme of the current frame is the third decoding scheme if the value of the switching flag is the second value.

可選地，第一獲得模組包括：第三解析子模組，用於從碼流中解析出當前幀的解碼方案的指示資訊，指示資訊用於指示當前幀的解碼方案為第一解碼方案、第二解碼方案或第三解碼方案。Optionally, the first acquisition module includes: a third parsing submodule, used to parse out indication information of a decoding scheme of a current frame from the bit stream, the indication information being used to indicate whether the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.

可選地，第一獲得模組包括：第四解析子模組，用於從碼流中解析出當前幀的初始解碼方案，初始解碼方案為第一解碼方案或第二解碼方案；第四確定子模組，用於若當前幀的初始解碼方案與當前幀的前一幀的初始解碼方案相同，則確定當前幀的解碼方案為當前幀的初始解碼方案；第五確定子模組，用於若當前幀的初始解碼方案為第一解碼方案且當前幀的前一幀的初始解碼方案為第二解碼方案，或當前幀的初始解碼方案為第二解碼方案且當前幀的前一幀的初始解碼方案為第一解碼方案，則確定當前幀的解碼方案為第三解碼方案。Optionally, the first acquisition module includes: a fourth parsing submodule, used to parse out an initial decoding scheme of the current frame from the bit stream, the initial decoding scheme being the first decoding scheme or the second decoding scheme; a fourth determining submodule, used to determine that the decoding scheme of the current frame is the initial decoding scheme of the current frame if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame; and a fifth determining submodule, used to determine that the decoding scheme of the current frame is the third decoding scheme if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme.

第五方面，提供了一種編碼端設備，所述編碼端設備包括處理器和記憶體，所述記憶體用於儲存執行上述第一方面所提供的編碼方法的程式，以及儲存用於實現上述第一方面所提供的編碼方法所涉及的資料。所述處理器被配置為用於執行所述記憶體中儲存的程式。所述存放裝置的操作裝置還可以包括通信匯流排，該通信匯流排用於該處理器與記憶體之間建立連接。In a fifth aspect, a coding end device is provided, the coding end device includes a processor and a memory, the memory is used to store a program for executing the coding method provided in the first aspect, and to store data involved in implementing the coding method provided in the first aspect. The processor is configured to execute the program stored in the memory. The operating device of the storage device may also include a communication bus, which is used to establish a connection between the processor and the memory.

第六方面，提供了一種解碼端設備，所述解碼端設備包括處理器和記憶體，所述記憶體用於儲存執行上述第二方面所提供的解碼方法的程式，以及儲存用於實現上述第二方面所提供的解碼方法所涉及的資料。所述處理器被配置為用於執行所述記憶體中儲存的程式。所述存放裝置的操作裝置還可以包括通信匯流排，該通信匯流排用於該處理器與記憶體之間建立連接。In a sixth aspect, a decoding end device is provided, the decoding end device includes a processor and a memory, the memory is used to store a program for executing the decoding method provided in the second aspect, and to store data involved in implementing the decoding method provided in the second aspect. The processor is configured to execute the program stored in the memory. The operating device of the storage device may also include a communication bus, which is used to establish a connection between the processor and the memory.

第七方面，提供了一種電腦可讀取儲存媒體，所述電腦可讀取儲存媒體中儲存有指令，當該指令在電腦上運行時，使得電腦執行上述第一方面所述的編碼方法或第二方面所述的解碼方法。In a seventh aspect, a computer-readable storage medium is provided, wherein instructions are stored in the computer-readable storage medium, and when the instructions are run on a computer, the computer executes the encoding method described in the first aspect or the decoding method described in the second aspect.

第八方面，提供了一種包含指令的電腦程式產品，當其在電腦上運行時，使得電腦執行上述第一方面所述的編碼方法或第二方面所述的解碼方法。In an eighth aspect, a computer program product comprising instructions is provided, which, when executed on a computer, enables the computer to execute the encoding method described in the first aspect or the decoding method described in the second aspect.

上述協力廠商面、第四方面、第五方面、第六方面、第七方面和第八方面所獲得的技術效果與第一方面或第二方面中對應的技術手段獲得的技術效果近似，在這裡不再贅述。The technical effects obtained by the above-mentioned third party manufacturers in the fourth aspect, the fifth aspect, the sixth aspect, the seventh aspect and the eighth aspect are similar to the technical effects obtained by the corresponding technical means in the first aspect or the second aspect, and will not be elaborated here.

本申請實施例提供的技術方案至少能夠帶來以下有益效果：在本申請實施例中，結合兩個方案（即基於虛擬揚聲器選擇的編解碼方案和基於方向音訊編碼的編解碼方案）對音訊幀的HOA信號進行編解碼，也即針對不同的音訊幀選擇合適的編解碼方案，這樣能夠提升音訊信號的壓縮率。同時，為了使得在不同編解碼方案之間切換時聽覺品質的平滑過渡，本方案中對於某些音訊幀來說，並非直接採用上述兩個方案中的任一個方案進行編解碼，而是採用一種新的編解碼方案來編解碼這些音訊幀，即將這些音訊幀的HOA信號中指定通道的信號編入碼流，即採用一種折衷的方案進行編解碼，從而使得對解碼恢復出的HOA信號進行渲染播放後的聽覺品質能夠平滑過渡。The technical solution provided by the embodiment of the present application can at least bring the following beneficial effects: In the embodiment of the present application, two schemes (i.e., a coding and decoding scheme based on virtual speaker selection and a coding and decoding scheme based on directional audio coding) are combined to encode and decode the HOA signal of the audio frame, that is, appropriate coding and decoding schemes are selected for different audio frames, which can improve the compression rate of the audio signal. At the same time, in order to ensure a smooth transition of auditory quality when switching between different coding and decoding schemes, this scheme does not directly adopt any of the above two schemes for encoding and decoding for some audio frames, but adopts a new coding and decoding scheme to encode and decode these audio frames, that is, encodes the signals of the specified channels in the HOA signals of these audio frames into the bit stream, that is, adopts a compromise scheme for encoding and decoding, so that the auditory quality after rendering and playing the decoded and restored HOA signals can be smoothly transitioned.

為使本申請實施例的目的、技術方案和優點更加清楚，下面將結合附圖對本申請實施方式作進一步地詳細描述。In order to make the purpose, technical solutions and advantages of the embodiment of this application clearer, the embodiment of this application will be further described in detail below in conjunction with the accompanying drawings.

在對本申請實施例提供的編解碼方法進行詳細地解釋說明之前，先對本申請實施例涉及的實施環境進行介紹。Before explaining the encoding and decoding method provided by the embodiment of the present application in detail, the implementation environment involved in the embodiment of the present application is first introduced.

請參考圖1，圖1是本申請實施例提供的一種實施環境的示意圖。該實施環境包括源裝置10、目的地裝置20、鏈路30和儲存裝置40。其中，源裝置10可以產生經編碼的媒體資料。因此，源裝置10也可以被稱為媒體資料編碼裝置。目的地裝置20可以對由源裝置10所產生的經編碼的媒體資料進行解碼。因此，目的地裝置20也可以被稱為媒體資料解碼裝置。鏈路30可以接收源裝置10所產生的經編碼的媒體資料，並可以將該經編碼的媒體資料傳輸給目的地裝置20。儲存裝置40可以接收源裝置10所產生的經編碼的媒體資料，並可以將該經編碼的媒體資料進行儲存，這樣的條件下，目的地裝置20可以直接從儲存裝置40中獲取經編碼的媒體資料。或者，儲存裝置40可以對應於檔案伺服器或可以保存由源裝置10產生的經編碼的媒體資料的另一中間儲存裝置，這樣的條件下，目的地裝置20可以經由資料流或下載儲存裝置40儲存的經編碼的媒體資料。Please refer to Figure 1, which is a schematic diagram of an implementation environment provided by an embodiment of the present application. The implementation environment includes a source device 10, a destination device 20, a link 30 and a storage device 40. Among them, the source device 10 can generate encoded media data. Therefore, the source device 10 can also be called a media data encoding device. The destination device 20 can decode the encoded media data generated by the source device 10. Therefore, the destination device 20 can also be called a media data decoding device. The link 30 can receive the encoded media data generated by the source device 10, and can transmit the encoded media data to the destination device 20. The storage device 40 can receive the encoded media data generated by the source device 10 and can store the encoded media data. Under such conditions, the destination device 20 can directly obtain the encoded media data from the storage device 40. Alternatively, the storage device 40 can correspond to a file server or another intermediate storage device that can save the encoded media data generated by the source device 10. Under such conditions, the destination device 20 can store the encoded media data stored in the storage device 40 via data streaming or downloading.

源裝置10和目的地裝置20均可以包括一個或多個處理器以及耦合到該一個或多個處理器的記憶體，該記憶體可以包括隨機存取記憶體（random access memory，RAM）、唯讀記憶體（read-only memory，ROM）、電性可擦除可程式設計唯讀記憶體（electrically erasable programmable read-only memory，EEPROM）、快閃記憶體、可用於以可由電腦存取的指令或資料結構的形式儲存所要的程式碼的任何其它媒體等。例如，源裝置10和目的地裝置20均可以包括桌上型電腦、移動計算裝置、筆記型（例如，膝上型）電腦、平板電腦、機上盒、例如所謂的「智慧」電話等手持電話、電視機、相機、顯示裝置、數位媒體播放機、視頻遊戲控制台、車載電腦或其類似者。Both the source device 10 and the destination device 20 may include one or more processors and a memory coupled to the one or more processors, and the memory may include random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory, any other medium that can be used to store desired program code in the form of instructions or data structures accessible by a computer, etc. For example, source device 10 and destination device 20 may each include a desktop computer, a mobile computing device, a notebook (e.g., laptop) computer, a tablet computer, a set-top box, a handheld phone such as a so-called "smart" phone, a television, a camera, a display device, a digital media player, a video game console, a car computer, or the like.

鏈路30可以包括能夠將經編碼的媒體資料從源裝置10傳輸到目的地裝置20的一個或多個媒體或裝置。在一種可能的實現方式中，鏈路30可以包括能夠使源裝置10即時地將經編碼的媒體資料直接發送到目的地裝置20的一個或多個通信媒體。在本申請實施例中，源裝置10可以基於通信標準來調製經編碼的媒體資料，該通信標準可以為無線通訊協定等，並且可以將經調製的媒體資料發送給目的地裝置20。該一個或多個通信媒體可以包括無線和/或有線通信媒體，例如該一個或多個通信媒體可以包括射頻（radio frequency，RF）頻譜或一個或多個物理傳輸線。該一個或多個通信媒體可以形成基於分組的網路的一部分，基於分組的網路可以為區域網路、廣域網路或全球網路（例如，網際網路）等。該一個或多個通信媒體可以包括路由器、交換器、基站或促進從源裝置10到目的地裝置20的通信的其它設備等，本申請實施例對此不做具體限定。The link 30 may include one or more media or devices capable of transmitting the encoded media data from the source device 10 to the destination device 20. In one possible implementation, the link 30 may include one or more communication media that enable the source device 10 to directly send the encoded media data to the destination device 20 in real time. In the embodiment of the present application, the source device 10 may modulate the encoded media data based on a communication standard, which may be a wireless communication protocol, etc., and may send the modulated media data to the destination device 20. The one or more communication media may include wireless and/or wired communication media, for example, the one or more communication media may include a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, which may be a local area network, a wide area network, or a global network (e.g., the Internet), etc. The one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the source device 10 to the destination device 20, and the embodiment of the present application is not specifically limited to this.

在一種可能的實現方式中，儲存裝置40可以將接收到的由源裝置10發送的經編碼的媒體資料進行儲存，目的地裝置20可以直接從儲存裝置40中獲取經編碼的媒體資料。這樣的條件下，儲存裝置40可以包括多種分散式或本地存取的資料儲存媒體中的任一者，例如，該多種分散式或本地存取的資料儲存媒體中的任一者可以為硬碟驅動器、藍光光碟、數位多功能光碟（digital versatile disc，DVD）、唯讀光碟（compact disc read-only memory，CD-ROM）、快閃記憶體、揮發性或非揮發性記憶體，或用於儲存經編碼媒體資料的任何其它合適的數位儲存媒體等。In a possible implementation, the storage device 40 may store the received encoded media data sent by the source device 10, and the destination device 20 may directly obtain the encoded media data from the storage device 40. Under such conditions, the storage device 40 may include any one of a variety of distributed or locally accessed data storage media, for example, any one of the multiple distributed or locally accessed data storage media may be a hard disk drive, a Blu-ray disc, a digital versatile disc (DVD), a compact disc read-only memory (CD-ROM), a flash memory, a volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded media data.

在一種可能的實現方式中，儲存裝置40可以對應於檔案伺服器或可以保存由源裝置10產生的經編碼媒體資料的另一中間儲存裝置，目的地裝置20可經由資料流或下載儲存裝置40儲存的媒體資料。檔案伺服器可以為能夠儲存經編碼的媒體資料並且將經編碼的媒體資料發送給目的地裝置20的任意類型的伺服器。在一種可能的實現方式中，檔案伺服器可以包括網路服務器、檔案傳輸通訊協定（file transfer protocol，FTP）伺服器、網路附屬儲存（network attached storage，NAS）裝置或本地磁碟機等。目的地裝置20可以通過任意標準資料連接（包括網際網路連接）來獲取經編碼媒體資料。任意標準資料連接可以包括無線通道（例如，Wi-Fi連接）、有線連接（例如，數位用戶線路（digital subscriber line，DSL）、纜線數據機等），或適合於獲取儲存在檔案伺服器上的經編碼的媒體資料的兩者的組合。經編碼的媒體資料從儲存裝置40的傳輸可為資料流、下載傳輸或兩者的組合。In one possible implementation, the storage device 40 may correspond to a file server or another intermediate storage device that can store the encoded media data generated by the source device 10, and the destination device 20 may stream or download the media data stored in the storage device 40. The file server may be any type of server that can store the encoded media data and send the encoded media data to the destination device 20. In one possible implementation, the file server may include a network server, a file transfer protocol (FTP) server, a network attached storage (NAS) device, or a local disk drive, etc. The destination device 20 may obtain the encoded media data through any standard data connection (including an Internet connection). Any standard data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a digital subscriber line (DSL), a cable modem, etc.), or a combination of the two suitable for obtaining encoded media data stored on a file server. The transmission of the encoded media data from the storage device 40 may be a data stream, a download transmission, or a combination of the two.

圖1所示的實施環境僅為一種可能的實現方式，並且本申請實施例的技術不僅可以適用於圖1所示的可以對媒體資料進行編碼的源裝置10，以及可以對經編碼的媒體資料進行解碼的目的地裝置20，還可以適用於其他可以對媒體資料進行編碼和對經編碼的媒體資料進行解碼的裝置，本申請實施例對此不做具體限定。The implementation environment shown in FIG1 is only one possible implementation method, and the technology of the embodiment of the present application can be applied not only to the source device 10 that can encode media data and the destination device 20 that can decode the encoded media data shown in FIG1, but also to other devices that can encode media data and decode encoded media data, and the embodiment of the present application does not make specific limitations on this.

在圖1所示的實施環境中，源裝置10包括資料源120、編碼器100和輸出介面140。在一些實施例中，輸出介面140可以包括調節器/解調器（數據機）和/或發送器，其中發送器也可以稱為發射器。資料源120可以包括圖像捕獲裝置（例如，攝影機等）、含有先前捕獲的媒體資料的存檔、用於從媒體資料內容提供者接收媒體資料的饋入介面，和/或用於產生媒體資料的電腦圖形系統，或媒體資料的這些來源的組合。In the implementation environment shown in FIG1 , the source device 10 includes a data source 120, an encoder 100, and an output interface 140. In some embodiments, the output interface 140 may include a modem/demodulator (a modem) and/or a transmitter, wherein the transmitter may also be referred to as a transmitter. The data source 120 may include an image capture device (e.g., a camera, etc.), an archive containing previously captured media data, a feed interface for receiving media data from a media data content provider, and/or a computer graphics system for generating media data, or a combination of these sources of media data.

資料源120可以向編碼器100發送媒體資料，編碼器100可以對接收到由資料源120發送的媒體資料進行編碼，得到經編碼的媒體資料。編碼器可以將經編碼的媒體資料發送給輸出介面。在一些實施例中，源裝置10經由輸出介面140將經編碼的媒體資料直接發送到目的地裝置20。在其它實施例中，經編碼的媒體資料還可儲存到儲存裝置40上，供目的地裝置20以後獲取並用於解碼和/或顯示。The data source 120 may send media data to the encoder 100, and the encoder 100 may encode the media data received from the data source 120 to obtain the encoded media data. The encoder may send the encoded media data to the output interface. In some embodiments, the source device 10 directly sends the encoded media data to the destination device 20 via the output interface 140. In other embodiments, the encoded media data may also be stored in the storage device 40 for the destination device 20 to obtain and use for decoding and/or display.

在圖1所示的實施環境中，目的地裝置20包括輸入介面240、解碼器200和顯示裝置220。在一些實施例中，輸入介面240包括接收器和/或數據機。輸入介面240可經由鏈路30和/或從儲存裝置40接收經編碼的媒體資料，然後再發送給解碼器200，解碼器200可以對接收到的經編碼的媒體資料進行解碼，得到經解碼的媒體資料。解碼器可以將經解碼的媒體資料發送給顯示裝置220。顯示裝置220可與目的地裝置20整合或可在目的地裝置20外部。一般來說，顯示裝置220顯示經解碼的媒體資料。顯示裝置220可以為多種類型中的任一種類型的顯示裝置，例如，顯示裝置220可以為液晶顯示器（liquid crystal display，LCD）、等離子顯示器、有機發光二極體（organic light-emitting diode，OLED）顯示器或其它類型的顯示裝置。In the implementation environment shown in FIG. 1 , the destination device 20 includes an input interface 240, a decoder 200, and a display device 220. In some embodiments, the input interface 240 includes a receiver and/or a modem. The input interface 240 may receive the encoded media data via the link 30 and/or from the storage device 40, and then send it to the decoder 200, which may decode the received encoded media data to obtain decoded media data. The decoder may send the decoded media data to the display device 220. The display device 220 may be integrated with the destination device 20 or may be external to the destination device 20. Generally speaking, the display device 220 displays the decoded media data. The display device 220 may be any of a variety of types of display devices. For example, the display device 220 may be a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) display, or other types of display devices.

儘管圖1中未示出，但在一些方面，編碼器100和解碼器200可各自與編碼器和解碼器整合，且可以包括適當的多工器-多路分用器（multiplexer-demultiplexer，MUX-DEMUX）單元或其它硬體和軟體，用於共同資料流程或單獨資料流程中的音訊和視頻兩者的編碼。在一些實施例中，如果適用的話，那麼MUX-DEMUX單元可符合ITU H.223多工器協議，或例如使用者資料包通訊協定（user datagram protocol，UDP）等其它協議。Although not shown in FIG. 1 , in some aspects, encoder 100 and decoder 200 may be integrated with an encoder and decoder, respectively, and may include appropriate multiplexer-demultiplexer (MUX-DEMUX) units or other hardware and software for encoding both audio and video in a common data flow or in separate data flows. In some embodiments, the MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP), if applicable.

編碼器100和解碼器200各自可為以下各項電路中的任一者：一個或多個微處理器、數位訊號處理器（digital signal processing，DSP）、專用積體電路(application specific integrated circuit，ASIC)、現場可程式設計閘陣列（field-programmable gate array，FPGA)、離散邏輯、硬體或其任何組合。如果部分地以軟體來實施本申請實施例的技術，那麼裝置可將用於軟體的指令儲存在合適的非揮發性電腦可讀取儲存媒體中，且可使用一個或多個處理器在硬體中執行所述指令從而實施本申請實施例的技術。前述內容（包括硬體、軟體、硬體與軟體的組合等）中的任一者可被視為一個或多個處理器。編碼器100和解碼器200中的每一者都可以包括在一個或多個編碼器或解碼器中，所述編碼器或所述解碼器中的任一者可以整合為相應裝置中的組合編碼器/解碼器（編碼解碼器）的一部分。The encoder 100 and the decoder 200 may each be any of the following circuits: one or more microprocessors, digital signal processing (DSP), application specific integrated circuits (ASIC), field-programmable gate arrays (FPGA), discrete logic, hardware, or any combination thereof. If the technology of the embodiment of the present application is partially implemented in software, the device may store instructions for the software in a suitable non-volatile computer-readable storage medium, and may use one or more processors to execute the instructions in hardware to implement the technology of the embodiment of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered as one or more processors. Each of the encoder 100 and the decoder 200 may be included in one or more encoders or decoders, and any of the encoders or decoders may be integrated as part of a combined encoder/decoder (encoder-decoder) in a corresponding device.

本申請實施例可大體上將編碼器100稱為將某些資訊「發信號通知」或「發送」到例如解碼器200的另一裝置。術語「發信號通知」或「發送」可大體上指代用於對經壓縮的媒體資料進行解碼的語法元素和/或其它資料的傳送。此傳送可即時或幾乎即時地發生。替代地，此通信可經過一段時間後發生，例如可在編碼時在經編碼位元流中將語法元素儲存到電腦可讀取儲存媒體時發生，解碼裝置接著可在所述語法元素儲存到此媒體之後的任何時間檢索所述語法元素。The present application embodiments may generally refer to encoder 100 as "signaling" or "sending" certain information to another device, such as decoder 200. The term "signaling" or "sending" may generally refer to the transmission of syntax elements and/or other data used to decode compressed media data. This transmission may occur instantly or nearly instantly. Alternatively, this communication may occur over a period of time, such as when the syntax elements are stored in the encoded bit stream to a computer-readable storage medium during encoding, and the decoding device may then retrieve the syntax elements at any time after the syntax elements are stored to the medium.

本申請實施例提供的編解碼方法可以應用於多種場景，接下來以待編碼的媒體資料為HOA信號為例，對其中的幾種場景分別進行介紹。The encoding and decoding method provided in the embodiment of the present application can be applied to a variety of scenarios. Next, taking the media data to be encoded as HOA signal as an example, several of the scenarios are introduced respectively.

請參考圖2，圖2是本申請實施例提供的一種編解碼方法應用於終端場景的實施環境的示意圖。該實施環境包括第一終端101和第二終端201，第一終端101與第二終端201進行通信連接。該通信連接可以為無線連接，也可以為有線連接，本申請實施例對此不做限定。Please refer to FIG. 2, which is a schematic diagram of an implementation environment of a coding and decoding method provided by an embodiment of the present application applied to a terminal scenario. The implementation environment includes a first terminal 101 and a second terminal 201, and the first terminal 101 is connected to the second terminal 201 for communication. The communication connection can be a wireless connection or a wired connection, and the embodiment of the present application does not limit this.

其中，第一終端101可以為發送端設備，也可以為接收端設備，同理，第二終端201可以為接收端設備，也可以為發送端設備。例如，在第一終端101為發送端設備的情況下，第二終端201為接收端設備，在第一終端101為接收端設備的情況下，第二終端201為發送端設備。The first terminal 101 can be a sending device or a receiving device, and similarly, the second terminal 201 can be a receiving device or a sending device. For example, when the first terminal 101 is a sending device, the second terminal 201 is a receiving device, and when the first terminal 101 is a receiving device, the second terminal 201 is a sending device.

接下來以第一終端101為發送端設備，第二終端201為接收端設備為例進行介紹。Next, the first terminal 101 is taken as a sending device and the second terminal 201 is taken as a receiving device as an example for introduction.

第一終端101和第二終端201均包括音訊採集模組、音訊重播模組、編碼器、解碼器、通道編碼模組和通道解碼模組。在本申請實施例中，該編碼器為一種三維音訊編碼器，該解碼器為一種三維音訊解碼器。The first terminal 101 and the second terminal 201 both include an audio acquisition module, an audio playback module, a codec, a decoder, a channel coding module and a channel decoding module. In the present application embodiment, the codec is a three-dimensional audio codec, and the decoder is a three-dimensional audio decoder.

第一終端101中的音訊採集模組採集HOA信號並傳輸給編碼器，編碼器利用本申請實施例提供的編碼方法對HOA信號進行編碼，該編碼可以稱為信源編碼。之後，為了實現HOA信號在通道中的傳輸，通道編碼模組還需要再進行通道編碼，然後將編碼得到的碼流通過無線或者有線網路通信設備在數位通道中傳輸。The audio acquisition module in the first terminal 101 collects the HOA signal and transmits it to the encoder. The encoder encodes the HOA signal using the encoding method provided in the embodiment of the present application. This encoding can be called source encoding. Afterwards, in order to realize the transmission of the HOA signal in the channel, the channel encoding module needs to perform channel encoding again, and then transmit the encoded code stream in the digital channel through the wireless or wired network communication equipment.

第二終端201通過無線或者有線網路通信設備接收數位通道中傳輸的碼流，通道解碼模組對碼流進行通道解碼，然後解碼器利用本申請實施例提供的解碼方法解碼得到HOA信號，再通過音訊重播模組進行播放。The second terminal 201 receives the code stream transmitted in the digital channel through a wireless or wired network communication device, the channel decoding module performs channel decoding on the code stream, and then the decoder uses the decoding method provided in the embodiment of the present application to decode to obtain the HOA signal, and then plays it through the audio replay module.

其中，第一終端101和第二終端201可以是任何一種可與使用者通過鍵盤、觸控板、觸控式螢幕、遙控器、語音交互或手寫設備等一種或多種方式進行人機交互的電子產品，例如個人電腦（personal computer，PC）、手機、智慧手機、個人數位助手（personal digital assistant，PDA）、可穿戴設備、掌上型電腦PPC（pocket PC）、平板電腦、智慧車用裝置、智慧電視、智慧音箱等。Among them, the first terminal 101 and the second terminal 201 can be any electronic product that can perform human-computer interaction with the user through one or more methods such as keyboard, touch pad, touch screen, remote control, voice interaction or handwriting device, such as personal computer (PC), mobile phone, smart phone, personal digital assistant (PDA), wearable device, handheld computer PPC (pocket PC), tablet computer, smart car device, smart TV, smart speaker, etc.

本領域技術人員應能理解上述終端僅為舉例，其他現有的或今後可能出現的終端如可適用於本申請實施例，也應包含在本申請實施例保護範圍以內，並在此以引用方式包含於此。Those skilled in the art should understand that the above-mentioned terminals are merely examples, and other existing or future terminals that may appear, if applicable to the embodiments of this application, should also be included in the protection scope of the embodiments of this application and are incorporated herein by reference.

請參考圖3，圖3是本申請實施例提供的一種編解碼方法應用於無線或核心網設備的轉碼場景的實施環境的示意圖。該實施環境包括通道解碼模組、音訊解碼器、音訊編碼器和通道編碼模組。在本申請實施例中，該音訊編碼器為一種三維音訊編碼器，該音訊解碼器為一種三維音訊解碼器。Please refer to Figure 3, which is a schematic diagram of an implementation environment of a coding and decoding method provided by an embodiment of the present application applied to a transcoding scenario of a wireless or core network device. The implementation environment includes a channel decoding module, an audio decoder, an audio encoder, and a channel encoding module. In the embodiment of the present application, the audio encoder is a three-dimensional audio encoder, and the audio decoder is a three-dimensional audio decoder.

其中，音訊解碼器可以為利用本申請實施例提供的解碼方法的解碼器，也可以為利用其他解碼方法的解碼器。音訊編碼器可以為利用本申請實施例提供的編碼方法的編碼器，也可以為利用其他編碼方法的編碼器。在音訊解碼器為利用本申請實施例提供的解碼方法的解碼器的情況下，音訊編碼器為利用其他編碼方法的編碼器，在音訊解碼器為利用其他解碼方法的解碼器的情況下，音訊編碼器為利用本申請實施例提供的編碼方法的編碼器。The audio decoder may be a decoder using the decoding method provided in the embodiment of the present application, or may be a decoder using other decoding methods. The audio encoder may be a encoder using the encoding method provided in the embodiment of the present application, or may be a encoder using other encoding methods. In the case where the audio decoder is a decoder using the decoding method provided in the embodiment of the present application, the audio encoder is an encoder using other encoding methods; in the case where the audio decoder is a decoder using other decoding methods, the audio encoder is an encoder using the encoding method provided in the embodiment of the present application.

第一種情況，音訊解碼器為利用本申請實施例提供的解碼方法的解碼器，音訊編碼器為利用其他編碼方法的編碼器。In the first case, the audio decoder is a decoder using the decoding method provided in the embodiment of the present application, and the audio encoder is an encoder using other encoding methods.

此時，通道解碼模組用於對接收的碼流進行通道解碼，然後音訊解碼器用於利用本申請實施例提供的解碼方法進行信源解碼，再通過音訊編碼器按照其他編碼方法進行編碼，實現一種格式到另一種格式的轉換，即轉碼。之後，再通過通道編碼後發送。At this time, the channel decoding module is used to perform channel decoding on the received code stream, and then the audio decoder is used to perform source decoding using the decoding method provided in the embodiment of the present application, and then the audio encoder is used to perform encoding according to other encoding methods to achieve conversion from one format to another, i.e. transcoding. After that, it is sent after channel encoding.

第二種情況，音訊解碼器為利用其他解碼方法的解碼器，音訊編碼器為利用本申請實施例提供的編碼方法的編碼器。In the second case, the audio decoder is a decoder using other decoding methods, and the audio encoder is an encoder using the encoding method provided in the embodiment of the present application.

此時，通道解碼模組用於對接收的碼流進行通道解碼，然後音訊解碼器用於利用其他解碼方法進行信源解碼，再通過音訊編碼器利用本申請實施例提供的編碼方法進行編碼，實現一種格式到另一種格式的轉換，即轉碼。之後，再通過通道編碼後發送。At this time, the channel decoding module is used to perform channel decoding on the received code stream, and then the audio decoder is used to use other decoding methods to perform source decoding, and then the audio encoder uses the encoding method provided by the embodiment of the present application to perform encoding, so as to realize the conversion from one format to another format, i.e. transcoding. After that, it is sent after channel encoding.

其中，無線設備可以為無線接入點、無線路由器、無線連接器等等。核心網設備可以為移動性管理實體、閘道等等。The wireless device may be a wireless access point, a wireless router, a wireless connector, etc. The core network device may be a mobility management entity, a gateway, etc.

本領域技術人員應能理解上述無線設備或者核心網設備僅為舉例，其他現有的或今後可能出現的無線或核心網設備如可適用於本申請實施例，也應包含在本申請實施例保護範圍以內，並在此以引用方式包含於此。Those skilled in the art should understand that the above-mentioned wireless devices or core network devices are merely examples, and other existing or future wireless or core network devices that may be applicable to the embodiments of this application should also be included in the protection scope of the embodiments of this application and are incorporated herein by reference.

請參考圖4，圖4是本申請實施例提供的一種編解碼方法應用於廣播電視場景的實施環境的示意圖。廣播電視場景分為直播場景和後期製作場景。對於直播場景來說，該實施環境包括直播節目三維聲製作模組、三維聲編碼模組、機上盒和揚聲器組，機上盒包括三維聲解碼模組。對於後期製作場景來說，該實施環境包括後期節目三維聲製作模組、三維聲編碼模組、網路接收器、移動終端、耳機等。Please refer to FIG. 4, which is a schematic diagram of an implementation environment of a coding and decoding method provided by an embodiment of the present application applied to a broadcasting and television scene. The broadcasting and television scene is divided into a live broadcast scene and a post-production scene. For the live broadcast scene, the implementation environment includes a live program three-dimensional sound production module, a three-dimensional sound encoding module, a set-top box and a speaker group, and the set-top box includes a three-dimensional sound decoding module. For the post-production scene, the implementation environment includes a post-program three-dimensional sound production module, a three-dimensional sound encoding module, a network receiver, a mobile terminal, a headset, etc.

直播場景下，直播節目三維聲製作模組製作出三維聲信號（如HOA信號），該三維聲信號經過應用本申請實施例的編碼方法得到碼流，該碼流經廣電網路傳輸到使用者側，由機上盒中的三維聲解碼模組利用本申請實施例提供的解碼方法對碼流進行解碼，從而重建三維聲信號，由揚聲器組進行重播。或者，該碼流經互聯網傳輸到使用者側，由網路接收器中的三維聲解碼模組利用本申請實施例提供的解碼方法對碼流進行解碼，從而重建三維聲信號，由揚聲器組進行重播。又或者，該碼流經互聯網傳輸到使用者側，由移動終端中的三維聲解碼模組利用本申請實施例提供的解碼方法對碼流進行解碼，從而重建三維聲信號，由耳機進行重播。In the live broadcast scenario, the live program three-dimensional sound production module produces a three-dimensional sound signal (such as an HOA signal), and the three-dimensional sound signal is obtained by applying the encoding method of the embodiment of the present application to obtain a code stream, which is transmitted to the user side through the broadcasting network, and the three-dimensional sound decoding module in the set-top box uses the decoding method provided in the embodiment of the present application to decode the code stream, thereby reconstructing the three-dimensional sound signal, and replaying it by the speaker group. Alternatively, the code stream is transmitted to the user side via the Internet, and the three-dimensional sound decoding module in the network receiver uses the decoding method provided in the embodiment of the present application to decode the code stream, thereby reconstructing the three-dimensional sound signal, and replaying it by the speaker group. Alternatively, the code stream is transmitted to the user side via the Internet, and the 3D sound decoding module in the mobile terminal decodes the code stream using the decoding method provided in the embodiment of the present application, thereby reconstructing the 3D sound signal, which is replayed by the earphone.

後期製作場景下，後期節目三維聲製作模組製作出三維聲信號，該三維聲信號經過應用本申請實施例的編碼方法得到碼流，該碼流經廣電網路傳輸到使用者側，由機上盒中的三維聲解碼模組利用本申請實施例提供的解碼方法對碼流進行解碼，從而重建三維聲信號，由揚聲器組進行重播。或者，該碼流經互聯網傳輸到使用者側，由網路接收器中的三維聲解碼模組利用本申請實施例提供的解碼方法對碼流進行解碼，從而重建三維聲信號，由揚聲器組進行重播。又或者，該碼流經互聯網傳輸到使用者側，由移動終端中的三維聲解碼模組利用本申請實施例提供的解碼方法對碼流進行解碼，從而重建三維聲信號，由耳機進行重播。In the post-production scenario, the post-program 3D sound production module produces a 3D sound signal, and the 3D sound signal is obtained by applying the encoding method of the embodiment of the present application to obtain a code stream, which is transmitted to the user side through the broadcasting network, and the 3D sound decoding module in the set-top box uses the decoding method provided by the embodiment of the present application to decode the code stream, thereby reconstructing the 3D sound signal, and replaying it by the speaker group. Alternatively, the code stream is transmitted to the user side via the Internet, and the 3D sound decoding module in the network receiver uses the decoding method provided by the embodiment of the present application to decode the code stream, thereby reconstructing the 3D sound signal, and replaying it by the speaker group. Alternatively, the code stream is transmitted to the user side via the Internet, and the 3D sound decoding module in the mobile terminal decodes the code stream using the decoding method provided in the embodiment of the present application, thereby reconstructing the 3D sound signal, which is replayed by the earphone.

請參考圖5，圖5是本申請實施例提供的一種編解碼方法應用於虛擬實境流場景的實施環境的示意圖。該實施環境包括編碼端和解碼端，編碼端包括採集模組、預處理模組、編碼模組、打包模組和發送模組，解碼端包括解包模組、解碼模組、渲染模組和耳機。Please refer to Figure 5, which is a schematic diagram of an implementation environment of a coding and decoding method provided by an embodiment of the present application applied to a virtual reality streaming scene. The implementation environment includes a coding end and a decoding end, the coding end includes a collection module, a pre-processing module, a coding module, a packaging module and a sending module, and the decoding end includes an unpacking module, a decoding module, a rendering module and a headset.

採集模組採集HOA信號，然後通過預處理模組對HOA信號進行預處理操作，預處理操作包括濾除掉HOA信號中的低頻部分，通常是以20Hz或者50Hz為分界點，提取HOA信號中的方位資訊等。之後通過編碼模組，利用本申請實施例提供的編碼方法進行編碼處理，編碼之後通過打包模組進行打包，進而通過發送模組發送給解碼端。The acquisition module acquires the HOA signal, and then performs preprocessing on the HOA signal through the preprocessing module. The preprocessing operation includes filtering out the low-frequency part of the HOA signal, usually with 20Hz or 50Hz as the dividing point, extracting the azimuth information in the HOA signal, etc. Then, the encoding module performs encoding processing using the encoding method provided in the embodiment of the present application, and after encoding, the packaging module performs packaging, and then sends it to the decoding end through the sending module.

解碼端的解包模組首先進行解包，之後通過解碼模組，利用本申請實施例提供的解碼方法進行解碼，然後通過渲染模組對解碼信號進行雙耳渲染處理，渲染處理後的信號映射到收聽者耳機上。該耳機可以為獨立的耳機，也可以是基於虛擬實境的眼鏡設備上的耳機。The unpacking module at the decoding end first unpacks the signal, and then decodes the signal by using the decoding method provided by the embodiment of the present application, and then performs binaural rendering processing on the decoded signal by the rendering module, and the rendered signal is mapped to the listener's earphone. The earphone can be an independent earphone or an earphone on a virtual reality-based eyewear device.

需要說明的是，本申請實施例描述的系統架構以及業務場景是為了更加清楚的說明本申請實施例的技術方案，並不構成對於本申請實施例提供的技術方案的限定，本領域普通技術人員可知，隨著系統架構的演變和新業務場景的出現，本申請實施例提供的技術方案對於類似的技術問題，同樣適用。It should be noted that the system architecture and business scenarios described in the embodiments of this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. Ordinary technical personnel in this field can know that with the evolution of the system architecture and the emergence of new business scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.

接下來對本申請實施例提供的編解碼方法進行詳細地解釋說明。需要說明的是，結合圖1所示的實施環境，下文中的任一種編碼方法可以是源裝置10中的編碼器100執行的。下文中的任一種解碼方法可以是目的地裝置20中的解碼器200執行的。Next, the encoding and decoding method provided by the embodiment of the present application is explained in detail. It should be noted that, in combination with the implementation environment shown in FIG. 1 , any encoding method described below can be executed by the encoder 100 in the source device 10. Any decoding method described below can be executed by the decoder 200 in the destination device 20.

圖6是本申請實施例提供的一種編碼方法的流程圖，該編碼方法應用於編碼端。請參考圖6，該方法包括如下步驟。FIG6 is a flow chart of a coding method provided by an embodiment of the present application, and the coding method is applied to a coding end. Referring to FIG6 , the method includes the following steps.

步驟601：根據當前幀的HOA信號確定當前幀的編碼方案。Step 601: Determine the coding scheme of the current frame according to the HOA signal of the current frame.

對於待編碼的多個音訊幀的HOA信號來說，編碼端逐幀進行編碼。其中，音訊幀的HOA信號是通過HOA採集技術得到的音訊信號。HOA信號是一種場景音訊信號，也是一種三維音訊信號，HOA信號是指對空間中麥克風所在位置的聲場進行採集得到的音訊信號，採集得到的音訊信號稱為原始HOA信號。音訊幀的HOA信號也可以是將其他格式的三維音訊信號轉換後獲得的HOA信號。例如將5.1聲道信號轉換成HOA信號，或者將5.1聲道信號和物件音訊混合的三維音訊信號轉換成HOA信號。可選地，待編碼的音訊幀的HOA信號為時域信號或頻域信號，可以包含HOA信號的所有通道，也可以包含HOA信號的部分通道。示例性地，若音訊幀的HOA信號的階數為3，HOA信號的通道數為16，音訊幀的幀長為20ms，取樣速率為48KHz，則待編碼的音訊幀的HOA信號包含16個通道的信號，每個通道包含960個採樣點。For the HOA signals of multiple audio frames to be encoded, the encoding end encodes frame by frame. Among them, the HOA signal of the audio frame is an audio signal obtained through the HOA acquisition technology. The HOA signal is a scene audio signal and also a three-dimensional audio signal. The HOA signal refers to the audio signal obtained by collecting the sound field at the position of the microphone in the space. The collected audio signal is called the original HOA signal. The HOA signal of the audio frame can also be an HOA signal obtained by converting three-dimensional audio signals in other formats. For example, a 5.1-channel signal is converted into an HOA signal, or a three-dimensional audio signal mixed with a 5.1-channel signal and object audio is converted into an HOA signal. Optionally, the HOA signal of the audio frame to be encoded is a time domain signal or a frequency domain signal, and may include all channels of the HOA signal or some channels of the HOA signal. Exemplarily, if the order of the HOA signal of the audio frame is 3, the number of channels of the HOA signal is 16, the frame length of the audio frame is 20ms, and the sampling rate is 48KHz, then the HOA signal of the audio frame to be encoded includes signals of 16 channels, and each channel includes 960 sampling points.

為了降低計算複雜度，若編碼端獲取到的音訊幀的HOA信號為原始HOA信號，原始HOA信號的採樣點數或頻點數較多，那麼編碼端可以對原始HOA信號進行下採樣，以得到待編碼的音訊幀的HOA信號。例如，編碼端對原始HOA信號進行1/Q下採樣，以降低待編碼的HOA信號的採樣點數或頻點數，如本申請實施例中原始HOA信號的每個通道包含960個採樣點，採用1/120下採樣後，得到待編碼的HOA信號的每個通道包含8個採樣點。In order to reduce the computational complexity, if the HOA signal of the audio frame obtained by the encoder is the original HOA signal, and the number of sampling points or frequency points of the original HOA signal is large, then the encoder can downsample the original HOA signal to obtain the HOA signal of the audio frame to be encoded. For example, the encoder performs 1/Q downsampling on the original HOA signal to reduce the number of sampling points or frequency points of the HOA signal to be encoded. For example, in the embodiment of the present application, each channel of the original HOA signal contains 960 sampling points. After 1/120 downsampling, each channel of the HOA signal to be encoded contains 8 sampling points.

在本申請實施例中以編碼端對當前幀進行編碼為例，對編碼端的編碼方法進行介紹。當前幀為待編碼的一個音訊幀。也即是，編碼端獲取當前幀的HOA信號，採用本申請實施例提供的編碼方法對當前幀的HOA信號進行編碼。In the embodiment of the present application, the encoding method of the encoding end is introduced by taking the encoding of the current frame by the encoding end as an example. The current frame is an audio frame to be encoded. That is, the encoding end obtains the HOA signal of the current frame and encodes the HOA signal of the current frame using the encoding method provided in the embodiment of the present application.

需要說明的是，為了滿足對不同聲場類型下的音訊幀均有較高的壓縮率，需要根據各音訊幀的聲場類型為相應音訊幀選擇合適的編解碼方案。在本申請實施例中，編碼端先根據當前幀的HOA信號確定當前幀的初始編碼方案，初始編碼方案為第一編碼方案或第二編碼方案。編碼端再通過對比當前幀的初始編碼方案和當前幀的前一幀的初始編碼方案是否相同，來判定採用第一編碼方案、第二編碼方案還是第三編碼方案對當前幀的HOA信號進行編碼。其中，若當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案相同，則編碼端採用與當前幀的初始編碼方案相一致的編碼方案來編碼當前幀的HOA信號。若當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案不同，則編碼端採用切換幀編碼方案來編碼當前幀的HOA信號。It should be noted that in order to meet the requirement of having a high compression rate for audio frames under different sound field types, it is necessary to select a suitable coding and decoding scheme for the corresponding audio frame according to the sound field type of each audio frame. In the embodiment of the present application, the coding end first determines the initial coding scheme of the current frame according to the HOA signal of the current frame, and the initial coding scheme is the first coding scheme or the second coding scheme. The coding end then determines whether to use the first coding scheme, the second coding scheme or the third coding scheme to encode the HOA signal of the current frame by comparing whether the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame. Among them, if the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame, the coding end adopts the coding scheme consistent with the initial coding scheme of the current frame to encode the HOA signal of the current frame. If the initial coding scheme of the current frame is different from the initial coding scheme of the previous frame of the current frame, the coding end adopts the switching frame coding scheme to encode the HOA signal of the current frame.

在本申請實施例中，當前幀的編碼方案為第一編碼方案、第二編碼方案和第三編碼方案中的一種。其中，第一編碼方案為基於DirAC的HOA編碼方案，第二編碼方案為基於虛擬揚聲器選擇的HOA編碼方案，第三編碼方案為混合編碼方案。可選地，混合編碼方案也稱為切換幀編碼方案。第三編碼方案為本申請實施例提供的一種切換幀編碼方案，第三編碼方案為了在不同的編解碼方案之間切換時聽覺品質的平滑過渡。本申請實施例將會在下文對這三種編碼方案進行詳細介紹。在本申請實施例中，基於虛擬揚聲器選擇的HOA編碼方案也稱為基於MP的HOA編碼方案。In the embodiment of the present application, the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme. Among them, the first coding scheme is the HOA coding scheme based on DirAC, the second coding scheme is the HOA coding scheme based on virtual speaker selection, and the third coding scheme is a hybrid coding scheme. Optionally, the hybrid coding scheme is also called a switching frame coding scheme. The third coding scheme is a switching frame coding scheme provided in the embodiment of the present application. The third coding scheme is for a smooth transition of auditory quality when switching between different coding and decoding schemes. The embodiment of the present application will introduce these three coding schemes in detail below. In the embodiment of the present application, the HOA coding scheme based on virtual speaker selection is also called the MP-based HOA coding scheme.

在本申請實施例中，編碼端根據當前幀的HOA信號確定當前幀的初始編碼方案。然後，編碼端基於當前幀的初始編碼方案和當前幀的前一幀的初始編碼方案，確定當前幀的編碼方案。需要說明的是，本申請實施例不限定編碼端確定初始編碼方案的實現方式。In the embodiment of the present application, the coding end determines the initial coding scheme of the current frame according to the HOA signal of the current frame. Then, the coding end determines the coding scheme of the current frame based on the initial coding scheme of the current frame and the initial coding scheme of the previous frame of the current frame. It should be noted that the embodiment of the present application does not limit the implementation method of the coding end determining the initial coding scheme.

可選地，編碼端對當前幀的HOA信號進行聲場類型分析，以得到當前幀的聲場分類結果，基於當前幀的聲場分類結果，確定當前幀的初始編碼方案。需要說明的是，本申請實施例不限定聲場類型分析的方法，例如編碼端通過對當前幀的HOA信號進行奇異值分解以進行聲場類型分析，或者對該HOA信號進行其他的線性分解以進行聲場類型分析。Optionally, the coding end performs a sound field type analysis on the HOA signal of the current frame to obtain a sound field classification result of the current frame, and determines an initial coding scheme for the current frame based on the sound field classification result of the current frame. It should be noted that the embodiment of the present application does not limit the method of sound field type analysis. For example, the coding end performs a sound field type analysis by performing a singular value decomposition on the HOA signal of the current frame, or performs other linear decomposition on the HOA signal to perform a sound field type analysis.

可選地，聲場分類結果包括相異性聲源數量。以編碼端對當前幀的HOA信號直接進行聲場類型分析為例，編碼端對當前幀的HOA信號進行聲場類型分析，以得到當前幀的聲場分類結果的一種實現方式為：編碼端對當前幀的HOA信號進行奇異值分解，得到M個奇異值。編碼端計算該M個奇異值中的第i個奇異值與第i+1個奇異值的比值，以得到M-1個聲場分類參數。其中，i=1,2,…,M。編碼端基於該M-1個聲場分類參數，確定當前幀對應的相異性聲源數量。其中，M=min(L,K)，L表示當前幀的HOA信號的通道數量，K表示當前幀的HOA信號的每個通道的信號點數，min表示取最小值運算。若HOA信號為時域信號，則信號點數為採樣點數，若HOA信號為頻域信號，則信號點數為頻點數。Optionally, the sound field classification result includes the number of dissimilar sound sources. Taking the example that the encoder directly performs sound field type analysis on the HOA signal of the current frame, one implementation method of the encoder performing sound field type analysis on the HOA signal of the current frame to obtain the sound field classification result of the current frame is: the encoder performs singular value decomposition on the HOA signal of the current frame to obtain M singular values. The encoder calculates the ratio of the i-th singular value to the i+1-th singular value among the M singular values to obtain M-1 sound field classification parameters. Wherein, i=1,2,…,M. The encoder determines the number of dissimilar sound sources corresponding to the current frame based on the M-1 sound field classification parameters. Where M=min(L,K), L represents the number of channels of the HOA signal of the current frame, K represents the number of signal points of each channel of the HOA signal of the current frame, and min represents the minimum value operation. If the HOA signal is a time domain signal, the number of signal points is the number of sampling points, and if the HOA signal is a frequency domain signal, the number of signal points is the number of frequency points.

可選地，假設該M-1個聲場類型參數為temp[i]，i=0,1,…,M-2，編碼端基於該M-1個聲場分類參數，確定當前幀對應的相異性聲源數量的一種實現方式為：從i=0開始依次執行如下流程：判斷temp[i]是否大於預設的相異性聲源判定閾值，若本輪流程中temp[i]小於該相異性聲源判定閾值，則更新i的取值為i+1，繼續執行下輪流程，若本輪流程中temp[i]大於或等於該相異性聲源判定閾值，則確定當前幀對應的相異性聲源數量等於i+1，結束流程。可選地，相異性聲源判定閾值為30、80或100等，相異性聲源判定閾值為預設的值，可以根據經驗或通過統計進行預設。Optionally, assuming that the M-1 sound field type parameters are temp[i], i=0,1,…,M-2, the encoder determines the number of dissimilar sound sources corresponding to the current frame based on the M-1 sound field classification parameters. One implementation method is: starting from i=0, the following processes are executed in sequence: determine whether temp[i] is greater than a preset dissimilar sound source determination threshold; if temp[i] is less than the dissimilar sound source determination threshold in this round of the process, update the value of i to i+1, and continue to execute the next round of the process; if temp[i] is greater than or equal to the dissimilar sound source determination threshold in this round of the process, determine that the number of dissimilar sound sources corresponding to the current frame is equal to i+1, and end the process. Optionally, the phase difference sound source determination threshold is 30, 80 or 100, etc. The phase difference sound source determination threshold is a preset value, which can be preset based on experience or through statistics.

相應地，在一種實現方式中，在確定當前幀對應的相異性聲源數量之後，若當前幀對應的相異性聲源數量大於第一閾值且小於第二閾值，則編碼端確定當前幀的初始編碼方案為第二編碼方案。若當前幀對應的相異性聲源數量不大於第一閾值或不小於第二閾值，則編碼端確定當前幀的初始編碼方案為第一編碼方案。其中，第一閾值小於第二閾值。可選地，第一閾值為0或其他值，第二閾值為3或其他值。前述第一閾值、第二閾值為預設的值，可以根據經驗或通過統計進行預設。Correspondingly, in one implementation, after determining the number of dissimilar sound sources corresponding to the current frame, if the number of dissimilar sound sources corresponding to the current frame is greater than the first threshold and less than the second threshold, the coding end determines that the initial coding scheme of the current frame is the second coding scheme. If the number of dissimilar sound sources corresponding to the current frame is not greater than the first threshold or not less than the second threshold, the coding end determines that the initial coding scheme of the current frame is the first coding scheme. The first threshold is less than the second threshold. Optionally, the first threshold is 0 or other values, and the second threshold is 3 or other values. The aforementioned first threshold and second threshold are preset values, which can be preset based on experience or through statistics.

示例性地，假設當前幀的HOA信號的通道數量L=16，每個通道的頻點數K=8，min(L,K)=8。那麼，編碼端對當前幀的HOA信號進行奇異值分解，得到奇異值v[i]，i=0,1,…,min(L,K)-1。編碼端計算相鄰奇異值之間的比值，將得到的比值作為當前幀的聲場分類結果temp[i]，temp[i]=v[i]/v[i+1]，i=0,1,…,min(L,K)-2。假設相異性聲源判定閾值為100，確定相異性聲源數量n的過程如下：從i=0開始，判斷temp[i]是否大於或等於100，若temp[i]大於或等於100，即滿足temp[i]≥100，則停止判斷；否則i=i+1，繼續判斷。若停止判斷，則停止判斷時的序號i加上1等於當前幀對應的相異性聲源數量n。例如，i=0時，若temp[0]≥100，則停止判斷，相異性聲源數量n等於1；否則令i=1，繼續判斷i=1；當i=1時，temp[1]≥100，則停止判斷，相異性聲源數量n等於i+1=2。假設第一閾值為0，第二閾值為3，則若當前幀對應的相異性聲源數量n滿足0＜n＜3，則編碼端確定當前幀的初始編碼方案為第二編碼方案。若當前幀對應的相異性聲源數量n滿足n=0或n≥3，則編碼端確定當前幀的初始編碼方案為第一編碼方案。For example, assume that the number of channels of the HOA signal of the current frame is L=16, the number of frequency points of each channel is K=8, and min(L,K)=8. Then, the encoder performs singular value decomposition on the HOA signal of the current frame to obtain singular values v[i], i=0,1,…,min(L,K)-1. The encoder calculates the ratio between adjacent singular values and uses the obtained ratio as the sound field classification result temp[i] of the current frame, temp[i]=v[i]/v[i+1], i=0,1,…,min(L,K)-2. Assuming that the threshold for determining the dissimilarity sound source is 100, the process of determining the number of dissimilarity sound sources n is as follows: starting from i=0, determine whether temp[i] is greater than or equal to 100. If temp[i] is greater than or equal to 100, that is, temp[i]≥100 is satisfied, then stop the determination; otherwise, i=i+1 and continue the determination. If the determination is stopped, the sequence number i at the time of stopping the determination plus 1 is equal to the number of dissimilarity sound sources n corresponding to the current frame. For example, when i=0, if temp[0]≥100, then stop judging, and the number of dissimilar sound sources n is equal to 1; otherwise, let i=1 and continue judging i=1; when i=1, temp[1]≥100, then stop judging, and the number of dissimilar sound sources n is equal to i+1=2. Assuming that the first threshold is 0 and the second threshold is 3, if the number of dissimilar sound sources n corresponding to the current frame satisfies 0＜n＜3, the encoder determines that the initial coding scheme of the current frame is the second coding scheme. If the number of dissimilar sound sources n corresponding to the current frame satisfies n=0 or n≥3, the encoder determines that the initial coding scheme of the current frame is the first coding scheme.

可選地，聲場分類結果包括聲場類型，聲場類型分為彌散性聲場和相異性聲場。聲場類型可以根據前述方法得到的相異性聲源數量來確定，即，編碼端基於當前幀對應的相異性聲源數量確定當前幀的聲場類型。例如，若當前幀對應的相異性聲源數量大於第一閾值且小於第二閾值，則編碼端確定當前幀的聲場類型為相異性聲場。若當前幀對應的相異性聲源數量不大於第一閾值或不小於第二閾值，則編碼端確定當前幀的聲場類型為彌散性聲場。相應地，若當前幀的聲場類型為相異性聲場，則編碼端確定當前幀的初始編碼方案為第二編碼方案，即基於MP的HOA編碼方案。若當前幀的聲場類型為彌散性聲場類型，則編碼端確定當前幀的初始編碼方案為第一編碼方案，即基於DirAC的HOA編碼方案。Optionally, the sound field classification result includes a sound field type, and the sound field type is divided into a diffuse sound field and a heterogeneous sound field. The sound field type can be determined based on the number of heterogeneous sound sources obtained by the aforementioned method, that is, the encoder determines the sound field type of the current frame based on the number of heterogeneous sound sources corresponding to the current frame. For example, if the number of heterogeneous sound sources corresponding to the current frame is greater than a first threshold and less than a second threshold, the encoder determines that the sound field type of the current frame is a heterogeneous sound field. If the number of heterogeneous sound sources corresponding to the current frame is not greater than the first threshold or not less than the second threshold, the encoder determines that the sound field type of the current frame is a diffuse sound field. Correspondingly, if the sound field type of the current frame is a heterogeneous sound field, the coding end determines that the initial coding scheme of the current frame is the second coding scheme, that is, the HOA coding scheme based on MP. If the sound field type of the current frame is a diffuse sound field type, the coding end determines that the initial coding scheme of the current frame is the first coding scheme, that is, the HOA coding scheme based on DirAC.

在一些實施例中，通過上述實現方式確定各個音訊幀（包括當前幀）的初始編碼方案之後，可能會出現各個音訊幀的初始編碼方案來回切換的情況，也即最終需要編碼的切換幀較多。由於編碼方案之間的切換帶來的問題較多，即需要解決的問題較多，那麼可以通過減少切換幀的數量來減少切換帶來的問題。為了減少切換幀的數量，編碼端可以先根據當前幀的聲場分類結果，確定當前幀的預計編碼方案，即編碼端將按照前述方法確定的初始編碼方案作為預計編碼方案。然後，編碼端採用滑動窗的方法基於預計編碼方案更新當前幀的初始編碼方案，如編碼端通過hangover處理來更新當前幀的初始編碼方案。In some embodiments, after the initial coding scheme of each audio frame (including the current frame) is determined by the above implementation method, the initial coding scheme of each audio frame may be switched back and forth, that is, more switching frames need to be encoded in the end. Since the switching between coding schemes brings more problems, that is, more problems need to be solved, the problems caused by switching can be reduced by reducing the number of switching frames. In order to reduce the number of switching frames, the encoder can first determine the expected coding scheme of the current frame according to the sound field classification result of the current frame, that is, the encoder uses the initial coding scheme determined by the above method as the expected coding scheme. Then, the encoder adopts a sliding window method to update the initial coding scheme of the current frame based on the expected coding scheme, such as the encoder updates the initial coding scheme of the current frame through hangover processing.

可選地，假設滑動窗的長度為N，滑動窗內包含當前幀的預計編碼方案以及當前幀的前N-1幀的已更新的初始編碼方案。若滑動窗內第二編碼方案的個數累計不小於第一指定閾值，則編碼端將當前幀的初始編碼方案更新為第二編碼方案。若滑動窗內第二編碼方案的個數累計小於第一指定閾值，則編碼端將當前幀的初始編碼方案更新為第一編碼方案。其中，滑動窗的長度N為8、10、15等，第一指定閾值為5、6、7等值，本申請實施例對滑動窗的長度和第一指定閾值的取值不作限定。舉例說明如下，假設滑動窗的長度為10，第一指定閾值為7，滑動窗內包含當前幀的預計編碼方案以及當前幀的前9幀的已更新的初始編碼方案，如果滑動窗內第二編碼方案的個數累計到不小於7，則編碼端將當前幀的初始編碼方案確定為第二編碼方案，如果滑動窗內第二編碼方案的個數累計小於7，則編碼端將當前幀的初始編碼方案更新為第一編碼方案。Optionally, assuming that the length of the sliding window is N, the sliding window contains the expected coding scheme of the current frame and the updated initial coding scheme of the N-1 frames before the current frame. If the cumulative number of second coding schemes in the sliding window is not less than the first specified threshold, the coding end updates the initial coding scheme of the current frame to the second coding scheme. If the cumulative number of second coding schemes in the sliding window is less than the first specified threshold, the coding end updates the initial coding scheme of the current frame to the first coding scheme. Among them, the length N of the sliding window is 8, 10, 15, etc., and the first specified threshold is 5, 6, 7, etc. The embodiment of the present application does not limit the values of the length of the sliding window and the first specified threshold. Take an example as follows, assuming that the length of the sliding window is 10, the first specified threshold is 7, the sliding window contains the expected coding scheme of the current frame and the updated initial coding schemes of the previous 9 frames of the current frame. If the number of second coding schemes in the sliding window accumulates to no less than 7, the encoder determines the initial coding scheme of the current frame as the second coding scheme. If the number of second coding schemes in the sliding window accumulates to less than 7, the encoder updates the initial coding scheme of the current frame to the first coding scheme.

或者，若滑動窗內第一編碼方案的個數累計不小於第二指定閾值，則編碼端將當前幀的初始編碼方案更新為第一編碼方案。若滑動窗內第一編碼方案的個數累計小於第二指定閾值，則編碼端將當前幀的初始編碼方案更新為第二編碼方案。其中，第二指定閾值為5、6、7等值，本申請實施例對第二指定閾值的取值不作限定。可選地，第二指定閾值與上述第一指定閾值不同或相同。Alternatively, if the cumulative number of the first coding schemes in the sliding window is not less than the second specified threshold, the coding end updates the initial coding scheme of the current frame to the first coding scheme. If the cumulative number of the first coding schemes in the sliding window is less than the second specified threshold, the coding end updates the initial coding scheme of the current frame to the second coding scheme. The second specified threshold is 5, 6, 7, etc., and the embodiment of the present application does not limit the value of the second specified threshold. Optionally, the second specified threshold is different from or the same as the first specified threshold.

除了上述介紹的一些實現方式之外，編碼端也可以採用其他的方法來得到當前幀的聲場分類結果，基於聲場分類結果確定初始編碼方案的方法也可以採用其他的方法，本申請實施例對此不作限定。In addition to the implementations described above, the encoder may also use other methods to obtain the sound field classification result of the current frame, and the method for determining the initial coding scheme based on the sound field classification result may also use other methods, which are not limited in this embodiment of the present application.

在本申請實施例中，編碼端確定當前幀的初始編碼方案之後，若當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案相同，則編碼端確定當前幀的編碼方案為當前幀的初始編碼方案。若當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案不同，則編碼端確定當前幀的編碼方案為第三編碼方案。也即是，若當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案相同且為第一編碼方案，則編碼端確定當前幀的編碼方案為第一編碼方案。若當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案相同且為第二編碼方案，則編碼端確定當前幀的編碼方案為第二編碼方案。若當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案中的一個為第一編碼方案，另一個為第二編碼方案，則編碼端確定當前幀的編碼方案為第三編碼方案。其中，當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案中的一個為第一編碼方案，另一個為第二編碼方案，即，當前幀的初始編碼方案為第一編碼方案且當前幀的前一幀的初始編碼方案為第二編碼方案，或者，當前幀的初始編碼方案為第二編碼方案且當前幀的前一幀的初始編碼方案為第一編碼方案。也即是，對於切換幀來說，編碼端既不採用第一編碼方案也不採用第二編碼方案來編碼切換幀的HOA信號，而是將採用切換幀編碼方案來編碼切換幀的HOA信號。對於非切換幀來說，編碼端將採用與非切換幀的初始編碼方案相一致的編碼方案來編碼切換幀的HOA信號。其中，初始編碼方案與前一幀的初始編碼方案不同的音訊幀為切換幀，初始編碼方案與前一幀的初始編碼方案相同的音訊幀為非切換幀。In the embodiment of the present application, after the coding end determines the initial coding scheme of the current frame, if the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame, the coding end determines the coding scheme of the current frame to be the initial coding scheme of the current frame. If the initial coding scheme of the current frame is different from the initial coding scheme of the previous frame of the current frame, the coding end determines the coding scheme of the current frame to be the third coding scheme. That is, if the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame and is the first coding scheme, the coding end determines the coding scheme of the current frame to be the first coding scheme. If the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame and is the second coding scheme, the coding end determines that the coding scheme of the current frame is the second coding scheme. If one of the initial coding scheme of the current frame and the initial coding scheme of the previous frame of the current frame is the first coding scheme and the other is the second coding scheme, the coding end determines that the coding scheme of the current frame is the third coding scheme. Among them, one of the initial coding scheme of the current frame and the initial coding scheme of the previous frame of the current frame is the first coding scheme and the other is the second coding scheme, that is, the initial coding scheme of the current frame is the first coding scheme and the initial coding scheme of the previous frame of the current frame is the second coding scheme, or the initial coding scheme of the current frame is the second coding scheme and the initial coding scheme of the previous frame of the current frame is the first coding scheme. That is, for the switching frame, the coding end does not use the first coding scheme or the second coding scheme to encode the HOA signal of the switching frame, but uses the switching frame coding scheme to encode the HOA signal of the switching frame. For the non-switching frame, the coding end will use the coding scheme consistent with the initial coding scheme of the non-switching frame to encode the HOA signal of the switching frame. Among them, the audio frame whose initial coding scheme is different from the initial coding scheme of the previous frame is the switching frame, and the audio frame whose initial coding scheme is the same as the initial coding scheme of the previous frame is the non-switching frame.

需要說明的是，編碼端除了確定當前幀的編碼方案之外，還需將能夠指示當前幀的編碼方案的資訊編入碼流，以便於解碼端確定採用哪個解碼方案來解碼當前幀的碼流。在本申請實施例中，編碼端將能夠指示當前幀的編碼方案的資訊編入碼流的實現方式有多種，接下來介紹其中的三種實現方式。It should be noted that, in addition to determining the coding scheme of the current frame, the encoder also needs to encode information that can indicate the coding scheme of the current frame into the bitstream, so that the decoder can determine which decoding scheme to use to decode the bitstream of the current frame. In the embodiment of the present application, there are multiple ways to implement the encoder to encode the information that can indicate the coding scheme of the current frame into the bitstream, and three of them are introduced below.

第一種實現方式、編碼切換標誌以及兩種編碼方案的指示資訊First implementation, coding switching flag, and indication information of two coding schemes

在該實現方式中，編碼端需要確定當前幀的切換標誌的值，將當前幀的切換標誌的值編入碼流。其中，當當前幀的編碼方案為第一編碼方案或第二編碼方案時，當前幀的切換標誌的值為第一值。當當前幀的編碼方案為第三編碼方案時，當前幀的切換標誌的值為第二值。可選地，第一值為「0」，第二值為「1」，第一值和第二值也可以為其他的值。In this implementation, the coding end needs to determine the value of the switching flag of the current frame and encode the value of the switching flag of the current frame into the bitstream. When the coding scheme of the current frame is the first coding scheme or the second coding scheme, the value of the switching flag of the current frame is the first value. When the coding scheme of the current frame is the third coding scheme, the value of the switching flag of the current frame is the second value. Optionally, the first value is "0" and the second value is "1", and the first value and the second value may also be other values.

另外，編碼端將當前幀的初始編碼方案的指示資訊編入碼流。或者，若當前幀的切換標誌的值為第一值，則編碼端將當前幀的初始編碼方案的指示資訊編入碼流，若當前幀的切換標誌的值為第二值，則編碼端將預設指示資訊編入碼流。In addition, the coding end encodes the indication information of the initial coding scheme of the current frame into the bitstream. Alternatively, if the value of the switching flag of the current frame is the first value, the coding end encodes the indication information of the initial coding scheme of the current frame into the bitstream, and if the value of the switching flag of the current frame is the second value, the coding end encodes the default indication information into the bitstream.

可選地，初始編碼方案的指示資訊以與初始編碼方案相對應的編碼模式（coding mode）來表示，即，以編碼模式作為指示資訊。例如，與初始編碼方案相對應的編碼模式為初始編碼模式，初始編碼模式為第一編碼模式（即DirAC模式）或第二編碼模式（即MP模式）。可選地，預設指示資訊為預設編碼模式，預設編碼模式為第一編碼模式或第二編碼模式。在其他一些實施例中，預設指示資訊為其他編碼模式，也即不限定編入碼流的切換幀的編碼方案的指示資訊具體是什麼。Optionally, the indication information of the initial coding scheme is represented by a coding mode corresponding to the initial coding scheme, that is, the coding mode is used as the indication information. For example, the coding mode corresponding to the initial coding scheme is the initial coding mode, and the initial coding mode is the first coding mode (i.e., DirAC mode) or the second coding mode (i.e., MP mode). Optionally, the default indication information is the default coding mode, and the default coding mode is the first coding mode or the second coding mode. In some other embodiments, the default indication information is other coding modes, that is, the indication information of the coding scheme for switching frames encoded into the bitstream is not limited.

也即是，在該第一種實現方式中，編碼端以切換標誌來指示切換幀，且可以不限定編入碼流的切換幀的編碼方案的指示資訊，切換幀的編碼方案的指示資訊可以為初始編碼模式，也可以為預設編碼模式，也可以從第一編碼模式和第二編碼模式中隨機選定，也可以是其他的指示資訊。需要說明的是，在這種實現方式中，用切換標誌來指示當前幀是否為切換幀，這樣，解碼端即能夠直接通過獲取碼流中的切換標誌來確定當前幀是否為切換幀。That is, in the first implementation, the encoder indicates the switching frame with the switching flag, and the indication information of the coding scheme of the switching frame encoded into the bitstream may not be limited. The indication information of the coding scheme of the switching frame may be the initial coding mode, the default coding mode, or randomly selected from the first coding mode and the second coding mode, or other indication information. It should be noted that in this implementation, the switching flag is used to indicate whether the current frame is a switching frame, so that the decoder can directly determine whether the current frame is a switching frame by obtaining the switching flag in the bitstream.

可選地，在該第一種實現方式中，當前幀的切換標誌和初始編碼方案的指示資訊各佔碼流的一個比特位元。示例性地，當前幀的切換標誌的值為「0」或「1」，其中，切換標誌的值為「0」指示當前幀不是切換幀，即當前幀的切換標誌的值為第一值。切換標誌為「1」指示當前幀是切換幀，即當前幀的切換標誌的值為第二值。可選地，初始編碼方案的指示資訊為「0」或「1」，其中，「0」表示DirAC模式（即DirAC編碼方案），「1」表示MP模式（即基於MP的編碼方案）。Optionally, in the first implementation, the switching flag of the current frame and the indication information of the initial coding scheme each occupy one bit of the code stream. Exemplarily, the value of the switching flag of the current frame is "0" or "1", wherein the value of the switching flag of the current frame is "0" indicating that the current frame is not a switching frame, that is, the value of the switching flag of the current frame is a first value. The switching flag is "1" indicating that the current frame is a switching frame, that is, the value of the switching flag of the current frame is a second value. Optionally, the indication information of the initial coding scheme is "0" or "1", wherein "0" indicates DirAC mode (i.e., DirAC coding scheme), and "1" indicates MP mode (i.e., MP-based coding scheme).

在其他一些實施例中，若當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案不同，則編碼端確定當前幀的切換標誌的值為第二值，將當前幀的切換標誌的值編入碼流。也即是，對於切換幀來說，由於碼流中切換標誌即能夠指示切換幀，因此無需編碼切換幀的編碼方案的指示資訊。In some other embodiments, if the initial coding scheme of the current frame is different from the initial coding scheme of the previous frame of the current frame, the coding end determines that the value of the switching flag of the current frame is the second value, and encodes the value of the switching flag of the current frame into the bitstream. That is, for the switching frame, since the switching flag in the bitstream can indicate the switching frame, there is no need to encode the indication information of the coding scheme of the switching frame.

第二種實現方式、編碼兩種編碼方案的指示資訊Second implementation method: Encoding the indication information of two coding schemes

在該實現方式中，編碼端將當前幀的初始編碼方案的指示資訊編入碼流。以編碼模式作為指示資訊為例，編入碼流的指示資訊實質上是與初始編碼方案相一致的編碼模式，即初始編碼模式，初始編碼模式為第一編碼模式或第二編碼模式。另外，編碼端可以不編碼切換標誌。In this implementation, the coding end encodes the indication information of the initial coding scheme of the current frame into the bitstream. Taking the coding mode as the indication information as an example, the indication information encoded into the bitstream is essentially the coding mode consistent with the initial coding scheme, that is, the initial coding mode, and the initial coding mode is the first coding mode or the second coding mode. In addition, the coding end may not encode the switching flag.

可選地，在該第一種實現方式中，初始編碼方案的指示資訊佔碼流的一個比特位元。示例性地，以編碼模式作為指示資訊為例，編入碼流的編碼模式為「0」或「1」，其中，「0」表示DirAC模式，指示當前幀的初始編碼方案為第一編碼方案，「1」表示MP模式，指示當前幀的初始編碼方案為第二編碼方案。Optionally, in the first implementation, the indication information of the initial coding scheme occupies one bit of the bitstream. Exemplarily, taking the coding mode as the indication information, the coding mode encoded into the bitstream is "0" or "1", wherein "0" indicates the DirAC mode, indicating that the initial coding scheme of the current frame is the first coding scheme, and "1" indicates the MP mode, indicating that the initial coding scheme of the current frame is the second coding scheme.

第三種實現方式、編碼三種編碼方案的指示資訊The third implementation method is to encode the indication information of three encoding schemes

在該實現方式中，編碼端將當前幀的編碼方案的指示資訊編入碼流。以編碼模式作為指示資訊為例，編入碼流的指示資訊實質上是與當前幀的編碼方案相一致的編碼模式，與當前幀的編碼方案相一致的編碼模式為實際編碼模式，實際編碼模式即第一編碼模式、第二編碼模式或第三編碼模式。可選地，第三編碼模式為MP-W模式。In this implementation, the coding end encodes the indication information of the coding scheme of the current frame into the bitstream. Taking the coding mode as the indication information as an example, the indication information encoded into the bitstream is essentially the coding mode consistent with the coding scheme of the current frame, and the coding mode consistent with the coding scheme of the current frame is the actual coding mode, which is the first coding mode, the second coding mode or the third coding mode. Optionally, the third coding mode is the MP-W mode.

可選地，在該第三種實現方式中，當前幀的編碼方案的指示資訊佔碼流的兩個比特位元。示例性地，當前幀的編碼方案的指示資訊為「00」、「01」或「10」。其中，「00」指示當前幀的編碼方案為第一編碼方案，「01」指示當前幀的編碼方案為第二編碼方案，「10」指示當前幀的編碼方案為第三編碼方案。Optionally, in the third implementation, the indication information of the coding scheme of the current frame occupies two bits of the bitstream. Exemplarily, the indication information of the coding scheme of the current frame is "00", "01" or "10". Among them, "00" indicates that the coding scheme of the current frame is the first coding scheme, "01" indicates that the coding scheme of the current frame is the second coding scheme, and "10" indicates that the coding scheme of the current frame is the third coding scheme.

由上述可知，在上述第一種實現方式中，編碼端確定當前幀的初始編碼方案之後，確定切換標誌的值，將切換標誌的值編入碼流。另外，將當前幀的初始編碼方案的指示資訊編入碼流，或者，若當前幀為切換幀，則編碼端將預設指示資訊編入碼流，若當前幀為非切換幀，則編碼端將當前幀的初始編碼方案的指示資訊編入碼流。在上述第二種實現方式中，編碼端確定當前幀的初始編碼方案之後，直接將當前幀的初始編碼方案的指示資訊編入碼流。在上述第三種實現方式中，編碼端確定當前幀的初始編碼方案之後，基於當前幀的初始編碼方案和當前幀的前一幀的初始編碼方案，確定當前幀的編碼方案，將當前幀的編碼方案的指示資訊編入碼流。As can be seen from the above, in the first implementation, after the encoder determines the initial coding scheme of the current frame, it determines the value of the switching flag and encodes the value of the switching flag into the bitstream. In addition, the indication information of the initial coding scheme of the current frame is encoded into the bitstream, or, if the current frame is a switching frame, the encoder encodes the default indication information into the bitstream, and if the current frame is a non-switching frame, the encoder encodes the indication information of the initial coding scheme of the current frame into the bitstream. In the second implementation, after the encoder determines the initial coding scheme of the current frame, it directly encodes the indication information of the initial coding scheme of the current frame into the bitstream. In the third implementation method described above, after the coding end determines the initial coding scheme of the current frame, it determines the coding scheme of the current frame based on the initial coding scheme of the current frame and the initial coding scheme of the previous frame of the current frame, and encodes the indication information of the coding scheme of the current frame into the bitstream.

步驟602：若當前幀的編碼方案為第三編碼方案，則將該HOA信號中指定通道的信號編入碼流，指定通道為該HOA信號的所有通道中的部分通道。Step 602: If the coding scheme of the current frame is the third coding scheme, the signal of the designated channel in the HOA signal is encoded into the bitstream, and the designated channel is a part of all channels of the HOA signal.

在本申請實施例中，若當前幀的編碼方案為第三編碼方案，表示當前幀為切換幀，則編碼端按照第三編碼方案（即混合編碼方案）對當前幀的HOA信號進行編碼。對應於上述步驟601中的第一種實現方式，若當前幀的切換標誌的值為第二值，表示當前幀為切換幀。對應於上述步驟601中的第二種實現方式，若當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案不同，表示當前幀為切換幀。對應於上述步驟601中的第三種實現方式，若當前幀的編碼方案為第三編碼方案，則當前幀的編碼方案指示當前幀為切換幀。對於切換幀來說，編碼端採用第三編碼方案來編碼當前幀的HOA信號。其中，第三編碼方案指示將當前幀的HOA信號中指定通道的信號編入碼流，其中，指定通道為該HOA信號的所有通道中的部分通道。也即是，對於切換幀來說，編碼端將切換幀的HOA信號中指定通道的信號編入碼流，而非採用第一編碼方案或第二編碼方案對切換幀進行編碼，即本方案為了編碼方案切換時聽覺品質的平滑過渡，採用一種折衷的方式來編碼切換幀。In the embodiment of the present application, if the coding scheme of the current frame is the third coding scheme, indicating that the current frame is a switching frame, the coding end encodes the HOA signal of the current frame according to the third coding scheme (i.e., the hybrid coding scheme). Corresponding to the first implementation method in the above step 601, if the value of the switching flag of the current frame is the second value, it indicates that the current frame is a switching frame. Corresponding to the second implementation method in the above step 601, if the initial coding scheme of the current frame is different from the initial coding scheme of the previous frame of the current frame, it indicates that the current frame is a switching frame. Corresponding to the third implementation method in the above step 601, if the coding scheme of the current frame is the third coding scheme, the coding scheme of the current frame indicates that the current frame is a switching frame. For the switching frame, the encoder adopts the third coding scheme to encode the HOA signal of the current frame. The third coding scheme indicates that the signal of the specified channel in the HOA signal of the current frame is encoded into the bitstream, wherein the specified channel is a part of all channels of the HOA signal. That is, for the switching frame, the encoder encodes the signal of the specified channel in the HOA signal of the switching frame into the bitstream instead of using the first coding scheme or the second coding scheme to encode the switching frame. That is, this scheme adopts a compromise method to encode the switching frame in order to achieve a smooth transition of the auditory quality when the coding scheme is switched.

可選地，指定通道與第一編碼方案中預設的傳輸通道一致，即指定通道為預設通道。也即是，在第三編碼方案與第二編碼方案不同的前提下，為了使得第三編碼方案與第二編碼方案的編碼效果相接近，編碼端將切換幀的HOA信號中與第一編碼方案中預設的傳輸通道相同的通道的信號編入碼流，從而使得聽覺品質盡可能地平滑過渡。需要說明的是，根據編碼頻寬、碼率的不同，甚至是應用場景的不同，可以分別預設不同的傳輸通道。可選地，不同的編碼頻寬、碼率或應用場景下，預設的傳輸通道也可以相同。Optionally, the designated channel is consistent with the transmission channel preset in the first coding scheme, that is, the designated channel is the default channel. That is, under the premise that the third coding scheme is different from the second coding scheme, in order to make the coding effects of the third coding scheme close to those of the second coding scheme, the coding end encodes the signal of the channel that is the same as the transmission channel preset in the first coding scheme in the HOA signal of the switching frame into the bit stream, so that the auditory quality transitions as smoothly as possible. It should be noted that different transmission channels can be preset according to different coding bandwidths, bit rates, and even different application scenarios. Optionally, the preset transmission channels can also be the same under different coding bandwidths, bit rates or application scenarios.

可選地，指定通道的信號包括FOA信號，FOA信號包括全向的W信號，以及定向的X信號、Y信號和Z信號。也即是，指定通道包括FOA通道，FOA通道的信號為低階信號，即，若當前幀為切換幀，則編碼端將當前幀的HOA信號的低階部分編入碼流，低階部分即包括FOA通道的W信號、X信號、Y信號和Z信號。Optionally, the signal of the designated channel includes a FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signal, Y signal, and Z signal. That is, the designated channel includes a FOA channel, and the signal of the FOA channel is a low-order signal, that is, if the current frame is a switching frame, the encoder encodes the low-order part of the HOA signal of the current frame into the bitstream, and the low-order part includes the W signal, X signal, Y signal, and Z signal of the FOA channel.

需要說明的是，在本申請實施例中，編碼端將HOA信號中指定通道的信號編入碼流的實現方式有很多，能將指定通道的信號編入碼流即可。接下來介紹其中的一些實現方式。It should be noted that in the present application embodiment, there are many implementation methods for the encoding end to encode the signal of the designated channel in the HOA signal into the bitstream, and it is sufficient to encode the signal of the designated channel into the bitstream. Next, some of the implementation methods are introduced.

在本申請實施例中，若指定通道包括FOA通道，則編碼端基於W信號、X信號、Y信號和Z信號，確定虛擬揚聲器信號和殘差信號，將虛擬揚聲器信號和殘差信號編入碼流。In the embodiment of the present application, if the designated channel includes the FOA channel, the encoding end determines the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal and the Z signal, and encodes the virtual speaker signal and the residual signal into the bitstream.

可選地，編碼端將W信號確定為一路虛擬揚聲器信號，基於W信號、X信號、Y信號和Z信號確定三路殘差信號，或者，將X信號、Y信號和Z信號確定為三路殘差信號。可選地，編碼端將W信號、X信號、Y信號和Z信號中任意三路信號與剩餘一路信號之間的差信號確定為三路殘差信號。例如，編碼端將X信號、Y信號和Z信號分別與W信號之間的差信號確定為三路殘差信號。示例性地，編碼端將X-W、Y-W、Z-W分別得到的差信號X’、Y’、Z’作為三路殘差信號。Optionally, the coding end determines the W signal as a virtual speaker signal, determines three residual signals based on the W signal, the X signal, the Y signal and the Z signal, or determines the X signal, the Y signal and the Z signal as three residual signals. Optionally, the coding end determines the difference signals between any three signals of the W signal, the X signal, the Y signal and the Z signal and the remaining signal as three residual signals. For example, the coding end determines the difference signals between the X signal, the Y signal and the Z signal and the W signal as three residual signals. Exemplarily, the coding end uses the difference signals X', Y', and Z' obtained by X-W, Y-W, and Z-W as three residual signals.

若編碼端使用核心編碼器對當前幀進行編碼，核心編碼器為身歷聲編碼器，由於所確定的一路虛擬揚聲器信號和三路殘差信號都是單聲道信號，因此，編碼端需要先基於這些單聲道信號組合出身歷聲信號，進而使用身歷聲編碼器進行編碼。可選地，編碼端將該一路虛擬揚聲器信號與第一路預設單聲道信號組合，以得到一路身歷聲信號，將該三路殘差信號與第二路預設單聲道信號組合，以得到兩路身歷聲信號。編碼端通過身歷聲編碼器將得到的三路身歷聲信號分別編入碼流。If the encoder uses a core encoder to encode the current frame, the core encoder is a stereo encoder. Since the determined virtual speaker signal and the three residual signals are all mono signals, the encoder needs to first combine the stereo signals based on these mono signals, and then use the stereo encoder for encoding. Optionally, the encoder combines the virtual speaker signal with the first preset mono signal to obtain a stereo signal, and combines the three residual signals with the second preset mono signal to obtain two stereo signals. The encoder encodes the obtained three stereo signals into the bitstream respectively through the stereo encoder.

其中，本申請實施例不限定編碼端將該三路殘差信號與一路預設單聲道信號組合，以得到兩路身歷聲信號的具體組合方式。可選地，編碼端將該三路殘差信號中相關性最高的兩路殘差信號組合，以得到該兩路身歷聲信號中的一路身歷聲信號，將該三路殘差信號中除相關性最高的兩路殘差信號之外的一路殘差信號與第二路預設單聲道信號組合，以得到該兩路身歷聲信號中的另一路身歷聲信號。也即是，編碼端根據信號的相關性來組合得到身歷聲信號。在其他一些實施例中，編碼端也可以將該三路殘差信號中的任意兩路殘差信號組合，以得到這兩路身歷聲信號中的一路身歷聲信號，將剩餘一路殘差信號與第二路預設單聲道信號組合，以得到該兩路身歷聲信號中的另一路身歷聲信號。Among them, the embodiment of the present application does not limit the specific combination method of the encoder to combine the three-way residual signal with a preset mono signal to obtain two-way stereo signals. Optionally, the encoder combines the two most correlated residual signals among the three-way residual signals to obtain one stereo signal among the two-way stereo signals, and combines one residual signal other than the two most correlated residual signals among the three-way residual signals with a second preset mono signal to obtain another stereo signal among the two-way stereo signals. That is, the encoder combines the signals according to their correlation to obtain the stereo signal. In some other embodiments, the encoding end may also combine any two of the three residual signals to obtain one of the two stereo signals, and combine the remaining residual signal with the second preset mono signal to obtain the other of the two stereo signals.

可選地，本申請實施例中的第一路預設單聲道信號為全零信號或全一信號，第二路預設單聲道信號為全零信號或全一信號。可選地，第一路預設單聲道信號與第二路預設單聲道信號相同或不同，即，第一路預設單聲道信號與第二路預設單聲道信號均為全零信號或全一信號，或者，第一路預設單聲道信號為全零信號且第二路預設單聲道信號為全一信號，或者，第一路預設單聲道信號為全一信號且第二路預設單聲道信號為全零信號。其中，全零信號包括採樣點的值均為零的信號或者頻點的值均為零的信號，全一信號包括採樣點的值均為一的信號或者頻點的值均為一的信號。其中，若HOA信號為時域信號，則全零信號包括採樣點的值均為零的信號，全一信號包括採樣點的值均為一的信號。若HOA信號為頻域信號，則全零信號包括頻點的值均為零的信號，全一信號包括頻點的值均為一的信號。在其他一些實施例中，第一路預設單聲道信號和/或第二路預設單聲道信號也可以是預設的其他形式的信號。Optionally, the first preset mono signal in the embodiment of the present application is an all-zero signal or an all-one signal, and the second preset mono signal is an all-zero signal or an all-one signal. Optionally, the first preset mono signal is the same as or different from the second preset mono signal, that is, the first preset mono signal and the second preset mono signal are all-zero signals or all-one signals, or the first preset mono signal is an all-zero signal and the second preset mono signal is an all-one signal, or the first preset mono signal is an all-one signal and the second preset mono signal is an all-zero signal. Among them, the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency values are all zero, and the all-one signal includes a signal whose sampling point values are all one or a signal whose frequency values are all one. Wherein, if the HOA signal is a time domain signal, the all-zero signal includes a signal whose sampling point values are all zero, and the all-one signal includes a signal whose sampling point values are all one. If the HOA signal is a frequency domain signal, the all-zero signal includes a signal whose frequency point values are all zero, and the all-one signal includes a signal whose frequency point values are all one. In some other embodiments, the first preset mono signal and/or the second preset mono signal may also be other preset signals.

若編碼端使用的核心編碼器為單聲道編碼器，則編碼端通過單聲道編碼器將該一路虛擬揚聲器信號、以及該三路殘差信號中的各路殘差信號分別編入碼流。If the core codec used by the encoding end is a mono codec, the encoding end encodes the virtual speaker signal and each of the three residual signals into a bitstream through the mono codec.

圖7是本申請實施例提供的一種切換幀編碼方案的示意圖。請參考圖7，待編碼的當前幀為切換幀，編碼端獲取當前幀的HOA信號，將該HOA信號中的W信號作為虛擬揚聲器信號，根據該HOA信號中的FOA信號確定殘差信號，如根據該HOA信號中的X、Y、Z信號確定殘差信號，或根據W信號和X、Y、Z信號確定殘差信號。編碼端通過核心編碼器將所確定的虛擬揚聲器信號和殘差信號編入碼流，以得到切換幀的碼流。FIG7 is a schematic diagram of a switching frame coding scheme provided by an embodiment of the present application. Referring to FIG7, the current frame to be coded is a switching frame, the coding end obtains the HOA signal of the current frame, uses the W signal in the HOA signal as a virtual speaker signal, and determines the residual signal according to the FOA signal in the HOA signal, such as determining the residual signal according to the X, Y, and Z signals in the HOA signal, or determining the residual signal according to the W signal and the X, Y, and Z signals. The coding end encodes the determined virtual speaker signal and the residual signal into the bitstream through the core encoder to obtain the bitstream of the switching frame.

可選地，在其他實施例中，編碼端將W信號、X信號、Y信號和Z信號中的兩路信號確定為兩路虛擬揚聲器信號，將剩餘兩路信號確定為兩路殘差信號。編碼端將該兩路虛擬揚聲器信號組合，以得到一路身歷聲信號，將該兩路殘差信號組合，以得到另一路身歷聲信號。編碼端通過身歷聲編碼器將得到的兩路身歷聲信號分別編入碼流。Optionally, in other embodiments, the encoding end determines two signals among the W signal, the X signal, the Y signal, and the Z signal as two virtual speaker signals, and determines the remaining two signals as two residual signals. The encoding end combines the two virtual speaker signals to obtain one stereo signal, and combines the two residual signals to obtain another stereo signal. The encoding end encodes the obtained two stereo signals into bitstreams respectively through a stereo encoder.

其中，本申請實施例不限定編碼端將W信號、X信號、Y信號和Z信號進行兩兩組合以得到兩路身歷聲信號的具體組合方式。可選地，編碼端將W信號確定為一路虛擬揚聲器信號，將X信號、Y信號和Z信號中與W信號相關性最高的一路信號確定為另一路虛擬揚聲器信號，也即將FOA通道包括的四路信號中的W信號以及與W信號相關性最高的一個信號進行組合，將剩餘兩路信號進行組合。或者，編碼端將W信號、X信號、Y信號和Z信號中的任意兩路信號進行組合，以得到一路身歷聲信號，將剩餘兩路信號進行組合，以得到另一路身歷聲信號。Among them, the embodiment of the present application does not limit the specific combination method of the encoding end combining the W signal, the X signal, the Y signal and the Z signal in pairs to obtain two-way stereo signals. Optionally, the encoding end determines the W signal as a virtual speaker signal, and determines the signal with the highest correlation with the W signal among the X signal, the Y signal and the Z signal as another virtual speaker signal, that is, the W signal and the signal with the highest correlation with the W signal among the four signals included in the FOA channel are combined, and the remaining two signals are combined. Alternatively, the encoding end combines any two signals among the W signal, the X signal, the Y signal and the Z signal to obtain a stereo signal, and combines the remaining two signals to obtain another stereo signal.

需要說明的是，本申請實施例不限定編碼端採用核心編碼器編碼虛擬揚聲器信號和殘差信號的具體實現方式，例如不限定虛擬揚聲器信號和殘差信號分別對應的編碼比特數等。It should be noted that the embodiment of the present application does not limit the specific implementation method of the encoding end using the core encoder to encode the virtual speaker signal and the residual signal, for example, it does not limit the number of encoding bits corresponding to the virtual speaker signal and the residual signal respectively.

以上介紹了當前幀為切換幀的情況下，編碼端對當前幀編碼的過程，也即編碼端按照第三編碼方案將切換幀的HOA信號中指定通道的信號編入碼流，第三編碼方案即切換幀編碼方案。由上述可知，在本申請實施例中，指定通道的信號可以包括W信號，W信號是HOA信號的一個核心信號，這樣，切換幀編碼方案也可稱為基於MP-W的編碼方案。接下來介紹在當前幀為非切換幀的情況下，編碼端對當前幀編碼的過程。The above describes the process of encoding the current frame by the encoder when the current frame is a switching frame, that is, the encoder encodes the signal of the specified channel in the HOA signal of the switching frame into the bitstream according to the third coding scheme, and the third coding scheme is the switching frame coding scheme. As can be seen from the above, in the embodiment of the present application, the signal of the specified channel may include a W signal, which is a core signal of the HOA signal. In this way, the switching frame coding scheme can also be called a coding scheme based on MP-W. Next, the process of encoding the current frame by the encoder when the current frame is a non-switching frame is described.

在本申請實施例中，若當前幀的編碼方案為第一編碼方案，則編碼端按照第一編碼方案將當前幀的HOA信號編入碼流。若當前幀的編碼方案為第二編碼方案，則編碼端按照第二編碼方案將當前幀的HOA信號編入碼流。也即是，若當前幀不是切換幀，則編碼端採用當前幀的初始編碼方案來編碼當前幀。In the embodiment of the present application, if the coding scheme of the current frame is the first coding scheme, the coding end encodes the HOA signal of the current frame into the bitstream according to the first coding scheme. If the coding scheme of the current frame is the second coding scheme, the coding end encodes the HOA signal of the current frame into the bitstream according to the second coding scheme. That is, if the current frame is not a switching frame, the coding end uses the initial coding scheme of the current frame to encode the current frame.

示例性地，參見圖8，編碼端按照第二編碼方案將當前幀的HOA信號編入碼流的實現過程為：編碼端基於MP演算法從虛擬揚聲器集合中選擇與當前幀的HOA信號匹配的目標虛擬揚聲器，基於當前幀的HOA信號和目標虛擬揚聲器，通過基於MP的空間編碼器確定虛擬揚聲器信號，基於當前幀的HOA信號和虛擬揚聲器信號通過基於MP的空間編碼器確定殘差信號，通過核心編碼器將虛擬揚聲器信號和殘差信號編入碼流。需要說明的是，基於MP的HOA編碼方案與切換幀編碼方案中確定虛擬揚聲器信號和殘差信號的原理和具體方式不同，且兩個方案所確定的虛擬揚聲器信號和殘差信號也不同。對於同一幀來說，採用基於MP的HOA編碼方案編入碼流的有效資訊會多於採用切換幀編碼方案。而本方案在切換幀編碼方案與第二編碼方案不同的前提下，為了使得切換幀編碼方案與第二編碼方案的編碼效果相接近，切換幀編碼方案也是將虛擬揚聲器信號和殘差信號編入碼流，從而使得聽覺品質盡可能地平滑過渡。Exemplarily, referring to FIG8 , the implementation process of the encoder encoding the HOA signal of the current frame into the bitstream according to the second coding scheme is as follows: the encoder selects a target virtual speaker that matches the HOA signal of the current frame from a virtual speaker set based on the MP algorithm, determines the virtual speaker signal through an MP-based spatial encoder based on the HOA signal of the current frame and the target virtual speaker, determines the residual signal through an MP-based spatial encoder based on the HOA signal of the current frame and the virtual speaker signal, and encodes the virtual speaker signal and the residual signal into the bitstream through a core encoder. It should be noted that the MP-based HOA coding scheme and the switching frame coding scheme have different principles and specific methods for determining the virtual speaker signal and the residual signal, and the virtual speaker signal and the residual signal determined by the two schemes are also different. For the same frame, the effective information encoded into the bitstream using the MP-based HOA coding scheme will be more than that using the switching frame coding scheme. On the premise that the switching frame coding scheme is different from the second coding scheme, in order to make the coding effects of the switching frame coding scheme and the second coding scheme close, the switching frame coding scheme also encodes the virtual speaker signal and the residual signal into the bitstream, so as to make the auditory quality transition as smooth as possible.

編碼端按照第一編碼方案將當前幀的HOA信號編入碼流的實現過程為：編碼端從當前幀的HOA信號中提取核心層信號和空間參數，將提取的核心層信號和空間參數編入碼流。示例性地，參見圖9，編碼端通過核心編碼信號獲取模組從當前幀的HOA信號中提取核心層信號，通過基於DirAC的空間參數提取模組從當前幀的HOA信號中提取出空間參數，通過核心編碼器將核心層信號編入碼流，通過空間參數編碼器將空間參數編入碼流。其中，核心層信號對應的通道與本方案中的指定通道一致。另外，採用第一編碼方案除了將核心層信號編入碼流之外，還將提取的空間參數編入碼流，空間參數包含豐富的場景資訊，例如方向資訊等。可見，對於同一幀來說，採用基於DirAC的HOA編碼方案編入碼流的有效資訊也會多於採用切換幀編碼方案編入碼流的有效資訊，而本方案在切換幀編碼方案與第一編碼方案不同的前提下，為了使得切換幀編碼方案與第一編碼方案的編碼效果相接近，切換幀編碼方案也是將HOA信號中與第一編碼方案所預設的傳輸通道的信號編入碼流，但不會將HOA信號中除指定通道的信號之外更多的資訊編入碼流，也即不會提取空間參數，更不會將空間參數編入碼流，從而使得聽覺品質盡可能地平滑過渡。The implementation process of the encoder encoding the HOA signal of the current frame into the bitstream according to the first coding scheme is as follows: the encoder extracts the core layer signal and spatial parameters from the HOA signal of the current frame, and encodes the extracted core layer signal and spatial parameters into the bitstream. For example, referring to FIG9 , the encoder extracts the core layer signal from the HOA signal of the current frame through the core coding signal acquisition module, extracts the spatial parameters from the HOA signal of the current frame through the DirAC-based spatial parameter extraction module, encodes the core layer signal into the bitstream through the core encoder, and encodes the spatial parameters into the bitstream through the spatial parameter encoder. Among them, the channel corresponding to the core layer signal is consistent with the designated channel in this scheme. In addition, the first coding scheme not only encodes the core layer signal into the bitstream, but also encodes the extracted spatial parameters into the bitstream. The spatial parameters contain rich scene information, such as direction information. It can be seen that, for the same frame, the effective information encoded into the bitstream using the DirAC-based HOA coding scheme will be more than the effective information encoded into the bitstream using the switching frame coding scheme. Under the premise that the switching frame coding scheme is different from the first coding scheme, in order to make the coding effects of the switching frame coding scheme close to those of the first coding scheme, the switching frame coding scheme also encodes the signal of the transmission channel preset by the first coding scheme in the HOA signal into the bitstream, but will not encode more information in the HOA signal except the signal of the specified channel into the bitstream, that is, it will not extract spatial parameters, and will not encode spatial parameters into the bitstream, so as to make the transition of auditory quality as smooth as possible.

圖10是本申請實施例提供的另一種編碼方法的流程圖。請參考圖10，以將當前幀的初始編碼方案的指示資訊編入碼流為例，對本申請實施例提供的編碼方法再次進行解釋說明。編碼端首先獲取待編碼的當前幀的HOA信號。然後，編碼端對該HOA信號進行聲場類型分析，以確定當前幀的初始編碼方案，編碼端將當前幀的初始編碼方案的指示資訊編入碼流。編碼端判斷當前幀的初始編碼方案與前一幀的初始編碼方案是否相同。若當前幀的初始編碼方案與前一幀的初始編碼方案相同，則編碼端採用當前幀的初始編碼方案對當前幀的HOA信號進行編碼，以得到當前幀的碼流。若當前幀的初始編碼方案與前一幀的初始編碼方案不同，則編碼端採用切換幀編碼方案對當前幀的HOA信號進行編碼，以得到當前幀的碼流。FIG10 is a flow chart of another coding method provided by the embodiment of the present application. Please refer to FIG10, and take encoding the indication information of the initial coding scheme of the current frame into the bitstream as an example to explain the coding method provided by the embodiment of the present application again. The coding end first obtains the HOA signal of the current frame to be encoded. Then, the coding end performs a sound field type analysis on the HOA signal to determine the initial coding scheme of the current frame, and the coding end encodes the indication information of the initial coding scheme of the current frame into the bitstream. The coding end determines whether the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame. If the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame, the coding end uses the initial coding scheme of the current frame to encode the HOA signal of the current frame to obtain the code stream of the current frame. If the initial coding scheme of the current frame is different from the initial coding scheme of the previous frame, the coding end uses the switching frame coding scheme to encode the HOA signal of the current frame to obtain the code stream of the current frame.

需要說明的是，若當前幀為待編碼的第一個音訊幀，則當前幀的初始編碼方案為第一編碼方案或第二編碼方案，編碼端採用當前幀的初始編碼方案將當前幀的HOA信號編入碼流。It should be noted that if the current frame is the first audio frame to be encoded, the initial coding scheme of the current frame is the first coding scheme or the second coding scheme, and the encoding end uses the initial coding scheme of the current frame to encode the HOA signal of the current frame into the bitstream.

綜上所述，在本申請實施例中，結合兩個方案（即基於虛擬揚聲器選擇的編解碼方案和基於方向音訊編碼的編解碼方案）對音訊幀的HOA信號進行編解碼，也即針對不同的音訊幀選擇合適的編解碼方案，這樣能夠提升音訊信號的壓縮率。同時，為了使得在不同編解碼方案之間切換時聽覺品質的平滑過渡，本方案中對於某些音訊幀來說，並非直接採用上述兩個方案中的任一個方案進行編碼，而是採用一種新的編解碼方案來編解碼這些音訊幀，即將這些音訊幀的HOA信號中指定通道的信號編入碼流，即採用一種折衷的方案進行編解碼，從而使得對解碼恢復出的HOA信號進行渲染播放後的聽覺品質能夠平滑過渡。In summary, in the embodiment of the present application, two schemes (i.e., a coding and decoding scheme based on virtual speaker selection and a coding and decoding scheme based on directional audio coding) are combined to encode and decode the HOA signal of the audio frame, that is, appropriate coding and decoding schemes are selected for different audio frames, which can improve the compression rate of the audio signal. At the same time, in order to ensure a smooth transition of auditory quality when switching between different coding and decoding schemes, this scheme does not directly adopt any of the above two schemes for encoding certain audio frames, but adopts a new coding and decoding scheme to encode and decode these audio frames, that is, encodes the signals of the specified channels in the HOA signals of these audio frames into the bit stream, that is, adopts a compromise scheme for encoding and decoding, so that the auditory quality after rendering and playing the decoded and restored HOA signals can be smoothly transitioned.

圖11是本申請實施例提供的一種解碼方法的流程圖，該方法應用於解碼端。需要說明的是，該解碼方法對應於圖6所示的編碼方法。請參考圖11，該方法包括如下步驟。FIG11 is a flow chart of a decoding method provided by an embodiment of the present application, and the method is applied to a decoding end. It should be noted that the decoding method corresponds to the encoding method shown in FIG6. Referring to FIG11, the method includes the following steps.

步驟1101：基於碼流獲得當前幀的解碼方案。Step 1101: Obtain a decoding solution for the current frame based on the bitstream.

其中，當前幀的解碼方案為第一解碼方案、第二解碼方案和第三解碼方案中的一種。第一解碼方案為基於DirAC的HOA解碼方案，第二解碼方案為基於虛擬揚聲器選擇的HOA解碼方案，第三解碼方案為混合解碼方案。可選地，混合解碼方案也稱為切換幀解碼方案。The decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme. The first decoding scheme is the HOA decoding scheme based on DirAC, the second decoding scheme is the HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme. Optionally, the hybrid decoding scheme is also called a switching frame decoding scheme.

需要說明的是，由於編碼端對不同的音訊幀採用不同的編碼方案進行編碼，那麼解碼端也需要用對應的解碼方案來解碼各個音訊幀。It should be noted that since the encoder uses different coding schemes to encode different audio frames, the decoder also needs to use the corresponding decoding scheme to decode each audio frame.

接下來首先介紹解碼端如何確定當前幀的編碼方案。由前述可知，在圖6所示編碼方法的步驟601中介紹了編碼端將能夠用於指示當前幀的編碼方案的資訊編入碼流的三種實現方式，相應地，解碼端確定當前幀的編碼方案也對應有三種實現方式，接下來將對此進行介紹。Next, we will first introduce how the decoding end determines the coding scheme of the current frame. As mentioned above, in step 601 of the coding method shown in FIG6 , three implementation methods are introduced in which the coding end encodes information that can be used to indicate the coding scheme of the current frame into the bitstream. Correspondingly, there are also three corresponding implementation methods for the decoding end to determine the coding scheme of the current frame, which will be introduced next.

第一種實現方式、編碼了切換標誌以及兩種編碼方案的指示資訊The first implementation encodes the switching flag and the indication information of the two encoding schemes.

解碼端先從碼流中解析出當前幀的切換標誌的值。若該切換標誌的值為第一值，則解碼端再從該碼流中解析出當前幀的解碼方案的指示資訊，該指示資訊用於指示當前幀的解碼方案為第一解碼方案或第二解碼方案。若該切換標誌為的值為第二值，則解碼端確定當前幀的解碼方案為第三解碼方案。需要說明的是，編碼端編入碼流的編碼方案的指示資訊即為解碼端從碼流中解析出的解碼方案的指示資訊。The decoding end first parses the value of the switching flag of the current frame from the bitstream. If the value of the switching flag is the first value, the decoding end then parses the indication information of the decoding scheme of the current frame from the bitstream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme. If the value of the switching flag is the second value, the decoding end determines that the decoding scheme of the current frame is the third decoding scheme. It should be noted that the indication information of the coding scheme encoded into the bitstream by the encoding end is the indication information of the decoding scheme parsed from the bitstream by the decoding end.

換句話說，若解碼端解析出當前幀的切換標誌的值為第一值，說明當前幀為非切換幀。解碼端再從碼流中解析出解碼方案的指示資訊，基於指示資訊確定當前幀的解碼方案。若解碼端解析出當前幀的切換標誌的值為第二值，說明當前幀為切換幀，即使碼流中包含指示資訊，解碼端也無需解碼指示資訊。In other words, if the decoder parses the switching flag value of the current frame as the first value, it means that the current frame is a non-switching frame. The decoder then parses the indication information of the decoding scheme from the bitstream and determines the decoding scheme of the current frame based on the indication information. If the decoder parses the switching flag value of the current frame as the second value, it means that the current frame is a switching frame. Even if the bitstream contains indication information, the decoder does not need to decode the indication information.

需要說明的是，若切換標誌的值為第二值，則解碼端確定當前幀的解碼方案為切換幀解碼方案，且當前幀為切換幀，切換幀解碼方案是不同於第一解碼方案和第二解碼方案的解碼方案，切換幀解碼方案是為了聽覺品質的平滑過渡。It should be noted that if the value of the switching flag is the second value, the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme, and the current frame is a switching frame. The switching frame decoding scheme is a decoding scheme different from the first decoding scheme and the second decoding scheme. The switching frame decoding scheme is for a smooth transition of auditory quality.

可選地，在該第一種實現方式中，解碼方案的指示資訊和切換標誌各佔碼流的一個比特位元。示例性地，解碼端先從碼流中解析當前幀的切換標誌的值，若解析出的切換標誌的值為「0」，即切換標誌的值為第一值，則解碼端再從碼流中解析當前幀的解碼方案的指示資訊，若解析出的指示資訊為「0」，則解碼端確定當前幀的解碼方案為第一解碼方案。若解析出的指示資訊為「1」，則解碼端確定當前幀的解碼方案為第二解碼方案。若解析出的切換標誌為的值「1」，則解碼端確定當前幀的解碼方案為切換幀解碼方案（第三解碼方案）。Optionally, in the first implementation, the indication information of the decoding scheme and the switching flag each occupy one bit of the code stream. Exemplarily, the decoding end first parses the value of the switching flag of the current frame from the code stream. If the value of the parsed switching flag is "0", that is, the value of the switching flag is the first value, the decoding end then parses the indication information of the decoding scheme of the current frame from the code stream. If the parsed indication information is "0", the decoding end determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "1", the decoding end determines that the decoding scheme of the current frame is the second decoding scheme. If the parsed switching flag is the value "1", the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme (the third decoding scheme).

第二種實現方式、編碼了兩種編碼方案的指示資訊The second implementation encodes the indication information of the two coding schemes.

解碼端從碼流中解析出當前幀的初始解碼方案，初始解碼方案為第一解碼方案或第二解碼方案。若當前幀的初始解碼方案與當前幀的前一幀的初始解碼方案相同，則確定當前幀的解碼方案為當前幀的初始解碼方案。若當前幀的初始解碼方案與當前幀的前一幀的初始解碼方案不同，則確定當前幀的解碼方案為第三解碼方案，即混合解碼方案。其中，當前幀的初始解碼方案與當前幀的前一幀的初始解碼方案不同是指，當前幀的初始解碼方案為第一解碼方案且當前幀的前一幀的初始解碼方案為第二解碼方案，或者，當前幀的初始解碼方案為第二解碼方案且當前幀的前一幀的初始解碼方案為第一解碼方案。也即是，當前幀的初始解碼方案與當前幀的前一幀的初始解碼方案中的一個為第一解碼方案，另一個為第二解碼方案。The decoding end parses the initial decoding scheme of the current frame from the bitstream, and the initial decoding scheme is the first decoding scheme or the second decoding scheme. If the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame, the decoding scheme of the current frame is determined to be the initial decoding scheme of the current frame. If the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame of the current frame, the decoding scheme of the current frame is determined to be the third decoding scheme, that is, the hybrid decoding scheme. The difference between the initial decoding scheme of the current frame and the initial decoding scheme of the previous frame of the current frame means that the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme. That is, one of the initial decoding scheme of the current frame and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme, and the other is the second decoding scheme.

可選地，在該第二種實現方式中，用於指示初始編碼方案的指示資訊佔碼流的一個比特位元，以編碼模式作為指示資訊為例，碼流中的編碼模式佔一個比特位元。示例性地，解碼端從碼流中解析當前幀的初始編碼方案的指示資訊，若解析出的指示資訊為「0」，且當前幀的前一幀的指示資訊也為「0」，則解碼端確定當前幀的解碼方案為第一解碼方案。若解析出的指示資訊為「1」，且當前幀的前一幀的指示資訊也為「1」，則解碼端確定當前幀的解碼方案為第二解碼方案。若解析出的指示資訊為「0」且當前幀的前一幀的指示資訊為「1」，或者解析出的指示資訊為「1」且當前幀的前一幀的指示資訊為「0」，則解碼端確定當前幀的解碼方案為切換幀解碼方案。Optionally, in the second implementation, the indication information for indicating the initial coding scheme occupies one bit of the code stream. For example, the coding mode is used as the indication information, and the coding mode in the code stream occupies one bit. Exemplarily, the decoding end parses the indication information of the initial coding scheme of the current frame from the code stream. If the parsed indication information is "0" and the indication information of the previous frame of the current frame is also "0", the decoding end determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "1" and the indication information of the previous frame of the current frame is also "1", the decoding end determines that the decoding scheme of the current frame is the second decoding scheme. If the parsed indication information is "0" and the indication information of the previous frame of the current frame is "1", or the parsed indication information is "1" and the indication information of the previous frame of the current frame is "0", the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme.

可選地，當前幀的前一幀的初始解碼方案的指示資訊為緩存的資料。在解碼到當前幀時，解碼端可以從緩存中獲取當前幀的前一幀的初始解碼方案的指示資訊。Optionally, the indication information of the initial decoding scheme of the frame before the current frame is cached data. When decoding the current frame, the decoding end can obtain the indication information of the initial decoding scheme of the frame before the current frame from the cache.

第三種實現方式、編碼了三種編碼方案的指示資訊The third implementation encodes the indication information of three coding schemes.

解碼端從碼流中解析出當前幀的解碼方案的指示資訊，該指示資訊用於指示當前幀的解碼方案為第一解碼方案、第二解碼方案或第三解碼方案。The decoding end parses the bitstream to obtain indication information of the decoding scheme of the current frame, where the indication information is used to indicate whether the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.

可選地，在該第三種實現方式中，解碼方案的指示資訊佔碼流的兩個比特位元。例如，假設以編碼模式作為指示資訊，當前幀的編碼模式佔碼流的兩個比特位元。Optionally, in the third implementation, the indication information of the decoding scheme occupies two bits of the bitstream. For example, assuming that the coding mode is used as the indication information, the coding mode of the current frame occupies two bits of the bitstream.

示例性地，解碼端從碼流中解析當前幀的解碼方案的指示資訊，若解析出的指示資訊為「00」，則解碼端確定當前幀的解碼方案為第一解碼方案。若解析出的指示資訊為「01」，則解碼端確定當前幀的解碼方案為第二解碼方案。若解析出的指示資訊為「10」，則解碼端確定當前幀的解碼方案為切換幀解碼方案。For example, the decoder parses the indication information of the decoding scheme of the current frame from the bitstream. If the indication information parsed is "00", the decoder determines that the decoding scheme of the current frame is the first decoding scheme. If the indication information parsed is "01", the decoder determines that the decoding scheme of the current frame is the second decoding scheme. If the indication information parsed is "10", the decoder determines that the decoding scheme of the current frame is the switching frame decoding scheme.

步驟1102：若當前幀的解碼方案為第三解碼方案，則基於碼流確定當前幀的HOA信號中指定通道的信號，指定通道為HOA信號的所有通道中的部分通道。Step 1102: If the decoding scheme of the current frame is the third decoding scheme, the signal of the designated channel in the HOA signal of the current frame is determined based on the bit stream, and the designated channel is a part of all channels of the HOA signal.

在本申請實施例中，解碼端獲得當前幀的解碼方案之後，若當前幀的解碼方案為第三編碼方案，表示當前幀為切換幀，則解碼端基於碼流確定當前幀的HOA信號中指定通道的信號。也即是，對於切換幀來說，編碼端是將指定通道的信號編入碼流，那麼解碼端採用切換幀解碼方案來解碼切換幀，即需要先從碼流中解析出指定通道的信號。In the embodiment of the present application, after the decoder obtains the decoding scheme of the current frame, if the decoding scheme of the current frame is the third coding scheme, indicating that the current frame is a switching frame, the decoder determines the signal of the designated channel in the HOA signal of the current frame based on the bitstream. That is, for the switching frame, the encoder encodes the signal of the designated channel into the bitstream, then the decoder adopts the switching frame decoding scheme to decode the switching frame, that is, it is necessary to parse the signal of the designated channel from the bitstream first.

接下來對解碼端採用切換幀解碼方案解碼切換幀的實現過程進行詳細介紹，也即詳細介紹在當前幀為切換幀的情況下，解碼端基於碼流確定當前幀的HOA信號中指定通道的信號的實現過程。Next, the implementation process of decoding the switching frame by the decoding end using the switching frame decoding scheme is introduced in detail, that is, the implementation process of the decoding end determining the signal of the specified channel in the HOA signal of the current frame based on the bit stream when the current frame is a switching frame.

需要說明的是，解碼端基於碼流確定當前幀的HOA信號中指定通道的信號的過程，與編碼端將當前幀的HOA信號中指定通道的信號編入碼流的過程是對稱的。在前述編碼方法的實施例中介紹了將該指定通道的信號編入碼流的一些實現過程，在解碼端將介紹與這些實現過程相對稱的解碼過程。It should be noted that the process of the decoder determining the signal of the specified channel in the HOA signal of the current frame based on the bitstream is symmetrical to the process of the encoder encoding the signal of the specified channel in the HOA signal of the current frame into the bitstream. In the above-mentioned embodiment of the encoding method, some implementation processes of encoding the signal of the specified channel into the bitstream are introduced, and the decoding process symmetrical to these implementation processes will be introduced at the decoder.

在本申請實施例中，若編碼端是先基於該指定通道的信號確定虛擬揚聲器信號和殘差信號，再將虛擬揚聲器信號和殘差信號編入碼流，那麼，相對應地，解碼端先基於碼流確定虛擬揚聲器信號和殘差信號，再基於虛擬揚聲器信號和殘差信號，確定指定通道的信號。In the embodiment of the present application, if the encoding end first determines the virtual speaker signal and the residual signal based on the signal of the designated channel, and then encodes the virtual speaker signal and the residual signal into the bit stream, then, correspondingly, the decoding end first determines the virtual speaker signal and the residual signal based on the bit stream, and then determines the signal of the designated channel based on the virtual speaker signal and the residual signal.

可選地，若編碼端通過身歷聲編碼器將基於虛擬揚聲器信號和殘差信號組合得到的三路身歷聲信號編入了碼流，那麼，解碼端通過身歷聲解碼器對碼流進行解碼，以得到三路身歷聲信號，然後基於該三路身歷聲信號，確定一路虛擬揚聲器信號和三路殘差信號。可選地，解碼端基於該三路身歷聲信號中的一路身歷聲信號，確定一路虛擬揚聲器信號，基於該三路身歷聲信號中的另兩路身歷聲信號，確定三路殘差信號。也即是，解碼端先從碼流中解析出這三路身歷聲信號，再通過拆解這三路身歷聲信號以得到一路虛擬揚聲器信號和三路殘差信號。Optionally, if the encoding end encodes three-way stereo signals obtained based on the combination of the virtual speaker signal and the residual signal into a bitstream through a stereo encoder, then the decoding end decodes the bitstream through the stereo decoder to obtain three-way stereo signals, and then determines one virtual speaker signal and three-way residual signals based on the three-way stereo signals. Optionally, the decoding end determines one virtual speaker signal based on one stereo signal among the three-way stereo signals, and determines three-way residual signals based on the other two stereo signals among the three-way stereo signals. That is, the decoding end first parses the three-way stereo sound signal from the bit stream, and then disassembles the three-way stereo sound signal to obtain one virtual speaker signal and three-way residual signals.

示例性地，解碼端從碼流中解析出三路身歷聲信號分別為S1、S2和S3，其中S1是由一路虛擬揚聲器信號和一路預設單聲道信號組合得到，S2是由兩路殘差信號組合得到，S3是由剩餘一路殘差信號與一路預設單聲道信號組合得到。解碼端將S1拆解得到一路虛擬揚聲器信號，將S2拆解得到兩路殘差信號，將S3拆解得到剩餘一路殘差信號。For example, the decoder parses three stereo signals from the bitstream, namely S1, S2 and S3, where S1 is obtained by combining one virtual speaker signal and one preset mono signal, S2 is obtained by combining two residual signals, and S3 is obtained by combining one remaining residual signal and one preset mono signal. The decoder disassembles S1 to obtain one virtual speaker signal, disassembles S2 to obtain two residual signals, and disassembles S3 to obtain one remaining residual signal.

可選地，若編碼端通過單聲道編碼器將基於虛擬揚聲器信號和殘差信號確定的四路單聲道信號編入了碼流，那麼，解碼端通過單聲道解碼器對碼流進行解碼，以得到一路虛擬揚聲器信號和三路殘差信號，這四路單聲道信號包括該一路虛擬揚聲器信號和該三路殘差信號。Optionally, if the encoding end encodes four-channel mono signals determined based on the virtual speaker signal and the residual signal into the bit stream through a mono encoder, then the decoding end decodes the bit stream through a mono decoder to obtain one virtual speaker signal and three residual signals, and the four-channel mono signals include the one virtual speaker signal and the three residual signals.

可選地，若該指定通道的信號包括FOA信號，FOA信號包括全向的W信號，以及定向的X信號、Y信號和Z信號，那麼，解碼端基於碼流確定虛擬揚聲器信號和殘差信號之後，基於該虛擬揚聲器信號，確定W信號。解碼端基於殘差信號和W信號確定X信號、Y信號和Z信號，或者，解碼端基於殘差信號確定X信號、Y信號和Z信號。例如，解碼端解析出三路殘差信號的情況下，將這三路殘差信號分別與W信號之和確定為X信號、Y信號和Z信號，或者，將這三路殘差信號分別確定為X信號、Y信號和Z信號。其中，若編碼端將X信號、Y信號和Z信號分別與W信號之間的差信號確定為三路殘差信號，那麼解碼端將這三路殘差信號分別與W信號之和確定為X信號、Y信號和Z信號。若編碼端將X信號、Y信號和Z信號確定為三路殘差信號，那麼解碼端將這三路殘差信號分別確定為X信號、Y信號和Z信號。即，解碼端的解碼過程是與編碼端的編碼過程匹配的。Optionally, if the signal of the designated channel includes an FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signal, Y signal, and Z signal, then after the decoder determines the virtual speaker signal and the residual signal based on the bitstream, the decoder determines the W signal based on the virtual speaker signal. The decoder determines the X signal, the Y signal, and the Z signal based on the residual signal and the W signal, or the decoder determines the X signal, the Y signal, and the Z signal based on the residual signal. For example, when the decoder parses out three residual signals, the sum of the three residual signals and the W signal is determined as the X signal, the Y signal, and the Z signal, or the three residual signals are determined as the X signal, the Y signal, and the Z signal, respectively. If the encoder determines the difference signals between the X signal, the Y signal, and the Z signal and the W signal as three-way residual signals, the decoder determines the sum of the three-way residual signals and the W signal as the X signal, the Y signal, and the Z signal. If the encoder determines the X signal, the Y signal, and the Z signal as three-way residual signals, the decoder determines the three-way residual signals as the X signal, the Y signal, and the Z signal. That is, the decoding process of the decoder matches the encoding process of the encoder.

若編碼端通過身歷聲編碼器將基於虛擬揚聲器信號和殘差信號確定的兩路身歷聲信號編入了碼流，那麼，解碼端通過身歷聲解碼器對碼流進行解碼，以得到這兩路身歷聲信號。解碼端基於這兩路身歷聲信號中的一路身歷聲信號確定兩路虛擬揚聲器信號，基於這兩路身歷聲信號中的另一路身歷聲信號確定兩路殘差信號，這兩路虛擬揚聲器信號和這兩路殘差信號即包括W信號、X信號、Y信號和Z信號。可選地，若編碼端將W信號，以及X信號、Y信號、Z信號中與W信號相關性最高的一路信號確定為兩路虛擬揚聲器信號，則解碼端確定的兩路虛擬揚聲器信號包括W信號以及X信號、Y信號、Z信號中與W信號相關性最高的一路信號。假設X信號、Y信號、Z信號中與W信號相關性最高的一路信號為X信號，則解碼端確定的兩路虛擬揚聲器信號包括W信號和X信號，解碼端確定的兩路殘差信號包括Y信號和Z信號。If the encoding end encodes two stereo signals determined based on the virtual speaker signal and the residual signal into a bitstream through a stereo encoder, then the decoding end decodes the bitstream through a stereo decoder to obtain the two stereo signals. The decoding end determines two virtual speaker signals based on one of the two stereo signals, and determines two residual signals based on the other of the two stereo signals. The two virtual speaker signals and the two residual signals include a W signal, an X signal, a Y signal, and a Z signal. Optionally, if the encoding end determines the W signal and the one signal with the highest correlation with the W signal among the X signal, the Y signal, and the Z signal as two virtual speaker signals, the two virtual speaker signals determined by the decoding end include the W signal and the one signal with the highest correlation with the W signal among the X signal, the Y signal, and the Z signal. Assuming that the one signal with the highest correlation with the W signal among the X signal, the Y signal, and the Z signal is the X signal, the two virtual speaker signals determined by the decoding end include the W signal and the X signal, and the two residual signals determined by the decoding end include the Y signal and the Z signal.

步驟1103：基於該指定通道的信號，確定當前幀的HOA 信號中除指定通道之外的一個或多個剩餘通道的增益。Step 1103: Based on the signal of the designated channel, determine the gain of one or more remaining channels other than the designated channel in the HOA signal of the current frame.

在本申請實施例中，解碼端基於碼流確定當前幀的HOA信號中指定通道的信號之後，基於該指定通道的信號，確定該HOA信號中除指定通道之外的一個或多個剩餘通道的增益。In the embodiment of the present application, after the decoding end determines the signal of the designated channel in the HOA signal of the current frame based on the code stream, it determines the gain of one or more residual channels in the HOA signal other than the designated channel based on the signal of the designated channel.

示例性地，假設指定通道為FOA通道，FOA通道可稱為低階通道，FOA通道的信號可稱為HOA信號的低階部分，HOA信號中除指定通道之外的一個或多個剩餘通道稱為高階通道，高階通道的信號可稱為HOA信號的高階部分，那麼，解碼端即基於該HOA信號的低階部分，確定該HOA信號的高階增益，即高階通道的增益。Exemplarily, assuming that the designated channel is the FOA channel, the FOA channel can be called a low-order channel, the signal of the FOA channel can be called the low-order part of the HOA signal, one or more remaining channels in the HOA signal except the designated channel are called high-order channels, and the signal of the high-order channel can be called the high-order part of the HOA signal. Then, the decoding end determines the high-order gain of the HOA signal, that is, the gain of the high-order channel, based on the low-order part of the HOA signal.

可選地，解碼端先對該HOA信號中指定通道的信號進行分析濾波處理，以得到經分析濾波的指定通道的信號，基於經分析濾波的指定通道的信號確定該一個或多個剩餘通道的增益。例如，假設指定通道的信號為HOA信號的低階部分，那麼解碼端先對該HOA信號的低階部分進行分析濾波處理，以得到經分析濾波的HOA信號的低階部分，再基於經分析濾波的HOA信號的低階部分估計出高階增益。可選地，本方案中對於切換幀來說，解碼端進行分析濾波處理所使用的分析濾波器，與基於DirAC的HOA解碼方案中使用的分析濾波器相同，這樣能夠使得切換幀的解碼時延與基於DirAC的HOA解碼方案的解碼時延一致，即時延對齊。需要說明的是，本文所講的解碼時延為端到端的編解碼時延，解碼時延也可稱為編碼時延。Optionally, the decoder first performs analysis filtering on the signal of the designated channel in the HOA signal to obtain the signal of the designated channel after analysis filtering, and determines the gain of the one or more remaining channels based on the signal of the designated channel after analysis filtering. For example, assuming that the signal of the designated channel is the low-order part of the HOA signal, the decoder first performs analysis filtering on the low-order part of the HOA signal to obtain the low-order part of the HOA signal after analysis filtering, and then estimates the high-order gain based on the low-order part of the HOA signal after analysis filtering. Optionally, in this solution, for the switching frame, the analysis filter used by the decoding end for analysis filtering processing is the same as the analysis filter used in the DirAC-based HOA decoding solution, so that the decoding delay of the switching frame can be consistent with the decoding delay of the DirAC-based HOA decoding solution, that is, the delay is aligned. It should be noted that the decoding delay mentioned in this article is the end-to-end encoding and decoding delay, and the decoding delay can also be called the encoding delay.

需要說明的是，在本申請實施例中，解碼端基於該指定通道的信號，確定HOA 信號中除指定通道之外的一個或多個剩餘通道的增益的過程，即基於指定通道的信號估計剩餘通道的增益的過程，具體實現方式與基於DirAC的編解碼方案中的剩餘通道增益估計方法相同，本申請實施例不詳細介紹。示例性地，本方案中對於切換幀來說，解碼端基於HOA信號的低階部分估計高階增益的方法與基於DirAC的編解碼方案中的高階增益估計方法相同。It should be noted that in the embodiment of the present application, the decoding end determines the gain of one or more residual channels other than the designated channel in the HOA signal based on the signal of the designated channel, that is, the process of estimating the gain of the residual channel based on the signal of the designated channel. The specific implementation method is the same as the residual channel gain estimation method in the DirAC-based coding and decoding scheme, and the embodiment of the present application is not described in detail. Exemplarily, in this scheme, for switching frames, the method for estimating the high-order gain based on the low-order part of the HOA signal by the decoding end is the same as the high-order gain estimation method in the DirAC-based coding and decoding scheme.

步驟1104：基於該指定通道的信號和該一個或多個剩餘通道的增益，確定該一個或多個剩餘通道中各個剩餘通道的信號。Step 1104: Determine the signal of each of the one or more residual channels based on the signal of the designated channel and the gain of the one or more residual channels.

在本申請實施例中，解碼端基於該指定通道的信號和該一個或多個剩餘通道的增益，確定該一個或多個剩餘通道中各個剩餘通道的信號。示例性地，假設該指定通道的信號為HOA信號中的低階部分，該一個或多個剩餘通道的增益為高階增益，那麼，解碼端可以基於該低階部分中的W信號和高階增益，確定HOA信號中的高階部分。或者，若解碼端對HOA信號的低階部分進行了分析濾波處理，那麼解碼端可以基於經分析濾波的HOA信號的低階部分中的W信號和高階增益，確定經分析濾波的HOA信號的高階部分。In the embodiment of the present application, the decoder determines the signal of each residual channel in the one or more residual channels based on the signal of the designated channel and the gain of the one or more residual channels. Exemplarily, assuming that the signal of the designated channel is the low-order part in the HOA signal, and the gain of the one or more residual channels is the high-order gain, then the decoder can determine the high-order part in the HOA signal based on the W signal and the high-order gain in the low-order part. Alternatively, if the decoder performs analysis filtering on the low-order part of the HOA signal, then the decoder can determine the high-order part of the HOA signal after analysis filtering based on the W signal and the high-order gain in the low-order part of the HOA signal after analysis filtering.

步驟1105：基於該指定通道的信號和該一個或多個剩餘通道的信號，獲得當前幀的重建HOA信號。Step 1105: Obtain a reconstructed HOA signal of the current frame based on the signal of the designated channel and the signals of the one or more remaining channels.

在本申請實施例中，解碼端在得到指定通道的信號和該一個或多個剩餘通道的信號之後，基於該指定通道的信號和該一個或多個剩餘通道的信號，獲得當前幀的重建HOA信號，即重建當前幀的HOA信號。示例性地，解碼端對該指定通道的信號和該一個或多個剩餘通道的信號進行合成濾波處理，以獲得當前幀的重建HOA信號。例如，假設該指定通道的信號為HOA信號中的低階部分，該一個或多個剩餘通道的信號為HOA信號中的高階部分，那麼解碼端對該HOA信號的低階部分和高階部分進行合成濾波處理，以獲得當前幀的重建HOA信號。或者，若解碼端對HOA信號的低階部分進行了分析濾波處理，那麼解碼端對經分析濾波的HOA信號的低階部分和經分析濾波的HOA信號的高階部分進行合成濾波處理，以獲得當前幀的重建HOA信號。可選地，本方案中對於切換幀來說，解碼端進行合成濾波處理所使用的合成濾波器，與基於DirAC的HOA編解碼方案中使用的合成濾波器相同，這樣能夠使得切換幀的解碼時延與基於DirAC的HOA解碼方案的解碼時延一致，即時延對齊。In the embodiment of the present application, after obtaining the signal of the designated channel and the signal of the one or more residual channels, the decoder obtains the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signal of the one or more residual channels, that is, reconstructs the HOA signal of the current frame. Exemplarily, the decoder performs synthetic filtering processing on the signal of the designated channel and the signal of the one or more residual channels to obtain the reconstructed HOA signal of the current frame. For example, assuming that the signal of the designated channel is the low-order part of the HOA signal, and the signal of the one or more residual channels is the high-order part of the HOA signal, then the decoder performs synthetic filtering processing on the low-order part and the high-order part of the HOA signal to obtain the reconstructed HOA signal of the current frame. Alternatively, if the decoder performs analysis filtering on the low-order part of the HOA signal, the decoder performs synthesis filtering on the low-order part of the HOA signal after analysis filtering and the high-order part of the HOA signal after analysis filtering to obtain the reconstructed HOA signal of the current frame. Optionally, in this scheme, for the switching frame, the synthesis filter used by the decoder for synthesis filtering is the same as the synthesis filter used in the HOA encoding and decoding scheme based on DirAC, so that the decoding delay of the switching frame can be consistent with the decoding delay of the HOA decoding scheme based on DirAC, that is, the delay is aligned.

圖12是本申請實施例提供的一種切換幀解碼方案的示意圖。參見圖12，待解碼的當前幀為切換幀，假設指定通道的信號為HOA信號的低階部分，那麼，在解碼過程中，解碼端獲取待解碼的當前幀的碼流，通過核心解碼器對該碼流進行核心解碼，以重建出當前幀的HOA信號的低階部分，採用與基於DirAC的HOA解碼方案中確定高階部分相類似的方法，基於該低階部分估計出高階部分，也即重建該HOA信號的高階部分。之後，解碼端基於解碼得到的低階部分和通過估計得到的高階部分重建出該HOA信號。FIG12 is a schematic diagram of a switching frame decoding scheme provided by an embodiment of the present application. Referring to FIG12 , the current frame to be decoded is a switching frame. Assuming that the signal of the designated channel is the low-order part of the HOA signal, then, in the decoding process, the decoding end obtains the code stream of the current frame to be decoded, and performs core decoding on the code stream through the core decoder to reconstruct the low-order part of the HOA signal of the current frame, and adopts a method similar to that of determining the high-order part in the HOA decoding scheme based on DirAC, and estimates the high-order part based on the low-order part, that is, reconstructs the high-order part of the HOA signal. Afterwards, the decoding end reconstructs the HOA signal based on the decoded low-order part and the estimated high-order part.

以上介紹了當前幀為切換幀的情況下，解碼端對當前幀解碼的過程，也即解碼端採用切換幀解碼方案來解碼切換幀，即解碼端先解碼出HOA信號中指定通道的信號（如低階部分），再重構出各個剩餘通道的信號（如重構高階部分）。接下來介紹在當前幀為非切換幀的情況下，解碼端對當前幀解碼的過程。The above describes the process of decoding the current frame at the decoder when the current frame is a switching frame, that is, the decoder uses a switching frame decoding scheme to decode the switching frame, that is, the decoder first decodes the signal of the specified channel in the HOA signal (such as the low-level part), and then reconstructs the signals of each remaining channel (such as reconstructing the high-level part). Next, the process of decoding the current frame at the decoder when the current frame is a non-switching frame is described.

在本申請實施例中，解碼端確定當前幀的解碼方案之後，若當前幀的解碼方案為第一解碼方案，則解碼端按照第一解碼方案，根據該碼流獲得當前幀的重建HOA信號。若當前幀的解碼方案為第二解碼方案，則解碼端按照第二解碼方案，根據該碼流獲得當前幀的重建HOA信號。In the embodiment of the present application, after the decoding end determines the decoding scheme of the current frame, if the decoding scheme of the current frame is the first decoding scheme, the decoding end obtains the reconstructed HOA signal of the current frame according to the bit stream according to the first decoding scheme. If the decoding scheme of the current frame is the second decoding scheme, the decoding end obtains the reconstructed HOA signal of the current frame according to the bit stream according to the second decoding scheme.

在本申請實施例中，參見圖13，解碼端按照第二解碼方案，根據該碼流獲得當前幀的重建HOA信號的實現過程為：解碼端通過核心解碼器從碼流中解析出虛擬揚聲器信號和殘差信號，將解析出的虛擬揚聲器信號和殘差信號送入基於MP的空間解碼器，以獲得當前幀的重建HOA信號。需要說明的是，圖13所示的解碼方案是與圖8所示的編碼方案相對應的。In the embodiment of the present application, referring to FIG13, the decoding end obtains the reconstructed HOA signal of the current frame according to the bitstream according to the second decoding scheme in the following process: the decoding end parses the virtual speaker signal and the residual signal from the bitstream through the core decoder, and sends the parsed virtual speaker signal and the residual signal to the MP-based spatial decoder to obtain the reconstructed HOA signal of the current frame. It should be noted that the decoding scheme shown in FIG13 corresponds to the encoding scheme shown in FIG8.

解碼端按照第一解碼方案，根據該碼流獲得當前幀的重建HOA信號的實現過程為：解碼端從碼流中解析出核心層信號和空間參數，基於核心層信號和空間參數重建出當前幀的HOA信號。示例性地，參見圖14，解碼端通過核心解碼器從碼流中解析出核心層信號，通過空間參數解碼器從碼流中解析出空間參數，基於解析出的核心層信號和空間參數進行基於DirAC的HOA信號合成處理，以獲得當前幀的重建HOA信號。需要說明的是，圖14所示的解碼方案是與圖9所示的編碼方案相對應的。The decoding end obtains the reconstructed HOA signal of the current frame according to the bitstream in accordance with the first decoding scheme, and the implementation process is as follows: the decoding end parses the core layer signal and the spatial parameter from the bitstream, and reconstructs the HOA signal of the current frame based on the core layer signal and the spatial parameter. For example, referring to FIG14, the decoding end parses the core layer signal from the bitstream through the core decoder, and parses the spatial parameter from the bitstream through the spatial parameter decoder, and performs DirAC-based HOA signal synthesis processing based on the parsed core layer signal and spatial parameter to obtain the reconstructed HOA signal of the current frame. It should be noted that the decoding scheme shown in FIG14 corresponds to the coding scheme shown in FIG9.

可選地，由於HOA信號的高階部分對聽覺品質的影響較大，為了進一步使得不同編解碼方案之間切換時聽覺品質的平滑過渡，解碼端在按照第二解碼方案，根據碼流獲得當前幀的重建HOA信號的過程中，還可以對當前幀的高階部分進行增益調整。例如，解碼端按照第二解碼方案，根據碼流獲得初始HOA信號，若當前幀的前一幀的解碼方案為第三解碼方案，即當前幀的前一幀為切換幀，則解碼端根據當前幀的前一幀的高階增益，對初始HOA信號的高階部分進行增益調整。然後，解碼端基於初始HOA信號的低階部分和經增益調整的高階部分，獲得當前幀的重建HOA信號。Optionally, since the high-order part of the HOA signal has a greater impact on the auditory quality, in order to further achieve a smooth transition of the auditory quality when switching between different coding and decoding schemes, the decoder can also adjust the gain of the high-order part of the current frame in the process of obtaining the reconstructed HOA signal of the current frame according to the bit stream according to the second decoding scheme. For example, the decoder obtains the initial HOA signal according to the bit stream according to the second decoding scheme. If the decoding scheme of the previous frame of the current frame is the third decoding scheme, that is, the previous frame of the current frame is a switching frame, the decoder adjusts the gain of the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame. Then, the decoder obtains the reconstructed HOA signal of the current frame based on the low-order part of the initial HOA signal and the gain-adjusted high-order part.

需要說明的是，若當前幀的前一幀為切換幀，則當前幀利用前一幀的高階增益對當前幀的初始HOA信號的高階部分進行增益調整，以使當前幀的經增益調整的高階部分與前一幀的高階部分相似，如增益調整使得這相鄰兩幀的HOA信號的高階部分的能量相近。這樣，後續解碼端對各個音訊幀進行渲染播放的過程中，切換幀的聽覺品質，以及切換幀的下一幀的聽覺品質均能夠很好的平滑過渡。It should be noted that if the previous frame of the current frame is a switching frame, the current frame uses the high-order gain of the previous frame to adjust the gain of the high-order part of the initial HOA signal of the current frame, so that the gain-adjusted high-order part of the current frame is similar to the high-order part of the previous frame, such as the gain adjustment makes the energy of the high-order part of the HOA signal of the two adjacent frames similar. In this way, in the process of rendering and playing each audio frame at the subsequent decoding end, the auditory quality of the switching frame and the auditory quality of the next frame of the switching frame can be smoothly transitioned.

可選地，除了對切換幀之後的解碼方案為第二解碼方案的音訊幀進行高階增益調整之外，對於其他的解碼方案為第二解碼方案的音訊幀來說，解碼端也可以對這些音訊幀的HOA信號的高階部分進行增益調整，本申請實施例不限定對這些音訊幀的HOA信號的高階部分進行增益調整的具體實現方式。可選地，除了對高階部分進行增益調整之外，解碼端還可以對這些音訊幀的HOA信號的其他部分進行增益調整。也即是，本申請實施例不限定對HOA信號的哪些通道的信號進行增益調整。換句話說，解碼端可以對HOA信號中任意一個或多個通道的信號進行增益調整，該一個或多個通道可以包括高階通道中的部分或全部，或除指定通道之外的剩餘通道中部分或全部，或其他通道。Optionally, in addition to performing high-order gain adjustment on the audio frames whose decoding scheme is the second decoding scheme after the switching frame, for other audio frames whose decoding scheme is the second decoding scheme, the decoding end may also perform gain adjustment on the high-order part of the HOA signals of these audio frames, and the embodiment of the present application does not limit the specific implementation method of gain adjustment on the high-order part of the HOA signals of these audio frames. Optionally, in addition to performing gain adjustment on the high-order part, the decoding end may also perform gain adjustment on other parts of the HOA signals of these audio frames. That is, the embodiment of the present application does not limit which channels of the HOA signal are gain-adjusted. In other words, the decoding end can perform gain adjustment on the signal of any one or more channels in the HOA signal, and the one or more channels may include part or all of the high-order channels, or part or all of the remaining channels except the specified channel, or other channels.

圖15是本申請實施例提供的另一種解碼方法的流程圖。參見圖15，以編碼端將初始編碼方案的指示資訊編入碼流為例，且假設碼流中未編入切換標誌，則在解碼過程中，解碼端先從碼流中解析出當前幀的初始解碼方案的指示資訊。然後，解碼端判斷當前幀的初始解碼方案與前一幀的初始解碼方案是否相同。若當前幀的初始解碼方案與前一幀的初始解碼方案相同，說明當前幀為非切換幀，則解碼端採用當前幀的初始解碼方案對碼流進行解碼，以獲得當前幀的重建HOA信號。若當前幀的初始解碼方案與前一幀的初始解碼方案不同，說明當前幀為切換幀，則解碼端採用切換幀解碼方案對碼流進行解碼，以獲得當前幀的重建HOA信號。FIG15 is a flow chart of another decoding method provided by an embodiment of the present application. Referring to FIG15 , taking the case where the encoder encodes the indication information of the initial coding scheme into the bitstream, and assuming that the switching flag is not encoded in the bitstream, during the decoding process, the decoder first parses the indication information of the initial decoding scheme of the current frame from the bitstream. Then, the decoder determines whether the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame. If the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame, indicating that the current frame is a non-switching frame, the decoder uses the initial decoding scheme of the current frame to decode the bitstream to obtain the reconstructed HOA signal of the current frame. If the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame, it means that the current frame is a switching frame. Then the decoding end adopts the switching frame decoding scheme to decode the bit stream to obtain the reconstructed HOA signal of the current frame.

綜上所述，在本申請實施例中，結合兩個方案（即基於虛擬揚聲器選擇的編解碼方案和基於方向音訊編碼的編解碼方案）對音訊幀的HOA信號進行編解碼，也即針對不同的音訊幀選擇合適的編解碼方案，這樣能夠提升音訊信號的壓縮率。同時，為了使得在不同編解碼方案之間切換時聽覺品質的平滑過渡，本方案中對於某些音訊幀來說，並非直接採用上述兩個方案中的任一個方案進行編解碼，而是採用一種新的編解碼方案來編解碼這些音訊幀，即編碼時將這些音訊幀的HOA信號中指定通道的信號編入碼流，即採用一種折衷的方案進行編解碼，從而使得對解碼恢復出的HOA信號進行渲染播放後的聽覺品質能夠平滑過渡。In summary, in the embodiment of the present application, two schemes (i.e., a coding and decoding scheme based on virtual speaker selection and a coding and decoding scheme based on directional audio coding) are combined to encode and decode the HOA signal of the audio frame, that is, appropriate coding and decoding schemes are selected for different audio frames, which can improve the compression rate of the audio signal. At the same time, in order to ensure a smooth transition of auditory quality when switching between different coding and decoding schemes, this scheme does not directly adopt any of the above two schemes for encoding and decoding for some audio frames, but adopts a new coding and decoding scheme to encode and decode these audio frames, that is, the signals of the specified channels in the HOA signals of these audio frames are encoded into the bit stream during encoding, that is, a compromise scheme is adopted for encoding and decoding, so that the auditory quality after rendering and playing the decoded and restored HOA signals can be smoothly transitioned.

圖16是本申請實施例提供的一種編碼裝置1600的結構示意圖，該編碼裝置1600可以由軟體、硬體或者兩者的結合實現成為編碼端設備的部分或者全部，該編碼端設備可以為前述實施例中的任一編碼端設備。參見圖16，該裝置1600包括：第一確定模組1601和第一編碼模組1602。FIG16 is a schematic diagram of the structure of a coding device 1600 provided in an embodiment of the present application. The coding device 1600 can be implemented by software, hardware, or a combination of both to form part or all of a coding end device, and the coding end device can be any coding end device in the aforementioned embodiments. Referring to FIG16 , the device 1600 includes: a first determination module 1601 and a first coding module 1602.

第一確定模組1601，用於根據當前幀的高階立體混響HOA信號確定當前幀的編碼方案，當前幀的編碼方案為第一編碼方案、第二編碼方案和第三編碼方案中的一種；其中，第一編碼方案為基於方向音訊編碼的HOA編碼方案，第二編碼方案為基於虛擬揚聲器選擇的HOA編碼方案，第三編碼方案為混合編碼方案；The first determination module 1601 is used to determine a coding scheme of the current frame according to the high-order stereo reverberation HOA signal of the current frame, and the coding scheme of the current frame is one of a first coding scheme, a second coding scheme and a third coding scheme; wherein the first coding scheme is an HOA coding scheme based on directional audio coding, the second coding scheme is an HOA coding scheme based on virtual speaker selection, and the third coding scheme is a hybrid coding scheme;

第一編碼模組1602，用於若當前幀的編碼方案為第三編碼方案，則將HOA信號中指定通道的信號編入碼流，指定通道為HOA信號的所有通道中的部分通道。The first encoding module 1602 is used to encode the signal of the specified channel in the HOA signal into the bit stream if the encoding scheme of the current frame is the third encoding scheme, and the specified channel is a part of all channels of the HOA signal.

可選地，第一編碼模組1602包括：第一確定子模組，用於基於W信號、X信號、Y信號和Z信號，確定虛擬揚聲器信號和殘差信號；編碼子模組，用於將該虛擬揚聲器信號和殘差信號編入碼流。Optionally, the first encoding module 1602 includes: a first determining submodule, configured to determine a virtual speaker signal and a residual signal based on the W signal, the X signal, the Y signal, and the Z signal; and a coding submodule, configured to encode the virtual speaker signal and the residual signal into a bitstream.

可選地，該裝置1600更包括：第二編碼模組，用於若當前幀的編碼方案為第一編碼方案，則按照第一編碼方案將該HOA信號編入碼流；第三編碼模組，用於若當前幀的編碼方案為第二編碼方案，則按照第二編碼方案將該HOA信號編入碼流。Optionally, the device 1600 further includes: a second coding module, used to encode the HOA signal into the bitstream according to the first coding scheme if the coding scheme of the current frame is the first coding scheme; and a third coding module, used to encode the HOA signal into the bitstream according to the second coding scheme if the coding scheme of the current frame is the second coding scheme.

可選地，第一確定模組1601包括：第二確定子模組，用於根據該HOA信號確定當前幀的初始編碼方案，初始編碼方案為第一編碼方案或第二編碼方案；第三確定子模組，用於若當前幀的初始編碼方案與當前幀的前一幀的初始編碼方案相同，則確定當前幀的編碼方案為當前幀的初始編碼方案；第四確定子模組，用於若當前幀的初始編碼方案為第一編碼方案且當前幀的前一幀的初始編碼方案為第二編碼方案，或當前幀的初始編碼方案為第二編碼方案且當前幀的前一幀的初始編碼方案為第一編碼方案，則確定當前幀的編碼方案為第三編碼方案。Optionally, the first determination module 1601 includes: a second determination submodule, used to determine an initial coding scheme of the current frame according to the HOA signal, the initial coding scheme being the first coding scheme or the second coding scheme; a third determination submodule, used to determine that the coding scheme of the current frame is the initial coding scheme of the current frame if the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame; and a fourth determination submodule, used to determine that the coding scheme of the current frame is the third coding scheme if the initial coding scheme of the current frame is the first coding scheme and the initial coding scheme of the previous frame of the current frame is the second coding scheme, or the initial coding scheme of the current frame is the second coding scheme and the initial coding scheme of the previous frame of the current frame is the first coding scheme.

可選地，該裝置1600更包括：第四編碼模組，用於將當前幀的初始編碼方案的指示資訊編入碼流。Optionally, the device 1600 further includes: a fourth encoding module, used to encode indication information of the initial encoding scheme of the current frame into the bitstream.

可選地，該裝置1600更包括：第二確定模組，用於確定當前幀的切換標誌的值，當當前幀的編碼方案為第一編碼方案或第二編碼方案時，當前幀的切換標誌的值為第一值；當當前幀的編碼方案為第三編碼方案時，當前幀的切換標誌的值為第二值；第五編碼模組，用於將該切換標誌的值編入碼流。Optionally, the device 1600 further includes: a second determination module, used to determine the value of the switching flag of the current frame, when the coding scheme of the current frame is the first coding scheme or the second coding scheme, the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the third coding scheme, the value of the switching flag of the current frame is the second value; a fifth coding module, used to encode the value of the switching flag into the bitstream.

可選地，該裝置1600更包括：第六編碼模組，用於將當前幀的編碼方案的指示資訊編入碼流。Optionally, the device 1600 further includes: a sixth encoding module, used to encode indication information of the encoding scheme of the current frame into the bitstream.

在本申請實施例中，結合兩個方案（即基於虛擬揚聲器選擇的編解碼方案和基於方向音訊編碼的編解碼方案）對音訊幀的HOA信號進行編解碼，也即針對不同的音訊幀選擇合適的編解碼方案，這樣能夠提升音訊信號的壓縮率。同時，為了使得在不同編解碼方案之間切換時聽覺品質的平滑過渡，本方案中對於某些音訊幀來說，並非直接採用上述兩個方案中的任一個方案進行編解碼，而是採用一種新的編解碼方案來編解碼這些音訊幀，即將這些音訊幀的HOA信號中指定通道的信號編入碼流，即採用一種折衷的方案進行編解碼，從而使得對解碼恢復出的HOA信號進行渲染播放後的聽覺品質能夠平滑過渡。In the embodiment of the present application, two schemes (i.e., a coding and decoding scheme based on virtual speaker selection and a coding and decoding scheme based on directional audio coding) are combined to encode and decode the HOA signal of the audio frame, that is, appropriate coding and decoding schemes are selected for different audio frames, which can improve the compression rate of the audio signal. At the same time, in order to ensure a smooth transition of auditory quality when switching between different coding and decoding schemes, this scheme does not directly adopt any of the above two schemes for encoding and decoding for some audio frames, but adopts a new coding and decoding scheme to encode and decode these audio frames, that is, encodes the signals of the specified channels in the HOA signals of these audio frames into the bit stream, that is, adopts a compromise scheme for encoding and decoding, so that the auditory quality after rendering and playing the decoded and restored HOA signals can be smoothly transitioned.

需要說明的是：上述實施例提供的編碼裝置在編碼音訊幀時，僅以上述各功能模組的劃分進行舉例說明，實際應用中，可以根據需要而將上述功能分配由不同的功能模組完成，即將裝置的內部結構劃分成不同的功能模組，以完成以上描述的全部或者部分功能。另外，上述實施例提供的編碼裝置與編碼方法實施例屬於同一構思，其具體實現過程詳見方法實施例，這裡不再贅述。It should be noted that: the coding device provided in the above embodiment only uses the division of the above functional modules as an example to illustrate when encoding audio frames. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the coding device and the coding method embodiment provided in the above embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.

圖17是本申請實施例提供的一種解碼裝置1700的結構示意圖，該解碼裝置1700可以由軟體、硬體或者兩者的結合實現成為解碼端設備的部分或者全部，該解碼端設備可以為前述實施例中的任一編碼端設備。參見圖17，該解碼裝置1700包括：第一獲得模組1701、第一確定模組1702、第二確定模組1703、第三確定模組1704和第二獲得模組1705。FIG17 is a schematic diagram of the structure of a decoding device 1700 provided in an embodiment of the present application. The decoding device 1700 can be implemented by software, hardware, or a combination of the two to become part or all of a decoding end device, and the decoding end device can be any encoding end device in the aforementioned embodiments. Referring to FIG17 , the decoding device 1700 includes: a first acquisition module 1701, a first determination module 1702, a second determination module 1703, a third determination module 1704, and a second acquisition module 1705.

第一獲得模組1701，用於基於碼流獲得當前幀的解碼方案，當前幀的解碼方案為第一解碼方案、第二解碼方案和第三解碼方案中的一種；其中，第一解碼方案為基於方向音訊解碼的高階立體混響HOA解碼方案，第二解碼方案為基於虛擬揚聲器選擇的HOA解碼方案，第三解碼方案為混合解碼方案；第一確定模組1702，用於若當前幀的解碼方案為第三解碼方案，則基於碼流確定當前幀的HOA信號中指定通道的信號，指定通道為HOA信號的所有通道中的部分通道；第二確定模組1703，用於基於指定通道的信號，確定HOA 信號中除指定通道之外的一個或多個剩餘通道的增益；第三確定模組1704，用於基於指定通道的信號和該一個或多個剩餘通道的增益，確定該一個或多個剩餘通道中各個剩餘通道的信號；第二獲得模組1705，用於基於指定通道的信號和該一個或多個剩餘通道的信號，獲得當前幀的重建HOA信號。The first acquisition module 1701 is used to obtain a decoding scheme of the current frame based on the bit stream, and the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme; wherein the first decoding scheme is a high-order stereo reverberation HOA decoding scheme based on directional audio decoding, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme; the first determination module 1702 is used to determine the signal of a specified channel in the HOA signal of the current frame based on the bit stream if the decoding scheme of the current frame is the third decoding scheme, and the specified channel is a part of all channels of the HOA signal; the second determination module 1703 is used to determine the HOA based on the signal of the specified channel The signal is used to determine the gain of one or more residual channels other than the designated channel in the signal; a third determination module 1704 is used to determine the signal of each residual channel in the one or more residual channels based on the signal of the designated channel and the gain of the one or more residual channels; a second acquisition module 1705 is used to obtain the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signal of the one or more residual channels.

可選地，第一確定模組1702包括：第一確定子模組，用於基於碼流確定虛擬揚聲器信號和殘差信號；第二確定子模組，用於基於該虛擬揚聲器信號和殘差信號，確定指定通道的信號。Optionally, the first determination module 1702 includes: a first determination submodule, used to determine a virtual speaker signal and a residual signal based on a bit stream; and a second determination submodule, used to determine a signal of a designated channel based on the virtual speaker signal and the residual signal.

可選地，該裝置1700更包括：第一解碼模組，用於若當前幀的解碼方案為第一解碼方案，則按照第一解碼方案，根據碼流獲得當前幀的重建HOA信號；第二解碼模組，用於若當前幀的解碼方案為第二解碼方案，則按照第二解碼方案，根據碼流獲得當前幀的重建HOA信號。Optionally, the device 1700 further includes: a first decoding module, used to obtain a reconstructed HOA signal of the current frame according to the bit stream according to the first decoding scheme if the decoding scheme of the current frame is the first decoding scheme; and a second decoding module, used to obtain a reconstructed HOA signal of the current frame according to the bit stream according to the second decoding scheme if the decoding scheme of the current frame is the second decoding scheme.

可選地，第一獲得模組1701包括：第一解析子模組，用於從碼流中解析出當前幀的切換標誌的值；第二解析子模組，用於若該切換標誌的值為第一值，則從碼流中解析當前幀的解碼方案的指示資訊，指示資訊用於指示當前幀的解碼方案為第一解碼方案或第二解碼方案；第三確定子模組，用於若該切換標誌的值為第二值，確定當前幀的解碼方案為第三解碼方案。Optionally, the first acquisition module 1701 includes: a first parsing submodule, used to parse the value of the switching flag of the current frame from the bit stream; a second parsing submodule, used to parse the indication information of the decoding scheme of the current frame from the bit stream if the value of the switching flag is the first value, the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme; a third determination submodule, used to determine that the decoding scheme of the current frame is the third decoding scheme if the value of the switching flag is the second value.

可選地，第一獲得模組1701包括：第三解析子模組，用於從碼流中解析出當前幀的解碼方案的指示資訊，指示資訊用於指示當前幀的解碼方案為第一解碼方案、第二解碼方案或第三解碼方案。Optionally, the first acquisition module 1701 includes: a third parsing submodule, used to parse out indication information of a decoding scheme of a current frame from the bit stream, wherein the indication information is used to indicate whether the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.

可選地，第一獲得模組1701包括：第四解析子模組，用於從碼流中解析出當前幀的初始解碼方案，初始解碼方案為第一解碼方案或第二解碼方案；第四確定子模組，用於若當前幀的初始解碼方案與當前幀的前一幀的初始解碼方案相同，則確定當前幀的解碼方案為當前幀的初始解碼方案；第五確定子模組，用於若當前幀的初始解碼方案為第一解碼方案且當前幀的前一幀的初始解碼方案為第二解碼方案，或當前幀的初始解碼方案為第二解碼方案且當前幀的前一幀的初始解碼方案為第一解碼方案，則確定當前幀的解碼方案為第三解碼方案。Optionally, the first acquisition module 1701 includes: a fourth parsing submodule, used to parse out an initial decoding scheme of the current frame from the bit stream, the initial decoding scheme being the first decoding scheme or the second decoding scheme; a fourth determining submodule, used to determine that the decoding scheme of the current frame is the initial decoding scheme of the current frame if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame; and a fifth determining submodule, used to determine that the decoding scheme of the current frame is the third decoding scheme if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme.

需要說明的是：上述實施例提供的解碼裝置在解碼音訊幀時，僅以上述各功能模組的劃分進行舉例說明，實際應用中，可以根據需要而將上述功能分配由不同的功能模組完成，即將裝置的內部結構劃分成不同的功能模組，以完成以上描述的全部或者部分功能。另外，上述實施例提供的解碼裝置與解碼方法實施例屬於同一構思，其具體實現過程詳見方法實施例，這裡不再贅述。It should be noted that: the decoding device provided in the above embodiment only uses the division of the above functional modules as an example to illustrate when decoding audio frames. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the decoding device and the decoding method embodiment provided in the above embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.

圖18為用於本申請實施例的一種編解碼裝置1800的示意性方塊圖。其中，編解碼裝置1800可以包括處理器1801、記憶體1802和匯流排系統1803。其中，處理器1801和記憶體1802通過匯流排系統1803相連，該記憶體1802用於儲存指令，該處理器1801用於執行該記憶體1802儲存的指令，以執行本申請實施例描述的各種的編碼或解碼方法。為避免重複，這裡不再詳細描述。FIG18 is a schematic block diagram of a coding and decoding device 1800 used in an embodiment of the present application. The coding and decoding device 1800 may include a processor 1801, a memory 1802, and a bus system 1803. The processor 1801 and the memory 1802 are connected via the bus system 1803. The memory 1802 is used to store instructions, and the processor 1801 is used to execute the instructions stored in the memory 1802 to execute various coding or decoding methods described in the embodiment of the present application. To avoid repetition, it will not be described in detail here.

在本申請實施例中，該處理器1801可以是中央處理單元（central processing unit，CPU），該處理器1801還可以是其他通用處理器、DSP、ASIC、FPGA或者其他可程式設計邏輯裝置、分立門或者電晶體邏輯裝置、分立硬體元件等。通用處理器可以是微處理器或者該處理器也可以是任何常規的處理器等。In the present application embodiment, the processor 1801 may be a central processing unit (CPU), or other general-purpose processors, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.

該記憶體1802可以包括ROM設備或者RAM設備。任何其他適宜類型的存放裝置也可以用作記憶體1802。記憶體1802可以包括由處理器1801使用匯流排1803訪問的代碼和資料18021。記憶體1802可以進一步包括作業系統18023和應用程式18022，該應用程式18022包括允許處理器1801執行本申請實施例描述的編碼或解碼方法的至少一個程式。例如，應用程式18022可以包括應用1至N，其進一步包括執行在本申請實施例描述的編碼或解碼方法的編碼或解碼應用（簡稱編解碼應用）。The memory 1802 may include a ROM device or a RAM device. Any other suitable type of storage device may also be used as the memory 1802. The memory 1802 may include code and data 18021 accessed by the processor 1801 using a bus 1803. The memory 1802 may further include an operating system 18023 and an application 18022, which includes at least one program that allows the processor 1801 to execute the encoding or decoding method described in the embodiment of the present application. For example, the application 18022 may include applications 1 to N, which further include an encoding or decoding application (referred to as a codec application for short) that executes the encoding or decoding method described in the embodiment of the present application.

該匯流排系統1803除包括資料匯流排之外，還可以包括電源匯流排、控制匯流排和狀態信號匯流排等。但是為了清楚說明起見，在圖中將各種匯流排都標為匯流排系統1803。In addition to the data bus, the bus system 1803 may also include a power bus, a control bus, a status signal bus, etc. However, for the sake of clarity, all the buses are labeled as the bus system 1803 in the figure.

可選地，編解碼裝置1800還可以包括一個或多個輸出設備，諸如顯示器1804。在一個示例中，顯示器1804可以是觸感顯示器，其將顯示器與可操作地感測觸摸輸入的觸感單元合併。顯示器1804可以經由匯流排1803連接到處理器1801。Optionally, the codec 1800 may also include one or more output devices, such as a display 1804. In one example, the display 1804 may be a touch display that combines a display with a touch unit operable to sense touch input. The display 1804 may be connected to the processor 1801 via a bus 1803.

需要指出的是，編解碼裝置1800可以執行本申請實施例中的編碼方法，也可執行本申請實施例中的解碼方法。It should be noted that the encoding and decoding device 1800 can execute the encoding method in the embodiment of this application, and can also execute the decoding method in the embodiment of this application.

本領域技術人員能夠領會，結合本文公開描述的各種說明性邏輯框、模組和演算法步驟所描述的功能可以硬體、軟體、韌體或其任何組合來實施。如果以軟體來實施，那麼各種說明性邏輯框、模組、和步驟描述的功能可作為一或多個指令或代碼在電腦可讀媒體上儲存或傳輸，且由基於硬體的處理單元執行。電腦可讀媒體可包含電腦可讀取儲存媒體，其對應於有形媒體，例如資料儲存媒體，或包括任何促進將電腦程式從一處傳送到另一處的媒體(例如，基於通信協議)的通信媒體。以此方式，電腦可讀媒體大體上可對應於（1）非暫時性的有形電腦可讀取儲存媒體，或（2）通信媒體，例如信號或載波。資料儲存媒體可為可由一或多個電腦或一或多個處理器存取以檢索用於實施本申請中描述的技術的指令、代碼和/或資料結構的任何可用媒體。電腦程式產品可包含電腦可讀媒體。Those skilled in the art will appreciate that the functions described in conjunction with the various illustrative logic blocks, modules, and algorithm steps disclosed herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions described in the various illustrative logic blocks, modules, and steps may be stored or transmitted as one or more instructions or codes on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or a communication medium including any medium that facilitates the transfer of a computer program from one place to another (e.g., based on a communication protocol). In this manner, computer-readable media generally may correspond to (1) non-transitory, tangible computer-readable storage media, or (2) communications media, such as signals or carrier waves. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this application. A computer program product may include computer-readable media.

作為實例而非限制，此類電腦可讀取儲存媒體可包括RAM、ROM、EEPROM、CD-ROM或其它光碟儲存裝置、磁片儲存裝置或其它磁性儲存裝置、快閃記憶體或可用來儲存指令或資料結構的形式的所要程式碼並且可由電腦存取的任何其它媒體。並且，任何連接被恰當地稱作電腦可讀媒體。舉例來說，如果使用同軸纜線、光纖纜線、雙絞線、數位訂戶線(DSL)或例如紅外線、無線電和微波等無線技術從網站、伺服器或其它遠端源傳輸指令，那麼同軸纜線、光纖纜線、雙絞線、DSL或例如紅外線、無線電和微波等無線技術包含在媒體的定義中。但是，應理解，所述電腦可讀取儲存媒體和資料儲存媒體並不包括連接、載波、信號或其它暫時媒體，而是實際上針對於非暫時性有形儲存媒體。如本文中所使用，磁片和光碟包含壓縮光碟(CD)、鐳射光碟、光學光碟、DVD和藍光光碟，其中磁片通常以磁性方式再現資料，而光碟利用鐳射以光學方式再現資料。以上各項的組合也應包含在電腦可讀媒體的範圍內。By way of example, and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, disk storage devices or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave are used to transmit instructions from a website, server, or other remote source, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media. However, it should be understood that the computer-readable storage media and data storage media do not include connections, carriers, signals, or other transient media, but are actually directed to non-transient tangible storage media. As used herein, disks and discs include compact discs (CDs), laser discs, optical discs, DVDs, and Blu-ray discs, where disks typically reproduce data magnetically and optical discs reproduce data optically using lasers. Combinations of the above should also be included within the scope of computer-readable media.

可通過例如一或多個數位訊號處理器(DSP)、通用微處理器、專用積體電路(ASIC)、現場可程式設計邏輯陣列(FPGA)或其它等效整合或離散邏輯電路等一或多個處理器來執行指令。因此，如本文中所使用的術語「處理器」可指前述結構或適合於實施本文中所描述的技術的任一其它結構中的任一者。另外，在一些方面中，本文中所描述的各種說明性邏輯框、模組、和步驟所描述的功能可以提供於經配置以用於編碼和解碼的專用硬體和/或軟體模組內，或者併入在組合轉碼器中。而且，所述技術可完全實施於一或多個電路或邏輯元件中。在一種示例下，編碼器100及解碼器200中的各種說明性邏輯框、單元、模組可以理解為對應的電路裝置或邏輯元件。Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Thus, the term "processor" as used herein may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functions described by the various illustrative logic blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined transcoder. Moreover, the techniques may be fully implemented in one or more circuits or logic elements. In one example, various illustrative logic blocks, units, and modules in the encoder 100 and the decoder 200 can be understood as corresponding circuit devices or logic elements.

本申請實施例的技術可在各種各樣的裝置或設備中實施，包含無線手持機、積體電路(IC)或一組IC(例如，晶片組)。本申請實施例中描述各種元件、模組或單元是為了強調用於執行所揭示的技術的裝置的功能方面，但未必需要由不同硬體單元實現。實際上，如上文所描述，各種單元可結合合適的軟體和/或韌體組合在編碼解碼器硬體單元中，或者通過交互操作硬體單元（包含如上文所描述的一或多個處理器）來提供。The technology of the embodiments of the present application can be implemented in a variety of devices or equipment, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chipset). The various components, modules or units described in the embodiments of the present application are to emphasize the functional aspects of the device for executing the disclosed technology, but they do not necessarily need to be implemented by different hardware units. In fact, as described above, the various units can be combined with appropriate software and/or firmware in a codec hardware unit, or provided by an interoperating hardware unit (including one or more processors as described above).

也就是說，在上述實施例中，可以全部或部分地通過軟體、硬體、韌體或者其任意結合來實現。當使用軟體實現時，可以全部或部分地以電腦程式產品的形式實現。所述電腦程式產品包括一個或多個電腦指令。在電腦上載入和執行所述電腦指令時，全部或部分地產生按照本申請實施例所述的流程或功能。所述電腦可以是通用電腦、專用電腦、電腦網路或其他可程式設計裝置。所述電腦指令可以儲存在電腦可讀取儲存媒體中，或者從一個電腦可讀取儲存媒體向另一個電腦可讀取儲存媒體傳輸，例如，所述電腦指令可以從一個網站網站、電腦、伺服器或資料中心通過有線（例如：同軸電纜、光纖、資料使用者線（digital subscriber line，DSL））或無線（例如：紅外、無線、微波等）方式向另一個網站、電腦、伺服器或資料中心進行傳輸。所述電腦可讀取儲存媒體可以是電腦能夠存取的任何可用媒體，或者是包含一個或多個可用媒體整合的伺服器、資料中心等資料存放裝置。所述可用媒體可以是磁性媒體（例如：軟碟、硬碟、磁帶）、光媒體（例如：數位通用光碟（digital versatile disc，DVD））或半導體媒體（例如：固態硬碟（solid state disk，SSD））等。值得注意的是，本申請實施例提到的電腦可讀取儲存媒體可以為非揮發性儲存媒體，換句話說，可以是非暫態性儲存媒體。That is to say, in the above embodiments, all or part of the embodiments can be implemented by software, hardware, firmware or any combination thereof. When implemented by software, all or part of the embodiments can be implemented in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network or other programmable device. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from one website, computer, server or data center to another website, computer, server or data center via wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, data center, etc. that includes one or more available media integrated therein. The available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital versatile disc (DVD)), or a semiconductor medium (e.g., a solid state disk (SSD)). It is worth noting that the computer-readable storage medium mentioned in the embodiment of the present application may be a non-volatile storage medium, in other words, a non-transient storage medium.

應當理解的是，本文提及的「至少一個」是指一個或多個，「多個」是指兩個或兩個以上。在本申請實施例的描述中，除非另有說明，「/」表示或的意思，例如，A/B可以表示A或B；本文中的「和/或」僅僅是一種描述關聯物件的關聯關係，表示可以存在三種關係，例如，A和/或B，可以表示：單獨存在A，同時存在A和B，單獨存在B這三種情況。另外，為了便於清楚描述本申請實施例的技術方案，在本申請的實施例中，採用了「第一」、「第二」等字樣對功能和作用基本相同的相同項或相似項進行區分。本領域技術人員可以理解「第一」、「第二」等字樣並不對數量和執行次序進行限定，並且「第一」、「第二」等字樣也並不限定一定不同。It should be understood that the "at least one" mentioned in this article refers to one or more, and "multiple" refers to two or more. In the description of the embodiments of the present application, unless otherwise specified, "/" means or, for example, A/B can mean A or B; the "and/or" in this article is only a description of the association relationship between related objects, indicating that three relationships can exist, for example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone. In addition, in order to facilitate the clear description of the technical solutions of the embodiments of the present application, in the embodiments of the present application, the words "first" and "second" are used to distinguish the same or similar items with basically the same functions and effects. Those skilled in the art can understand that the words "first", "second", etc. do not limit the quantity and execution order, and the words "first", "second", etc. do not necessarily limit the differences.

以上所述為本申請提供的實施例，並不用以限制本申請，凡在本申請的精神和原則之內，所作的任何修改、等同替換、改進等，均應包含在本申請的保護範圍之內。The above description is an embodiment provided for this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of this application shall be included in the scope of protection of this application.

10:源裝置 101:第一終端 120:資料源 100:編碼器 140:輸出介面 20:目的地裝置 200:解碼器 201:第二終端 220:顯示裝置 240:輸入介面 30:鏈路 40:儲存裝置 601、602、1101、1102、1103、1104、1105:步驟 1600:編碼裝置 1601:第一確定模組 1602:第一編碼模組 1700:解碼裝置 1701:第一獲得模組 1702:第一確定模組 1703:第二確定模組 1704:第三確定模組 1705:第二獲得模組 1800:編解碼裝置 1801:處理器 1802:記憶體 18021:資料 18022:應用程式 18023:作業系統 1803:匯流排系統 1804:顯示器 10: Source device 101: First terminal 120: Data source 100: Encoder 140: Output interface 20: Destination device 200: Decoder 201: Second terminal 220: Display device 240: Input interface 30: Link 40: Storage device 601, 602, 1101, 1102, 1103, 1104, 1105: Steps 1600: Encoding device 1601: First determination module 1602: First encoding module 1700: Decoding device 1701: First acquisition module 1702: First determination module 1703: Second determination module 1704: Third determination module 1705: Second acquisition module 1800: Codec device 1801: Processor 1802: Memory 18021: Data 18022: Application 18023: Operating system 1803: Bus system 1804: Display

圖1是本申請實施例提供的一種實施環境的示意圖。圖2是本申請實施例提供的一種終端場景的實施環境的示意圖。圖3是本申請實施例提供的一種無線或核心網設備的轉碼場景的實施環境的示意圖。圖4是本申請實施例提供的一種廣播電視場景的實施環境的示意圖。圖5是本申請實施例提供的一種虛擬實境流場景的實施環境的示意圖。圖6是本申請實施例提供的一種編碼方法的流程圖。圖7是本申請實施例提供的一種切換幀編碼方案的示意圖。圖8是本申請實施例提供的一種基於虛擬揚聲器選擇的HOA編碼方案的示意圖。圖9是本申請實施例提供的一種基於DirAC的HOA編碼方案的示意圖。圖10是本申請實施例提供的另一種編碼方法的流程圖。圖11是本申請實施例提供的一種解碼方法的流程圖。圖12是本申請實施例提供的一種切換幀解碼方案的示意圖。圖13是本申請實施例提供的一種基於虛擬揚聲器選擇的HOA解碼方案的示意圖。圖14是本申請實施例提供的一種基於DirAC的HOA解碼方案的示意圖。圖15是本申請實施例提供的另一種解碼方法的流程圖。圖16是本申請實施例提供的一種編碼裝置的結構示意圖。圖17是本申請實施例提供的一種解碼裝置的結構示意圖。圖18是本申請實施例提供的一種編解碼裝置的示意性方塊圖。 FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application. FIG. 2 is a schematic diagram of an implementation environment of a terminal scenario provided by an embodiment of the present application. FIG. 3 is a schematic diagram of an implementation environment of a transcoding scenario of a wireless or core network device provided by an embodiment of the present application. FIG. 4 is a schematic diagram of an implementation environment of a broadcast television scenario provided by an embodiment of the present application. FIG. 5 is a schematic diagram of an implementation environment of a virtual reality streaming scenario provided by an embodiment of the present application. FIG. 6 is a flow chart of a coding method provided by an embodiment of the present application. FIG. 7 is a schematic diagram of a switching frame coding scheme provided by an embodiment of the present application. FIG. 8 is a schematic diagram of an HOA coding scheme based on virtual speaker selection provided by an embodiment of the present application. FIG. 9 is a schematic diagram of a HOA coding scheme based on DirAC provided by an embodiment of the present application. FIG. 10 is a flow chart of another coding method provided by an embodiment of the present application. FIG. 11 is a flow chart of a decoding method provided by an embodiment of the present application. FIG. 12 is a schematic diagram of a switching frame decoding scheme provided by an embodiment of the present application. FIG. 13 is a schematic diagram of an HOA decoding scheme based on virtual speaker selection provided by an embodiment of the present application. FIG. 14 is a schematic diagram of an HOA decoding scheme based on DirAC provided by an embodiment of the present application. FIG. 15 is a flow chart of another decoding method provided by an embodiment of the present application. FIG. 16 is a structural schematic diagram of a coding device provided by an embodiment of the present application. FIG. 17 is a structural schematic diagram of a decoding device provided by an embodiment of the present application. Figure 18 is a schematic block diagram of a coding and decoding device provided in an embodiment of the present application.

601、602:步驟 601, 602: Steps

Claims

A coding method, comprising: determining a coding scheme of the current frame according to a high-order stereo reverberation HOA signal of the current frame, the coding scheme of the current frame being one of a first coding scheme, a second coding scheme and a third coding scheme, wherein the first coding scheme is an HOA coding scheme based on directional audio coding, the second coding scheme is an HOA coding scheme based on virtual speaker selection, and the third coding scheme is a hybrid coding scheme; and if the coding scheme of the current frame is the third coding scheme, encoding a signal of a designated channel in the HOA signal into a bitstream, the designated channel being a part of all channels of the HOA signal; wherein determining the coding scheme of the current frame according to the high-order stereo reverberation HOA signal of the current frame The coding scheme of the previous frame includes: determining the initial coding scheme of the current frame according to the HOA signal, the initial coding scheme being the first coding scheme or the second coding scheme; if the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame, determining the coding scheme of the current frame to be the initial coding scheme of the current frame; and if the initial coding scheme of the current frame is the first coding scheme and the initial coding scheme of the previous frame of the current frame is the second coding scheme, or the initial coding scheme of the current frame is the second coding scheme and the initial coding scheme of the previous frame of the current frame is the first coding scheme, determining the coding scheme of the current frame to be the third coding scheme.

The encoding method as described in claim 1, wherein the signal of the designated channel includes a first-order stereo reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signal, Y signal, and Z signal.

The encoding method as described in claim 2, wherein encoding the signal of the specified channel in the HOA signal into the code stream includes: determining a virtual speaker signal and a residual signal based on the W signal, the X signal, the Y signal, and the Z signal; and encoding the virtual speaker signal and the residual signal into the code stream.

The coding method as described in claim 3, wherein the determining of the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal and the Z signal comprises: determining the W signal as one of the virtual speaker signals; and determining three of the residual signals based on the W signal, the X signal, the Y signal and the Z signal, or determining the X signal, the Y signal and the Z signal as three of the residual signals.

The encoding method as described in claim 4, wherein the encoding of the virtual speaker signal and the residual signal into the bitstream comprises: combining the one virtual speaker signal with the first preset mono signal to obtain one stereo signal; combining the three residual signals with the second preset mono signal to obtain two stereo signals; and encoding the obtained three stereo signals into the bitstream respectively through the stereo encoder.

The coding method as described in claim 5, wherein the combining the three-channel residual signal with the second preset mono signal to obtain two-channel stereo signals comprises: combining the two channels of residual signals with the highest correlation among the three-channel residual signals to obtain one channel of the two-channel stereo signals; and combining a channel of residual signal other than the two channels of residual signals with the highest correlation among the three-channel residual signal with the second preset mono signal to obtain another channel of the two-channel stereo signals.

The encoding method as described in claim 5, wherein the first preset mono signal is an all-zero signal or an all-one signal, the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency points are all zero, and the all-one signal includes a signal whose sampling point values are all one or a signal whose frequency points are all one; the second preset mono signal is an all-zero signal or an all-one signal; and the first preset mono signal is the same as or different from the second preset mono signal.

The encoding method as described in claim 4, wherein encoding the virtual speaker signal and the residual signal into the bitstream comprises: encoding the one-channel virtual speaker signal and each of the three residual signals into the bitstream respectively through a mono encoder.

The coding method as described in any one of claim items 1-8, wherein after determining the coding scheme of the current frame according to the high-order stereo reverberation HOA signal of the current frame, the coding method further comprises: if the coding scheme of the current frame is the first coding scheme, encoding the HOA signal into the bitstream according to the first coding scheme; and if the coding scheme of the current frame is the second coding scheme, encoding the HOA signal into the bitstream according to the second coding scheme.

The coding method as described in claim 1, wherein after determining the initial coding scheme of the current frame according to the HOA signal, further comprises: encoding the indication information of the initial coding scheme of the current frame into the bitstream.

The coding method as described in any one of claim items 1-8, wherein after determining the coding scheme of the current frame according to the high-order stereo reverberation HOA signal of the current frame, further comprises: determining the value of the switching flag of the current frame, when the coding scheme of the current frame is the first coding scheme or the second coding scheme, the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the third coding scheme, the value of the switching flag of the current frame is the second value; and encoding the value of the switching flag into the bitstream.

The coding method as described in any one of claim items 1-8, wherein after determining the coding scheme of the current frame according to the HOA signal of the current frame, further comprises: encoding the indication information of the coding scheme of the current frame into the bitstream.

A coding method as described in any one of claim items 1-8, wherein the specified channel is consistent with the transmission channel preset in the first coding scheme.

A decoding method, comprising: obtaining a decoding scheme of a current frame based on a bit stream, wherein the decoding scheme of the current frame is one of a first decoding scheme, a second decoding scheme and a third decoding scheme; wherein the first decoding scheme is a high-order stereo reverberation HOA decoding scheme based on directional audio decoding, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme; if the decoding scheme of the current frame is the third decoding scheme, then determining a signal of a specified channel in the HOA signal of the current frame based on the bit stream, and the specified channel The channel is a part of all channels of the HOA signal; based on the signal of the designated channel, the gain of one or more residual channels in the HOA signal except the designated channel is determined; based on the signal of the designated channel and the gain of the one or more residual channels, the signal of each residual channel in the one or more residual channels is determined; and based on the signal of the designated channel and the signal of the one or more residual channels, the reconstructed HOA signal of the current frame is obtained; wherein the decoding scheme for obtaining the current frame based on the code stream includes: parsing out the the value of the switching flag of the current frame; if the value of the switching flag is the first value, parsing the indication information of the decoding scheme of the current frame from the code stream, the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme; and if the value of the switching flag is the second value, determining that the decoding scheme of the current frame is the third decoding scheme; wherein the decoding scheme of the current frame based on the code stream includes: parsing the initial decoding scheme of the current frame from the code stream, the initial decoding scheme is the first decoding scheme or the third decoding scheme the second decoding scheme; if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame, the decoding scheme of the current frame is determined to be the initial decoding scheme of the current frame; and if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme, the decoding scheme of the current frame is determined to be the third decoding scheme.

The decoding method as described in claim 14, wherein the signal of the designated channel in the HOA signal of the current frame based on the code stream comprises: determining a virtual speaker signal and a residual signal based on the code stream; and determining the signal of the designated channel based on the virtual speaker signal and the residual signal.

The decoding method as described in claim 15, wherein the determining of the virtual speaker signal and the residual signal based on the bit stream comprises: decoding the bit stream by a stereo decoder to obtain three-way stereo signals; and Based on the three-way stereo signals, determining one of the virtual speaker signals and three of the residual signals.

The decoding method as described in claim 16, wherein the determining one of the virtual speaker signals and the three residual signals based on the three stereo signals comprises: determining the one of the virtual speaker signals based on one of the three stereo signals; and determining the three residual signals based on the other two of the three stereo signals.

The decoding method as described in claim 15, wherein the determining of the virtual speaker signal and the residual signal based on the bit stream comprises: decoding the bit stream by a mono decoder to obtain one channel of the virtual speaker signal and three channels of the residual signal.

The decoding method of claim 15, wherein the signal of the designated channel includes a first-order stereo reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signal, Y signal, and Z signal; the signal of the designated channel is determined based on the virtual speaker signal and the residual signal, including: determining the W signal based on the virtual speaker signal; and determining the X signal, the Y signal, and the Z signal based on the residual signal and the W signal, or determining the X signal, the Y signal, and the Z signal based on the residual signal.

The decoding method as described in any one of claim items 14-19 further comprises: if the decoding scheme of the current frame is the first decoding scheme, then according to the first decoding scheme, the reconstructed HOA signal of the current frame is obtained according to the code stream; and if the decoding scheme of the current frame is the second decoding scheme, then according to the second decoding scheme, the reconstructed HOA signal of the current frame is obtained according to the code stream.

The decoding method as described in claim 20, wherein the step of obtaining the reconstructed HOA signal of the current frame according to the bitstream according to the second decoding scheme comprises: obtaining the initial HOA signal according to the bitstream according to the second decoding scheme; if the decoding scheme of the previous frame of the current frame is the third decoding scheme, adjusting the gain of the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame; and obtaining the reconstructed HOA signal based on the low-order part and the gain-adjusted high-order part of the initial HOA signal.

A decoding method as described in any one of claim items 14-19, wherein the decoding scheme for the current frame obtained based on the code stream includes: parsing indication information of the decoding scheme for the current frame from the code stream, wherein the indication information is used to indicate that the decoding scheme for the current frame is the first decoding scheme, the second decoding scheme, or the third decoding scheme.

A coding device comprises: a first determining module, used for determining a coding scheme of the current frame according to a high-order stereo reverberation HOA signal of the current frame, wherein the coding scheme of the current frame is one of a first coding scheme, a second coding scheme and a third coding scheme, wherein the first coding scheme is an HOA coding scheme based on directional audio coding, the second coding scheme is an HOA coding scheme based on virtual speaker selection, and the third coding scheme is a hybrid coding scheme; and a first coding module, used for encoding a signal of a specified channel in the HOA signal into a bitstream if the coding scheme of the current frame is the third coding scheme, wherein the specified channel is a part of all channels of the HOA signal; wherein the first determining module comprises: a second determining submodule , used to determine the initial coding scheme of the current frame according to the HOA signal, the initial coding scheme being the first coding scheme or the second coding scheme; a third determination submodule, used to determine the coding scheme of the current frame as the initial coding scheme of the current frame if the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame; and a fourth determination submodule, used to determine the coding scheme of the current frame as the third coding scheme if the initial coding scheme of the current frame is the first coding scheme and the initial coding scheme of the previous frame of the current frame is the second coding scheme, or the initial coding scheme of the current frame is the second coding scheme and the initial coding scheme of the previous frame of the current frame is the first coding scheme.

The encoding device as described in claim 23, wherein the signal of the designated channel includes a first-order stereo reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signal, Y signal, and Z signal.

The coding device as described in claim 24, wherein the first coding module includes: a first determination submodule for determining a virtual speaker signal and a residual signal based on the W signal, the X signal, the Y signal, and the Z signal; and a coding submodule for encoding the virtual speaker signal and the residual signal into the bitstream.

The coding device as described in claim 25, wherein the first determination submodule is used to: determine the W signal as one of the virtual speaker signals; and determine the difference signals between the X signal, the Y signal, and the Z signal and the W signal as three of the residual signals, or determine the X signal, the Y signal, and the Z signal as three of the residual signals.

The coding device as described in claim 26, wherein the coding submodule is used to: combine the one-way virtual speaker signal with the first-way preset mono signal to obtain one-way stereo signal; combine the three-way residual signal with the second-way preset mono signal to obtain two-way stereo signals; and encode the obtained three-way stereo signals into the code stream respectively through the stereo encoder.

The coding device as claimed in claim 27, wherein the coding submodule is used to: combine the two residual signals with the highest correlation among the three residual signals to obtain one stereo signal among the two stereo signals; and combine one residual signal among the three residual signals except the two residual signals with the highest correlation with the second preset mono signal to obtain another stereo signal among the two stereo signals.

The coding device as described in claim 27, wherein the first preset mono signal is an all-zero signal or an all-one signal, the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency points are all zero, and the all-one signal includes a signal whose sampling point values are all one or a signal whose frequency points are all one; the second preset mono signal is an all-zero signal or an all-one signal; and the first preset mono signal is the same as or different from the second preset mono signal.

The coding device as described in claim 26, wherein the coding submodule is used to: encode the one-channel virtual speaker signal and each of the three-channel residual signals into the bitstream respectively through a mono encoder.

The coding device as described in any one of claim items 23-30 further comprises: a second coding module, for encoding the HOA signal into the bitstream according to the first coding scheme if the coding scheme of the current frame is the first coding scheme; and a third coding module, for encoding the HOA signal into the bitstream according to the second coding scheme if the coding scheme of the current frame is the second coding scheme.

The coding device as described in claim 31 further comprises: a fourth coding module for encoding the indication information of the initial coding scheme of the current frame into the bitstream.

The coding device as described in any one of claim 23-30 further comprises: a second determination module, used to determine the value of the switching flag of the current frame, when the coding scheme of the current frame is the first coding scheme or the second coding scheme, the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the third coding scheme, the value of the switching flag of the current frame is the second value; and a fifth coding module, used to encode the value of the switching flag into the bitstream.

The coding device as described in any one of claim 23-30 further comprises: a sixth coding module, used to encode the indication information of the coding scheme of the current frame into the bit stream.

A coding device as described in any one of claim items 23-30, wherein the designated channel is consistent with the transmission channel preset in the first coding scheme.

A decoding device comprises: a first acquisition module, used for obtaining a decoding scheme of a current frame based on a bit stream, wherein the decoding scheme of the current frame is one of a first decoding scheme, a second decoding scheme and a third decoding scheme; wherein the first decoding scheme is a high-order stereophonic HOA decoding scheme based on directional audio decoding, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme; a first determination module, used for determining a signal of a specified channel in the HOA signal of the current frame based on a bit stream if the decoding scheme of the current frame is the third decoding scheme, wherein the specified channel is the specified channel. a first determining module for determining the gain of one or more residual channels in the HOA signal except the designated channel based on the signal of the designated channel; a third determining module for determining the signal of each residual channel in the one or more residual channels based on the signal of the designated channel and the gain of the one or more residual channels; and a second obtaining module for obtaining the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signal of the one or more residual channels; wherein the first obtaining module includes: a first parsing submodule for obtaining the reconstructed HOA signal of the current frame from the bitstream; parsing the value of the switching flag of the current frame; a second parsing submodule, for parsing the indication information of the decoding scheme of the current frame from the code stream if the value of the switching flag is the first value, the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme; and a third determining submodule, for determining that the decoding scheme of the current frame is the third decoding scheme if the value of the switching flag is the second value; wherein the first obtaining module includes: a fourth parsing submodule, for parsing the initial decoding scheme of the current frame from the code stream, the initial decoding scheme is the first decoding scheme or the second decoding scheme; a fourth determination submodule, for determining the decoding scheme of the current frame to be the initial decoding scheme of the current frame if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame; and a fifth determination submodule, for determining the decoding scheme of the current frame to be the third decoding scheme if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme.

A decoding device as described in claim 36, wherein the first determination module includes: a first determination submodule for determining a virtual speaker signal and a residual signal based on the bitstream; and a second determination submodule for determining the signal of the specified channel based on the virtual speaker signal and the residual signal.

The decoding device as described in claim 37, wherein the first determination submodule is used to: decode the bit stream through a stereo decoder to obtain three-way stereo signals; and determine one of the virtual speaker signals and three of the residual signals based on the three-way stereo signals.

A decoding device as described in claim 38, wherein the first determination submodule is used to: determine the one-way virtual speaker signal based on one of the three-way stereo signals; and determine the three-way residual signal based on the other two of the three-way stereo signals.

The decoding device as described in claim 37, wherein the first determination submodule is used to: decode the code stream through a mono decoder to obtain one channel of the virtual speaker signal and three channels of the residual signal.

A decoding device as described in any one of claim items 37-40, wherein the signal of the designated channel includes a first-order stereo reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signal, Y signal, and Z signal; the first determination submodule is used to: determine the W signal based on the virtual speaker signal; and determine the X signal, the Y signal, and the Z signal based on the residual signal and the W signal, or determine the X signal, the Y signal, and the Z signal based on the residual signal.

The decoding device as described in any one of claim items 36-40 further comprises: a first decoding module, for obtaining the reconstructed HOA signal of the current frame according to the bit stream according to the first decoding scheme if the decoding scheme of the current frame is the first decoding scheme; and a second decoding module, for obtaining the reconstructed HOA signal of the current frame according to the bit stream according to the second decoding scheme if the decoding scheme of the current frame is the second decoding scheme.

A decoding device as described in claim 42, wherein the second decoding module comprises: a first acquisition submodule, used to obtain an initial HOA signal according to the bit stream according to the second decoding scheme; a gain adjustment submodule, used to adjust the gain of the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame if the decoding scheme of the previous frame of the current frame is the third decoding scheme; and a second acquisition submodule, used to obtain the reconstructed HOA signal based on the low-order part and the gain-adjusted high-order part of the initial HOA signal.

A decoding device as described in any one of claim items 36-40, wherein the first acquisition module includes: a third parsing submodule, used to parse the code stream to obtain indication information of the decoding scheme of the current frame, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme, or the third decoding scheme.

A coding end device, wherein the coding end device includes a memory and a processor; the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory to implement the coding method described in any one of claim items 1-13.

A decoding end device, wherein the decoding end device includes a memory and a processor; the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory to implement the decoding method described in any one of claim items 14-22.

A computer-readable storage medium, wherein the storage medium stores instructions, and when the instructions are executed on the computer, the computer executes the steps of the method described in any one of request items 1-22.

A computer program product, wherein the computer program product comprises instructions, which when executed by a processor implement the method described in any one of claims 1-22.