WO2023051367A1 - 解码方法、装置、设备、存储介质及计算机程序产品 - Google Patents

解码方法、装置、设备、存储介质及计算机程序产品 Download PDF

Info

Publication number
WO2023051367A1
WO2023051367A1 PCT/CN2022/120461 CN2022120461W WO2023051367A1 WO 2023051367 A1 WO2023051367 A1 WO 2023051367A1 CN 2022120461 W CN2022120461 W CN 2022120461W WO 2023051367 A1 WO2023051367 A1 WO 2023051367A1
Authority
WO
WIPO (PCT)
Prior art keywords
decoding scheme
current frame
decoding
signal
scheme
Prior art date
Application number
PCT/CN2022/120461
Other languages
English (en)
French (fr)
Inventor
刘帅
高原
王宾
王喆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023051367A1 publication Critical patent/WO2023051367A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters

Definitions

  • the embodiments of the present application relate to the technical field of audio processing, and in particular to a decoding method, device, equipment, storage medium, and computer program product.
  • HOA Higher order ambisonics
  • One of the schemes is a codec scheme based on directional audio coding (directional audio coding, DirAC).
  • the encoder extracts the core layer signal and spatial parameters from the HOA signal of the current frame, and encodes the extracted core layer signal and spatial parameters into the code stream.
  • the decoding end decodes the core layer signal and spatial parameters from the code stream, and performs analysis, synthesis and filtering on the core layer signal and spatial parameters to reconstruct the HOA signal of the current frame.
  • Another solution is a codec solution based on virtual speaker selection.
  • the encoder selects the target virtual speaker that matches the HOA signal of the current frame from the virtual speaker set based on the match-projection (MP) algorithm, and determines the virtual speaker based on the HOA signal of the current frame and the target virtual speaker signal, determine the residual signal based on the HOA signal of the current frame and the virtual speaker signal, and encode the virtual speaker signal and the residual signal into the code stream.
  • MP match-projection
  • the decoding end uses a decoding method symmetrical to the encoding to reconstruct the HOA signal of the current frame from the code stream.
  • the heterogeneous sound source refers to a point sound source with different positions and/or directions of the sound source.
  • the sound field types of different audio frames may be different. If you want to have a higher compression rate for audio frames under different sound field types at the same time, you need to use the sound field type of each audio frame as Select the appropriate codec scheme for the corresponding audio frame, so you need to switch between different codec schemes.
  • Embodiments of the present application provide a decoding method, device, device, storage medium, and computer program product, capable of decoding the problem of different time delays when switching between different codec schemes. Described technical scheme is as follows:
  • a decoding method which includes:
  • the decoding scheme of the current frame is the first decoding scheme or not the first decoding scheme.
  • the first decoding scheme is the HOA decoding scheme based on DirAC; if the decoding scheme of the current frame is the first decoding scheme , then the decoding end reconstructs the first audio signal according to the code stream according to the first decoding scheme, and the reconstructed first audio signal is the reconstructed HOA signal of the current frame; if the decoding scheme of the current frame is not the first decoding scheme, the decoding end follows Not the first decoding scheme, reconstruct the second audio signal according to the code stream, and perform alignment processing on the reconstructed second audio signal to obtain the reconstructed HOA signal of the current frame, and align the decoding delay of the current frame with that of the first decoding scheme The decoding delay is consistent.
  • the decoding delay of the HOA decoding scheme based on DirAC is relatively large, for the current frame encoded by the first encoding scheme, it is enough to decode the current frame according to the first decoding scheme. For the current frame that is not encoded by the first encoding scheme, alignment processing is required to make the decoding delay of the current frame consistent with the decoding delay of the first decoding scheme. Wherein, since the decoding delay of the DirAC decoding scheme is fixed, the decoding delay of the current frame may be consistent with the decoding delay of the first decoding scheme (ie, the DirAC decoding scheme) through alignment processing.
  • the delay can be added in the alignment process to realize that the decoding delay of the current frame is consistent with the decoding delay of the first decoding scheme (ie, the DirAC decoding scheme).
  • the first encoding scheme corresponds to the first decoding scheme, that is, if the first decoding scheme is a DirAC decoding scheme, then the first encoding scheme is a DirAC encoding scheme; correspondingly, the second encoding scheme corresponds to the second decoding scheme, and the third The encoding scheme also corresponds to the third decoding scheme.
  • the decoding end determines the decoding scheme of the current frame according to the code stream, including: parsing the value of the switching flag of the current frame from the code stream; if the value of the switching flag is the first value, parsing the current frame from the code stream.
  • the indication information of the decoding scheme the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme, and the second decoding scheme is the HOA decoding scheme based on virtual speaker selection (which can be referred to as MP-based HOA decoding for short) scheme); if the value of the switching flag is the second value, it is determined that the decoding scheme of the current frame is the third decoding scheme, and the third decoding scheme is a hybrid decoding scheme.
  • the mixed decoding scheme is a scheme designed for switching frames in the embodiment of the present application, and the encoding and decoding schemes of the previous frame and the subsequent frame of the switching frame are different.
  • the code stream includes a switching flag, the value of the switching flag is the first value, indicating that the current frame is a non-switching frame, and the value of the switching frame is the second value, indicating that the current frame is switching.
  • the decoder first parses the value of the switching flag from the code stream, and when it is determined based on the value of the switching flag that the current frame is not a switching frame, then parses the indication information of the decoding scheme of the current frame from the code stream to determine whether it is a switching frame.
  • the first decoding scheme is also the second decoding scheme.
  • the decoding end can directly determine whether the current frame is a switching frame based on the switching flag, and the decoding efficiency is high.
  • the hybrid decoding scheme refers to the use of technical means related to the first decoding scheme (that is, the DirAC decoding scheme) and the technical means related to the second decoding scheme (MP-based HOA decoding scheme) in the decoding process, so It is called a hybrid decoding scheme.
  • the decoding end determines the decoding scheme of the current frame according to the code stream, including: parsing the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme, and the second decoding scheme.
  • Two decoding schemes or a third decoding scheme the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme. That is, the code stream directly includes the indication information of the decoding scheme, so that the decoding end directly determines the decoding scheme of the current frame based on the indication information, and the decoding efficiency is also high.
  • the decoding end determines the decoding scheme of the current frame according to the code stream, including: parsing the initial decoding scheme of the current frame from the code stream, the initial decoding scheme is the first decoding scheme or the second decoding scheme, and the second decoding scheme is Based on the HOA decoding scheme selected by the virtual speaker; if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame, it is determined that the decoding scheme of the current frame is the initial decoding scheme of the current frame; if the initial decoding scheme of the current frame The scheme is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme scheme, it is determined that the decoding scheme of the current frame is the third decoding scheme, and the third decoding scheme is a hybrid decoding scheme.
  • the code stream contains the indication information of the initial decoding scheme
  • the decoding end judges whether the current frame is a switched frame by comparing the initial decoding scheme of the current frame with the initial decoding scheme of the previous frame, and the switched decoding scheme is the third
  • the decoding scheme of the non-switching frame is the initial decoding scheme of the non-switching frame.
  • the non-first decoding scheme is the second decoding scheme or the third decoding scheme
  • the second decoding scheme is the HOA decoding scheme based on virtual speaker selection
  • the third decoding scheme is a hybrid decoding scheme
  • the decoding scheme of the current frame is The third decoding scheme is to reconstruct the second audio signal according to the code stream, including: reconstructing the signal of the specified channel according to the code stream, the reconstructed signal of the specified channel is the reconstructed second audio signal, and the specified channel is all channels of the HOA signal of the current frame part of the channel. That is, for the switched frames decoded by the third decoding scheme, what the decoding end reconstructs according to the code stream is the signal of the specified channel, not the complete HOA signal.
  • the decoder performs alignment processing on the reconstructed second audio signal to obtain the reconstructed HOA signal of the current frame, including: analyzing the reconstructed signal of the specified channel Filter processing; based on the signal of the specified channel after analysis and filtering, determine the gain of one or more remaining channels in the HOA signal of the current frame except for the specified channel; based on the gain of the one or more remaining channels and the analysis and filtering
  • the signals of the specified channel are determined to determine the signals of one or more remaining channels; the analyzed and filtered signals of the specified channel and the signals of the one or more remaining channels are synthesized and filtered to obtain the reconstructed HOA signal of the current frame. That is, for switching frames, the decoder needs to reconstruct the signals of the remaining channels except the specified channel, and through analysis and synthesis filtering, the decoding delay of the current frame is increased to the decoding time of the first decoding scheme. Consistent.
  • the non-first decoding scheme is the second decoding scheme or the third decoding scheme
  • the second decoding scheme is the HOA decoding scheme based on virtual speaker selection
  • the third decoding scheme is a hybrid decoding scheme
  • the decoding end reconstructs the second audio signal according to the code stream, including: reconstructing the first HOA signal according to the code stream according to the second decoding scheme, and the reconstructed first HOA signal is the reconstructed second audio signal. That is, for an audio frame encoded by the second encoding scheme, the decoder first reconstructs the first HOA signal according to the second decoding scheme.
  • the decoding end performs alignment processing on the reconstructed second audio signal to obtain the reconstructed HOA signal of the current frame, including: performing analysis and synthesis filtering on the reconstructed first HOA signal to obtain the reconstructed HOA signal of the current frame. That is, after the decoding end reconstructs the first HOA signal according to the second decoding scheme, time delay alignment is performed through analysis, synthesis and filtering.
  • the decoding end performs analysis, synthesis and filtering processing on the reconstructed first HOA signal to obtain the reconstructed HOA signal of the current frame, including: performing analysis and filtering processing on the reconstructed first HOA signal to obtain the second HOA signal; Signals of one or more remaining channels in the second HOA signal are gain-adjusted to obtain signals of one or more remaining channels after gain adjustment, and the one or more remaining channels are channels other than the designated channel in the HOA signal ; performing synthesis filtering on the signal of the specified channel in the second HOA signal and the signals of one or more remaining channels after gain adjustment, so as to obtain the reconstructed HOA signal of the current frame. That is to say, for the audio frames encoded by the second encoding scheme, in the process of time delay alignment through analysis and synthesis filtering, the auditory quality can be smoothly transitioned through gain adjustment.
  • the decoding end performs gain adjustment on signals of one or more remaining channels in the second HOA signal to obtain signals of one or more remaining channels after gain adjustment, including: if the decoding of the previous frame of the current frame
  • the scheme is the third decoding scheme, then according to the gains of one or more remaining channels in the previous frame of the current frame, the signals of one or more remaining channels in the second HOA signal are adjusted in gain to obtain a gain-adjusted or multiple remaining channel signals. That is, if the previous frame of the current frame is a switching frame, the decoder adjusts the signals of the remaining channels of the current frame according to the remaining channel gain of the switching frame, so that the auditory quality of the current frame is similar to that of the previous frame , for a smooth transition.
  • the designated channel includes a first-order ambisonics (first-order ambisonics, FOA) channel.
  • FOA first-order ambisonics
  • the designated channel is consistent with the channel preset in the first decoding scheme.
  • the decoding scheme of the previous frame of the current frame is the second decoding scheme; the decoding end performs alignment processing on the reconstructed second audio signal to obtain the reconstructed HOA signal of the current frame, including: the reconstructed first HOA signal Perform circular buffer processing to obtain the reconstructed HOA signal of the current frame. That is, if the decoding scheme of the current frame is the second decoding scheme but the previous frame of the current frame is a non-switching frame, the decoding end may also implement delay alignment through circular buffering.
  • the decoding end performs circular buffering processing on the reconstructed first HOA signal to obtain the reconstructed HOA signal of the current frame, including: acquiring first data, the first data being the first data in the HOA signal of the previous frame of the current frame Data between time and the end time of the previous frame of HOA signal, the duration between the first time and the end time is the first time length, and the first time length is equal to the encoding delay difference between the first decoding scheme and the second decoding scheme ; Merge the first data and the second data to obtain the reconstructed HOA signal of the current frame, and the second data is between the initial moment and the second moment of the reconstructed first HOA signal in the reconstructed first HOA signal For data, the duration between the second moment and the start moment is the second duration, and the sum of the first duration and the second duration is equal to the frame length of the current frame. That is to say, the circular buffering process essentially achieves latency alignment by means of data buffering.
  • the method further includes: buffering third data, where the third data is data other than the second data in the reconstructed first HOA signal. That is, the third data is buffered for decoding of the next frame of the current frame.
  • a decoding device in a second aspect, has a function of realizing the behavior of the decoding method in the first aspect above.
  • the decoding device includes one or more modules, and the one or more modules are used to implement the decoding method provided in the first aspect above.
  • the first determination module is used to determine the decoding scheme of the current frame according to the code stream, the decoding scheme of the current frame is the first decoding scheme or not the first decoding scheme, and the first decoding scheme is a high-order ambisonics based on directional audio coding DirAC HOA decoding scheme;
  • the first decoding module is used to reconstruct the first audio signal according to the code stream according to the first decoding scheme if the decoding scheme of the current frame is the first decoding scheme, and the reconstructed first audio signal is the reconstructed HOA signal of the current frame;
  • the second decoding module is configured to reconstruct the second audio signal according to the code stream according to the non-first decoding scheme if the decoding scheme of the current frame is not the first decoding scheme, and perform alignment processing on the reconstructed second audio signal to obtain The reconstructed HOA signal of the current frame is aligned so that the decoding delay of the current frame is consistent with the decoding delay of the first decoding scheme.
  • the non-first decoding scheme is a second decoding scheme or a third decoding scheme
  • the second decoding scheme is an HOA decoding scheme based on virtual speaker selection
  • the third decoding scheme is a hybrid decoding scheme
  • the second decoding module includes:
  • the first reconstruction sub-module is used to reconstruct the signal of the specified channel according to the code stream if the decoding scheme of the current frame is the third decoding scheme, the reconstructed signal of the specified channel is the reconstructed second audio signal, and the specified channel is the signal of the current frame Some of all channels of the HOA signal.
  • the second decoding module includes:
  • the analysis and filtering sub-module is used to analyze and filter the signal of the reconstructed specified channel
  • the first determination submodule is used to determine the gain of one or more remaining channels in the HOA signal of the current frame except for the specified channel based on the analyzed and filtered signal of the specified channel;
  • the second determining submodule is used to determine the signal of one or more remaining channels based on the gain of the one or more remaining channels and the signal of the specified channel after analysis and filtering;
  • the synthesis filter sub-module is configured to perform synthesis filter processing on the signal of the designated channel after analysis and filtering and the signals of the one or more remaining channels, so as to obtain the reconstructed HOA signal of the current frame.
  • the non-first decoding scheme is a second decoding scheme or a third decoding scheme
  • the second decoding scheme is an HOA decoding scheme based on virtual speaker selection
  • the third decoding scheme is a hybrid decoding scheme
  • the second decoding module includes:
  • the second reconstruction sub-module is configured to reconstruct the first HOA signal according to the code stream according to the second decoding scheme if the decoding scheme of the current frame is the second decoding scheme, and the reconstructed first HOA signal is the reconstructed second audio signal.
  • the second decoding module includes:
  • the analysis-synthesis filter sub-module is configured to perform analysis-synthesis filter processing on the reconstructed first HOA signal to obtain the reconstructed HOA signal of the current frame.
  • analysis-synthesis filtering submodule is used to:
  • Gain adjustment is performed on signals of one or more remaining channels in the second HOA signal to obtain signals of one or more remaining channels after gain adjustment, and the one or more remaining channels are signals other than specified channels in the HOA signal aisle;
  • Synthesis and filtering are performed on the signals of the specified channel in the second HOA signal and the signals of one or more remaining channels after gain adjustment, so as to obtain the reconstructed HOA signal of the current frame.
  • analysis-synthesis filtering submodule is used to:
  • the signals of one or more remaining channels in the second HOA signal are amplified according to the gains of one or more remaining channels of the previous frame of the current frame Adjust to obtain a gain-adjusted signal of one or more remaining channels.
  • the specified channel includes a first-order ambisonic FOA channel.
  • the decoding scheme of the previous frame of the current frame is the second decoding scheme
  • the second decoding module includes:
  • the circular buffer submodule is configured to perform circular buffer processing on the reconstructed first HOA signal to obtain the reconstructed HOA signal of the current frame.
  • the circular cache submodule is used to:
  • the first data is the data between the first moment and the end moment of the previous frame HOA signal in the previous frame of the current frame HOA signal, and the duration between the first moment and the end moment is the first duration , the first duration is equal to the encoding delay difference between the first decoding scheme and the second decoding scheme;
  • the first data and the second data are combined to obtain the reconstructed HOA signal of the current frame
  • the second data is the data located between the starting moment and the second moment of the reconstructed first HOA signal in the reconstructed first HOA signal
  • the duration between the second moment and the start moment is the second duration
  • the sum of the first duration and the second duration is equal to the frame length of the current frame.
  • the circular cache submodule is used to:
  • the third data is cached, where the third data is data other than the second data in the reconstructed first HOA signal.
  • the first determination module includes:
  • the first parsing submodule is used to parse out the value of the switching flag of the current frame from the code stream;
  • the second parsing submodule is used to parse the indication information of the decoding scheme of the current frame from the code stream if the value of the switching flag is the first value, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme
  • the second decoding scheme is an HOA decoding scheme based on virtual speaker selection
  • the third determining submodule is configured to determine that the decoding scheme of the current frame is the third decoding scheme, and the third decoding scheme is the hybrid decoding scheme, if the value of the switching flag is the second value.
  • the first determination module includes:
  • the third parsing submodule is used to parse out the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.
  • the second decoding scheme is an HOA decoding scheme based on virtual speaker selection
  • the third decoding scheme is a hybrid decoding scheme.
  • the first determination module includes:
  • the fourth analysis sub-module is used to analyze the initial decoding scheme of the current frame from the code stream, the initial decoding scheme is the first decoding scheme or the second decoding scheme, and the second decoding scheme is the HOA decoding scheme selected based on the virtual speaker;
  • the fourth determining submodule is used to determine that the decoding scheme of the current frame is the initial decoding scheme of the current frame if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame;
  • the fifth determining submodule is used to determine if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and The initial decoding scheme of the previous frame of the current frame is the first decoding scheme, then it is determined that the decoding scheme of the current frame is the third decoding scheme, and the third decoding scheme is the hybrid decoding scheme.
  • a decoding end device in a third aspect, includes a processor and a memory, and the memory is used to store a program for executing the decoding method provided in the first aspect above, and to store a program for implementing the above first The data involved in the decode method provided by the aspect.
  • the processor is configured to execute programs stored in the memory.
  • the operating device of the storage device may further include a communication bus for establishing a connection between the processor and the memory.
  • a computer-readable storage medium where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer is made to execute the decoding method described in the above-mentioned first aspect.
  • a fifth aspect provides a computer program product containing instructions, which when run on a computer causes the computer to execute the decoding method described in the first aspect above.
  • the decoding delay of the HOA decoding scheme based on directional audio coding is relatively large, for the current frame encoded by the first encoding scheme, it is sufficient to decode the code stream of the current frame according to the first decoding scheme.
  • first reconstruct the second audio signal according to the code stream and then perform alignment processing on the reconstructed second audio signal to obtain the reconstructed HOA signal of the current frame, that is, through alignment processing
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an implementation environment of a terminal scenario provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an implementation environment of a transcoding scenario of a wireless or core network device provided in an embodiment of the present application;
  • FIG. 4 is a schematic diagram of an implementation environment of a broadcast television scene provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an implementation environment of a virtual reality streaming scene provided by an embodiment of the present application.
  • FIG. 6 is a flow chart of an encoding method provided by an embodiment of the present application.
  • Fig. 7 is a flow chart of another encoding method provided by the embodiment of the present application.
  • FIG. 8 is a flow chart of a decoding method provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of coding scheme switching provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of decoding of a coding scheme switching provided by an embodiment of the present application.
  • Fig. 11 is a schematic diagram of decoding another encoding scheme switching provided by the embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a decoding device provided by an embodiment of the present application.
  • Fig. 13 is a schematic block diagram of a codec device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • the implementation environment includes source device 10 , destination device 20 , link 30 and storage device 40 .
  • the source device 10 may generate encoded media data. Therefore, the source device 10 may also be called a media data encoding device.
  • Destination device 20 may decode the encoded media data generated by source device 10 . Accordingly, destination device 20 may also be referred to as a media data decoding device.
  • Link 30 may receive encoded media data generated by source device 10 and may transmit the encoded media data to destination device 20 .
  • the storage device 40 can receive the encoded media data generated by the source device 10, and can store the encoded media data.
  • the destination device 20 can directly obtain the encoded media from the storage device 40.
  • the storage device 40 may correspond to a file server or another intermediate storage device that may save encoded media data generated by the source device 10, in which case the destination device 20 may transmit or download the media data from the storage device 40 via streaming or downloading. Stored encoded media data.
  • Both the source device 10 and the destination device 20 may include one or more processors and a memory coupled to the one or more processors, and the memory may include random access memory (random access memory, RAM), read-only memory ( read-only memory, ROM), charged erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), flash memory, can be used to store the desired program in the form of instructions or data structures that can be accessed by the computer Any other media etc. of the code.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory can be used to store the desired program in the form of instructions or data structures that can be accessed by the computer Any other media etc. of the code.
  • both source device 10 and destination device 20 may include desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called “smart" phones, Televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.
  • Link 30 may include one or more media or devices capable of transmitting encoded media data from source device 10 to destination device 20 .
  • link 30 may include one or more communication media that enable source device 10 to transmit encoded media data directly to destination device 20 in real-time.
  • the source device 10 may modulate the encoded media data based on a communication standard, such as a wireless communication protocol, etc., and may send the modulated media data to the destination device 20 .
  • the one or more communication media may include wireless and/or wired communication media, for example, the one or more communication media may include radio frequency (radio frequency, RF) spectrum or one or more physical transmission lines.
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet), among others.
  • the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the source device 10 to the destination device 20, etc., which are not specifically limited in this embodiment of the present application.
  • the storage device 40 may store the received encoded media data sent by the source device 10 , and the destination device 20 may directly obtain the encoded media data from the storage device 40 .
  • the storage device 40 may include any one of a variety of distributed or locally accessed data storage media, for example, any one of the various distributed or locally accessed data storage media may be Hard disk drive, Blu-ray Disc, digital versatile disc (DVD), compact disc read-only memory (CD-ROM), flash memory, volatile or nonvolatile memory, or Any other suitable digital storage medium for storing encoded media data, etc.
  • the storage device 40 may correspond to a file server or another intermediate storage device that may save the encoded media data generated by the source device 10, and the destination device 20 may transmit or download the storage device via streaming or downloading. 40 stored media data.
  • the file server may be any type of server capable of storing encoded media data and sending the encoded media data to destination device 20 .
  • the file server may include a network server, a file transfer protocol (file transfer protocol, FTP) server, a network attached storage (network attached storage, NAS) device, or a local disk drive.
  • Destination device 20 may obtain encoded media data over any standard data connection, including an Internet connection.
  • Any standard data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a digital subscriber line (DSL), cable modem, etc.), or is suitable for obtaining encoded data stored on a file server.
  • a wireless channel e.g., a Wi-Fi connection
  • a wired connection e.g., a digital subscriber line (DSL), cable modem, etc.
  • DSL digital subscriber line
  • cable modem etc.
  • the transmission of encoded media data from storage device 40 may be a streaming transmission, a download transmission, or a combination of both.
  • the implementation environment shown in Figure 1 is only a possible implementation, and the technology of the embodiment of the present application is not only applicable to the source device 10 shown in Figure 1 that can encode media data, but also can encode the encoded media
  • the destination device 20 for decoding data may also be applicable to other devices capable of encoding media data and decoding encoded media data, which is not specifically limited in this embodiment of the present application.
  • the source device 10 includes a data source 120 , an encoder 100 and an output interface 140 .
  • output interface 140 may include a conditioner/demodulator (modem) and/or a transmitter, where a transmitter may also be referred to as a transmitter.
  • Data source 120 may include an image capture device (e.g., video camera, etc.), an archive containing previously captured media data, a feed interface for receiving media data from a media data content provider, and/or a computer for generating media data graphics system, or a combination of these sources of media data.
  • the data source 120 may send media data to the encoder 100, and the encoder 100 may encode the received media data sent by the data source 120 to obtain encoded media data.
  • An encoder may send encoded media data to an output interface.
  • source device 10 sends the encoded media data directly to destination device 20 via output interface 140 .
  • encoded media data may also be stored on storage device 40 for later retrieval by destination device 20 for decoding and/or display.
  • the destination device 20 includes an input interface 240 , a decoder 200 and a display device 220 .
  • input interface 240 includes a receiver and/or a modem.
  • the input interface 240 can receive the encoded media data via the link 30 and/or from the storage device 40, and then send it to the decoder 200, and the decoder 200 can decode the received encoded media data to obtain the decoded media data. media data.
  • the decoder may transmit the decoded media data to the display device 220 .
  • the display device 220 may be integrated with the destination device 20 or may be external to the destination device 20 . In general, the display device 220 displays the decoded media data.
  • the display device 220 can be any type of display device in various types, for example, the display device 220 can be a liquid crystal display (liquid crystal display, LCD), a plasma display, an organic light-emitting diode (organic light-emitting diode, OLED) monitor or other type of display device.
  • the display device 220 can be a liquid crystal display (liquid crystal display, LCD), a plasma display, an organic light-emitting diode (organic light-emitting diode, OLED) monitor or other type of display device.
  • encoder 100 and decoder 200 may be individually integrated with the encoder and decoder, and may include appropriate multiplexer-demultiplexer (multiplexer-demultiplexer) , MUX-DEMUX) unit or other hardware and software for encoding both audio and video in a common data stream or in separate data streams.
  • the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as user datagram protocol (UDP), if applicable.
  • Each of the encoder 100 and the decoder 200 can be any one of the following circuits: one or more microprocessors, digital signal processing (digital signal processing, DSP), application specific integrated circuit (application specific integrated circuit, ASIC) ), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the techniques of the embodiments of the present application are implemented partially in software, the device may store instructions for the software in a suitable non-transitory computer-readable storage medium, and may use one or more processors in hardware The instructions are executed to implement the technology of the embodiments of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered to be one or more processors. Each of encoder 100 and decoder 200 may be included in one or more encoders or decoders, either of which may be integrated into a combined encoding in a corresponding device Part of a codec/decoder (codec).
  • codec codec/decoder
  • Embodiments of the present application may generally refer to the encoder 100 as “signaling” or “sending” certain information to another device such as the decoder 200 .
  • the term “signaling” or “sending” may generally refer to the transmission of syntax elements and/or other data for decoding compressed media data. This transfer can occur in real time or near real time. Alternatively, this communication may occur after a period of time, such as upon encoding when storing syntax elements in an encoded bitstream to a computer-readable storage medium, which the decoding device may then perform after the syntax elements are stored on this medium The syntax element is retrieved at any time.
  • the encoding and decoding methods provided in the embodiments of the present application can be applied to various scenarios. Next, several scenarios will be introduced by taking the media data to be encoded as an HOA signal as an example.
  • FIG. 2 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a terminal scenario.
  • the implementation environment includes a first terminal 101 and a second terminal 201 , and the first terminal 101 and the second terminal 201 are connected in communication.
  • the communication connection may be a wireless connection or a wired connection, which is not limited in this embodiment of the present application.
  • the first terminal 101 may be a sending end device or a receiving end device.
  • the second terminal 201 may be a receiving end device or a sending end device.
  • the first terminal 101 is a sending end device
  • the second terminal 201 is a receiving end device
  • the first terminal 101 is a receiving end device
  • the second terminal 201 is a sending end device.
  • Both the first terminal 101 and the second terminal 201 include an audio collection module, an audio playback module, an encoder, a decoder, a channel encoding module and a channel decoding module.
  • the encoder is a three-dimensional audio encoder
  • the decoder is a three-dimensional audio decoder.
  • the audio collection module in the first terminal 101 collects the HOA signal and transmits it to the encoder.
  • the encoder encodes the HOA signal using the encoding method provided in the embodiment of the present application.
  • the encoding may be called source encoding. Later, in order to realize the transmission of the HOA signal in the channel, the channel coding module needs to perform channel coding again, and then transmit the encoded code stream in the digital channel through the wireless or wired network communication equipment.
  • the second terminal 201 receives the code stream transmitted in the digital channel through a wireless or wired network communication device, the channel decoding module performs channel decoding on the code stream, and then the decoder decodes the HOA signal by using the decoding method provided in the embodiment of this application, and then passes the audio Playback module to play.
  • the first terminal 101 and the second terminal 201 can be any electronic product that can interact with the user through one or more ways such as keyboard, touch pad, touch screen, remote control, voice interaction or handwriting equipment, etc.,
  • Such as personal computer personal computer, PC
  • mobile phone smart phone
  • personal digital assistant personal digital assistant, PDA
  • wearable device PPC (pocket PC)
  • tablet computer smart car machine, smart TV, smart speaker wait.
  • FIG. 3 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a transcoding scenario of a wireless or core network device.
  • the implementation environment includes a channel decoding module, an audio decoder, an audio encoder and a channel encoding module.
  • the audio encoder is a three-dimensional audio encoder
  • the audio decoder is a three-dimensional audio decoder.
  • the audio decoder may be a decoder using the decoding method provided in the embodiment of the present application, or may be a decoder using other decoding methods.
  • the audio encoder may be an encoder using the encoding method provided by the embodiment of the present application, or may be an encoder using other encoding methods.
  • the audio encoder is a coder using other encoding methods
  • the audio The encoder is an encoder using the encoding method provided by the embodiment of the present application.
  • the audio decoder is a decoder using the decoding method provided by the embodiment of the present application, and the audio encoder is an encoder using other encoding methods.
  • the channel decoding module is used to perform channel decoding on the received code stream, and then the audio decoder is used to use the decoding method provided by the embodiment of the application to perform source decoding, and then the audio encoder is used to encode according to other encoding methods to achieve a
  • the conversion from one format to another is known as transcoding. After that, it is sent after channel coding.
  • the audio decoder is a decoder using other decoding methods
  • the audio encoder is an encoder using the encoding method provided by the embodiment of the present application.
  • the channel decoding module is used to perform channel decoding on the received code stream, and then the audio decoder is used to use other decoding methods to perform source decoding, and then the audio encoder uses the encoding method provided by the embodiment of the application to perform encoding to realize a
  • the conversion from one format to another is known as transcoding. After that, it is sent after channel coding.
  • the wireless device may be a wireless access point, a wireless router, a wireless connector, and the like.
  • a core network device may be a mobility management entity, a gateway, and the like.
  • FIG. 4 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a broadcast television scene.
  • the broadcast TV scene is divided into a live scene and a post-production scene.
  • the implementation environment includes a live program 3D sound production module, a 3D sound encoding module, a set-top box and a speaker group, and the set-top box includes a 3D sound decoding module.
  • the implementation environment includes post-program 3D sound production modules, 3D sound coding modules, network receivers, mobile terminals, earphones, and the like.
  • the three-dimensional sound production module of the live program produces a three-dimensional sound signal (such as an HOA signal), and the three-dimensional sound signal obtains a code stream by applying the encoding method of the embodiment of the application, and the code stream is transmitted to the user side through the radio and television network, and the The 3D sound decoder in the set-top box uses the decoding method provided by the embodiment of the present application to decode the code stream, thereby reconstructing the 3D sound signal, which is played back by the speaker group.
  • a three-dimensional sound signal such as an HOA signal
  • the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the network receiver decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back by the speaker group.
  • the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the mobile terminal decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back through the earphone.
  • the post-program 3D sound production module produces a 3D sound signal, and the 3D sound signal obtains a code stream by applying the encoding method of the embodiment of the application.
  • the acoustic decoder uses the decoding method provided by the embodiment of the present application to decode the code stream, so as to reconstruct the three-dimensional acoustic signal, which is played back by the speaker group.
  • the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the network receiver decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back by the speaker group.
  • the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the mobile terminal decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back through the earphone.
  • FIG. 5 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a virtual reality streaming scene.
  • the implementation environment includes an encoding end and a decoding end.
  • the encoding end includes an acquisition module, a preprocessing module, an encoding module, a packaging module and a sending module
  • the decoding end includes an unpacking module, a decoding module, a rendering module and earphones.
  • the acquisition module collects the HOA signal, and then preprocesses the HOA signal through the preprocessing module.
  • the preprocessing operation includes filtering out the low frequency part of the HOA signal, usually using 20Hz or 50Hz as the cut-off point to extract the orientation information in the HOA signal wait.
  • use the encoding module to perform encoding processing using the encoding method provided by the embodiment of the present application. After encoding, use the packing module to pack and send to the decoding end through the sending module.
  • the unpacking module at the decoding end first unpacks, and then uses the decoding method provided by the embodiment of the application to decode through the decoding module, and then performs binaural rendering processing on the decoded signal through the rendering module, and the rendered signal is mapped to the listener's earphones superior.
  • the earphone can be an independent earphone, or an earphone on a virtual reality glasses device.
  • any of the following encoding methods may be executed by the encoder 100 in the source device 10 .
  • Any of the following decoding methods may be performed by the decoder 200 in the destination device 20 .
  • FIG. 6 is a flow chart of an encoding method provided by an embodiment of the present application, and the encoding method is applied to an encoding end. Please refer to FIG. 6 , the method includes the following steps.
  • Step 601 Determine the coding scheme of the current frame according to the HOA signal of the current frame.
  • the encoder performs encoding frame by frame.
  • the HOA signal of the audio frame is an audio signal obtained through the HOA acquisition technology.
  • the HOA signal is a scene audio signal and also a three-dimensional audio signal.
  • the HOA signal refers to the audio signal obtained by collecting the sound field where the microphone is located in the space.
  • the collected audio signal is called the original HOA signal.
  • the HOA signal of the audio frame may also be an HOA signal obtained by converting a 3D audio signal in another format. For example, convert a 5.1-channel signal into an HOA signal, or convert a 3D audio signal mixed with a 5.1-channel signal and object audio into an HOA signal.
  • the HOA signal of the audio frame to be encoded is a time-domain signal or a frequency-domain signal, and may include all channels of the HOA signal, or may include some channels of the HOA signal.
  • the order of the HOA signal of the audio frame is 3, the number of channels of the HOA signal is 16, the frame length of the audio frame is 20ms, and the sampling rate is 48KHz, then the HOA signal of the audio frame to be encoded contains 16 channels The signal, each channel contains 960 sampling points.
  • the encoder can down-sample the original HOA signal to obtain the The HOA signal of the audio frame. For example, the encoder performs 1/Q down-sampling on the original HOA signal to reduce the number of sampling points or frequency points of the HOA signal to be encoded. For example, in the embodiment of the present application, each channel of the original HOA signal contains 960 sampling points. After /120 downsampling, each channel of the HOA signal to be encoded contains 8 sampling points.
  • the encoding method of the encoding end is introduced by taking the encoding end encoding the current frame as an example.
  • the current frame is an audio frame to be encoded. That is, the encoding end acquires the HOA signal of the current frame, and encodes the HOA signal of the current frame by using the encoding method provided in the embodiment of the present application.
  • the encoding end first determines the initial encoding scheme of the current frame according to the sound field type of the current frame, and the initial encoding scheme is the first encoding scheme or the second encoding scheme. The encoding end judges whether the first encoding scheme, the second encoding scheme or the third encoding scheme is used to encode the HOA signal of the current frame by comparing the initial encoding scheme of the current frame with the initial encoding scheme of the previous frame of the current frame. .
  • the encoding end uses the encoding scheme consistent with the initial encoding scheme of the current frame to encode the HOA signal of the current frame. If the initial encoding scheme of the current frame is different from the initial encoding scheme of the current frame, the encoding end uses the third encoding scheme to encode the HOA signal of the current frame.
  • the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme.
  • the first coding scheme is the HOA coding scheme based on DirAC
  • the second coding scheme is the HOA coding scheme based on virtual speaker selection
  • the third coding scheme is a hybrid coding scheme.
  • the hybrid coding scheme is also referred to as a switched frame coding scheme.
  • the third coding scheme is a switching frame coding scheme provided by the embodiment of the present application.
  • the switching frame coding scheme is for smooth transition of auditory quality when switching between different codec schemes.
  • the embodiment of the present application will introduce these three encoding schemes in detail below.
  • the HOA coding scheme based on virtual speaker selection is also referred to as the MP-based HOA coding scheme.
  • the coding end determines the initial coding scheme of the current frame according to the HOA signal of the current frame. Then, the encoding end determines the encoding scheme of the current frame based on the initial encoding scheme of the current frame and the initial encoding scheme of the previous frame of the current frame. It should be noted that this embodiment of the present application does not limit the implementation manner in which the encoding end determines the initial encoding scheme.
  • the coding end analyzes the sound field type of the HOA signal of the current frame to obtain the sound field classification result of the current frame, and determines the initial coding scheme of the current frame based on the sound field classification result of the current frame. It should be noted that the embodiment of the present application does not limit the method of sound field type analysis, for example, the encoding end performs singular value decomposition on the HOA signal of the current frame to perform sound field type analysis.
  • the sound field classification result includes the number of dissimilar sound sources, and this embodiment of the present application does not limit the method for determining the number of dissimilar sound sources.
  • the encoder After determining the number of different sound sources corresponding to the current frame, if the number of different sound sources corresponding to the current frame is greater than the first threshold and smaller than the second threshold, the encoder determines that the initial encoding scheme of the current frame is the second encoding scheme. If the number of dissimilar sound sources corresponding to the current frame is not greater than the first threshold or not less than the second threshold, the encoder determines that the initial encoding scheme of the current frame is the first encoding scheme.
  • the first threshold is smaller than the second threshold.
  • the first threshold is 0 or other values
  • the second threshold is 3 or other values.
  • the initial encoding scheme of each audio frame may switch back and forth, so that there are more switching frames that need to be encoded in the end, due to the difference between the encoding schemes
  • the switching frame refers to an audio frame whose initial encoding scheme is different from that of the previous frame.
  • the encoding end may first determine the expected encoding scheme of the current frame according to the sound field classification result of the current frame, that is, the encoding end uses the initial encoding scheme determined according to the aforementioned method as the expected encoding scheme. Then, the encoder uses a sliding window method to update the initial encoding scheme of the current frame based on the expected encoding scheme of the current frame, for example, the encoder updates the initial encoding scheme of the current frame through hangover processing.
  • the sliding window includes the predicted coding scheme of the current frame and the updated initial coding scheme of the previous N ⁇ 1 frames of the current frame. If the cumulative number of second coding schemes in the sliding window is not less than the first specified threshold, the encoder updates the initial coding scheme of the current frame to the second coding scheme. If the cumulative number of second coding schemes in the sliding window is less than the first specified threshold, the encoder updates the initial coding scheme of the current frame to the first coding scheme.
  • the length N of the sliding window is 8, 10, 15, etc.
  • the first specified threshold is 5, 6, 7, etc. The embodiment of the present application does not limit the length of the sliding window and the value of the first specified threshold.
  • An example is as follows, assuming that the length of the sliding window is 10, the first specified threshold is 7, and the sliding window contains the predicted coding scheme of the current frame and the updated initial coding scheme of the first 9 frames of the current frame. If the second When the number of encoding schemes is not less than 7, the encoding end updates the initial encoding scheme of the current frame to the second encoding scheme. If the cumulative number of second coding schemes in the sliding window is less than 7, the coding end updates the initial coding scheme of the current frame to the first coding scheme.
  • the encoder updates the initial coding scheme of the current frame to the first coding scheme. If the cumulative number of the first coding scheme in the sliding window is less than the second specified threshold, the encoder updates the initial coding scheme of the current frame to the second coding scheme.
  • the second designated threshold value is 5, 6, 7 and other values, and the embodiment of the present application does not limit the value of the second designated threshold value.
  • the second specified threshold is different from or the same as the above-mentioned first specified threshold.
  • the encoder can also use other methods to obtain the sound field classification result of the current frame, and the method of determining the initial coding scheme based on the sound field classification result can also be other methods. Not limited.
  • the encoding end determines the initial encoding scheme of the current frame, if the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame, the encoding end determines that the encoding scheme of the current frame is the current frame initial encoding scheme. If the initial encoding scheme of the current frame is different from the initial encoding scheme of the frame preceding the current frame, the encoder determines that the encoding scheme of the current frame is the third encoding scheme.
  • the encoder determines that the coding scheme of the current frame is the first coding scheme. If the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame and is the second coding scheme, the encoder determines that the coding scheme of the current frame is the second coding scheme. If one of the initial coding scheme of the current frame and the initial coding scheme of the previous frame of the current frame is the first coding scheme, and the other is the second coding scheme, the encoder determines that the coding scheme of the current frame is the third coding scheme.
  • one of the initial coding scheme of the current frame and the initial coding scheme of the previous frame of the current frame is the first coding scheme
  • the other is the second coding scheme
  • the initial coding scheme of the current frame is the first coding scheme
  • the initial encoding scheme of the frame preceding the current frame is the second encoding scheme
  • the initial encoding scheme of the current frame is the second encoding scheme
  • the initial encoding scheme of the current frame is the second encoding scheme and the initial encoding scheme of the frame preceding the current frame is the first encoding scheme. That is, for the switching frame, the encoding end neither adopts the first coding scheme nor the second coding scheme to encode the HOA signal of the switching frame, but uses the switching frame coding scheme to encode the HOA signal of the switching frame.
  • the coding end will use a coding scheme consistent with the initial coding scheme of the non-switching frame to code the HOA signal of the switching frame.
  • a coding scheme consistent with the initial coding scheme of the non-switching frame to code the HOA signal of the switching frame.
  • an audio frame whose initial coding scheme is different from that of the previous frame is a switching frame
  • an audio frame whose initial coding scheme is the same as that of the previous frame is a non-switching frame.
  • the encoding end in addition to determining the encoding scheme of the current frame, the encoding end also needs to encode information that can indicate the encoding scheme of the current frame into the code stream, so that the decoding end can determine which decoding scheme to use to decode the code stream of the current frame .
  • the encoding end there are many ways for the encoding end to encode information capable of indicating the encoding scheme of the current frame into the code stream, and three implementation ways will be introduced next.
  • the first implementation the code switching flag and the indication information of the two coding schemes
  • the encoder needs to determine the value of the switching flag of the current frame, and encode the value of the switching flag of the current frame into the code stream.
  • the value of the switching flag of the current frame is the first value.
  • the value of the switching flag of the current frame is the second value.
  • the first value is "0" and the second value is "1”, and the first value and the second value may also be other values.
  • the encoding end encodes the indication information of the initial encoding scheme of the current frame into the code stream.
  • the encoding end encodes the indication information of the initial encoding scheme of the current frame into the code stream; if the value of the switching flag of the current frame is the second value, the encoding end encodes the Preset instructions are programmed into the bitstream.
  • the indication information of the initial coding scheme is represented by a coding mode (coding mode) corresponding to the initial coding scheme, that is, the coding mode is used as the indication information.
  • the coding mode corresponding to the initial coding scheme is the initial coding mode
  • the initial coding mode is the first coding mode (that is, the DirAC coding mode, that is, the DirAC coding scheme) or the second coding mode (that is, the MP coding mode, that is, the MP coding scheme) ).
  • the preset indication information is a preset encoding mode
  • the preset encoding mode is a first encoding mode or a second encoding mode.
  • the preset indication information is other coding modes, that is, it does not limit the specific indication information of the coding scheme of the switching frame encoded into the code stream.
  • the encoding end uses the switching flag to indicate the switching frame
  • the indication information of the coding scheme of the switching frame encoded into the code stream may not be limited, and the indication information of the coding scheme of the switching frame may be To switch the initial encoding mode of the frame, it may also be a preset encoding mode, or may be randomly selected from the first encoding mode and the second encoding mode.
  • the switching flag is used to indicate whether the current frame is a switching frame, so that the decoder can directly determine whether the current frame is a switching frame by obtaining the switching flag in the code stream.
  • the switching flag of the current frame and the indication information of the initial coding scheme each occupy one bit of the code stream.
  • the value of the switching flag of the current frame is "0" or “1”, wherein the value of the switching flag is "0" indicating that the current frame is not a switching frame, that is, the value of the switching flag of the current frame is the first value.
  • the value of the switching flag is "1" indicating that the current frame is a switching frame, that is, the value of the switching flag of the current frame is the second value.
  • the indication information of the initial coding scheme is "0” or "1", wherein "0" indicates the DirAC mode, and "1" indicates the MP mode.
  • the encoding end determines that the value of the switching flag of the current frame is the second value, and sets the value of the switching flag of the current frame to The value is encoded into the codestream. That is, for the switching frame, since the switching flag in the code stream can indicate the switching frame, there is no need to encode the indication information of the coding scheme of the switching frame.
  • the second implementation mode encoding the indication information of the two encoding schemes
  • the encoding end encodes the indication information of the initial encoding scheme of the current frame into the code stream.
  • the indication information coded into the code stream is substantially the coding mode consistent with the initial coding scheme, that is, the DirAC mode or the MP mode.
  • the indication information of the initial encoding scheme occupies one bit of the code stream.
  • the indication information is "0" or "1", wherein "0" indicates the DirAC mode, indicating that the initial encoding scheme of the current frame is the first encoding scheme, and "1" indicates MP mode, indicating that the initial encoding scheme of the current frame is the second encoding scheme.
  • the third implementation mode encoding the indication information of the three encoding schemes
  • the encoding end encodes the indication information of the encoding scheme of the current frame into the code stream.
  • the indication information coded into the code stream is substantially the coding mode consistent with the coding scheme of the current frame, that is, DirAC mode, MP mode or MP-W mode.
  • the MP-W mode is a coding mode corresponding to the switching frame coding scheme.
  • the indication information is MP-W mode, it indicates that the current frame is a switching frame, and if the indication information is DirAC mode or MP mode, it indicates that the current frame is a non-switching frame.
  • the indication information of the coding scheme of the current frame occupies two bits of the code stream.
  • the indication information encoded into the code stream is "00", “01” or “10".
  • "00" indicates that the encoding scheme of the current frame is the first encoding scheme
  • "01” indicates that the encoding scheme of the current frame is the second encoding scheme
  • "10" indicates that the encoding scheme of the current frame is the third encoding scheme.
  • Step 602 If the coding scheme of the current frame is the third coding scheme, code the signal of the designated channel in the HOA signal into the code stream, and the designated channel is a part of all the channels of the HOA signal.
  • the third encoding scheme indicates that only the signal of the specified channel in the HOA signal of the current frame is encoded into the code stream.
  • the specified channel is a part of all channels of the HOA signal. That is to say, for the switching frame, the encoder encodes the signal of the specified channel in the HOA signal of the switching frame into the code stream instead of using the first coding scheme or the second coding scheme to encode the switching frame, that is, this scheme is for Smooth transition of auditory quality when coding schemes are switched, using a compromise method to encode switching frames.
  • the designated channel is consistent with a preset transmission channel in the first encoding scheme, that is, the designated channel is a preset channel. That is to say, under the premise that the third coding scheme is different from the second coding scheme, in order to make the coding effect of the third coding scheme and the first coding scheme close, the coding end will switch the HOA signal of the frame and the first coding scheme The signal of the same channel as the preset transmission channel is encoded into the code stream, so that the auditory quality transitions as smoothly as possible when the encoding scheme is switched.
  • different transmission channels can be preset according to different encoding bandwidths, bit rates, and even application scenarios.
  • the preset transmission channels may also be the same.
  • the encoding end to encode the signal of the specified channel in the HOA signal into the code stream. It only needs to encode the signal of the specified channel into the code stream. This is not limited.
  • the signals of the specified channel include FOA signals
  • the FOA signals include omnidirectional W signals, and directional X signals, Y signals, and Z signals.
  • the specified channel includes the FOA channel
  • the signal of the FOA channel is a low-order signal, that is, if the current frame is a switching frame, the encoder only encodes the low-order part of the HOA signal of the current frame into the code stream, and the low-order part That is, it includes the W signal, X signal, Y signal and Z signal of the FOA channel.
  • the encoding end determines a virtual speaker signal and a residual signal based on the signal of the specified channel, and encodes the virtual speaker signal and residual signal into a code stream.
  • the encoder determines the W signal as a virtual speaker signal, and determines the difference signals between the X signal, Y signal, and Z signal and the W signal as three residual signals, or, The X signal, the Y signal and the Z signal are determined as three-way residual signals.
  • the encoding end encodes the one-way virtual speaker signal and the three-way residual signal into the code stream through the core encoder.
  • the core encoder is a stereo encoder or a mono encoder.
  • the switching frame coding scheme may also be referred to as an MP-W-based coding scheme.
  • the process of encoding the current frame at the encoding end will be introduced.
  • the encoding end encodes the HOA signal of the current frame into the code stream according to the first encoding scheme. If the encoding scheme of the current frame is the second encoding scheme, the encoding end encodes the HOA signal of the current frame into the code stream according to the second encoding scheme. That is, if the current frame is not a switching frame, the encoding end uses the initial encoding scheme of the current frame to encode the current frame.
  • the implementation process of encoding the HOA signal of the current frame into the code stream by the encoding end according to the first encoding scheme is as follows: the encoding end extracts the core layer signal and spatial parameters from the HOA signal of the current frame, and the extracted core layer Layer signal and spatial parameters are encoded into the bitstream.
  • the encoder extracts the core layer signal from the HOA signal of the current frame through the core coded signal acquisition module, extracts the spatial parameters from the HOA signal of the current frame through the DirAC-based spatial parameter extraction module, and extracts the core layer signal from the HOA signal of the current frame through the core encoder.
  • the layer signal is encoded into the code stream, and the spatial parameters are encoded into the code stream through the spatial parameter encoder.
  • the channel corresponding to the core layer signal is consistent with the specified channel in this solution.
  • the first encoding scheme not only encodes the core layer signal into the code stream, but also encodes the extracted spatial parameters into the code stream.
  • the spatial parameters contain rich scene information, such as direction information.
  • the switching frame encoding scheme provided by the embodiment of the present application only encodes the signal of the specified channel into the code stream. It can be seen that, for the same frame, the effective information encoded into the code stream by using the DirAC-based HOA encoding scheme is also more than the effective information encoded into the code stream by using the switching frame encoding scheme.
  • the switching frame coding scheme also combines the HOA signal with the first coding scheme
  • the signal of the same specified channel as the preset transmission channel is encoded into the code stream, but more information in the HOA signal will not be encoded into the code stream, that is, the spatial parameters will not be extracted, and the spatial parameters will not be encoded into the code stream , so that the auditory quality transitions as smoothly as possible.
  • the encoding end encodes the HOA signal of the current frame into the code stream according to the second encoding scheme: the encoding end selects the target virtual speaker that matches the HOA signal of the current frame from the virtual speaker set based on the MP algorithm, and based on the HOA signal of the current frame signal and the target virtual speaker, the virtual speaker signal is determined by the MP-based spatial encoder, the residual signal is determined by the MP-based spatial encoder based on the HOA signal and the virtual speaker signal of the current frame, and the virtual speaker signal and the residual signal are combined by the core encoder The difference signal is encoded into the code stream.
  • the principles and specific methods of determining the virtual loudspeaker signal and residual signal in the MP-based HOA coding scheme are different from those in the switching frame coding scheme, and the virtual loudspeaker signal and residual signal determined by the two schemes are also different.
  • the effective information encoded into the code stream by using the MP-based HOA coding scheme will be more than that by switching the frame coding scheme.
  • the switching frame coding scheme under the premise that the switching frame coding scheme is different from the second coding scheme, in order to make the coding effect of the switching frame coding scheme and the first coding scheme close, the switching frame coding scheme also adopts the method of coding the virtual speaker signal and the residual signal way, so that the auditory quality transitions as smoothly as possible.
  • Fig. 7 is a flow chart of another encoding method provided by the embodiment of the present application. Please refer to FIG. 7 , taking the example of encoding the indication information of the initial encoding scheme of the current frame into the code stream, the encoding method provided by the embodiment of the present application is explained again.
  • the encoder first acquires the HOA signal of the current frame to be encoded. Then, the encoding end analyzes the sound field type of the HOA signal to determine the initial encoding scheme of the current frame. The encoding end judges whether the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame.
  • the encoding end uses the initial encoding scheme of the current frame to encode the HOA signal of the current frame to obtain the code stream of the current frame. If the initial coding scheme of the current frame is different from the initial coding scheme of the previous frame of the current frame, the encoding end uses the switching frame coding scheme to encode the HOA signal of the current frame to obtain the code stream of the current frame.
  • the initial encoding scheme of the current frame is the first encoding scheme or the second encoding scheme
  • the encoder adopts the initial encoding scheme of the current frame to convert the HOA of the current frame
  • the signal is encoded into the bitstream.
  • the HOA signal of the audio frame is encoded and decoded by combining two schemes (namely, a codec scheme based on virtual speaker selection and a codec scheme based on directional audio coding), that is, for different
  • the audio frame selects an appropriate codec scheme, which can improve the compression rate of the audio signal.
  • Fig. 8 is a flowchart of a decoding method provided by an embodiment of the present application, and the method is applied to a decoding end. It should be noted that this decoding method corresponds to the encoding method shown in FIG. 6 . Please refer to FIG. 8 , the method includes the following steps.
  • Step 801 Determine the decoding scheme of the current frame according to the code stream, the decoding scheme of the current frame is the first decoding scheme or not the first decoding scheme, and the first decoding scheme is the HOA decoding scheme based on DirAC.
  • the decoding end since the encoding end uses different encoding schemes for encoding different audio frames, the decoding end also needs to use a corresponding decoding scheme to decode each audio frame. Next, how the decoding end determines the decoding scheme of the current frame is introduced.
  • step 601 of the encoding method shown in FIG. 6 three implementations are introduced in which the encoding end encodes information that can be used to indicate the encoding scheme of the current frame into the code stream.
  • the decoding end determines the current frame's
  • the encoding scheme which will be introduced next.
  • the first implementation encoding the switching flag and the indication information of the two encoding schemes
  • the decoder first parses out the value of the switching flag of the current frame from the code stream. If the value of the switching flag is the first value, the decoding end parses the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme. decoding scheme. If the value of the switching flag is the second value, the decoding end determines that the encoding scheme of the current frame is the third encoding scheme. It should be noted that the indication information of the encoding scheme encoded into the code stream by the encoding end is the indication information of the decoding scheme parsed from the code stream by the decoding end.
  • the decoding end parses out that the value of the switching flag of the current frame is the first value, it means that the current frame is a non-switching frame.
  • the decoding end then parses out the indication information of the decoding scheme from the code stream, and determines the decoding scheme of the current frame based on the indication information. If the value of the switching flag is the second value, the decoding end determines that the decoding scheme of the current frame is the third decoding scheme, and the current frame is a switching frame. In this case, even if the code stream contains indication information, the decoding end does not need to decode Instructions.
  • the third decoding scheme is a hybrid decoding scheme, that is, a switching frame decoding scheme.
  • the decoding end determines that the decoding scheme of the current frame is a switching frame decoding scheme, and the current frame is a switching frame, and the switching frame decoding scheme is different from the first decoding scheme and the second decoding scheme.
  • the hybrid decoding scheme of the two decoding schemes, and the switching frame decoding scheme is for smooth transition of auditory quality and delay alignment.
  • the indication information of the decoding scheme and the switching flag each occupy one bit of the code stream.
  • the decoder first parses the value of the switching flag of the current frame from the code stream. If the parsed value of the switching flag is "0", that is, the value of the switching flag is the first value, the decoding end then analyzes the value of the switching flag from the code stream. The indication information of the decoding scheme of the current frame is analyzed in the middle, and if the indication information analyzed is "0", the decoding end determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "1", the decoding end determines that the decoding scheme of the current frame is the second decoding scheme. If the parsed value of the switching flag is "1", that is, the value of the switching flag is the second value, then the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme, that is, the third decoding scheme.
  • the decoder can determine the switching state of the current frame based on the switching flag of the current frame and the decoding scheme of the previous frame of the current frame. For example, if the value of the switching flag of the current frame is the first value, and the decoding scheme of the previous frame of the current frame is the first decoding scheme, then the decoding end determines that the switching state of the current frame is the first switching state, and the first switching state Refers to the status of switching from the DirAC-based HOA decoding scheme to the MP-based HOA decoding scheme.
  • the decoding end determines that the switching state of the current frame is the second switching state, and the second switching state means Status of switching from MP-based HOA decoding scheme to DirAC-based HOA decoding scheme.
  • the second implementation mode encodes the indication information of two encoding schemes
  • the decoding end parses out the initial decoding scheme of the current frame from the code stream, and the initial decoding scheme is the first decoding scheme or the second decoding scheme. If the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame, it is determined that the decoding scheme of the current frame is the initial decoding scheme of the current frame. If the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame of the current frame, it is determined that the decoding scheme of the current frame is a third decoding scheme, that is, a hybrid decoding scheme.
  • the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame of the current frame means that the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme , or, the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme. That is, one of the initial decoding scheme of the current frame and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme, and the other is the second decoding scheme.
  • the indication information parsed from the code stream is called a coding mode (coding mode).
  • a coding mode coding mode
  • the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame of the current frame, it means that the current frame is a switching frame. If the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame, it means that the current frame is a non-switching frame.
  • the indication information used to indicate the initial decoding scheme occupies one bit of the code stream.
  • the coding mode in the code stream occupies one bit.
  • the decoding end parses the indication information of the current frame from the code stream. If the indication information parsed is "0" and the indication information of the previous frame of the current frame is also "0", the decoding end determines that the current frame The decoding scheme of is the first decoding scheme. If the parsed indication information is "1" and the indication information of the previous frame of the current frame is also "1", the decoding end determines that the decoding scheme of the current frame is the second decoding scheme.
  • the decoding end determines that the decoding scheme of the current frame is the third decoding scheme.
  • the indication information of the initial decoding scheme of the previous frame of the current frame is cached data.
  • the decoding end may acquire the indication information of the initial decoding scheme of the previous frame of the current frame from the cache.
  • the decoding end can determine the switching state of the current frame based on the initial decoding scheme of the previous frame of the current frame. For example, if the initial decoding scheme of the previous frame of the current frame is the first decoding scheme, the decoding end determines that the switching state of the current frame is the first switching state, and the first switching state refers to switching from the DirAC-based HOA decoding scheme to the HOA-based decoding scheme. Status of the MP's HOA decoding scheme.
  • the decoding end determines that the switching state of the current frame is the second switching state, and the second switching state refers to switching from the MP-based HOA decoding scheme to the DirAC-based decoding scheme. Status of the HOA decoding scheme.
  • the third implementation mode encodes the indication information of three encoding schemes
  • the decoding end parses the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.
  • the decoding end parses out the encoding mode of the current frame from the code stream. If the encoding mode of the current frame is the DirAC mode, the decoding end determines that the decoding scheme of the current frame is the first decoding scheme. If the coding mode of the current frame is the MP mode, the decoding end determines that the decoding scheme of the current frame is the second decoding scheme. If the coding mode of the current frame is the MP-W mode, the decoding end determines that the decoding scheme of the current frame is the third decoding scheme.
  • the indication information of the decoding scheme occupies two bits of the code stream.
  • the coding mode of the current frame occupies two bits of the code stream.
  • the decoding end parses the indication information of the decoding scheme of the current frame from the code stream, and if the parsed indication information is "00", the decoding end determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "01”, the decoding end determines that the decoding scheme of the current frame is the second decoding scheme. If the parsed indication information is "10", the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme.
  • the decoding end can determine the switching state of the current frame based on the decoding scheme of the previous frame of the current frame. For example, if the decoding scheme of the previous frame of the current frame is the first decoding scheme, the decoding end determines that the switching state of the current frame is the first switching state, and the first switching state refers to switching from the DirAC-based HOA decoding scheme to the MP-based decoding scheme. The status of the HOA decoding scheme.
  • the decoding end determines that the switching state of the current frame is the second switching state, and the second switching state refers to switching from the MP-based HOA decoding scheme to the DirAC-based HOA Status of the decoding scheme.
  • Step 802 If the coding scheme of the current frame is the first coding scheme, reconstruct the first audio signal according to the code stream according to the first coding scheme, and the reconstructed first audio signal is the reconstructed HOA signal of the current frame.
  • the decoding end uses the first decoding scheme to decode the code stream, and the current frame can be obtained The reconstructed HOA signal. That is, if the decoding scheme of the current frame is the first decoding scheme, the decoder reconstructs the first audio signal according to the code stream according to the first decoding scheme, and the reconstructed first audio signal is the reconstructed HOA signal of the current frame.
  • the decoding end reconstructs the first audio signal according to the code stream according to the first decoding scheme.
  • HOA signal exemplaryily, the decoding end parses the core layer signal from the code stream through the core decoder, parses the spatial parameters from the code stream through the spatial parameter decoder, and performs DirAC-based HOA signal based on the parsed core layer signal and spatial parameters. Synthesis processing to reconstruct the first audio signal, the reconstructed first audio signal is the reconstructed HOA signal of the current frame.
  • Step 803 If the coding scheme of the current frame is not the first coding scheme, reconstruct the second audio signal according to the code stream according to the non-first decoding scheme, and perform alignment processing on the reconstructed second audio signal to obtain the current frame The HOA signal is reconstructed, and the alignment process makes the decoding delay of the current frame consistent with the decoding delay of the first decoding scheme.
  • the decoding end decodes the code stream according to the first encoding scheme to obtain the reconstructed HOA signal of the current frame without other processing.
  • the decoding end first reconstructs the second audio signal according to the code stream, and then needs to align the second audio signal Processing, or alignment processing based on the second audio signal, to obtain the reconstructed HOA signal of the current frame.
  • the alignment process makes the decoding delay of the current frame consistent with the decoding delay of the first decoding scheme.
  • the decoding delay mentioned in this article is the end-to-end encoding and decoding delay, and the decoding delay can also be regarded as the encoding delay.
  • the delays of the encoding process of the three encoding schemes are consistent, and the decoding process The delay needs to be aligned according to the decoding method provided by the embodiment of this application.
  • the decoding scheme of the current frame is not the first encoding scheme, which is divided into two cases, that is, the decoding scheme of the current frame is the second decoding scheme, and the decoding scheme of the current frame is the third decoding scheme.
  • the scheme, that is, the non-first decoding scheme is the second decoding scheme or the third decoding scheme.
  • the decoding end reconstructs the signal of the specified channel according to the code stream, and uses the reconstructed signal of the specified channel as the reconstructed second audio signal.
  • the specified channel is a part of all channels of the HOA signal of the current frame.
  • the decoder performs alignment processing on the reconstructed signals of the specified channel to obtain the reconstructed HOA signal of the current frame.
  • the process of reconstructing the signal of the specified channel according to the code stream at the decoding end is symmetrical to the process of encoding the signal of the specified channel into the code stream at the encoding end, that is, it matches.
  • the encoding end determines the virtual speaker signal and the residual signal based on the signal of the specified channel in the HOA signal of the current frame, and encodes the virtual speaker signal and the residual signal into the code stream.
  • the decoder determines the virtual speaker signal and the residual signal according to the code stream, and then reconstructs the signal of the specified channel based on the virtual speaker signal and the residual signal.
  • the decoding end parses the virtual speaker signal and the residual signal from the code stream through a core decoder, and the core decoder may be a stereo decoder or a mono decoder.
  • the decoder For switching frames, after the decoder reconstructs the signal of the designated channel, it analyzes and filters the reconstructed signal of the designated channel, and determines the HOA signal of the current frame except for the designated channel based on the analyzed and filtered signal of the designated channel. The gain of one or more of the remaining channels.
  • the decoding end determines the signals of the one or more remaining channels based on the gains of the one or more remaining channels and the analyzed and filtered signals of the specified channel.
  • the decoding end performs synthesis filtering processing on the analyzed and filtered signal of the specified channel and the signals of the one or more remaining channels to obtain the reconstructed HOA signal of the current frame.
  • the alignment process includes reconstruction of signals of each remaining channel and time delay alignment process based on analysis and synthesis filtering.
  • the decoding end increases the decoding delay of the switching frame by analyzing and synthesizing filtering processing, so that the decoding delay of the switching frame is consistent with the decoding delay of the first decoding scheme, and the analyzing and synthesizing filtering process includes analyzing filtering processing and synthesizing filtering processing.
  • the decoding end performs the reconstruction of the low-order part of the HOA signal
  • the analysis and filtering process is performed on the order part, and the high-order gain of the current frame is determined based on the analysis-filtered low-order part of the HOA signal.
  • the high-order gain includes the gain of each channel included in the high-order part of the HOA signal.
  • the decoding end determines the high-order part of the HOA signal of the current frame based on the analyzed and filtered low-order part and high-order gain of the HOA signal.
  • the decoding end performs synthesis filtering on the low-order part of the analyzed and filtered HOA signal and the high-order part to obtain the reconstructed HOA signal of the current frame. That is, when the signal of the specified channel is the low-order part of the HOA signal of the current frame, the alignment processing corresponding to the switching frame includes reconstruction of the high-order part and delay alignment processing based on analysis and synthesis filtering.
  • the designated channel is consistent with a preset transmission channel in the first decoding scheme (or the first codec scheme or the first encoding scheme).
  • the designated channel includes a first-order ambisonics (first-order ambisonics, FOA) channel, and the signal of the designated channel includes an FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals and Z signals .
  • the FOA signal is the low-order part of the HOA signal.
  • the decoding end inputs the low-order part of the reconstructed HOA signal into the analysis filter, so that the low-order part of the reconstructed HOA signal can be analyzed and filtered through the analysis filter, so as to obtain the low-order part of the analyzed and filtered HOA signal. stage part. Based on the low order portion of the analysis filtered HOA signal, determining a high order gain for the current frame, and determining the analysis filtered high order portion based on the low order portion and the high order gain of the analysis filtered HOA signal.
  • the low-order part and the high-order part of the analyzed and filtered HOA signal are subjected to synthesis filtering through the synthesis filter, so as to obtain the reconstructed HOA signal of the current frame output by the synthesis filter. That is, a delay is added to the current frame by analyzing the synthesis filter.
  • the analysis and synthesis filter is the same as the analysis and synthesis filter used in the DirAC-based HOA codec scheme, so that the time delay increased after the first HOA signal of the current frame is processed by the same analysis and synthesis filter is different from
  • the processing delay of the analysis and synthesis filter in the DirAC-based HOA coding scheme is consistent, so that the decoding delay of the current frame is consistent with that of the DirAC-based HOA decoding scheme. For example, if the delay added by the analysis and synthesis filtering is 5 ms, then the HOA signal of the current frame will be output 5 ms later than that without the analysis and synthesis filtering, so as to achieve the purpose of delay alignment.
  • the analysis and synthesis filter may be a complex domain low delay filter bank (complex domain low delay filter bank, CLDFB) or other filters with delay characteristics.
  • CLDFB complex domain low delay filter bank
  • the decoder first reconstructs the first HOA signal according to the code stream according to the second decoding scheme, and the reconstructed first HOA signal is the reconstructed first HOA signal. Two audio signals. Then, the decoder performs alignment processing on the reconstructed second audio signal to obtain the reconstructed HOA signal of the current frame.
  • the decoding end reconstructs the first HOA signal according to the code stream.
  • the signal and the residual signal are fed into an MP-based spatial decoder to reconstruct the first HOA signal.
  • the process of reconstructing the first HOA signal according to the code stream at the decoding end according to the second decoding scheme corresponds to the process of encoding the HOA signal of the current frame into the code stream according to the second coding scheme at the encoding end, and the first The virtual loudspeaker signal and residual signal in the two-codec scheme are different from the virtual loudspeaker signal and residual signal in the switching frame coding scheme.
  • the decoding end performs alignment processing on the reconstructed second audio signal to obtain the reconstructed HOA signal of the current frame.
  • Circular cache processing for pattern alignment to align latency.
  • the delay alignment processing based on the analysis synthesis filter and the delay alignment processing based on the circular buffer will be introduced respectively.
  • the decoding end after reconstructing the first HOA signal, the decoding end performs analysis, synthesis and filtering processing on the reconstructed first HOA signal, so as to obtain the reconstructed HOA signal of the current frame. That is to say, for the current frame encoded by the MP-based HOA coding scheme, the decoding end first uses the second decoding scheme to reconstruct the HOA signal of the current frame based on the code stream, that is, reconstructs the first HOA signal, and then performs analysis and synthesis filtering to obtain Perform timing alignment.
  • the decoding end inputs the reconstructed first HOA signal into the analysis and synthesis filter, so as to obtain the reconstructed HOA signal of the current frame output by the analysis and synthesis filter. That is, a delay is added to the current frame by analyzing the synthesis filter.
  • the analysis and synthesis filter is the same as the analysis and synthesis filter used in the DirAC-based HOA decoding scheme, so that the time delay increased after the first HOA signal of the current frame is processed by the same analysis and synthesis filter is different from that based on
  • the processing delay of the analysis and synthesis filter is consistent, so that the decoding delay of the current frame is consistent with that of the HOA decoding scheme based on DirAC.
  • the analysis and synthesis filter may be a complex-domain low-delay filter bank (CLDFB) or other filters with delay characteristics.
  • the decoding end can also decode the MP-based HOA decoding scheme.
  • the high-order portion of the HOA signal is gain-adjusted so that the energy of the gain-adjusted high-order portion is increased.
  • the decoding end performs analysis and filtering processing on the reconstructed first HOA signal to obtain the second HOA signal.
  • the decoding end performs gain adjustment on the high-order part of the second HOA signal to obtain a gain-adjusted high-order part.
  • the decoding end performs synthesis filtering processing on the low-order part and the gain-adjusted high-order part of the second HOA signal to obtain the reconstructed HOA signal of the current frame.
  • the alignment processing may be considered to include high-order gain adjustment and time delay alignment processing based on analysis and synthesis filtering.
  • the decoding terminal according to the decoding scheme of the current frame
  • the high-order gain of the previous frame is used to perform gain adjustment on the high-order part of the second HOA signal to obtain a gain-adjusted high-order part.
  • the decoder can use the high-order gain of the switching frame before the audio frame to adjust the HOA signal of the audio frame
  • the high-order part so that the energy of the high-order part of the reconstructed HOA signal of the audio frame finally obtained is similar to the energy of the high-order part of the reconstructed HOA signal of the switching frame, so as to realize a smooth transition of auditory quality.
  • the audio frame located after the switching frame and adjacent to the switching frame in the decoding process can be called an MP decoding high-order gain adjustment frame, and the decoder needs to perform high-order gain adjustment on the MP decoding high-order gain adjustment frame and based on Delay-aligned processing of analytical synthesis filtering.
  • the high-order gain used for high-order gain adjustment may be the high-order gain of the previous frame, or may be a high-order gain obtained by other methods. This application The embodiment does not limit this.
  • the decoding end can also pass the high
  • the order gain performs gain adjustment on the high-order part of the second HOA signal of the current frame to obtain a gain-adjusted high-order part.
  • the high-order gain can be the high-order gain of the previous frame of the current frame, or can be based on the high-order gain of the previous frame and the preset gain
  • the adjustment function is determined, and may also be determined by other methods.
  • the decoder may also perform gain adjustment on other parts of the HOA signal of the audio frame whose decoding scheme is the second decoding scheme. That is, the embodiment of the present application does not limit which channel signals of the HOA signal are to be adjusted for gain.
  • the decoder can adjust the gain of any one or more channels in the HOA signal.
  • the channel for gain adjustment can include all or part of the higher-order channels, or the remaining channels except the specified channel. All or some of the channels, or other channels.
  • the decoding end analyzes and filters the reconstructed first HOA signal to obtain the second HOA signal, and the second HOA signal Signals of one or more remaining channels are subjected to gain adjustment to obtain gain-adjusted signals of one or more remaining channels.
  • one or more remaining channels are channels other than the specified channel in the HOA signal.
  • the decoding end performs synthesis filtering processing on the signals of the specified channel in the second HOA signal and the signals of one or more remaining channels after gain adjustment, so as to obtain the reconstructed HOA signal of the current frame.
  • the decoder performs one or more Gain adjustments are performed on signals of the remaining channels to obtain gain-adjusted signals of one or more remaining channels. That is, for the HOA signal of the audio frame encoded and decoded by the second decoding scheme, the decoding end performs gain adjustment on the signals of the remaining channels except the designated channel. If the previous frame of the current frame is a switching frame, the decoding end will adjust the gain of the signal of the remaining channels of the current frame based on the gain of the remaining channels of the switching frame, so that the signal strength of the remaining channels of the current frame can be close to that of the switching frame The signal strength of the remaining channels makes the transition of auditory quality smoother.
  • the decoding end may perform delay alignment through analysis-synthetic filtering-based delay alignment processing.
  • an implementation process of implementing circular buffer-based delay alignment processing for the current frame whose decoding scheme is the second decoding scheme is introduced.
  • the decoder after the decoder reconstructs the first HOA signal, if the decoding scheme of the current frame is the second decoding scheme and the decoding scheme of the previous frame of the current frame is the second decoding scheme, that is, the previous frame of the current frame If one frame is a non-switching frame, the decoder performs circular buffer processing on the reconstructed first HOA signal to obtain the reconstructed HOA signal of the current frame.
  • the decoding end may perform delay alignment based on circular buffering. However, for the current frame whose decoding scheme is the second decoding scheme and the previous frame is the switching frame, the decoding end still performs delay alignment based on the delay alignment processing of the analysis synthesis filter.
  • the decoder performs circular buffer processing on the reconstructed first HOA signal to obtain the reconstructed HOA signal of the current frame.
  • the first data is the data between the first moment and the end moment of the previous frame HOA signal in the previous frame HOA signal of the current frame
  • the duration between the first moment and the end moment is the first duration, that is The first moment is a moment before the end moment and a first duration from the end moment
  • the first duration is equal to a coding delay difference between the first decoding scheme and the second decoding scheme.
  • the second data is the data between the start moment and the second moment of the reconstructed first HOA signal in the reconstructed first HOA signal, and the duration between the second moment and the start moment is the second duration, That is, the second moment is a moment after the starting moment and a second duration from the starting moment, and the sum of the first duration and the second duration is equal to the frame length of the current frame.
  • the previous frame of the current frame is also an audio frame encoded by the MP-based HOA coding scheme, that is, the decoding scheme of the previous frame of the current frame is also the second decoding scheme.
  • a first HOA signal needs to be reconstructed first.
  • the HOA signal of the previous frame of the current frame mentioned in the circular buffering process refers to the reconstructed first HOA signal of the previous frame.
  • the third data is cached, and the third data is the reconstructed first HOA signal except the second data The data.
  • the third data is used for decoding of a frame after the current frame.
  • the decoder acquires the buffered 5ms data, and combines the 5ms data with the first 15ms data of the reconstructed first HOA signal of the current frame to obtain the reconstructed HOA signal of the current frame.
  • the decoding end also buffers the tail 5 ms data of the reconstructed first HOA signal of the current frame for decoding of a frame after the current frame.
  • the decoder rebuilds For the first HOA signal of the i+1th frame, obtain the buffered 5ms data, and merge the obtained 5ms data with the first 15ms data of the reconstructed first HOA signal of the i+1th frame to obtain the i+1th frame Reconstruct the HOA signal.
  • the decoder obtains the buffered 5ms data, and in the process of decoding the switching frame based on the analysis and synthesis filter processing, this The 5ms data is combined with the first 15ms data corresponding to the i+1th frame after being processed in the analysis and synthesis filter as the reconstructed HOA signal of the current frame.
  • the decoding end decodes the switching frame according to the switching frame decoding scheme, that is, the remaining channel signal reconstruction (such as high-order partial reconstruction) and Delay alignment processing based on analysis-by-synthesis filtering.
  • the decoding end performs delay alignment processing based on analysis and synthesis filtering, and optionally, high-order gain adjustment may also be performed.
  • the decoding end decodes the switching frame according to the switching frame decoding scheme.
  • the decoding end For an audio frame whose decoding scheme is the second decoding scheme and the previous frame is a switching frame, the decoding end performs delay alignment processing based on analysis and synthesis filtering, and optionally, high-order gain adjustment may also be performed.
  • the decoding end For an audio frame whose decoding scheme is the second decoding scheme and the previous frame is not a switching frame, the decoding end performs delay alignment processing based on a circular buffer.
  • the decoder performs delay alignment processing based on analysis and synthesis filtering or delay alignment processing based on circular buffering.
  • FIG. 9 is a schematic diagram of coding scheme switching provided by an embodiment of the present application.
  • the current frame is a switching frame, and the current frame is coded based on the MP-W HOA coding scheme (that is, the switching frame coding scheme).
  • the previous frame of the current frame is a DirAC coded frame, and the previous frame is coded based on the DirAC HOA coding scheme.
  • the next frame of the current frame is an MP coded frame, and the next frame is coded based on the HOA coding scheme of MP.
  • the switching state of the switching frame shown in FIG. 9 is the first switching state, and the first switching state refers to the state of switching from the DirAC-based HOA coding scheme to the MP-based HOA coding scheme.
  • the DirAC coded frame refers to an audio frame whose coding scheme is the first coding scheme
  • the MP coded frame refers to an audio frame whose coding scheme is the second coding scheme.
  • FIG. 10 is a schematic diagram of decoding of a coding scheme switching provided by an embodiment of the present application.
  • FIG. 10 shows the decoding process when the switching state of the switching frame is the first switching state as shown in FIG. 9 .
  • the current frame is a switching frame, and the current frame is decoded based on the HOA decoding scheme of MP-W.
  • the previous frame of the current frame is a DirAC decoded frame, and the previous frame is decoded based on the HOA decoding scheme of DirAC.
  • the next frame of the current frame is an MP decoding high-order gain adjustment frame, which is decoded based on the HOA decoding scheme of MP, and performs delay alignment processing and high-order gain adjustment based on analysis and synthesis filtering.
  • the MP-based HOA decoding scheme is used for decoding, and the delay alignment process based on analysis and synthesis filtering is performed.
  • the DirAC decoded frame refers to an audio frame whose decoding scheme is the first decoding scheme
  • the MP decoded frame refers to an audio frame whose decoding scheme is the second decoding scheme.
  • FIG. 11 is a schematic diagram of decoding another coding scheme switching provided by the embodiment of the present application.
  • FIG. 11 shows the decoding process when the switching state of the switching frame is the first switching state as shown in FIG. 9 .
  • the decoding process shown in FIG. 11 is different from the decoding process shown in FIG. 10 in that the MP decoding frame located before the next switching frame after the MP decoding high-order gain adjustment frame, that is, the subsequent MP decoding frame , perform decoding based on the MP-based HOA decoding scheme, and perform delay alignment processing based on a circular buffer.
  • the switching state of the switching frame is the first switching state, that is, it is necessary to switch from the DirAC-based HOA coding scheme to the MP-based HOA coding scheme, that is, it is necessary to switch from the large Switching from delay to small delay
  • the MP-based HOA decoding scheme itself has a small decoding delay and does not include delay alignment processing
  • the switching frame coding scheme provided by this scheme itself includes delay alignment processing.
  • the switching state of the switching frame is the second switching state, that is, it needs to be switched from small delay to large delay, since the DirAC-based HOA decoding scheme itself has a large delay, there is no need to decode the DirAC frame after the switching frame for additional processing.
  • the decoding of the current frame according to the first decoding scheme Just stream.
  • first reconstruct the second audio signal according to the code stream and then perform alignment processing on the reconstructed second audio signal to obtain the reconstructed HOA signal of the current frame, that is, through alignment processing
  • Fig. 12 is a schematic structural diagram of a decoding device 1200 provided by an embodiment of the present application.
  • the decoding device 1200 can be implemented by software, hardware or a combination of the two to become part or all of the decoding end device.
  • the decoding end device can be the above-mentioned implementation Any decoder device in the example.
  • the decoding device 1200 includes: a first determination module 1201 , a first decoding module 1202 and a second decoding module 1203 .
  • the first determining module 1201 is configured to determine the decoding scheme of the current frame according to the code stream, the decoding scheme of the current frame is the first decoding scheme or a non-first decoding scheme, and the first decoding scheme is a high-order stereo mixing based on directional audio coding DirAC Response to HOA decoding scheme;
  • the first decoding module 1202 is configured to reconstruct the first audio signal according to the code stream according to the first decoding scheme if the decoding scheme of the current frame is the first decoding scheme, and the reconstructed first audio signal is the reconstructed HOA signal of the current frame;
  • the second decoding module 1203 is configured to reconstruct the second audio signal according to the code stream according to the non-first decoding scheme if the decoding scheme of the current frame is not the first decoding scheme, and perform alignment processing on the reconstructed second audio signal, so as to The reconstructed HOA signal of the current frame is obtained, and the alignment process makes the decoding delay of the current frame consistent with the decoding delay of the first decoding scheme.
  • the non-first decoding scheme is a second decoding scheme or a third decoding scheme
  • the second decoding scheme is an HOA decoding scheme based on virtual speaker selection
  • the third decoding scheme is a hybrid decoding scheme
  • the second decoding module 1203 includes:
  • the first reconstruction sub-module is used to reconstruct the signal of the specified channel according to the code stream if the decoding scheme of the current frame is the third decoding scheme, the reconstructed signal of the specified channel is the reconstructed second audio signal, and the specified channel is the signal of the current frame Some of all channels of the HOA signal.
  • the second decoding module 1203 includes:
  • the analysis and filtering sub-module is used to analyze and filter the signal of the reconstructed specified channel
  • the first determination submodule is used to determine the gain of one or more remaining channels in the HOA signal of the current frame except for the specified channel based on the analyzed and filtered signal of the specified channel;
  • the second determining submodule is used to determine the signal of one or more remaining channels based on the gain of the one or more remaining channels and the signal of the specified channel after analysis and filtering;
  • the synthesis filter sub-module is configured to perform synthesis filter processing on the signal of the designated channel after analysis and filtering and the signals of the one or more remaining channels, so as to obtain the reconstructed HOA signal of the current frame.
  • the non-first decoding scheme is a second decoding scheme or a third decoding scheme
  • the second decoding scheme is an HOA decoding scheme based on virtual speaker selection
  • the third decoding scheme is a hybrid decoding scheme
  • the second decoding module 1203 includes:
  • the second reconstruction sub-module is configured to reconstruct the first HOA signal according to the code stream according to the second decoding scheme if the decoding scheme of the current frame is the second decoding scheme, and the reconstructed first HOA signal is the reconstructed second audio signal.
  • the second decoding module 1203 includes:
  • the analysis-synthesis filter sub-module is configured to perform analysis-synthesis filter processing on the reconstructed first HOA signal to obtain the reconstructed HOA signal of the current frame.
  • analysis-synthesis filtering submodule is used to:
  • Gain adjustment is performed on signals of one or more remaining channels in the second HOA signal to obtain signals of one or more remaining channels after gain adjustment, and the one or more remaining channels are signals other than specified channels in the HOA signal aisle;
  • Synthesis and filtering are performed on the signals of the specified channel in the second HOA signal and the signals of one or more remaining channels after gain adjustment, so as to obtain the reconstructed HOA signal of the current frame.
  • analysis-synthesis filtering submodule is used to:
  • the signals of one or more remaining channels in the second HOA signal are amplified according to the gains of one or more remaining channels of the previous frame of the current frame Adjust to obtain a gain-adjusted signal of one or more remaining channels.
  • the specified channel includes a first-order ambisonic FOA channel.
  • the decoding scheme of the previous frame of the current frame is the second decoding scheme
  • the second decoding module 1203 includes:
  • the circular buffer submodule is configured to perform circular buffer processing on the reconstructed first HOA signal to obtain the reconstructed HOA signal of the current frame.
  • the circular cache submodule is used to:
  • the first data is the data between the first moment and the end moment of the previous frame HOA signal in the previous frame of the current frame HOA signal, and the duration between the first moment and the end moment is the first duration , the first duration is equal to the encoding delay difference between the first decoding scheme and the second decoding scheme;
  • the first data and the second data are combined to obtain the reconstructed HOA signal of the current frame
  • the second data is the data located between the starting moment and the second moment of the reconstructed first HOA signal in the reconstructed first HOA signal
  • the duration between the second moment and the start moment is the second duration
  • the sum of the first duration and the second duration is equal to the frame length of the current frame.
  • the circular cache submodule is used to:
  • the third data is cached, where the third data is data other than the second data in the reconstructed first HOA signal.
  • the first determining module 1201 includes:
  • the first parsing submodule is used to parse out the value of the switching flag of the current frame from the code stream;
  • the second parsing submodule is used to parse the indication information of the decoding scheme of the current frame from the code stream if the value of the switching flag is the first value, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme
  • the second decoding scheme is an HOA decoding scheme based on virtual speaker selection
  • the third determining submodule is configured to determine that the decoding scheme of the current frame is the third decoding scheme, and the third decoding scheme is the hybrid decoding scheme, if the value of the switching flag is the second value.
  • the first determining module 1201 includes:
  • the third parsing submodule is used to parse out the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.
  • the second decoding scheme is an HOA decoding scheme based on virtual speaker selection
  • the third decoding scheme is a hybrid decoding scheme.
  • the first determining module 1201 includes:
  • the fourth analysis sub-module is used to analyze the initial decoding scheme of the current frame from the code stream, the initial decoding scheme is the first decoding scheme or the second decoding scheme, and the second decoding scheme is the HOA decoding scheme selected based on the virtual speaker;
  • the fourth determining submodule is used to determine that the decoding scheme of the current frame is the initial decoding scheme of the current frame if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame;
  • the fifth determining submodule is used to determine if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and The initial decoding scheme of the previous frame of the current frame is the first decoding scheme, then it is determined that the decoding scheme of the current frame is the third decoding scheme, and the third decoding scheme is the hybrid decoding scheme.
  • the decoding delay of the DirAC-based HOA decoding scheme is relatively large, for the current frame encoded by the first encoding scheme, it is sufficient to decode the code stream of the current frame according to the first decoding scheme.
  • first reconstruct the second audio signal according to the code stream and then perform alignment processing on the reconstructed second audio signal to obtain the reconstructed HOA signal of the current frame, that is, through alignment processing
  • the decoding device provided in the above embodiment decodes audio frames, it only uses the division of the above-mentioned functional modules as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the decoding device and the decoding method embodiments provided in the above embodiments belong to the same idea, and the specific implementation process thereof is detailed in the method embodiments, and will not be repeated here.
  • FIG. 13 is a schematic block diagram of a codec device 1300 used in an embodiment of the present application.
  • the codec apparatus 1300 may include a processor 1301 , a memory 1302 and a bus system 1303 .
  • the processor 1301 and the memory 1302 are connected through the bus system 1303, the memory 1302 is used to store instructions, and the processor 1301 is used to execute the instructions stored in the memory 1302 to perform various encoding or decoding described in the embodiments of this application. method. To avoid repetition, no detailed description is given here.
  • the processor 1301 can be a central processing unit (central processing unit, CPU), and the processor 1301 can also be other general-purpose processors, DSP, ASIC, FPGA or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 1302 may include a ROM device or a RAM device. Any other suitable type of storage device may also be used as memory 1302 .
  • Memory 1302 may include code and data 13021 accessed by processor 1301 using bus 1303 .
  • the memory 1302 may further include an operating system 13023 and an application program 13022, where the application program 13022 includes at least one program that allows the processor 1301 to execute the encoding or decoding method described in the embodiment of this application.
  • the application program 13022 may include applications 1 to N, which further include an encoding or decoding application (codec application for short) that executes the encoding or decoding method described in the embodiment of this application.
  • the bus system 1303 may include not only a data bus, but also a power bus, a control bus, and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 1303 in the figure.
  • the codec apparatus 1300 may also include one or more output devices, such as a display 1304 .
  • display 1304 may be a touch-sensitive display that incorporates a display with a haptic unit operable to sense touch input.
  • the display 1304 may be connected to the processor 1301 via the bus 1303 .
  • codec apparatus 1300 may implement the encoding method in the embodiment of the present application, and may also implement the decoding method in the embodiment of the present application.
  • Computer-readable media may include computer-readable storage media, which correspond to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, based on a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this application.
  • a computer program product may include a computer readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can contain the desired program code in the form of a computer and can be accessed by a computer.
  • any connection is properly termed a computer-readable medium.
  • coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave
  • coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • Disk and disc includes compact disc (CD), laser disc, optical disc, DVD and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • the term "processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functionality described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in conjunction with into the combined codec.
  • the techniques may be fully implemented in one or more circuits or logic elements.
  • various illustrative logical blocks, units, and modules in the encoder 100 and the decoder 200 may be understood as corresponding circuit devices or logic elements.
  • inventions of the present application may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a group of ICs (eg, a chipset).
  • IC integrated circuit
  • a group of ICs eg, a chipset
  • Various components, modules or units are described in the embodiments of the present application to emphasize the functional aspects of the apparatus for performing the disclosed technology, but they do not necessarily need to be realized by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (comprising one or more processors as described above) to supply.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (eg coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or may be a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example: floppy disk, hard disk, magnetic tape), an optical medium (for example: digital versatile disc (digital versatile disc, DVD)) or a semiconductor medium (for example: solid state disk (solid state disk, SSD)) wait.
  • a magnetic medium for example: floppy disk, hard disk, magnetic tape
  • an optical medium for example: digital versatile disc (digital versatile disc, DVD)
  • a semiconductor medium for example: solid state disk (solid state disk, SSD)
  • the information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • All signals are authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.
  • the audio signals involved in the embodiments of the present application are all obtained under the condition of full authorization.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Stereophonic System (AREA)

Abstract

一种解码方法、装置、设备、存储介质及计算机程序产品,属于音频处理技术领域。由于基于DirAC的HOA解码方案的解码时延较大,对于通过第一编码方案编码的当前帧来说,按照第一解码方案解码当前帧的码流即可(802)。对于不是通过第一编码方案编码的当前帧来说,先根据码流重建第二音频信号,再对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号,也即通过对齐处理使得当前帧的解码时延与第一解码方案的解码时延一致(803)。这样,采用该方法能够使各个音频帧的解码时延均一致,也即保证时延对齐以使不同的编解码方案之间能够很好地切换。

Description

解码方法、装置、设备、存储介质及计算机程序产品
本申请要求于2021年9月29日提交的申请号为202111155351.6、发明名称为“解码方法、装置、设备、存储介质及计算机程序产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及音频处理技术领域,特别涉及一种解码方法、装置、设备、存储介质及计算机程序产品。
背景技术
高阶立体混响(higher order ambisonics,HOA)技术作为一种三维音频技术,因其在进行三维音频回放时具有更高的灵活性,因而得到了广泛的关注。为了实现更好的听觉效果,HOA技术需要大量的数据记录详细的声音场景信息。但随着HOA阶数的增加将会产生更多的数据,大量的数据造成传输和存储的困难。因此如何对HOA信号进行编解码成为目前重点关注的问题。
相关技术提出了两种对HOA信号进行编解码的方案。其中一种方案为基于方向音频编码(directional audio coding,DirAC)的编解码方案。在该方案中,编码端从当前帧的HOA信号中提取核心层信号和空间参数,将提取的核心层信号和空间参数编入码流。解码端从码流中解码出核心层信号和空间参数,对核心层信号和空间参数进行分析合成滤波处理,以重建出当前帧的HOA信号。另一种方案为基于虚拟扬声器选择的编解码方案。在该方案中,编码端基于匹配投影(match-projection,MP)算法从虚拟扬声器集合中选择与当前帧的HOA信号匹配的目标虚拟扬声器,基于当前帧的HOA信号和目标虚拟扬声器,确定虚拟扬声器信号,基于当前帧的HOA信号和虚拟扬声器信号确定残差信号,将虚拟扬声器信号和残差信号编入码流。解码端采用与编码对称的解码方法从码流中重建出当前帧的HOA信号。
然而,对于声场中相异性声源较少的情况,基于虚拟扬声器选择的编解码方案的压缩率较高,对于声场中相异性声源较多的情况,基于DirAC的编解码方案的压缩率较高。其中,相异性声源指声源的位置和/或方向不同的点声源。而不同音频帧的声场类型(与声场中相异性声源相关)可能不同,如果想要同时满足对不同声场类型下的音频帧均有较高的压缩率,需要根据各音频帧的声场类型为相应音频帧选择合适的编解码方案,这样就需要在不同的编解码方案之间进行切换。但不同编解码方案的解码时延不同,例如由于在基于DirAC的编解码方案中需要进行分析合成滤波处理,导致基于DirAC的编解码方案的解码时延高于基于虚拟扬声器选择的编解码方案。在不同的编解码方案之间进行切换的情况下,如何解决时延不同的问题是当前需要研究的重点。
发明内容
本申请实施例提供了一种解码方法、装置、设备、存储介质及计算机程序产品,能够解 码在不同的编解码方案之间进行切换情况下时延不同的问题。所述技术方案如下:
第一方面,提供了一种解码方法,该方法包括:
根据码流确定当前帧的解码方案,当前帧的解码方案为第一解码方案或非第一解码方案,第一解码方案为基于DirAC的HOA解码方案;若当前帧的解码方案为第一解码方案,则解码端按照第一解码方案,根据码流重建第一音频信号,重建的第一音频信号为当前帧的重建HOA信号;若当前帧的解码方案为非第一解码方案,则解码端按照非第一解码方案,根据码流重建第二音频信号,对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号,对齐处理使得当前帧的解码时延与第一解码方案的解码时延一致。
也即是,由于基于DirAC的HOA解码方案的解码时延较大,对于通过第一编码方案编码的当前帧来说,按照第一解码方案解码当前帧即可。对于不是通过第一编码方案编码的当前帧来说,需要通过对齐处理使得当前帧的解码时延与第一解码方案的解码时延一致。其中,由于DirAC解码方案的解码时延是固定的,因此可以通过对齐处理使得当前帧的解码时延与第一解码方案(即DirAC解码方案)的解码时延一致。一般来说,可以在对齐处理中增加时延来实现当前帧的解码时延与第一解码方案(即DirAC解码方案)的解码时延一致。其中,第一编码方案与第一解码方案对应,即若第一解码方案为DirAC解码方案,则第一编码方案为DirAC编码方案;相应地,第二编码方案与第二解码方案对应,第三编码方案也与第三解码方案对应。
可选地,解码端根据码流确定当前帧的解码方案,包括:从码流中解析出当前帧的切换标志的值;若切换标志的值为第一值,则从码流中解析当前帧的解码方案的指示信息,指示信息用于指示当前帧的解码方案为第一解码方案或第二解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案(可以简称为基于MP的HOA解码方案);若切换标志的值为第二值,则确定当前帧的解码方案为第三解码方案,第三解码方案为混合解码方案。需要说明的是,混合解码方案是本申请实施例针对切换帧所设计的方案,切换帧的前一帧和后一帧的编解码方案不同。码流中包含切换标志,切换标志的值为第一值,则指示当前帧为非切换帧,切换帧的值为第二值,则指示当前帧为切换。解码端先从码流中解析出切换标志的值,在基于切换标志的值确定当前帧不是切换帧的情况下,再从码流中解析出当前帧的解码方案的指示信息,以确定到底是第一解码方案还是第二解码方案。可见,解码端能够基于切换标志直接判断当前帧是否为切换帧,解码效率较高。其中,混合解码方案是指在解码过程中既会使用第一解码方案(即DirAC解码方案)相关的技术手段,也会使用第二解码方案(基于MP的HOA解码方案)相关的技术手段,所以叫混合解码方案。
可选地,解码端根据码流确定当前帧的解码方案,包括:从码流中解析出当前帧的解码方案的指示信息,指示信息用于指示当前帧的解码方案为第一解码方案、第二解码方案或第三解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案。也即是,码流中直接包含解码方案的指示信息,这样,解码端基于该指示信息直接确定当前帧的解码方案,解码效率也较高。
可选地,解码端根据码流确定当前帧的解码方案,包括:从码流中解析出当前帧的初始解码方案,初始解码方案为第一解码方案或第二解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案;若当前帧的初始解码方案与当前帧的前一帧的初始解码方案相同,则确定当前帧的解码方案为当前帧的初始解码方案;若当前帧的初始解码方案为第一解码方 案且当前帧的前一帧的初始解码方案为第二解码方案,或当前帧的初始解码方案为第二解码方案且当前帧的前一帧的初始解码方案为第一解码方案,则确定当前帧的解码方案为第三解码方案,第三解码方案为混合解码方案。也即是,码流中包含初始解码方案的指示信息,解码端通过对比当前帧的初始解码方案与前一帧的初始解码方案,来判断当前帧是否为切换帧,切换的解码方案为第三解码方案,非切换帧的解码方案为非切换帧的初始解码方案。
可选地,非第一解码方案为第二解码方案或第三解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案;若当前帧的解码方案为第三解码方案,根据码流重建第二音频信号,包括:根据码流重建指定通道的信号,重建的指定通道的信号为重建的第二音频信号,指定通道为当前帧的HOA信号的所有通道中的部分通道。也即是,对于采用第三解码方案进行解码的切换帧来说,解码端根据码流重建的是指定通道的信号,并非完整的HOA信号。
可选地,若当前帧的解码方案为第三解码方案,则解码端对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号,包括:对重建的指定通道的信号进行分析滤波处理;基于经分析滤波的指定通道的信号,确定当前帧的HOA信号中除指定通道之外的一个或多个剩余通道的增益;基于该一个或多个剩余通道的增益和经分析滤波的指定通道的信号,确定一个或多个剩余通道的信号;对经分析滤波的指定通道的信号和该一个或多个剩余通道的信号进行合成滤波处理,以得到当前帧的重建HOA信号。也即是,对于切换帧来说,解码端需要重构除指定通道之外的剩余通道的信号,并通过分析合成滤波处理,使得当前帧的解码时延增加到与第一解码方案的解码时延一致。
可选地,非第一解码方案为第二解码方案或第三解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案;若当前帧的解码方案为第二解码方案,则解码端根据码流重建第二音频信号,包括:按照第二解码方案,根据码流重建第一HOA信号,重建的第一HOA信号为重建的第二音频信号。也即是,对于通过第二编码方案编码的音频帧来说,解码端先按照第二解码方案重建第一HOA信号。
可选地,解码端对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号,包括:对重建的第一HOA信号进行分析合成滤波处理,以得到当前帧的重建HOA信号。也即是,解码端按照第二解码方案重建第一HOA信号之后,通过分析合成滤波处理进行时延对齐。
可选地,解码端对重建的第一HOA信号进行分析合成滤波处理,以得到当前帧的重建HOA信号,包括:对重建的第一HOA信号进行分析滤波处理,以得到第二HOA信号;对第二HOA信号中一个或多个剩余通道的信号进行增益调整,以得到经增益调整的一个或多个剩余通道的信号,该一个或多个剩余通道为HOA信号中除指定通道之外的通道;对第二HOA信号中指定通道的信号,以及经增益调整的一个或多个剩余通道的信号进行合成滤波处理,以得到当前帧的重建HOA信号。也即是,对于通过第二编码方案编码的音频帧来说,在通过分析合成滤波处理进行时延对齐的过程中,还能够通过增益调整使得听觉质量平滑过渡。
可选地,解码端对第二HOA信号中一个或多个剩余通道的信号进行增益调整,以得到经增益调整的一个或多个剩余通道的信号,包括:若当前帧的前一帧的解码方案为第三解码方案,则根据当前帧的前一帧的一个或多个剩余通道的增益,对第二HOA信号中一个或多 个剩余通道的信号进行增益调整,以得到经增益调整的一个或多个剩余通道的信号。也即是,若当前帧的前一帧为切换帧,则解码端根据该切换帧的剩余通道增益来调整当前帧的剩余通道的信号,使得当前帧的听觉质量与前一帧的听觉质量相近,以实现平滑过渡。
可选地,指定通道包括一阶立体混响(first-order ambisonics,FOA)通道。可选地,指定通道与第一解码方案中预设的通道一致。
可选地,当前帧的前一帧的解码方案为第二解码方案;解码端对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号,包括:对重建的第一HOA信号进行循环缓存处理,以得到当前帧的重建HOA信号。也即是,对若当前帧的解码方案为第二解码方案但当前帧的前一帧为非切换帧,则解码端也可以通过循环缓存处理实现时延对齐。
可选地,解码端对重建的第一HOA信号进行循环缓存处理,以得到当前帧的重建HOA信号,包括:获取第一数据,第一数据为当前帧的前一帧HOA信号中位于第一时刻与前一帧HOA信号的结束时刻之间的数据,第一时刻与结束时刻之间的时长为第一时长,第一时长等于第一解码方案与第二解码方案之间的编码时延差;将第一数据和第二数据进行合并,以得到当前帧的重建HOA信号,第二数据为重建的第一HOA信号中位于重建的第一HOA信号的起始时刻与第二时刻之间的数据,第二时刻与起始时刻之间的时长为第二时长,第一时长与第二时长之和等于当前帧的帧长。也即是,循环缓存处理实质上是通过数据缓存的方式以实现时延对齐。
可选地,该方法还包括:缓存第三数据,第三数据为重建的第一HOA信号中除第二数据之外的数据。也即是,缓存第三数据用于当前帧的下一帧的解码。
第二方面,提供了一种解码装置,所述解码装置具有实现上述第一方面中解码方法行为的功能。所述解码装置包括一个或多个模块,该一个或多个模块用于实现上述第一方面所提供的解码方法。
第一确定模块,用于根据码流确定当前帧的解码方案,当前帧的解码方案为第一解码方案或非第一解码方案,第一解码方案为基于方向音频编码DirAC的高阶立体混响HOA解码方案;
第一解码模块,用于若当前帧的解码方案为第一解码方案,则按照第一解码方案,根据码流重建第一音频信号,重建的第一音频信号为当前帧的重建HOA信号;
第二解码模块,用于若当前帧的解码方案为非第一解码方案,则按照非第一解码方案,根据码流重建第二音频信号,对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号,对齐处理使得当前帧的解码时延与第一解码方案的解码时延一致。
可选地,非第一解码方案为第二解码方案或第三解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案;
第二解码模块,包括:
第一重建子模块,用于若当前帧的解码方案为第三解码方案,则根据码流重建指定通道的信号,重建的指定通道的信号为重建的第二音频信号,指定通道为当前帧的HOA信号的所有通道中的部分通道。
可选地,第二解码模块,包括:
分析滤波子模块,用于对重建的指定通道的信号进行分析滤波处理;
第一确定子模块,用于基于经分析滤波的指定通道的信号,确定当前帧的HOA信号中除指定通道之外的一个或多个剩余通道的增益;
第二确定子模块,用于基于该一个或多个剩余通道的增益和经分析滤波的指定通道的信号,确定一个或多个剩余通道的信号;
合成滤波子模块,用于对经分析滤波的指定通道的信号和该一个或多个剩余通道的信号进行合成滤波处理,以得到当前帧的重建HOA信号。
可选地,非第一解码方案为第二解码方案或第三解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案;
第二解码模块,包括:
第二重建子模块,用于若当前帧的解码方案为第二解码方案,则按照第二解码方案,根据码流重建第一HOA信号,重建的第一HOA信号为重建的第二音频信号。
可选地,第二解码模块,包括:
分析合成滤波子模块,用于对重建的第一HOA信号进行分析合成滤波处理,以得到当前帧的重建HOA信号。
可选地,分析合成滤波子模块用于:
对重建的第一HOA信号进行分析滤波处理,以得到第二HOA信号;
对第二HOA信号中一个或多个剩余通道的信号进行增益调整,以得到经增益调整的一个或多个剩余通道的信号,该一个或多个剩余通道为HOA信号中除指定通道之外的通道;
对第二HOA信号中指定通道的信号,以及经增益调整的一个或多个剩余通道的信号进行合成滤波处理,以得到当前帧的重建HOA信号。
可选地,分析合成滤波子模块用于:
若当前帧的前一帧的解码方案为第三解码方案,则根据当前帧的前一帧的一个或多个剩余通道的增益,对第二HOA信号中一个或多个剩余通道的信号进行增益调整,以得到经增益调整的一个或多个剩余通道的信号。
可选地,指定通道包括一阶立体混响FOA通道。
可选地,当前帧的前一帧的解码方案为第二解码方案;
第二解码模块,包括:
循环缓存子模块,用于对重建的第一HOA信号进行循环缓存处理,以得到当前帧的重建HOA信号。
可选地,循环缓存子模块用于:
获取第一数据,第一数据为当前帧的前一帧HOA信号中位于第一时刻与前一帧HOA信号的结束时刻之间的数据,第一时刻与结束时刻之间的时长为第一时长,第一时长等于第一解码方案与第二解码方案之间的编码时延差;
将第一数据和第二数据进行合并,以得到当前帧的重建HOA信号,第二数据为重建的第一HOA信号中位于重建的第一HOA信号的起始时刻与第二时刻之间的数据,第二时刻与起始时刻之间的时长为第二时长,第一时长与第二时长之和等于当前帧的帧长。
可选地,循环缓存子模块用于:
缓存第三数据,第三数据为重建的第一HOA信号中除第二数据之外的数据。
可选地,第一确定模块包括:
第一解析子模块,用于从码流中解析出当前帧的切换标志的值;
第二解析子模块,用于若切换标志的值为第一值,则从码流中解析当前帧的解码方案的指示信息,指示信息用于指示当前帧的解码方案为第一解码方案或第二解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案;
第三确定子模块,用于若切换标志的值为第二值,则确定当前帧的解码方案为第三解码方案,第三解码方案为混合解码方案。
可选地,第一确定模块包括:
第三解析子模块,用于从码流中解析出当前帧的解码方案的指示信息,指示信息用于指示当前帧的解码方案为第一解码方案、第二解码方案或第三解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案。
可选地,第一确定模块包括:
第四解析子模块,用于从码流中解析出当前帧的初始解码方案,初始解码方案为第一解码方案或第二解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案;
第四确定子模块,用于若当前帧的初始解码方案与当前帧的前一帧的初始解码方案相同,则确定当前帧的解码方案为当前帧的初始解码方案;
第五确定子模块,用于若当前帧的初始解码方案为第一解码方案且当前帧的前一帧的初始解码方案为第二解码方案,或当前帧的初始解码方案为第二解码方案且当前帧的前一帧的初始解码方案为第一解码方案,则确定当前帧的解码方案为第三解码方案,第三解码方案为混合解码方案。
第三方面,提供了一种解码端设备,所述解码端设备包括处理器和存储器,所述存储器用于存储执行上述第一方面所提供的解码方法的程序,以及存储用于实现上述第一方面所提供的解码方法所涉及的数据。所述处理器被配置为用于执行所述存储器中存储的程序。所述存储设备的操作装置还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面所述的解码方法。
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的解码方法。
上述第二方面、第三方面、第四方面和第五方面所获得的技术效果与第一方面中对应的技术手段获得的技术效果近似,在这里不再赘述。
本申请实施例提供的技术方案至少能够带来以下有益效果:
在本申请实施例中,由于基于方向音频编码的HOA解码方案的解码时延较大,对于通过第一编码方案编码的当前帧来说,按照第一解码方案解码当前帧的码流即可。对于不是通过第一编码方案编码的当前帧来说,先根据码流重建第二音频信号,再对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号,也即通过对齐处理使得当前帧的解码时延与第一解码方案的解码时延一致。这样,采用本方案能够使各个音频帧的解码时延均一致, 也即保证时延对齐以使不同的编解码方案之间能够很好地切换。
附图说明
图1是本申请实施例提供的一种实施环境的示意图;
图2是本申请实施例提供的一种终端场景的实施环境的示意图;
图3是本申请实施例提供的一种无线或核心网设备的转码场景的实施环境的示意图;
图4是本申请实施例提供的一种广播电视场景的实施环境的示意图;
图5是本申请实施例提供的一种虚拟现实流场景的实施环境的示意图;
图6是本申请实施例提供的一种编码方法的流程图;
图7是本申请实施例提供的另一种编码方法的流程图;
图8是本申请实施例提供的一种解码方法的流程图;
图9是本申请实施例提供的一种编码方案切换的编码示意图;
图10是本申请实施例提供的一种编码方案切换的解码示意图;
图11是本申请实施例提供的另一种编码方案切换的解码示意图;
图12是本申请实施例提供的一种解码装置的结构示意图;
图13是本申请实施例提供的一种编解码装置的示意性框图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在对本申请实施例提供的编解码方法进行详细地解释说明之前,先对本申请实施例涉及的实施环境进行介绍。
请参考图1,图1是本申请实施例提供的一种实施环境的示意图。该实施环境包括源装置10、目的地装置20、链路30和存储装置40。其中,源装置10可以产生经编码的媒体数据。因此,源装置10也可以被称为媒体数据编码装置。目的地装置20可以对由源装置10所产生的经编码的媒体数据进行解码。因此,目的地装置20也可以被称为媒体数据解码装置。链路30可以接收源装置10所产生的经编码的媒体数据,并可以将该经编码的媒体数据传输给目的地装置20。存储装置40可以接收源装置10所产生的经编码的媒体数据,并可以将该经编码的媒体数据进行存储,这样的条件下,目的地装置20可以直接从存储装置40中获取经编码的媒体数据。或者,存储装置40可以对应于文件服务器或可以保存由源装置10产生的经编码的媒体数据的另一中间存储装置,这样的条件下,目的地装置20可以经由流式传输或下载存储装置40存储的经编码的媒体数据。
源装置10和目的地装置20均可以包括一个或多个处理器以及耦合到该一个或多个处理器的存储器,该存储器可以包括随机存取存储器(random access memory,RAM)、只读存储器(read-only memory,ROM)、带电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、快闪存储器、可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体等。例如,源装置10和目的地装置20均可以包括桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、视频游 戏控制台、车载计算机或其类似者。
链路30可以包括能够将经编码的媒体数据从源装置10传输到目的地装置20的一个或多个媒体或装置。在一种可能的实现方式中,链路30可以包括能够使源装置10实时地将经编码的媒体数据直接发送到目的地装置20的一个或多个通信媒体。在本申请实施例中,源装置10可以基于通信标准来调制经编码的媒体数据,该通信标准可以为无线通信协议等,并且可以将经调制的媒体数据发送给目的地装置20。该一个或多个通信媒体可以包括无线和/或有线通信媒体,例如该一个或多个通信媒体可以包括射频(radio frequency,RF)频谱或一个或多个物理传输线。该一个或多个通信媒体可以形成基于分组的网络的一部分,基于分组的网络可以为局域网、广域网或全球网络(例如,因特网)等。该一个或多个通信媒体可以包括路由器、交换器、基站或促进从源装置10到目的地装置20的通信的其它设备等,本申请实施例对此不做具体限定。
在一种可能的实现方式中,存储装置40可以将接收到的由源装置10发送的经编码的媒体数据进行存储,目的地装置20可以直接从存储装置40中获取经编码的媒体数据。这样的条件下,存储装置40可以包括多种分布式或本地存取的数据存储媒体中的任一者,例如,该多种分布式或本地存取的数据存储媒体中的任一者可以为硬盘驱动器、蓝光光盘、数字多功能光盘(digital versatile disc,DVD)、只读光盘(compact disc read-only memory,CD-ROM)、快闪存储器、易失性或非易失性存储器,或用于存储经编码媒体数据的任何其它合适的数字存储媒体等。
在一种可能的实现方式中,存储装置40可以对应于文件服务器或可以保存由源装置10产生的经编码媒体数据的另一中间存储装置,目的地装置20可经由流式传输或下载存储装置40存储的媒体数据。文件服务器可以为能够存储经编码的媒体数据并且将经编码的媒体数据发送给目的地装置20的任意类型的服务器。在一种可能的实现方式中,文件服务器可以包括网络服务器、文件传输协议(file transfer protocol,FTP)服务器、网络附属存储(network attached storage,NAS)装置或本地磁盘驱动器等。目的地装置20可以通过任意标准数据连接(包括因特网连接)来获取经编码媒体数据。任意标准数据连接可以包括无线信道(例如,Wi-Fi连接)、有线连接(例如,数字用户线路(digital subscriber line,DSL)、电缆调制解调器等),或适合于获取存储在文件服务器上的经编码的媒体数据的两者的组合。经编码的媒体数据从存储装置40的传输可为流式传输、下载传输或两者的组合。
图1所示的实施环境仅为一种可能的实现方式,并且本申请实施例的技术不仅可以适用于图1所示的可以对媒体数据进行编码的源装置10,以及可以对经编码的媒体数据进行解码的目的地装置20,还可以适用于其他可以对媒体数据进行编码和对经编码的媒体数据进行解码的装置,本申请实施例对此不做具体限定。
在图1所示的实施环境中,源装置10包括数据源120、编码器100和输出接口140。在一些实施例中,输出接口140可以包括调节器/解调器(调制解调器)和/或发送器,其中发送器也可以称为发射器。数据源120可以包括图像捕获装置(例如,摄像机等)、含有先前捕获的媒体数据的存档、用于从媒体数据内容提供者接收媒体数据的馈入接口,和/或用于产生媒体数据的计算机图形系统,或媒体数据的这些来源的组合。
数据源120可以向编码器100发送媒体数据,编码器100可以对接收到由数据源120发送的媒体数据进行编码,得到经编码的媒体数据。编码器可以将经编码的媒体数据发送给输 出接口。在一些实施例中,源装置10经由输出接口140将经编码的媒体数据直接发送到目的地装置20。在其它实施例中,经编码的媒体数据还可存储到存储装置40上,供目的地装置20以后获取并用于解码和/或显示。
在图1所示的实施环境中,目的地装置20包括输入接口240、解码器200和显示装置220。在一些实施例中,输入接口240包括接收器和/或调制解调器。输入接口240可经由链路30和/或从存储装置40接收经编码的媒体数据,然后再发送给解码器200,解码器200可以对接收到的经编码的媒体数据进行解码,得到经解码的媒体数据。解码器可以将经解码的媒体数据发送给显示装置220。显示装置220可与目的地装置20集成或可在目的地装置20外部。一般来说,显示装置220显示经解码的媒体数据。显示装置220可以为多种类型中的任一种类型的显示装置,例如,显示装置220可以为液晶显示器(liquid crystal display,LCD)、等离子显示器、有机发光二极管(organic light-emitting diode,OLED)显示器或其它类型的显示装置。
尽管图1中未示出,但在一些方面,编码器100和解码器200可各自与编码器和解码器集成,且可以包括适当的多路复用器-多路分用器(multiplexer-demultiplexer,MUX-DEMUX)单元或其它硬件和软件,用于共同数据流或单独数据流中的音频和视频两者的编码。在一些实施例中,如果适用的话,那么MUX-DEMUX单元可符合ITU H.223多路复用器协议,或例如用户数据报协议(user datagram protocol,UDP)等其它协议。
编码器100和解码器200各自可为以下各项电路中的任一者:一个或多个微处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件来实施本申请实施例的技术,那么装置可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一个或多个处理器在硬件中执行所述指令从而实施本申请实施例的技术。前述内容(包括硬件、软件、硬件与软件的组合等)中的任一者可被视为一个或多个处理器。编码器100和解码器200中的每一者都可以包括在一个或多个编码器或解码器中,所述编码器或所述解码器中的任一者可以集成为相应装置中的组合编码器/解码器(编码解码器)的一部分。
本申请实施例可大体上将编码器100称为将某些信息“发信号通知”或“发送”到例如解码器200的另一装置。术语“发信号通知”或“发送”可大体上指代用于对经压缩的媒体数据进行解码的语法元素和/或其它数据的传送。此传送可实时或几乎实时地发生。替代地,此通信可经过一段时间后发生,例如可在编码时在经编码位流中将语法元素存储到计算机可读存储媒体时发生,解码装置接着可在所述语法元素存储到此媒体之后的任何时间检索所述语法元素。
本申请实施例提供的编解码方法可以应用于多种场景,接下来以待编码的媒体数据为HOA信号为例,对其中的几种场景分别进行介绍。
请参考图2,图2是本申请实施例提供的一种编解码方法应用于终端场景的实施环境的示意图。该实施环境包括第一终端101和第二终端201,第一终端101与第二终端201进行通信连接。该通信连接可以为无线连接,也可以为有线连接,本申请实施例对此不做限定。
其中,第一终端101可以为发送端设备,也可以为接收端设备,同理,第二终端201可 以为接收端设备,也可以为发送端设备。例如,在第一终端101为发送端设备的情况下,第二终端201为接收端设备,在第一终端101为接收端设备的情况下,第二终端201为发送端设备。
接下来以第一终端101为发送端设备,第二终端201为接收端设备为例进行介绍。
第一终端101和第二终端201均包括音频采集模块、音频回放模块、编码器、解码器、信道编码模块和信道解码模块。在本申请实施例中,该编码器为一种三维音频编码器,该解码器为一种三维音频解码器。
第一终端101中的音频采集模块采集HOA信号并传输给编码器,编码器利用本申请实施例提供的编码方法对HOA信号进行编码,该编码可以称为信源编码。之后,为了实现HOA信号在信道中的传输,信道编码模块还需要再进行信道编码,然后将编码得到的码流通过无线或者有线网络通信设备在数字信道中传输。
第二终端201通过无线或者有线网络通信设备接收数字信道中传输的码流,信道解码模块对码流进行信道解码,然后解码器利用本申请实施例提供的解码方法解码得到HOA信号,再通过音频回放模块进行播放。
其中,第一终端101和第二终端201可以是任何一种可与用户通过键盘、触摸板、触摸屏、遥控器、语音交互或手写设备等一种或多种方式进行人机交互的电子产品,例如个人计算机(personal computer,PC)、手机、智能手机、个人数字助手(personal digital assistant,PDA)、可穿戴设备、掌上电脑PPC(pocket PC)、平板电脑、智能车机、智能电视、智能音箱等。
本领域技术人员应能理解上述终端仅为举例,其他现有的或今后可能出现的终端如可适用于本申请实施例,也应包含在本申请实施例保护范围以内,并在此以引用方式包含于此。
请参考图3,图3是本申请实施例提供的一种编解码方法应用于无线或核心网设备的转码场景的实施环境的示意图。该实施环境包括信道解码模块、音频解码器、音频编码器和信道编码模块。在本申请实施例中,该音频编码器为一种三维音频编码器,该音频解码器为一种三维音频解码器。
其中,音频解码器可以为利用本申请实施例提供的解码方法的解码器,也可以为利用其他解码方法的解码器。音频编码器可以为利用本申请实施例提供的编码方法的编码器,也可以为利用其他编码方法的编码器。在音频解码器为利用本申请实施例提供的解码方法的解码器的情况下,音频编码器为利用其他编码方法的编码器,在音频解码器为利用其他解码方法的解码器的情况下,音频编码器为利用本申请实施例提供的编码方法的编码器。
第一种情况,音频解码器为利用本申请实施例提供的解码方法的解码器,音频编码器为利用其他编码方法的编码器。
此时,信道解码模块用于对接收的码流进行信道解码,然后音频解码器用于利用本申请实施例提供的解码方法进行信源解码,再通过音频编码器按照其他编码方法进行编码,实现一种格式到另一种格式的转换,即转码。之后,再通过信道编码后发送。
第二种情况,音频解码器为利用其他解码方法的解码器,音频编码器为利用本申请实施例提供的编码方法的编码器。
此时,信道解码模块用于对接收的码流进行信道解码,然后音频解码器用于利用其他解 码方法进行信源解码,再通过音频编码器利用本申请实施例提供的编码方法进行编码,实现一种格式到另一种格式的转换,即转码。之后,再通过信道编码后发送。
其中,无线设备可以为无线接入点、无线路由器、无线连接器等等。核心网设备可以为移动性管理实体、网关等等。
本领域技术人员应能理解上述无线设备或者核心网设备仅为举例,其他现有的或今后可能出现的无线或核心网设备如可适用于本申请实施例,也应包含在本申请实施例保护范围以内,并在此以引用方式包含于此。
请参考图4,图4是本申请实施例提供的一种编解码方法应用于广播电视场景的实施环境的示意图。广播电视场景分为直播场景和后期制作场景。对于直播场景来说,该实施环境包括直播节目三维声制作模块、三维声编码模块、机顶盒和扬声器组,机顶盒包括三维声解码模块。对于后期制作场景来说,该实施环境包括后期节目三维声制作模块、三维声编码模块、网络接收器、移动终端、耳机等。
直播场景下,直播节目三维声制作模块制作出三维声信号(如HOA信号),该三维声信号经过应用本申请实施例的编码方法得到码流,该码流经广电网络传输到用户侧,由机顶盒中的三维声解码器利用本申请实施例提供的解码方法对码流进行解码,从而重建三维声信号,由扬声器组进行回放。或者,该码流经互联网传输到用户侧,由网络接收器中的三维声解码器利用本申请实施例提供的解码方法对码流进行解码,从而重建三维声信号,由扬声器组进行回放。又或者,该码流经互联网传输到用户侧,由移动终端中的三维声解码器利用本申请实施例提供的解码方法对码流进行解码,从而重建三维声信号,由耳机进行回放。
后期制作场景下,后期节目三维声制作模块制作出三维声信号,该三维声信号经过应用本申请实施例的编码方法得到码流,该码流经广电网络传输到用户侧,由机顶盒中的三维声解码器利用本申请实施例提供的解码方法对码流进行解码,从而重建三维声信号,由扬声器组进行回放。或者,该码流经互联网传输到用户侧,由网络接收器中的三维声解码器利用本申请实施例提供的解码方法对码流进行解码,从而重建三维声信号,由扬声器组进行回放。又或者,该码流经互联网传输到用户侧,由移动终端中的三维声解码器利用本申请实施例提供的解码方法对码流进行解码,从而重建三维声信号,由耳机进行回放。
请参考图5,图5是本申请实施例提供的一种编解码方法应用于虚拟现实流场景的实施环境的示意图。该实施环境包括编码端和解码端,编码端包括采集模块、预处理模块、编码模块、打包模块和发送模块,解码端包括解包模块、解码模块、渲染模块和耳机。
采集模块采集HOA信号,然后通过预处理模块对HOA信号进行预处理操作,预处理操作包括滤除掉HOA信号中的低频部分,通常是以20Hz或者50Hz为分界点,提取HOA信号中的方位信息等。之后通过编码模块,利用本申请实施例提供的编码方法进行编码处理,编码之后通过打包模块进行打包,进而通过发送模块发送给解码端。
解码端的解包模块首先进行解包,之后通过解码模块,利用本申请实施例提供的解码方法进行解码,然后通过渲染模块对解码信号进行双耳渲染处理,渲染处理后的信号映射到收听者耳机上。该耳机可以为独立的耳机,也可以是基于虚拟现实的眼镜设备上的耳机。
需要说明的是,本申请实施例描述的系统架构以及业务场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着系统架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
接下来对本申请实施例提供的编解码方法进行详细地解释说明。需要说明的是,结合图1所示的实施环境,下文中的任一种编码方法可以是源装置10中的编码器100执行的。下文中的任一种解码方法可以是目的地装置20中的解码器200执行的。
图6是本申请实施例提供的一种编码方法的流程图,该编码方法应用于编码端。请参考图6,该方法包括如下步骤。
步骤601:根据当前帧的HOA信号确定当前帧的编码方案。
对于待编码的多个音频帧的HOA信号来说,编码端逐帧进行编码。其中,音频帧的HOA信号是通过HOA采集技术得到的音频信号。HOA信号是一种场景音频信号,也是一种三维音频信号,HOA信号是指对空间中麦克风所在位置的声场进行采集得到的音频信号,采集得到的音频信号称为原始HOA信号。音频帧的HOA信号也可以是将其他格式的三维音频信号转换后获得的HOA信号。例如将5.1声道信号转换成HOA信号,或者将5.1声道信号和对象音频混合的三维音频信号转换成HOA信号。可选地,待编码的音频帧的HOA信号为时域信号或频域信号,可以包含HOA信号的所有通道,也可以包含HOA信号的部分通道。示例性地,若音频帧的HOA信号的阶数为3,HOA信号的通道数为16,音频帧的帧长为20ms,采样率为48KHz,则待编码的音频帧的HOA信号包含16个通道的信号,每个通道包含960个采样点。
为了降低计算复杂度,若编码端获取到的音频帧的HOA信号为原始HOA信号,原始HOA信号的采样点数或频点数较多,那么编码端可以对原始HOA信号进行下采样,以得到待编码的音频帧的HOA信号。例如,编码端对原始HOA信号进行1/Q下采样,以降低待编码的HOA信号的采样点数或频点数,如本申请实施例中原始HOA信号的每个通道包含960个采样点,采用1/120下采样后,得到待编码的HOA信号的每个通道包含8个采样点。
在本申请实施例中以编码端对当前帧进行编码为例,对编码端的编码方法进行介绍。当前帧为待编码的一个音频帧。也即是,编码端获取当前帧的HOA信号,采用本申请实施例提供的编码方法对当前帧的HOA信号进行编码。
需要说明的是,为了满足对不同声场类型下的音频帧均有较高的压缩率,需要根据各音频帧的声场类型为相应音频帧选择合适的编解码方案。在本申请实施例中,编码端先根据当前帧的声场类型,确定当前帧的初始编码方案,初始编码方案为第一编码方案或第二编码方案。编码端再通过对比当前帧的初始编码方案和当前帧的前一帧的初始编码方案是否相同,来判定采用第一编码方案、第二编码方案还是第三编码方案对当前帧的HOA信号进行编码。其中,若当前帧的初始编码方案与当前帧的前一帧的初始编码方案相同,则编码端采用与当前帧的初始编码方案相一致的编码方案来编码当前帧的HOA信号。若当前帧的初始编码方案与当前帧的初始编码方案不相同,则编码端采用第三编码方案来编码当前帧的HOA信号。
在本申请实施例中,当前帧的编码方案为第一编码方案、第二编码方案和第三编码方案中的一种。其中,第一编码方案为基于DirAC的HOA编码方案,第二编码方案为基于虚拟 扬声器选择的HOA编码方案,第三编码方案为混合编码方案。可选地,混合编码方案也称为切换帧编码方案。第三编码方案为本申请实施例提供的一种切换帧编码方案,切换帧编码方案为了在不同的编解码方案之间切换时听觉质量的平滑过渡。本申请实施例将会在下文对这三种编码方案进行详细介绍。在本申请实施例中,基于虚拟扬声器选择的HOA编码方案也称为基于MP的HOA编码方案。
在本申请实施例中,编码端根据当前帧的HOA信号确定当前帧的初始编码方案。然后,编码端基于当前帧的初始编码方案和当前帧的前一帧的初始编码方案,确定当前帧的编码方案。需要说明的是,本申请实施例不限定编码端确定初始编码方案的实现方式。
可选地,编码端对当前帧的HOA信号进行声场类型分析,以得到当前帧的声场分类结果,基于当前帧的声场分类结果,确定当前帧的初始编码方案。需要说明的是,本申请实施例不限定声场类型分析的方法,例如编码端通过对当前帧的HOA信号进行奇异值分解以进行声场类型分析。
可选地,声场分类结果包括相异性声源数量,本申请实施例不限定相异性声源数量的确定方法。在确定当前帧对应的相异性声源数量之后,若当前帧对应的相异性声源数量大于第一阈值且小于第二阈值,则编码端确定当前帧的初始编码方案为第二编码方案。若当前帧对应的相异性声源数量不大于第一阈值或不小于第二阈值,则编码端确定当前帧的初始编码方案为第一编码方案。其中,第一阈值小于第二阈值。可选地,第一阈值为0或其他值,第二阈值为3或其他值。
在通过上述方法确定各个音频帧(包括当前帧)的初始编码方案的情况下,可能会出现各个音频帧的初始编码方案来回切换的情况,这样最终需要编码的切换帧较多,由于编码方案之间的切换带来的问题较多,即需要解决的问题较多,那么可以通过减少切换帧的数量来减少切换带来的问题。其中,切换帧指初始编码方案与前一帧的初始编码方案不同的音频帧。可选地,为了减少切换帧的数量,编码端可以先根据当前帧的声场分类结果,确定当前帧的预计编码方案,即编码端将按照前述方法确定的初始编码方案作为预计编码方案。然后,编码端采用滑动窗的方法基于当前帧的预计编码方案更新当前帧的初始编码方案,如编码端通过hangover处理来更新当前帧的初始编码方案。
可选地,假设滑动窗的长度为N,滑动窗内包含当前帧的预计编码方案以及当前帧的前N-1帧的已更新的初始编码方案。若滑动窗内第二编码方案的个数累计不小于第一指定阈值,则编码端将当前帧的初始编码方案更新为第二编码方案。若滑动窗内第二编码方案的个数累计小于第一指定阈值,则编码端将当前帧的初始编码方案更新为第一编码方案。其中,滑动窗的长度N为8、10、15等,第一指定阈值为5、6、7等值,本申请实施例对滑动窗的长度和第一指定阈值的取值不作限定。举例说明如下,假设滑动窗的长度为10,第一指定阈值为7,滑动窗内包含当前帧的预计编码方案以及当前帧的前9帧的已更新的初始编码方案,如果滑动窗内第二编码方案的个数累计到不小于7,则编码端将当前帧的初始编码方案更新为第二编码方案。如果滑动窗内第二编码方案的个数累计小于7,则编码端将当前帧的初始编码方案更新为第一编码方案。
或者,若滑动窗内第一编码方案的个数累计不小于第二指定阈值,则编码端将当前帧的初始编码方案更新为第一编码方案。若滑动窗内第一编码方案的个数累计小于第二指定阈值,则编码端将当前帧的初始编码方案更新为第二编码方案。其中,第二指定阈值为5、6、7等值, 本申请实施例对第二指定阈值的取值不作限定。可选地,第二指定阈值与上述第一指定阈值不同或相同。
除了上述介绍的一些实现方式之外,编码端也可以采用其他的方法来得到当前帧的声场分类结果,基于声场分类结果确定初始编码方案的方法也可以为其他的方法,本申请实施例对此不作限定。
在本申请实施例中,编码端确定当前帧的初始编码方案之后,若当前帧的初始编码方案与当前帧的前一帧的初始编码方案相同,则编码端确定当前帧的编码方案为当前帧的初始编码方案。若当前帧的初始编码方案与当前帧的前一帧的初始编码方案不同,则编码端确定当前帧的编码方案为第三编码方案。也即是,若当前帧的初始编码方案与当前帧的前一帧的初始编码方案相同且为第一编码方案,则编码端确定当前帧的编码方案为第一编码方案。若当前帧的初始编码方案与当前帧的前一帧的初始编码方案相同且为第二编码方案,则编码端确定当前帧的编码方案为第二编码方案。若当前帧的初始编码方案与当前帧的前一帧的初始编码方案中的一个为第一编码方案,另一个为第二编码方案,则编码端确定当前帧的编码方案为第三编码方案。其中,当前帧的初始编码方案与当前帧的前一帧的初始编码方案中的一个为第一编码方案,另一个为第二编码方案,即,当前帧的初始编码方案为第一编码方案且当前帧的前一帧的初始编码方案为第二编码方案,或者,当前帧的初始编码方案为第二编码方案且当前帧的前一帧的初始编码方案为第一编码方案。也即是,对于切换帧来说,编码端既不采用第一编码方案也不采用第二编码方案来编码切换帧的HOA信号,而是将采用切换帧编码方案来编码切换帧的HOA信号。对于非切换帧来说,编码端将采用与非切换帧的初始编码方案相一致的编码方案来编码切换帧的HOA信号。其中,初始编码方案与前一帧的初始编码方案不同的音频帧为切换帧,初始编码方案与前一帧的初始编码方案相同的音频帧为非切换帧。
需要说明的是,编码端除了确定当前帧的编码方案之外,还需将能够指示当前帧的编码方案的信息编入码流,以便于解码端确定采用哪个解码方案来解码当前帧的码流。在本申请实施例中,编码端将能够指示当前帧的编码方案的信息编入码流的实现方式有多种,接下来介绍其中的三种实现方式。
第一种实现方式、编码切换标志以及两种编码方案的指示信息
在该实现方式中,编码端需要确定当前帧的切换标志的值,将当前帧的切换标志的值编入码流。其中,当当前帧的编码方案为第一编码方案或第二编码方案时,当前帧的切换标志的值为第一值。当当前帧的编码方案为第三编码方案时,当前帧的切换标志的值为第二值。可选地,第一值为“0”,第二值为“1”,第一值和第二值也可以为其他的值。
另外,编码端将当前帧的初始编码方案的指示信息编入码流。或者,若当前帧的切换标志的值为第一值,则编码端将当前帧的初始编码方案的指示信息编入码流,若当前帧的切换标志的值为第二值,则编码端将预设指示信息编入码流。
可选地,初始编码方案的指示信息以与初始编码方案相对应的编码模式(coding mode)来表示,即,以编码模式作为指示信息。例如,与初始编码方案相对应的编码模式为初始编码模式,初始编码模式为第一编码模式(即DirAC编码模式,即DirAC编码方案)或第二编码模式(即MP编码模式,即MP编码方案)。可选地,预设指示信息为预设编码模式,预设编码模式为第一编码模式或第二编码模式。在其他一些实施例中,预设指示信息为其他编码 模式,也即不限定编入码流的切换帧的编码方案的指示信息具体是什么。
也即是,在该第一种实现方式中,编码端以切换标志来指示切换帧,且可以不限定编入码流的切换帧的编码方案的指示信息,切换帧的编码方案的指示信息可以为切换帧的初始编码模式,也可以为预设编码模式,也可以从第一编码模式和第二编码模式中随机选定。需要说明的是,在这种实现方式中,用切换标志来指示当前帧是否为切换帧,这样,解码端即能够直接通过获取码流中的切换标志来确定当前帧是否为切换帧。
可选地,在该第一种实现方式中,当前帧的切换标志和初始编码方案的指示信息各占码流的一个比特位。示例性地,当前帧的切换标志的值为“0”或“1”,其中,切换标志的值为“0”指示当前帧不是切换帧,即当前帧的切换标志的值为第一值。切换标志的值为“1”指示当前帧是切换帧,即当前帧的切换标志的值为第二值。可选地,初始编码方案的指示信息为“0”或“1”,其中,“0”表示DirAC模式,“1”表示MP模式。
在其他一些实施例中,若当前帧的初始编码方案与当前帧的前一帧的初始编码方案不同,则编码端确定当前帧的切换标志的值为第二值,将当前帧的切换标志的值编入码流。也即是,对于切换帧来说,由于码流中切换标志即能够指示切换帧,因此无需编码切换帧的编码方案的指示信息。
第二种实现方式、编码两种编码方案的指示信息
编码端将当前帧的初始编码方案的指示信息编入码流。以编码模式作为指示信息为例,编入码流的指示信息实质上是与初始编码方案相一致的编码模式,即DirAC模式或MP模式。
可选地,在该第一种实现方式中,初始编码方案的指示信息占码流的一个比特位。示例性地,以编码模式作为指示信息为例,指示信息为“0”或“1”,其中,“0”表示DirAC模式,指示当前帧的初始编码方案为第一编码方案,“1”表示MP模式,指示当前帧的初始编码方案为第二编码方案。
第三种实现方式、编码三种编码方案的指示信息
在该实现方式中,编码端将当前帧的编码方案的指示信息编入码流。以编码模式作为指示信息为例,编入码流的指示信息实质上是与当前帧的编码方案相一致的编码模式,即DirAC模式、MP模式或MP-W模式。其中,MP-W模式为与切换帧编码方案相对应的编码模式。其中,若指示信息为MP-W模式,则指示当前帧为切换帧,若指示信息为DirAC模式或MP模式,则指示当前帧为非切换帧。
可选地,在该第三种实现方式中,当前帧的编码方案的指示信息占码流的两个比特位。示例性地,编入码流的指示信息为“00”、“01”或“10”。其中,“00”指示当前帧的编码方案为第一编码方案,“01”指示当前帧的编码方案为第二编码方案,“10”指示当前帧的编码方案为第三编码方案。
步骤602:若当前帧的编码方案为第三编码方案,则将该HOA信号中指定通道的信号编入码流,指定通道为该HOA信号的所有通道中的部分通道。
在本申请实施例中,第三编码方案指示仅将当前帧的HOA信号中指定通道的信号编入码流。其中,指定通道为该HOA信号的所有通道中的部分通道。也即是,对于切换帧来说,编码端将切换帧的HOA信号中指定通道的信号编入码流,而非采用第一编码方案或第二编码方案对切换帧进行编码,即本方案为了编码方案切换时听觉质量的平滑过渡,采用一种折中的方式来编码切换帧。
可选地,指定通道与第一编码方案中预设的传输通道一致,即指定通道为预设通道。也即是,在第三编码方案与第二编码方案不同的前提下,为了使得第三编码方案与第一编码方案的编码效果相接近,编码端将切换帧的HOA信号中与第一编码方案中预设的传输通道相同的通道的信号编入码流,从而使得编码方案切换时听觉质量尽可能地平滑过渡。需要说明的是,根据编码带宽、码率的不同,甚至是应用场景的不同,可以分别预设不同的传输通道。可选地,不同的编码带宽、码率或应用场景下,预设的传输通道也可以相同。
需要说明的是,在本申请实施例中,编码端将HOA信号中指定通道的信号编入码流的实现方式有很多,能将指定通道的信号编入码流即可,本申请实施例对此不作限定。可选地,指定通道的信号包括FOA信号,FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号。也即是,指定通道包括FOA通道,FOA通道的信号为低阶信号,即,若当前帧为切换帧,则编码端仅将当前帧的HOA信号的低阶部分编入码流,低阶部分即包括FOA通道的W信号、X信号、Y信号和Z信号。示例性地,编码端基于该指定通道的信号确定虚拟扬声器信号和残差信号,将该虚拟扬声器信号和残差信号编入码流。例如,若指定通道包括FOA通道,则编码端将W信号确定为一路虚拟扬声器信号,将X信号、Y信号和Z信号分别与W信号之间的差信号确定为三路残差信号,或者,将X信号、Y信号和Z信号确定为三路残差信号。编码端通过核心编码器将该一路虚拟扬声器信号和三路残差信号编入码流。可选地,该核心编码器为立体声编码器或单声道编码器。
以上介绍了在当前帧为切换帧的情况下,编码端采用切换帧编码方案对当前帧编码的过程,也即编码端基于第三编码方案将当前帧的HOA信号中指定通道的信号编入码流。在本申请实施例中,切换帧编码方案也可称为基于MP-W的编码方案。接下来介绍在当前帧为非切换帧的情况下,编码端对当前帧编码的过程。
在本申请实施例中,若当前帧的编码方案为第一编码方案,则编码端按照第一编码方案将当前帧的HOA信号编入码流。若当前帧的编码方案为第二编码方案,则编码端按照第二编码方案将当前帧的HOA信号编入码流。也即是,若当前帧不是切换帧,则编码端采用当前帧的初始编码方案来编码当前帧。
在本申请实施例中,编码端按照第一编码方案将当前帧的HOA信号编入码流的实现过程为:编码端从当前帧的HOA信号中提取核心层信号和空间参数,将提取的核心层信号和空间参数编入码流。示例性地,编码端通过核心编码信号获取模块从当前帧的HOA信号中提取核心层信号,通过基于DirAC的空间参数提取模块从当前帧的HOA信号中提取出空间参数,通过核心编码器将核心层信号编入码流,通过空间参数编码器将空间参数编入码流。其中,核心层信号对应的通道与本方案中的指定通道一致。另外需要强调的是,采用第一编码方案除了将核心层信号编入码流之外,还将提取的空间参数编入码流,空间参数包含丰富的场景信息,例如方向信息等。而本申请实施例提供的切换帧编码方案仅将指定通道的信号编入码流。可见,对于同一帧来说,采用基于DirAC的HOA编码方案编入码流的有效信息也会多于采用切换帧编码方案编入码流的有效信息。而本方案在切换帧编码方案与第一编码方案不同的前提下,为了使得切换帧编码方案与第一编码方案的编码效果相接近,切换帧编码方案也是将HOA信号中与第一编码方案中预设的传输通道相同的指定通道的信号编入码流,但不会将HOA信号中更多的信息编入码流,也即不会提取空间参数,更不会将空间参数编入码流,从而使得听觉质量尽可能地平滑过渡。
编码端按照第二编码方案将当前帧的HOA信号编入码流的实现过程为:编码端基于MP算法从虚拟扬声器集合中选择与当前帧的HOA信号匹配的目标虚拟扬声器,基于当前帧的HOA信号和目标虚拟扬声器,通过基于MP的空间编码器确定虚拟扬声器信号,基于当前帧的HOA信号和虚拟扬声器信号通过基于MP的空间编码器确定残差信号,通过核心编码器将虚拟扬声器信号和残差信号编入码流。需要强调的是,基于MP的HOA编码方案与切换帧编码方案中确定虚拟扬声器信号和残差信号的原理和具体方式不同,且这两个方案所确定的虚拟扬声器信号和残差信号也不同。对于同一帧来说,采用基于MP的HOA编码方案编入码流的有效信息会多于采用切换帧编码方案。而本方案在切换帧编码方案与第二编码方案不同的前提下,为了使得切换帧编码方案与第一编码方案的编码效果相接近,切换帧编码方案也是采用编码虚拟扬声器信号和残差信号的方式,从而使得听觉质量尽可能地平滑过渡。
图7是本申请实施例提供的另一种编码方法的流程图。请参考图7,以将当前帧的初始编码方案的指示信息编入码流为例,对本申请实施例提供的编码方法再次进行解释说明。编码端首先获取待编码的当前帧的HOA信号。然后,编码端对该HOA信号进行声场类型分析,以确定当前帧的初始编码方案。编码端判断当前帧的初始编码方案与当前帧的前一帧的初始编码方案是否相同。若当前帧的初始编码方案与当前帧的前一帧的初始编码方案相同,则编码端采用当前帧的初始编码方案对当前帧的HOA信号进行编码,以得到当前帧的码流。若当前帧的初始编码方案与当前帧的前一帧的初始编码方案不同,则编码端采用切换帧编码方案对当前帧的HOA信号进行编码,以得到当前帧的码流。
需要说明的是,若当前帧为待编码的第一个音频帧,则当前帧的初始编码方案为第一编码方案或第二编码方案,编码端采用当前帧的初始编码方案将当前帧的HOA信号编入码流。
综上所述,在本申请实施例中,结合两个方案(即基于虚拟扬声器选择的编解码方案和基于方向音频编码的编解码方案)对音频帧的HOA信号进行编解码,也即针对不同的音频帧选择合适的编解码方案,这样能够提升音频信号的压缩率。同时,为了使得在不同编解码方案之间的切换时听觉质量的平滑过渡,本方案中对于切换帧来说,并非直接采用上述两个方案中的任一个方案对切换帧进行编码,而是将切换帧的HOA信号中指定通道的信号编入码流,即采用一种折中的方案对切换帧进行编解码,从而使得对解码恢复出的切换帧的HOA信号进行渲染播放后的听觉质量能够平滑过渡。
图8是本申请实施例提供的一种解码方法的流程图,该方法应用于解码端。需要说明的是,该解码方法对应于图6所示的编码方法。请参考图8,该方法包括如下步骤。
步骤801:根据码流确定当前帧的解码方案,当前帧的解码方案为第一解码方案或非第一解码方案,第一解码方案为基于DirAC的HOA解码方案。
需要说明的是,由于编码端对不同的音频帧采用不同的编码方案进行编码,那么解码端也需要用对应的解码方案来解码各个音频帧。接下来介绍解码端如何确定当前帧的解码方案。
由前述可知,在图6所示编码方法的步骤601中介绍了编码端将能够用于指示当前帧的编码方案的信息编入码流的三种实现方式,相应地,解码端确定当前帧的编码方案也对应有三种实现方式,接下来将对此进行介绍。
第一种实现方式、编码了切换标志以及两种编码方案的指示信息
解码端先从码流中解析出当前帧的切换标志的值。若该切换标志的值为第一值,则解码 端再从该码流中解析出当前帧的解码方案的指示信息,该指示信息用于指示当前帧的解码方案为第一解码方案或第二解码方案。若该切换标志为的值为第二值,则解码端确定当前帧的编码方案为第三编码方案。需要说明的是,编码端编入码流的编码方案的指示信息即为解码端从码流中解析出的解码方案的指示信息。
换句话说,若解码端解析出当前帧的切换标志的值为第一值,说明当前帧为非切换帧。解码端再从码流中解析出解码方案的指示信息,基于指示信息确定当前帧的解码方案。若该切换标志的值为第二值,则解码端确定当前帧的解码方案为第三解码方案,当前帧为切换帧,这种情况下,即使码流中包含指示信息,解码端也无需解码指示信息。其中,第三解码方案为混合解码方案,即切换帧解码方案。
需要说明的是,若切换标志的值为第二值,则解码端确定当前帧的解码方案为切换帧解码方案,且当前帧为切换帧,切换帧解码方案是不同于第一解码方案和第二解码方案的混合解码方案,切换帧解码方案是为了听觉质量的平滑过渡以及时延对齐。
可选地,在该第一种实现方式中,解码方案的指示信息和切换标志各占码流的一个比特位。示例性地,解码端先从码流中解析当前帧的切换标志的值,若解析出的切换标志的值为“0”,即切换标志的值为第一值,则解码端再从码流中解析当前帧的解码方案的指示信息,若解析出的指示信息为“0”,则解码端确定当前帧的解码方案为第一解码方案。若解析出的指示信息为“1”,则解码端确定当前帧的解码方案为第二解码方案。若解析出的切换标志的值为“1”,即切换标志的值为第二值,则解码端确定当前帧的解码方案为切换帧解码方案,即第三解码方案。
可选地,在当前帧为切换帧的情况下,解码端能够基于当前帧的切换标志以及当前帧的前一帧的解码方案,确定当前帧的切换状态。例如,若当前帧的切换标志的值为第一值,且当前帧的前一帧的解码方案为第一解码方案,则解码端确定当前帧的切换状态为第一切换状态,第一切换状态是指从基于DirAC的HOA解码方案切换到基于MP的HOA解码方案的状态。若当前帧的切换标志的值为第二值,且当前帧的前一帧的解码方案为第二解码方案,则解码端确定当前帧的切换状态为第二切换状态,第二切换状态是指从基于MP的HOA解码方案切换到基于DirAC的HOA解码方案的状态。
第二种实现方式、编码了两种编码方案的指示信息
解码端从码流中解析出当前帧的初始解码方案,初始解码方案为第一解码方案或第二解码方案。若当前帧的初始解码方案与当前帧的前一帧的初始解码方案相同,则确定当前帧的解码方案为当前帧的初始解码方案。若当前帧的初始解码方案与当前帧的前一帧的初始解码方案不同,则确定当前帧的解码方案为第三解码方案,即混合解码方案。其中,当前帧的初始解码方案与当前帧的前一帧的初始解码方案不同是指,当前帧的初始解码方案为第一解码方案且当前帧的前一帧的初始解码方案为第二解码方案,或者,当前帧的初始解码方案为第二解码方案且当前帧的前一帧的初始解码方案为第一解码方案。也即是,当前帧的初始解码方案与当前帧的前一帧的初始解码方案中的一个为第一解码方案,另一个为第二解码方案。
可选地,以编码模式作为编入码流的初始编码方案的指示信息为例,从码流中解析出的指示信息称为编码模式(coding mode)。需要说明的是,若当前帧的初始解码方案与当前帧的前一帧的初始解码方案不同,则表示当前帧为切换帧。若当前帧的初始解码方案与当前帧的前一帧的初始解码方案相同,则表示当前帧为非切换帧。
可选地,在该第二种实现方式中,用于指示初始解码方案的指示信息占码流的一个比特位。以编码模式作为指示信息为例,码流中的编码模式占一个比特位。示例性地,解码端从码流中解析当前帧的指示信息,若解析出的指示信息为“0”,且当前帧的前一帧的指示信息也为“0”,则解码端确定当前帧的解码方案为第一解码方案。若解析出的指示信息为“1”,且当前帧的前一帧的指示信息也为“1”,则解码端确定当前帧的解码方案为第二解码方案。若解析出的指示信息为“0”且当前帧的前一帧的指示信息为“1”,或者解析出的指示信息为“1”且当前帧的前一帧的指示信息为“0”,则解码端确定当前帧的解码方案为第三解码方案。
可选地,当前帧的前一帧的初始解码方案的指示信息为缓存的数据。在解码到当前帧时,解码端可以从缓存中获取当前帧的前一帧的初始解码方案的指示信息。
可选地,在当前帧为切换帧的情况下,解码端能够基于当前帧的前一帧的初始解码方案,确定当前帧的切换状态。例如,若当前帧的前一帧的初始解码方案为第一解码方案,则解码端确定当前帧的切换状态为第一切换状态,第一切换状态是指从基于DirAC的HOA解码方案切换到基于MP的HOA解码方案的状态。若当前帧的前一帧的初始解码方案为第二解码方案,则解码端确定当前帧的切换状态为第二切换状态,第二切换状态是指从基于MP的HOA解码方案切换到基于DirAC的HOA解码方案的状态。
第三种实现方式、编码了三种编码方案的指示信息
解码端从码流中解析出当前帧的解码方案的指示信息,该指示信息用于指示当前帧的解码方案为第一解码方案、第二解码方案或第三解码方案。
示例性地,假设以编码模式作为指示信息,解码端从码流中解析出当前帧的编码模式。若当前帧的编码模式为DirAC模式,则解码端确定当前帧的解码方案为第一解码方案。若当前帧的编码模式为MP模式,则解码端确定当前帧的解码方案为第二解码方案。若当前帧的编码模式为MP-W模式,则解码端确定当前帧的解码方案为第三解码方案。
可选地,在该第三种实现方式中,解码方案的指示信息占码流的两个比特位。例如,假设以编码模式作为指示信息,当前帧的编码模式占码流的两个比特位。示例性地,解码端从码流中解析当前帧的解码方案的指示信息,若解析出的指示信息为“00”,则解码端确定当前帧的解码方案为第一解码方案。若解析出的指示信息为“01”,则解码端确定当前帧的解码方案为第二解码方案。若解析出的指示信息为“10”,则解码端确定当前帧的解码方案为切换帧解码方案。
可选地,在当前帧为切换帧的情况下,解码端能够基于当前帧的前一帧的解码方案,确定当前帧的切换状态。例如,若当前帧的前一帧的解码方案为第一解码方案,则解码端确定当前帧的切换状态为第一切换状态,第一切换状态是指从基于DirAC的HOA解码方案切换到基于MP的HOA解码方案的状态。若当前帧的前一帧的解码方案为第二解码方案,则解码端确定当前帧的切换状态为第二切换状态,第二切换状态是指从基于MP的HOA解码方案切换到基于DirAC的HOA解码方案的状态。
步骤802:若当前帧的编码方案为第一编码方案,则按照第一编码方案,根据码流重建第一音频信号,重建的第一音频信号为当前帧的重建HOA信号。
在本申请实施例中,由于基于DirAC的HOA解码方案的解码时延较大,若当前帧的解码方案为第一解码方案,则解码端采用第一解码方案解码码流,即可得到当前帧的重建HOA信号。也即是,若当前帧的解码方案为第一解码方案,则解码端按照第一解码方案,根据码 流重建第一音频信号,重建的第一音频信号为当前帧的重建HOA信号。
其中,解码端按照第一解码方案,根据码流重建第一音频信号的实现过程为:解码端从码流中解析出核心层信号和空间参数,基于核心层信号和空间参数重建出当前帧的HOA信号。示例性地,解码端通过核心解码器从码流中解析出核心层信号,通过空间参数解码器从码流中解析出空间参数,基于解析出的核心层信号和空间参数进行基于DirAC的HOA信号合成处理,以重建出第一音频信号,重建的第一音频信号即为当前帧的重建HOA信号。
步骤803:若当前帧的编码方案为非第一编码方案,则按照非第一解码方案,根据该码流重建第二音频信号,对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号,对齐处理使得当前帧的解码时延与第一解码方案的解码时延一致。
由前述可知,若当前帧的编码方案为第一编码方案,则解码端按照第一编码方案对码流进行解码即可得到当前帧的重建HOA信号,无需进行其他处理。为了解决不同编解码方案的解码时延不同的问题,若当前帧的解码方案为非第一解码方案,则解码端先根据码流重建第二音频信号,然后还需对第二音频信号进行对齐处理,或者说基于第二音频信号进行对齐处理,以得到当前帧的重建HOA信号。其中,对齐处理使得当前帧的解码时延与第一解码方案的解码时延一致。需要说明的是,本文中所讲的解码时延为端到端的编解码时延,解码时延也可认为是编码时延,三种编码方案的编码过程的时延是一致的,解码过程的时延需要根据本申请实施例提供的解码方法进行对齐。
在本申请实施例中,当前帧的解码方案为非第一编码方案分为两种情况,这两种情况即当前帧的解码方案为第二解码方案,以及当前帧的解码方案为第三解码方案,也即非第一解码方案为第二解码方案或第三解码方案。接下来将分别对这两种情况的解码过程进行介绍。
在本申请实施例中,若当前帧的解码方案为第三解码方案,即当前帧为切换帧,则解码端根据码流重建指定通道的信号,将重建的指定通道的信号作为重建的第二音频信号。其中,指定通道为当前帧的HOA信号的所有通道中的部分通道。相应地,解码端对重建的指定通道的信号进行对齐处理,以得到当前帧的重建HOA信号。
在本申请实施例中,解码端根据码流重建指定通道的信号的过程,与编码端将指定通道的信号编入码流的过程是对称的,即相匹配的。假设编码端基于当前帧的HOA信号中指定通道的信号确定虚拟扬声器信号和残差信号,将虚拟扬声器信号和残差信号编入码流。那么,解码端根据码流确定虚拟扬声器信号和残差信号,再基于虚拟扬声器信号和残差信号,重建该指定通道的信号。示例性地,解码端通过核心解码器从码流中解析出虚拟扬声器信号和残差信号,该核心解码器可以为立体声解码器或单声道解码器。
对于切换帧来说,解码端重建出指定通道的信号之后,对重建的指定通道的信号进行分析滤波处理,基于经分析滤波的指定通道的信号,确定当前帧的HOA信号中除指定通道之外的一个或多个剩余通道的增益。解码端基于该一个或多个剩余通道的增益和经分析滤波的指定通道的信号,确定该一个或多个剩余通道的信号。解码端对经分析滤波的指定通道的信号和该一个或多个剩余通道的信号进行合成滤波处理,以得到当前帧的重建HOA信号。也即是,对于切换帧来说,对齐处理包括重建各个剩余通道的信号以及基于分析合成滤波的时延对齐处理。其中,解码端通过分析合成滤波处理来增加切换帧的解码时延,使得切换帧的解码时延与第一解码方案的解码时延一致,分析合成滤波处理包括分析滤波处理和合成滤波处理。
示例性地,假设指定通道的信号为当前帧的HOA信号的低阶部分,该一个或多个剩余通道的信号为当前帧的HOA信号的高阶部分,则解码端对重建的HOA信号的低阶部分进行分析滤波处理,基于经分析滤波的HOA信号的低阶部分,确定当前帧的高阶增益。其中,该高阶增益包括该HOA信号的高阶部分所包括的各个通道的增益。解码端基于经分析滤波的HOA信号的低阶部分和高阶增益,确定当前帧的HOA信号的高阶部分。解码端对经分析滤波的HOA信号的低阶部分和该高阶部分进行合成滤波处理,以得到当前帧的重建HOA信号。也即是,在指定通道的信号为当前帧的HOA信号的低阶部分的情况下,切换帧对应的对齐处理包括重建高阶部分以及基于分析合成滤波的时延对齐处理。
可选地,指定通道与第一解码方案(或第一编解码方案或第一编码方案)中预设的传输通道一致。可选地,指定通道包括一阶立体混响(first-order ambisonics,FOA)通道,指定通道的信号包括FOA信号,FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号。FOA信号为HOA信号中的低阶部分。
示例性地,解码端将重建的HOA信号的低阶部分输入分析滤波器,以通过该分析滤波器对重建的HOA信号的低阶部分进行分析滤波处理,从而得到经分析滤波的HOA信号的低阶部分。基于经分析滤波的HOA信号的低阶部分,确定当前帧的高阶增益,以及基于经分析滤波的HOA信号的低阶部分和高阶增益确定经分析滤波的高阶部分。通过合成滤波器对经分析滤波的HOA信号的低阶部分和高阶部分进行合成滤波处理,以得到合成滤波器输出的当前帧的重建HOA信号。也即通过分析合成滤波器为当前帧增加一个时延。其中,该分析合成滤波器与基于DirAC的HOA编解码方案中使用的分析合成滤波器相同,以使通过相同的分析合成滤波器对当前帧的第一HOA信号处理后所增加的时延,与基于DirAC的HOA编码方案中分析合成滤波器的处理时延一致,进而使得当前帧的解码时延与基于DirAC的HOA解码方案的解码时延一致。例如,分析合成滤波处理所增加的时延为5ms,那么当前帧的HOA信号将比未经分析合成滤波处理的情况下晚5ms完成输出,从而达到时延对齐的目的。
其中,该分析合成滤波器可以为经过复数域低延时滤波器组(complex domain low delay filter bank,CLDFB)或者其他的具有时延特性的滤波器。
上述介绍了关于切换帧的对齐处理以使时延对齐的过程。接下来介绍关于解码方案为第二解码方案的音频帧的对齐处理以使时延对齐的过程。
在本申请实施例中,若当前帧的解码方案为第二解码方案,则解码端先按照第二解码方案,根据该码流重建第一HOA信号,重建的第一HOA信号即为重建的第二音频信号。然后,解码端再对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号。
其中,解码端按照第二解码方案,根据该码流重建第一HOA信号的实现过程为:解码端通过核心解码器从码流中解析出虚拟扬声器信号和残差信号,将解析出的虚拟扬声器信号和残差信号送入基于MP的空间解码器,以重建出第一HOA信号。需要说明的是,解码端按照第二解码方案,根据该码流重建第一HOA信号的过程,与编码端按照第二编码方案将当前帧的HOA信号编入码流的过程相对应,且第二编解码方案中的所讲的虚拟扬声器信号和残差信号是不同于切换帧编码方案中所讲的虚拟扬声器信号和残差信号。
可选地,解码端对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号的方式有多种,例如通过分析合成滤波处理来进行模式对齐,以使得时延对齐,或者通过循 环缓存处理来进行模式对齐,以使得时延对齐。接下来将对基于分析合成滤波的时延对齐处理和基于循环缓存的时延对齐处理分别进行介绍。
首先介绍对解码方案为第二解码方案的当前帧进行基于分析合成滤波的时延对齐处理的实现过程。在本申请实施例中,解码端重建出第一HOA信号之后,对重建的第一HOA信号进行分析合成滤波处理,以得到当前帧的重建HOA信号。也即是,对于基于MP的HOA编码方案编码的当前帧来说,解码端先采用第二解码方案基于码流重建当前帧的HOA信号,即重建第一HOA信号,然后通过分析合成滤波处理以进行时延对齐。
示例性地,解码端将重建的第一HOA信号输入分析合成滤波器,以得到该分析合成滤波器输出的当前帧的重建HOA信号。也即通过分析合成滤波器为当前帧增加一个时延。其中,该分析合成滤波器与基于DirAC的HOA解码方案中使用的分析合成滤波器相同,以使通过相同的分析合成滤波器对当前帧的第一HOA信号处理后所增加的时延,与基于DirAC的HOA解码方案中分析合成滤波器的处理时延一致,进而使得当前帧的解码时延与基于DirAC的HOA解码方案的解码时延一致。其中,该分析合成滤波器可以为经过复数域低延时滤波器组(CLDFB)或者其他的具有时延特性的滤波器。
由于基于DirAC的HOA解码方案解码得到的HOA信号的高阶部分的能量较大,而基于MP的HOA解码方案解码得到的HOA信号的高阶部分的能量较小。基于此,在本申请实施例中,为了使相邻音频帧的重建HOA信号的高阶部分的能量相差较小,使得听觉质量平滑过渡,解码端还可以对基于MP的HOA解码方案解码得到的HOA信号的高阶部分进行增益调整,以使经增益调整的高阶部分的能量得到提高。
可选地,解码端对重建的第一HOA信号进行分析滤波处理,以得到第二HOA信号。解码端对第二HOA信号的高阶部分进行增益调整,以得到经增益调整的高阶部分。解码端对第二HOA信号的低阶部分和经增益调整的高阶部分进行合成滤波处理,以得到当前帧的重建HOA信号。需要说明的是,这种情况下,对齐处理可认为是包括高阶增益调整以及基于分析合成滤波的时延对齐处理。
可选地,若当前帧的解码方案为第二解码方案,且当前帧的前一帧的解码方案为第三解码方案,即当前帧的前一帧为切换帧,则解码端根据当前帧的前一帧的高阶增益,对第二HOA信号的高阶部分进行增益调整,以得到经增益调整的高阶部分。也即是,对于切换帧之后相邻的一个基于MP的HOA解码方案进行解码的音频帧来说,解码端可以用该音频帧之前的切换帧的高阶增益来调整该音频帧的HOA信号的高阶部分,以使最终得到的该音频帧的重建HOA信号的高阶部分的能量与该切换帧的重建HOA信号的高阶部分的能量相近,实现听觉质量的平滑过渡。可选地,在解码过程中位于切换帧之后且与切换帧相邻的音频帧可称为MP解码高阶增益调整帧,解码端需要对MP解码高阶增益调整帧进行高阶增益调整以及基于分析合成滤波的时延对齐处理。可选地,对于MP解码高阶增益调整帧来说,进行高阶增益调整所使用的高阶增益可以为前一帧的高阶增益,也可以为根据其他方式获取的高阶增益,本申请实施例对此不作限定。
可选地,若当前帧的编码方案为第二解码方案,且当前帧的前一帧的解码方案是第二解码方案,即当前帧的前一帧不是切换帧,则解码端也可以通过高阶增益对当前帧的第二HOA信号的高阶部分进行增益调整,以得到经增益调整的高阶部分。需要说明的是,本申请实施例不限定该高阶增益的获取方法,该高阶增益可以为当前帧的前一帧的高阶增益,也可以根 据前一帧的高阶增益和预设增益调整函数确定,也可以通过其他方法确定。
可选地,除了对高阶部分进行增益调整之外,解码端还可以对解码方案为第二解码方案的音频帧的HOA信号的其他部分进行增益调整。也即是,本申请实施例不限定对HOA信号的哪些通道的信号进行增益调整。换句话说,解码端可以对HOA信号中任意一个或多个通道的信号进行增益调整,例如,进行增益调整的通道可以包括高阶通道中的全部或部分通道,或除指定通道之外的剩余通道中的全部或部分通道,或其他通道。
以对除指定通道之外的一个或多个剩余通道的信号进行增益调整为例,解码端对重建的第一HOA信号进行分析滤波处理,以得到第二HOA信号之后,对第二HOA信号中一个或多个剩余通道的信号进行增益调整,以得到经增益调整的一个或多个剩余通道的信号。其中,一个或多个剩余通道为HOA信号中除指定通道之外的通道。解码端对第二HOA信号中指定通道的信号,以及经增益调整的一个或多个剩余通道的信号进行合成滤波处理,以得到当前帧的重建HOA信号。可选地,若当前帧的前一帧的解码方案为第三解码方案,则解码端根据当前帧的前一帧的一个或多个剩余通道的增益,对第二HOA信号中一个或多个剩余通道的信号进行增益调整,以得到经增益调整的一个或多个剩余通道的信号。也即是,对于采用第二解码方案进行编解码的音频帧的HOA信号来说,解码端对除指定通道之外的剩余通道的信号进行增益调整。若当前帧的前一帧为切换帧,则解码端基于切换帧的剩余通道的增益对当前帧的剩余通道的信号进行增益调整,这样能够使得当前帧的剩余通道的信号强度接近于该切换帧的剩余通道的信号强度,使得听觉质量过渡更加平滑。
需要说明的是,在本申请实施例中,对于解码方案为第二解码方案的当前帧来说,解码端均可以通过基于分析合成滤波的时延对齐处理以进行时延对齐。
然后介绍对解码方案为第二解码方案的当前帧进行基于循环缓存的时延对齐处理的实现过程。在本申请实施例中,解码端重建出第一HOA信号之后,若当前帧的解码方案为第二解码方案且当前帧的前一帧的解码方案为第二解码方案,也即当前帧的前一帧为非切换帧,则解码端对重建的第一HOA信号进行循环缓存处理,以得到当前帧的重建HOA信号。也即是,对于解码方案为第二解码方案且前一帧为非切换帧的当前帧来说,解码端也可以基于循环缓存的时延对齐处理以进行时延对齐。而对于解码方案为第二解码方案且前一帧为切换帧的当前帧来说,解码端仍基于分析合成滤波的时延对齐处理以进行时延对齐。
可选地,解码端对重建的第一HOA信号进行循环缓存处理,以得到当前帧的重建HOA信号的实现过程为:解码端获取第一数据,将第一数据和第二数据进行合并,以得到当前帧的重建HOA信号。其中,第一数据为当前帧的前一帧HOA信号中位于第一时刻与前一帧HOA信号的结束时刻之间的数据,第一时刻与该结束时刻之间的时长为第一时长,即第一时刻为位于该结束时刻之前且距离该结束时刻第一时长的时刻,第一时长等于第一解码方案与第二解码方案之间的编码时延差。第二数据为该重建的第一HOA信号中位于该重建的第一HOA信号的起始时刻与第二时刻之间的数据,第二时刻与该起始时刻之间的时长为第二时长,即第二时刻为位于该起始时刻之后且距离该起始时刻第二时长的时刻,第一时长与第二时长之和等于当前帧的帧长。需要说明的是,在这种情况下,当前帧的前一帧也是基于MP的HOA编码方案编码的音频帧,也即当前帧的前一帧的解码方案也是第二解码方案,在当前帧的前一帧的解码过程中,也需先重建一个第一HOA信号,这里循环缓存处理中所讲的当前帧的前一帧HOA信号指的是重建的前一帧的第一HOA信号。
可选地,解码端将第一数据和第二数据进行合并,以得到当前帧的重建HOA信号之后,缓存第三数据,第三数据为该重建的第一HOA信号中除第二数据之外的数据。其中,第三数据用于当前帧的后一帧的解码。
示例性地,假设第一编码方案与第二编码方案之间的编码时延差为5ms(毫秒),当前帧的帧长为20ms,第一数据为缓存的5ms数据,这5ms数据为当前帧的前一帧HOA信号的末尾5ms数据,则解码端获取缓存的5ms数据,将这5ms数据与当前帧的重建的第一HOA信号的前15ms数据进行合并,以得到当前帧的重建HOA信号。另外,解码端还将当前帧的重建的第一HOA信号的尾部5ms数据进行缓存,以用于当前帧的后一帧的解码。例如,假设当前缓存的是第i帧对应的尾部5ms数据,i为正整数,若第i+1帧的解码方案为第二解码方案,则在解码到第i+1帧时,解码端重建第i+1帧的第一HOA信号,获取缓存的5ms数据,将获取的5ms数据与重建的第i+1帧的第一HOA信号的前15ms数据进行合并,以得到第i+1帧的重建HOA信号。若第i+1帧的解码方案为切换帧解码方案,则在解码到第i+1帧时,解码端获取缓存的5ms数据,在基于分析合成滤波处理对切换帧进行解码的过程中,这5ms数据将在分析合成滤波器中经过处理与第i+1帧对应的前15ms数据合并作为当前帧的重建HOA信号。
由以上的介绍可知,在本申请实施例中,对于切换帧来说,解码端均按照切换帧解码方案来解码切换帧,即需要对切换帧进行剩余通道信号重建(例如高阶部分重建)以及基于分析合成滤波的时延对齐处理。对于解码方案为第二解码方案的音频帧来说,解码端均进行基于分析合成滤波的时延对齐处理,可选地,还可以进行高阶增益调整。
或者,对于切换帧来说,解码端均按照切换帧解码方案来解码切换帧。对于解码方案为第二解码方案且前一帧是切换帧的音频帧来说,解码端进行基于分析合成滤波的时延对齐处理,可选地,还可以进行高阶增益调整。对于解码方案为第二解码方案且前一帧不是切换帧的音频帧来说,解码端进行基于循环缓存的时延对齐处理。对于待解码的第一个音频帧来说,若第一个音频帧的解码方案为第二解码方案,则解码端进行基于分析合成滤波的时延对齐处理或者基于循环缓存的时延对齐处理。
图9是本申请实施例提供的一种编码方案切换的编码示意图。参见图9,当前帧为切换帧,当前帧基于MP-W的HOA编码方案(即切换帧编码方案)进行编码。当前帧的前一帧为DirAC编码帧,该前一帧基于DirAC的HOA编码方案进行编码。当前帧的后一帧为MP编码帧,该后一帧基于MP的HOA编码方案进行编码。图9所示切换帧的切换状态为第一切换状态,第一切换状态是指从基于DirAC的HOA编码方案切换到基于MP的HOA编码方案的状态。其中,DirAC编码帧是指编码方案为第一编码方案的音频帧,MP编码帧是指编码方案为第二编码方案的音频帧。
图10是本申请实施例提供的一种编码方案切换的解码示意图。图10示出了切换帧的切换状态为如图9所示的第一切换状态的情况下的解码过程。参见图10,当前帧为切换帧,当前帧基于MP-W的HOA解码方案进行解码。当前帧的前一帧为DirAC解码帧,该前一帧基于DirAC的HOA解码方案进行解码。当前帧的后一帧为MP解码高阶增益调整帧,该后一帧基于MP的HOA解码方案进行解码,以及进行基于分析合成滤波的时延对齐处理和高阶增益调整。对于位于该MP解码高阶增益调整帧之后下一个切换帧之前的MP解码帧,也即后续的MP解码帧,基于MP的HOA解码方案进行解码,以及进行基于分析合成滤波的时延对 齐处理。其中,DirAC解码帧是指解码方案为第一解码方案的音频帧,MP解码帧是指解码方案为第二解码方案的音频帧。
图11是本申请实施例提供的另一种编码方案切换的解码示意图。图11示出了切换帧的切换状态为如图9所示的第一切换状态的情况下的解码过程。参见图11,图11所示的解码过程与图10所示的解码过程不同的是,位于该MP解码高阶增益调整帧之后下一个切换帧之前的MP解码帧,也即后续的MP解码帧,基于MP的HOA解码方案进行解码,以及进行基于循环缓存的时延对齐处理。
由上述可知,在本申请实施例中,在切换帧的切换状态为第一切换状态的情况下,也即需要从基于DirAC的HOA编码方案切换到基于MP的HOA编码方案,即,需要从大时延往小时延切换,由于基于MP的HOA解码方案本身解码时延小,且不包含时延对齐处理,那么需要对切换帧之后的MP解码帧进行时延对齐处理,对于切换帧来说,可认为本方案提供的切换帧编码方案本身包含时延对齐处理。在切换帧的切换状态为第二切换状态的情况下,也即需要从小时延往大时延切换,由于基于DirAC的HOA解码方案本身时延大,则不需要对切换帧之后的DirAC解码帧进行额外的处理。
综上所述,在本申请实施例中,由于基于方向音频编码的HOA解码方案的解码时延较大,对于通过第一编码方案编码的当前帧来说,按照第一解码方案解码当前帧的码流即可。对于不是通过第一编码方案编码的当前帧来说,先根据码流重建第二音频信号,再对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号,也即通过对齐处理使得当前帧的解码时延与第一解码方案的解码时延一致。这样,采用本方案能够使各个音频帧的解码时延均一致,也即保证时延对齐以使不同的编解码方案之间能够很好地切换。
图12是本申请实施例提供的一种解码装置1200的结构示意图,该解码装置1200可以由软件、硬件或者两者的结合实现成为解码端设备的部分或者全部,该解码端设备可以为上述实施例中的任一解码端设备。参见图12,该解码装置1200包括:第一确定模块1201、第一解码模块1202和第二解码模块1203。
第一确定模块1201,用于根据码流确定当前帧的解码方案,当前帧的解码方案为第一解码方案或非第一解码方案,第一解码方案为基于方向音频编码DirAC的高阶立体混响HOA解码方案;
第一解码模块1202,用于若当前帧的解码方案为第一解码方案,则按照第一解码方案,根据码流重建第一音频信号,重建的第一音频信号为当前帧的重建HOA信号;
第二解码模块1203,用于若当前帧的解码方案为非第一解码方案,则按照非第一解码方案,根据码流重建第二音频信号,对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号,对齐处理使得当前帧的解码时延与第一解码方案的解码时延一致。
可选地,非第一解码方案为第二解码方案或第三解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案;
第二解码模块1203,包括:
第一重建子模块,用于若当前帧的解码方案为第三解码方案,则根据码流重建指定通道的信号,重建的指定通道的信号为重建的第二音频信号,指定通道为当前帧的HOA信号的所有通道中的部分通道。
可选地,第二解码模块1203,包括:
分析滤波子模块,用于对重建的指定通道的信号进行分析滤波处理;
第一确定子模块,用于基于经分析滤波的指定通道的信号,确定当前帧的HOA信号中除指定通道之外的一个或多个剩余通道的增益;
第二确定子模块,用于基于该一个或多个剩余通道的增益和经分析滤波的指定通道的信号,确定一个或多个剩余通道的信号;
合成滤波子模块,用于对经分析滤波的指定通道的信号和该一个或多个剩余通道的信号进行合成滤波处理,以得到当前帧的重建HOA信号。
可选地,非第一解码方案为第二解码方案或第三解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案;
第二解码模块1203,包括:
第二重建子模块,用于若当前帧的解码方案为第二解码方案,则按照第二解码方案,根据码流重建第一HOA信号,重建的第一HOA信号为重建的第二音频信号。
可选地,第二解码模块1203,包括:
分析合成滤波子模块,用于对重建的第一HOA信号进行分析合成滤波处理,以得到当前帧的重建HOA信号。
可选地,分析合成滤波子模块用于:
对重建的第一HOA信号进行分析滤波处理,以得到第二HOA信号;
对第二HOA信号中一个或多个剩余通道的信号进行增益调整,以得到经增益调整的一个或多个剩余通道的信号,该一个或多个剩余通道为HOA信号中除指定通道之外的通道;
对第二HOA信号中指定通道的信号,以及经增益调整的一个或多个剩余通道的信号进行合成滤波处理,以得到当前帧的重建HOA信号。
可选地,分析合成滤波子模块用于:
若当前帧的前一帧的解码方案为第三解码方案,则根据当前帧的前一帧的一个或多个剩余通道的增益,对第二HOA信号中一个或多个剩余通道的信号进行增益调整,以得到经增益调整的一个或多个剩余通道的信号。
可选地,指定通道包括一阶立体混响FOA通道。
可选地,当前帧的前一帧的解码方案为第二解码方案;
第二解码模块1203,包括:
循环缓存子模块,用于对重建的第一HOA信号进行循环缓存处理,以得到当前帧的重建HOA信号。
可选地,循环缓存子模块用于:
获取第一数据,第一数据为当前帧的前一帧HOA信号中位于第一时刻与前一帧HOA信号的结束时刻之间的数据,第一时刻与结束时刻之间的时长为第一时长,第一时长等于第一解码方案与第二解码方案之间的编码时延差;
将第一数据和第二数据进行合并,以得到当前帧的重建HOA信号,第二数据为重建的第一HOA信号中位于重建的第一HOA信号的起始时刻与第二时刻之间的数据,第二时刻与起始时刻之间的时长为第二时长,第一时长与第二时长之和等于当前帧的帧长。
可选地,循环缓存子模块用于:
缓存第三数据,第三数据为重建的第一HOA信号中除第二数据之外的数据。
可选地,第一确定模块1201包括:
第一解析子模块,用于从码流中解析出当前帧的切换标志的值;
第二解析子模块,用于若切换标志的值为第一值,则从码流中解析当前帧的解码方案的指示信息,指示信息用于指示当前帧的解码方案为第一解码方案或第二解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案;
第三确定子模块,用于若切换标志的值为第二值,则确定当前帧的解码方案为第三解码方案,第三解码方案为混合解码方案。
可选地,第一确定模块1201包括:
第三解析子模块,用于从码流中解析出当前帧的解码方案的指示信息,指示信息用于指示当前帧的解码方案为第一解码方案、第二解码方案或第三解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案。
可选地,第一确定模块1201包括:
第四解析子模块,用于从码流中解析出当前帧的初始解码方案,初始解码方案为第一解码方案或第二解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案;
第四确定子模块,用于若当前帧的初始解码方案与当前帧的前一帧的初始解码方案相同,则确定当前帧的解码方案为当前帧的初始解码方案;
第五确定子模块,用于若当前帧的初始解码方案为第一解码方案且当前帧的前一帧的初始解码方案为第二解码方案,或当前帧的初始解码方案为第二解码方案且当前帧的前一帧的初始解码方案为第一解码方案,则确定当前帧的解码方案为第三解码方案,第三解码方案为混合解码方案。
在本申请实施例中,由于基于DirAC的HOA解码方案的解码时延较大,对于通过第一编码方案编码的当前帧来说,按照第一解码方案解码当前帧的码流即可。对于不是通过第一编码方案编码的当前帧来说,先根据码流重建第二音频信号,再对重建的第二音频信号进行对齐处理,以得到当前帧的重建HOA信号,也即通过对齐处理使得当前帧的解码时延与第一解码方案的解码时延一致。这样,采用本方案能够使各个音频帧的解码时延均一致,也即保证时延对齐以使不同的编解码方案之间能够很好地切换。
需要说明的是:上述实施例提供的解码装置在解码音频帧时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的解码装置与解码方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图13为用于本申请实施例的一种编解码装置1300的示意性框图。其中,编解码装置1300可以包括处理器1301、存储器1302和总线系统1303。其中,处理器1301和存储器1302通过总线系统1303相连,该存储器1302用于存储指令,该处理器1301用于执行该存储器1302存储的指令,以执行本申请实施例描述的各种的编码或解码方法。为避免重复,这里不再详细描述。
在本申请实施例中,该处理器1301可以是中央处理单元(central processing unit,CPU), 该处理器1301还可以是其他通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器1302可以包括ROM设备或者RAM设备。任何其他适宜类型的存储设备也可以用作存储器1302。存储器1302可以包括由处理器1301使用总线1303访问的代码和数据13021。存储器1302可以进一步包括操作系统13023和应用程序13022,该应用程序13022包括允许处理器1301执行本申请实施例描述的编码或解码方法的至少一个程序。例如,应用程序13022可以包括应用1至N,其进一步包括执行在本申请实施例描述的编码或解码方法的编码或解码应用(简称编解码应用)。
该总线系统1303除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统1303。
可选地,编解码装置1300还可以包括一个或多个输出设备,诸如显示器1304。在一个示例中,显示器1304可以是触感显示器,其将显示器与可操作地感测触摸输入的触感单元合并。显示器1304可以经由总线1303连接到处理器1301。
需要指出的是,编解码装置1300可以执行本申请实施例中的编码方法,也可执行本申请实施例中的解码方法。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,基于通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、DVD和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其 它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。在一种示例下,编码器100及解码器200中的各种说明性逻辑框、单元、模块可以理解为对应的电路器件或逻辑元件。
本申请实施例的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请实施例中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
也就是说,在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络或其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(digital subscriber line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质,或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(digital versatile disc,DVD))或半导体介质(例如:固态硬盘(solid state disk,SSD))等。值得注意的是,本申请实施例提到的计算机可读存储介质可以为非易失性存储介质,换句话说,可以是非瞬时性存储介质。
应当理解的是,本文提及的“至少一个”是指一个或多个,“多个”是指两个或两个以上。在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
需要说明的是,本申请实施例所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号,均为经用户授权或者经过各方充分授权的,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。例如,本申请实施例中涉及到的音频信号都是在充分授权的情况下获取的。
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (31)

  1. 一种解码方法,其特征在于,所述方法包括:
    根据码流确定当前帧的解码方案,所述当前帧的解码方案为第一解码方案或非第一解码方案,所述第一解码方案为基于方向音频编码DirAC的高阶立体混响HOA解码方案;
    若当前帧的解码方案为所述第一解码方案,则按照所述第一解码方案,根据码流重建第一音频信号,重建的所述第一音频信号为所述当前帧的重建HOA信号;
    若所述当前帧的解码方案为所述非第一解码方案,则按照所述非第一解码方案,根据所述码流重建第二音频信号,对重建的第二音频信号进行对齐处理,以得到所述当前帧的重建HOA信号,所述对齐处理使得所述当前帧的解码时延与所述第一解码方案的解码时延一致。
  2. 如权利要求1所述的方法,其特征在于,所述非第一解码方案为第二解码方案或第三解码方案,所述第二解码方案为基于虚拟扬声器选择的HOA解码方案,所述第三解码方案为混合解码方案;若所述当前帧的解码方案为所述第三解码方案,所述根据所述码流重建第二音频信号,包括:
    根据所述码流重建指定通道的信号,重建的指定通道的信号为所述重建的第二音频信号,所述指定通道为所述当前帧的HOA信号的所有通道中的部分通道。
  3. 如权利要求2所述的方法,其特征在于,所述对所述重建的第二音频信号进行对齐处理,以得到所述当前帧的重建HOA信号,包括:
    对所述重建的指定通道的信号进行分析滤波处理;
    基于经分析滤波的指定通道的信号,确定所述当前帧的HOA信号中除所述指定通道之外的一个或多个剩余通道的增益;
    基于所述一个或多个剩余通道的增益和所述经分析滤波的指定通道的信号,确定所述一个或多个剩余通道的信号;
    对所述经分析滤波的指定通道的信号和所述一个或多个剩余通道的信号进行合成滤波处理,以得到所述当前帧的重建HOA信号。
  4. 如权利要求1所述的方法,其特征在于,所述非第一解码方案为第二解码方案或第三解码方案,所述第二解码方案为基于虚拟扬声器选择的HOA解码方案,所述第三解码方案为混合解码方案;若所述当前帧的解码方案为所述第二解码方案,所述根据所述码流重建第二音频信号,包括:
    按照所述第二解码方案,根据所述码流重建第一HOA信号,重建的所述第一HOA信号为所述重建的第二音频信号。
  5. 如权利要求4所述的方法,其特征在于,所述对重建的第二音频信号进行对齐处理,以得到所述当前帧的重建HOA信号,包括:
    对重建的所述第一HOA信号进行分析合成滤波处理,以得到所述当前帧的重建HOA信 号。
  6. 如权利要求5所述的方法,其特征在于,所述对所述重建的所述第一HOA信号进行分析合成滤波处理,以得到所述当前帧的重建HOA信号,包括:
    对重建的所述第一HOA信号进行分析滤波处理,以得到第二HOA信号;
    对所述第二HOA信号中一个或多个剩余通道的信号进行增益调整,以得到经增益调整的所述一个或多个剩余通道的信号,所述一个或多个剩余通道为所述HOA信号中除指定通道之外的通道;
    对所述第二HOA信号中所述指定通道的信号,以及所述经增益调整的所述一个或多个剩余通道的信号进行合成滤波处理,以得到所述当前帧的重建HOA信号。
  7. 如权利要求6所述的方法,其特征在于,所述对所述第二HOA信号中一个或多个剩余通道的信号进行增益调整,以得到经增益调整的所述一个或多个剩余通道的信号,包括:
    若所述当前帧的前一帧的解码方案为所述第三解码方案,则根据所述当前帧的前一帧的所述一个或多个剩余通道的增益,对所述第二HOA信号中所述一个或多个剩余通道的信号进行增益调整,以得到所述经增益调整的所述一个或多个剩余通道的信号。
  8. 如权利要求2-3、6-7中任一所述的方法,其特征在于,所述指定通道包括一阶立体混响FOA通道。
  9. 如权利要求4所述的方法,其特征在于,所述当前帧的前一帧的解码方案为所述第二解码方案;
    所述对重建的所述第二音频信号进行对齐处理,以得到所述当前帧的重建HOA信号,包括:
    对所述重建的所述第一HOA信号进行循环缓存处理,以得到所述当前帧的重建HOA信号。
  10. 如权利要求9所述的方法,其特征在于,所述对所述重建的所述第一HOA信号进行循环缓存处理,以得到所述当前帧的重建HOA信号,包括:
    获取第一数据,所述第一数据为所述当前帧的前一帧HOA信号中位于第一时刻与所述前一帧HOA信号的结束时刻之间的数据,所述第一时刻与所述结束时刻之间的时长为第一时长,所述第一时长等于所述第一解码方案与所述第二解码方案之间的编码时延差;
    将所述第一数据和第二数据进行合并,以得到所述当前帧的重建HOA信号,所述第二数据为所述重建的所述第一HOA信号中位于所述重建的所述第一HOA信号的起始时刻与第二时刻之间的数据,所述第二时刻与所述起始时刻之间的时长为第二时长,所述第一时长与所述第二时长之和等于所述当前帧的帧长。
  11. 如权利要求10所述的方法,其特征在于,所述方法还包括:
    缓存第三数据,所述第三数据为所述重建的所述第一HOA信号中除所述第二数据之外 的数据。
  12. 如权利要求1-11任一所述的方法,其特征在于,所述根据码流确定当前帧的解码方案,包括:
    从所述码流中解析出所述当前帧的切换标志的值;
    若所述切换标志的值为第一值,则从所述码流中解析所述当前帧的解码方案的指示信息,所述指示信息用于指示所述当前帧的解码方案为所述第一解码方案或第二解码方案,所述第二解码方案为基于虚拟扬声器选择的HOA解码方案;
    若所述切换标志的值为第二值,则确定所述当前帧的解码方案为第三解码方案,所述第三解码方案为混合解码方案。
  13. 如权利要求1-11任一所述的方法,其特征在于,所述根据码流确定当前帧的解码方案,包括:
    从所述码流中解析出所述当前帧的解码方案的指示信息,所述指示信息用于指示所述当前帧的解码方案为所述第一解码方案、第二解码方案或第三解码方案,所述第二解码方案为基于虚拟扬声器选择的HOA解码方案,所述第三解码方案为混合解码方案。
  14. 如权利要求1-11任一所述的方法,其特征在于,所述根据码流确定当前帧的解码方案,包括:
    从所述码流中解析出所述当前帧的初始解码方案,所述初始解码方案为所述第一解码方案或第二解码方案,所述第二解码方案为基于虚拟扬声器选择的HOA解码方案;
    若所述当前帧的初始解码方案与所述当前帧的前一帧的初始解码方案相同,则确定所述当前帧的解码方案为所述当前帧的初始解码方案;
    若所述当前帧的初始解码方案为所述第一解码方案且所述当前帧的前一帧的初始解码方案为所述第二解码方案,或所述当前帧的初始解码方案为所述第二解码方案且所述当前帧的前一帧的初始解码方案为所述第一解码方案,则确定所述当前帧的解码方案为第三解码方案,所述第三解码方案为混合解码方案。
  15. 一种解码装置,其特征在于,所述装置包括:
    第一确定模块,用于根据码流确定当前帧的解码方案,所述当前帧的解码方案为第一解码方案或非第一解码方案,所述第一解码方案为基于方向音频编码DirAC的高阶立体混响HOA解码方案;
    第一解码模块,用于若当前帧的解码方案为所述第一解码方案,则按照所述第一解码方案,根据码流重建第一音频信号,重建的所述第一音频信号为所述当前帧的重建HOA信号;
    第二解码模块,用于若所述当前帧的解码方案为所述非第一解码方案,则按照所述非第一解码方案,根据所述码流重建第二音频信号,对重建的第二音频信号进行对齐处理,以得到所述当前帧的重建HOA信号,所述对齐处理使得所述当前帧的解码时延与所述第一解码方案的解码时延一致。
  16. 如权利要求15所述的装置,其特征在于,所述非第一解码方案为第二解码方案或第三解码方案,所述第二解码方案为基于虚拟扬声器选择的HOA解码方案,所述第三解码方案为混合解码方案;
    所述第二解码模块,包括:
    第一重建子模块,用于若所述当前帧的解码方案为所述第三解码方案,则根据所述码流重建指定通道的信号,重建的指定通道的信号为所述重建的第二音频信号,所述指定通道为所述当前帧的HOA信号的所有通道中的部分通道。
  17. 如权利要求16所述的装置,其特征在于,所述第二解码模块,包括:
    分析滤波子模块,用于对所述重建的指定通道的信号进行分析滤波处理;
    第一确定子模块,用于基于经分析滤波的指定通道的信号,确定所述当前帧的HOA信号中除所述指定通道之外的一个或多个剩余通道的增益;
    第二确定子模块,用于基于所述一个或多个剩余通道的增益和所述经分析滤波的指定通道的信号,确定所述一个或多个剩余通道的信号;
    合成滤波子模块,用于对所述经分析滤波的指定通道的信号和所述一个或多个剩余通道的信号进行合成滤波处理,以得到所述当前帧的重建HOA信号。
  18. 如权利要求15所述的装置,其特征在于,所述非第一解码方案为第二解码方案或第三解码方案,所述第二解码方案为基于虚拟扬声器选择的HOA解码方案,所述第三解码方案为混合解码方案;
    所述第二解码模块,包括:
    第二重建子模块,用于若所述当前帧的解码方案为所述第二解码方案,则按照所述第二解码方案,根据所述码流重建第一HOA信号,重建的所述第一HOA信号为所述重建的第二音频信号。
  19. 如权利要求18所述的装置,其特征在于,所述第二解码模块,包括:
    分析合成滤波子模块,用于对重建的所述第一HOA信号进行分析合成滤波处理,以得到所述当前帧的重建HOA信号。
  20. 如权利要求19所述的装置,其特征在于,所述分析合成滤波子模块用于:
    对重建的所述第一HOA信号进行分析滤波处理,以得到第二HOA信号;
    对所述第二HOA信号中一个或多个剩余通道的信号进行增益调整,以得到经增益调整的所述一个或多个剩余通道的信号,所述一个或多个剩余通道为所述HOA信号中除指定通道之外的通道;
    对所述第二HOA信号中所述指定通道的信号,以及所述经增益调整的所述一个或多个剩余通道的信号进行合成滤波处理,以得到所述当前帧的重建HOA信号。
  21. 如权利要求20所述的装置,其特征在于,所述分析合成滤波子模块用于:
    若所述当前帧的前一帧的解码方案为所述第三解码方案,则根据所述当前帧的前一帧的 所述一个或多个剩余通道的增益,对所述第二HOA信号中所述一个或多个剩余通道的信号进行增益调整,以得到所述经增益调整的所述一个或多个剩余通道的信号。
  22. 如权利要求16-17、20-21中任一所述的装置,其特征在于,所述指定通道包括一阶立体混响FOA通道。
  23. 如权利要求18所述的装置,其特征在于,所述当前帧的前一帧的解码方案为所述第二解码方案;
    所述第二解码模块,包括:
    循环缓存子模块,用于对所述重建的所述第一HOA信号进行循环缓存处理,以得到所述当前帧的重建HOA信号。
  24. 如权利要求23所述的装置,其特征在于,所述循环缓存子模块用于:
    获取第一数据,所述第一数据为所述当前帧的前一帧HOA信号中位于第一时刻与所述前一帧HOA信号的结束时刻之间的数据,所述第一时刻与所述结束时刻之间的时长为第一时长,所述第一时长等于所述第一解码方案与所述第二解码方案之间的编码时延差;
    将所述第一数据和第二数据进行合并,以得到所述当前帧的重建HOA信号,所述第二数据为所述重建的所述第一HOA信号中位于所述重建的所述第一HOA信号的起始时刻与第二时刻之间的数据,所述第二时刻与所述起始时刻之间的时长为第二时长,所述第一时长与所述第二时长之和等于所述当前帧的帧长。
  25. 如权利要求24所述的装置,其特征在于,所述循环缓存子模块用于:
    缓存第三数据,所述第三数据为所述重建的所述第一HOA信号中除所述第二数据之外的数据。
  26. 如权利要求15-25任一所述的装置,其特征在于,所述第一确定模块包括:
    第一解析子模块,用于从所述码流中解析出所述当前帧的切换标志的值;
    第二解析子模块,用于若所述切换标志的值为第一值,则从所述码流中解析所述当前帧的解码方案的指示信息,所述指示信息用于指示所述当前帧的解码方案为所述第一解码方案或第二解码方案,所述第二解码方案为基于虚拟扬声器选择的HOA解码方案;
    第三确定子模块,用于若所述切换标志的值为第二值,则确定所述当前帧的解码方案为第三解码方案,所述第三解码方案为混合解码方案。
  27. 如权利要求15-25任一所述的装置,其特征在于,所述第一确定模块包括:
    第三解析子模块,用于从所述码流中解析出所述当前帧的解码方案的指示信息,所述指示信息用于指示所述当前帧的解码方案为所述第一解码方案、第二解码方案或第三解码方案,所述第二解码方案为基于虚拟扬声器选择的HOA解码方案,所述第三解码方案为混合解码方案。
  28. 如权利要求15-25任一所述的装置,其特征在于,所述第一确定模块包括:
    第四解析子模块,用于从所述码流中解析出所述当前帧的初始解码方案,所述初始解码方案为所述第一解码方案或第二解码方案,所述第二解码方案为基于虚拟扬声器选择的HOA解码方案;
    第四确定子模块,用于若所述当前帧的初始解码方案与所述当前帧的前一帧的初始解码方案相同,则确定所述当前帧的解码方案为所述当前帧的初始解码方案;
    第五确定子模块,用于若所述当前帧的初始解码方案为所述第一解码方案且所述当前帧的前一帧的初始解码方案为所述第二解码方案,或所述当前帧的初始解码方案为所述第二解码方案且所述当前帧的前一帧的初始解码方案为所述第一解码方案,则确定所述当前帧的解码方案为第三解码方案,所述第三解码方案为混合解码方案。
  29. 一种解码端设备,其特征在于,所述解码端设备包括存储器和处理器;
    所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的所述计算机程序,以实现权利要求1-15任一项所述的解码方法。
  30. 一种计算机可读存储介质,其特征在于,所述存储介质内存储有指令,当所述指令在所述计算机上运行时,使得所述计算机执行权利要求1-15任一项所述的方法的步骤。
  31. 一种计算机程序产品,其特征在于,所述计算机程序产品包含指令,所述指令被处理器执行时实现如权利要求1-15中任一项所述的方法。
PCT/CN2022/120461 2021-09-29 2022-09-22 解码方法、装置、设备、存储介质及计算机程序产品 WO2023051367A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111155351.6A CN115881138A (zh) 2021-09-29 2021-09-29 解码方法、装置、设备、存储介质及计算机程序产品
CN202111155351.6 2021-09-29

Publications (1)

Publication Number Publication Date
WO2023051367A1 true WO2023051367A1 (zh) 2023-04-06

Family

ID=85756472

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/120461 WO2023051367A1 (zh) 2021-09-29 2022-09-22 解码方法、装置、设备、存储介质及计算机程序产品

Country Status (2)

Country Link
CN (1) CN115881138A (zh)
WO (1) WO2023051367A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104781879A (zh) * 2012-09-26 2015-07-15 摩托罗拉移动有限责任公司 用于对音频信号进行编码的方法和装置
CN106256001A (zh) * 2014-02-24 2016-12-21 三星电子株式会社 信号分类方法和装置以及使用其的音频编码方法和装置
CN109273017A (zh) * 2018-08-14 2019-01-25 Oppo广东移动通信有限公司 编码控制方法、装置以及电子设备
CN109300480A (zh) * 2017-07-25 2019-02-01 华为技术有限公司 立体声信号的编解码方法和编解码装置
CN109427337A (zh) * 2017-08-23 2019-03-05 华为技术有限公司 立体声信号编码时重建信号的方法和装置
WO2019228423A1 (zh) * 2018-05-31 2019-12-05 华为技术有限公司 立体声信号的编码方法和装置
CN113168838A (zh) * 2018-11-02 2021-07-23 杜比国际公司 音频编码器及音频解码器

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104781879A (zh) * 2012-09-26 2015-07-15 摩托罗拉移动有限责任公司 用于对音频信号进行编码的方法和装置
CN106256001A (zh) * 2014-02-24 2016-12-21 三星电子株式会社 信号分类方法和装置以及使用其的音频编码方法和装置
CN109300480A (zh) * 2017-07-25 2019-02-01 华为技术有限公司 立体声信号的编解码方法和编解码装置
CN109427337A (zh) * 2017-08-23 2019-03-05 华为技术有限公司 立体声信号编码时重建信号的方法和装置
WO2019228423A1 (zh) * 2018-05-31 2019-12-05 华为技术有限公司 立体声信号的编码方法和装置
CN109273017A (zh) * 2018-08-14 2019-01-25 Oppo广东移动通信有限公司 编码控制方法、装置以及电子设备
CN113168838A (zh) * 2018-11-02 2021-07-23 杜比国际公司 音频编码器及音频解码器

Also Published As

Publication number Publication date
CN115881138A (zh) 2023-03-31

Similar Documents

Publication Publication Date Title
US10224894B2 (en) Metadata for ducking control
CN107277691B (zh) 基于云的多声道音频播放方法、系统及音频网关装置
US20230137053A1 (en) Audio Coding Method and Apparatus
CA3200632A1 (en) Audio encoding and decoding method and apparatus
CN114067810A (zh) 音频信号渲染方法和装置
US20200020342A1 (en) Error concealment for audio data using reference pools
EP2610867B1 (en) Audio reproducing device and audio reproducing method
WO2021213128A1 (zh) 音频信号编码方法和装置
US20230105508A1 (en) Audio Coding Method and Apparatus
WO2023051367A1 (zh) 解码方法、装置、设备、存储介质及计算机程序产品
US20230145725A1 (en) Multi-channel audio signal encoding and decoding method and apparatus
WO2023051368A1 (zh) 编解码方法、装置、设备、存储介质及计算机程序产品
AU2021388397A1 (en) Audio encoding/decoding method and device
WO2023051370A1 (zh) 编解码方法、装置、设备、存储介质及计算机程序
JP2023523081A (ja) 音声信号に対するビット割り当て方法及び装置
WO2022242534A1 (zh) 编解码方法、装置、设备、存储介质及计算机程序
WO2022258036A1 (zh) 编解码方法、装置、设备、存储介质及计算机程序
WO2022012554A1 (zh) 多声道音频信号编码方法和装置
WO2023023504A1 (en) Wireless surround sound system with common bitstream
JP2024518846A (ja) 3次元オーディオ信号符号化方法および装置、ならびにエンコーダ
CN116582697A (zh) 音频传输方法、装置、终端、存储介质及程序产品
WO2023049628A1 (en) Efficient packet-loss protected data encoding and/or decoding
WO2021255327A1 (en) Managing network jitter for multiple audio streams
CN104735512A (zh) 一种同步音频数据的方法、设备及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874754

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE