WO2023051368A1 - Procédé et appareil de codage et de décodage, et dispositif, support de stockage et produit programme informatique - Google Patents

Procédé et appareil de codage et de décodage, et dispositif, support de stockage et produit programme informatique Download PDF

Info

Publication number
WO2023051368A1
WO2023051368A1 PCT/CN2022/120495 CN2022120495W WO2023051368A1 WO 2023051368 A1 WO2023051368 A1 WO 2023051368A1 CN 2022120495 W CN2022120495 W CN 2022120495W WO 2023051368 A1 WO2023051368 A1 WO 2023051368A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
scheme
current frame
encoding
decoding
Prior art date
Application number
PCT/CN2022/120495
Other languages
English (en)
Chinese (zh)
Inventor
刘帅
高原
王宾
王喆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023051368A1 publication Critical patent/WO2023051368A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters

Definitions

  • the embodiments of the present application relate to the technical field of audio processing, and in particular to a codec method, device, equipment, storage medium, and computer program product.
  • HOA Higher order ambisonics
  • One of the schemes is a codec scheme based on directional audio coding (directional audio coding, DirAC).
  • the encoder extracts the core layer signal and spatial parameters from the HOA signal of the current frame, and encodes the extracted core layer signal and spatial parameters into the code stream.
  • the decoding end uses a decoding method symmetrical to the encoding to reconstruct the HOA signal of the current frame from the code stream.
  • Another solution is a codec solution based on virtual speaker selection.
  • the encoder selects the target virtual speaker that matches the HOA signal of the current frame from the virtual speaker set based on the match-projection (MP) algorithm, and determines the virtual speaker based on the HOA signal of the current frame and the target virtual speaker signal, determine the residual signal based on the HOA signal of the current frame and the virtual speaker signal, and encode the virtual speaker signal and the residual signal into the code stream.
  • MP match-projection
  • the decoding end uses a decoding method symmetrical to the encoding to reconstruct the HOA signal of the current frame from the code stream.
  • the heterogeneous sound source refers to a point sound source with different positions and/or directions of the sound source.
  • the sound field types of different audio frames may be different. If you want to have a higher compression rate for audio frames under different sound field types at the same time, you need to use the sound field type of each audio frame as Select the appropriate codec scheme for the corresponding audio frame, so you need to switch between different codec schemes.
  • HOA signals reconstructed based on different codec schemes have different auditory quality after rendering and playback. When switching between different codec schemes, how to ensure the smooth transition of auditory quality is a problem that needs to be considered at present.
  • Embodiments of the present application provide a codec method, device, device, storage medium, and computer program product, capable of ensuring a smooth transition of auditory quality when switching between different codec schemes. Described technical scheme is as follows:
  • an encoding method which includes:
  • the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme; wherein, the first coding scheme is based on directional audio coding
  • the HOA encoding scheme namely the DirAC decoding scheme
  • the second encoding scheme is the HOA encoding scheme based on virtual speaker selection (which can be referred to simply as the MP-based HOA decoding scheme)
  • the third encoding scheme is a hybrid encoding scheme; if the encoding scheme of the current frame
  • the signal of the specified channel in the HOA signal is encoded into the code stream, and the specified channel is a part of all channels of the HOA signal.
  • the hybrid coding scheme will use both the technical means related to the first coding scheme (ie DirAC coding scheme) and the technical means related to the second coding scheme (MP-based HOA coding scheme) in the coding process, so it is called hybrid encoding scheme.
  • an appropriate codec scheme is selected for different audio frames, which can improve the compression rate of the audio signal.
  • a new codec scheme is used to code and decode these audio frames, that is, the HOA of these audio frames.
  • the signal of the specified channel in the signal is encoded into the code stream, that is, a compromise scheme is used for encoding and decoding, so that the auditory quality after rendering and playback of the decoded and recovered HOA signal can be smoothly transitioned.
  • the signal of the specified channel includes a first-order ambisonics (first-order ambisonics, FOA) signal
  • the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals.
  • encoding the signal of the specified channel in the HOA signal into the code stream includes: determining the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal, and the Z signal; encoding the virtual speaker signal and the residual signal input stream.
  • determining the virtual speaker signal and the residual signal includes: determining the W signal as a virtual speaker signal; based on the W signal, the X signal, the Y signal and the Z signal Three paths of residual signals are determined, or the X signal, Y signal and Z signal are determined as three paths of residual signals.
  • the difference signals between the X signal, the Y signal, and the Z signal and the W signal are determined as three-way residual signals.
  • encoding the virtual speaker signal and the residual signal into the code stream includes: combining the virtual speaker signal with the first preset mono signal to obtain a stereo signal; combining the three residual signals It is combined with the second preset mono signal to obtain two stereo signals; the obtained three stereo signals are respectively encoded into the code stream through a stereo encoder.
  • combining the three residual signals with the second preset mono signal to obtain two stereo signals includes: combining the two residual signals with the highest correlation among the three residual signals , to obtain one stereo signal among the two stereo signals; combining one residual signal of the three residual signals except the two residual signals with the highest correlation with the second preset mono signal, In order to obtain the other stereo signal of the two stereo signals.
  • the first preset monophonic signal is an all-zero signal or an all-one signal.
  • the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency point values are all zero.
  • the all-one signal includes The value of the sampling point is all one signal or the signal of the frequency point value is one; the second preset mono signal is all zero signal or all one signal; the first preset mono signal and the second the same or different preset mono signals.
  • encoding the virtual speaker signal and the residual signal into the code stream includes: respectively encoding the virtual speaker signal and the residual signals of the three residual signals into the code stream through a mono encoder flow.
  • the encoding scheme of the current frame after determining the encoding scheme of the current frame according to the HOA signal of the current frame, it further includes: if the encoding scheme of the current frame is the first encoding scheme, encoding the HOA signal into the code stream according to the first encoding scheme; if the current If the encoding scheme of the frame is the second encoding scheme, the HOA signal is encoded into the code stream according to the second encoding scheme.
  • determining the coding scheme of the current frame according to the high-order ambisonic reverberation HOA signal of the current frame includes: determining the initial coding scheme of the current frame according to the HOA signal, and the initial coding scheme is the first coding scheme or the second coding scheme; If the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame, then determine that the encoding scheme of the current frame is the initial encoding scheme of the current frame; if the initial encoding scheme of the current frame is the first encoding scheme and the current frame The initial encoding scheme of the previous frame is the second encoding scheme, or the initial encoding scheme of the current frame is the second encoding scheme and the initial encoding scheme of the previous frame of the current frame is the first encoding scheme, then determine the encoding scheme of the current frame is the third encoding scheme.
  • the method further includes: encoding the indication information of the initial encoding scheme of the current frame into a code stream.
  • determining the value of the switching flag of the current frame when the coding scheme of the current frame is the first coding scheme or the second coding scheme scheme, the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the third coding scheme, the value of the switching flag of the current frame is the second value; the value of the switching flag is encoded into the code stream. That is, a switch flag is used to indicate whether the current frame is a switch frame.
  • the method further includes: encoding the indication information of the coding scheme of the current frame into the code stream.
  • the specified channel is consistent with the preset transmission channel in the first encoding scheme. In this way, it can be ensured that the auditory quality of the switching frame is similar to that of the audio frame encoded by using the first encoding scheme.
  • a decoding method comprising:
  • the decoding scheme of the current frame is obtained based on the code stream, and the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme; wherein, the first decoding scheme is high-order stereo based on directional audio decoding Reverberation HOA decoding scheme, the second decoding scheme is the HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme; if the decoding scheme of the current frame is the third decoding scheme, the HOA of the current frame is determined based on the code stream
  • the signal of the specified channel in the signal, the specified channel is a part of all channels of the HOA signal; based on the signal of the specified channel, determine the gain of one or more remaining channels in the HOA signal except the specified channel; based on the specified channel
  • the signal of the signal and the gain of the one or more remaining channels determine the signal of each remaining channel in the one or more remaining channels; based on the signal of the specified channel and the signal of the one or
  • determining the signal of the specified channel in the HOA signal of the current frame based on the code stream includes: determining a virtual speaker signal and a residual signal based on the code stream; and determining a signal of the specified channel based on the virtual speaker signal and the residual signal.
  • determining the virtual speaker signal and the residual signal based on the code stream includes: decoding the code stream through a stereo decoder to obtain three stereo signals; based on the three stereo signals, determining one virtual speaker signal and three channels residual signal.
  • determining a virtual speaker signal and three residual signals based on the three stereo signals includes: determining a virtual speaker signal based on a stereo signal in the three stereo signals; The other two stereo signals are used to determine the three residual signals.
  • determining the virtual speaker signal and the residual signal based on the code stream includes: decoding the code stream by a monophonic decoder to obtain one virtual speaker signal and three residual signals.
  • the signal of the specified channel includes a first-order ambisonic reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals; based on the virtual speaker signal and the residual signal, the specified channel is determined
  • the signal includes: determining W signal based on the virtual speaker signal; determining X signal, Y signal and Z signal based on the residual signal and W signal, or determining X signal, Y signal and Z signal based on the residual signal.
  • obtaining the reconstructed HOA signal of the current frame according to the code stream includes: according to the second decoding scheme, obtaining the initial HOA signal according to the code stream; if the decoding scheme of the previous frame of the current frame is the third In the decoding scheme, gain adjustment is performed on the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame; based on the low-order part of the initial HOA signal and the gain-adjusted high-order part, a reconstructed HOA signal is obtained. That is, through high-order gain adjustments, the auditory quality is further smoothed.
  • obtaining the decoding scheme of the current frame based on the code stream includes: parsing the value of the switching flag of the current frame from the code stream; if the value of the switching flag is the first value, parsing the decoding scheme of the current frame from the code stream Scheme indication information, the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme; if the value of the switching flag is the second value, it is determined that the decoding scheme of the current frame is the third decoding scheme.
  • obtaining the decoding scheme of the current frame based on the code stream includes: parsing the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme, and the second decoding scheme. scheme or a third decoding scheme.
  • obtaining the decoding scheme of the current frame based on the code stream includes: parsing the initial decoding scheme of the current frame from the code stream, where the initial decoding scheme is the first decoding scheme or the second decoding scheme; if the initial decoding scheme of the current frame The same as the initial decoding scheme of the previous frame of the current frame, it is determined that the decoding scheme of the current frame is the initial decoding scheme of the current frame; if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame If the scheme is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme, then the decoding scheme of the current frame is determined to be the third decoding scheme.
  • an encoding device in a third aspect, is provided, and the encoding device has a function of implementing the behavior of the encoding method in the first aspect above.
  • the encoding device includes one or more modules, and the one or more modules are used to implement the encoding method provided in the first aspect above.
  • an encoding device comprising:
  • the first determination module is used to determine the coding scheme of the current frame according to the high-order ambisonics HOA signal of the current frame, and the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme;
  • the first coding scheme is an HOA coding scheme based on directional audio coding
  • the second coding scheme is an HOA coding scheme based on virtual speaker selection
  • the third coding scheme is a hybrid coding scheme
  • the first encoding module is configured to encode the signal of the specified channel in the HOA signal into the code stream if the encoding scheme of the current frame is the third encoding scheme, and the specified channel is a part of all channels of the HOA signal.
  • the signal of the designated channel includes a first-order ambisonic reverberation FOA signal
  • the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals.
  • the first determination submodule is used to determine the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal and the Z signal;
  • the encoding sub-module is used to encode the virtual loudspeaker signal and the residual signal into a code stream.
  • the first determination submodule is used for:
  • the three residual signals are determined based on the W signal, the X signal, the Y signal and the Z signal, or the X signal, the Y signal and the Z signal are determined as the three residual signals.
  • the encoding submodule is used to:
  • the obtained three-way stereo signals are respectively coded into bit streams through a stereo encoder.
  • the encoding submodule is used to:
  • the first preset monophonic signal is an all-zero signal or an all-one signal.
  • the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency point values are all zero.
  • the all-one signal includes The value of the sampling point is all one signal or the signal of the frequency point value is one; the second preset mono signal is all zero signal or all one signal; the first preset mono signal and the second the same or different preset mono signals.
  • the encoding submodule is used to:
  • the one channel of virtual loudspeaker signals and the residual signals of the three channels of residual signals are respectively coded into code streams through a mono encoder.
  • the device also includes:
  • the second encoding module is used to encode the HOA signal into the code stream according to the first encoding scheme if the encoding scheme of the current frame is the first encoding scheme;
  • the third encoding module is configured to encode the HOA signal into the code stream according to the second encoding scheme if the encoding scheme of the current frame is the second encoding scheme.
  • the first determination module includes:
  • the second determining submodule is used to determine the initial encoding scheme of the current frame according to the HOA signal, where the initial encoding scheme is the first encoding scheme or the second encoding scheme;
  • the third determining submodule is used to determine that the encoding scheme of the current frame is the initial encoding scheme of the current frame if the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame;
  • the fourth determining submodule is used to determine if the initial encoding scheme of the current frame is the first encoding scheme and the initial encoding scheme of the previous frame of the current frame is the second encoding scheme, or the initial encoding scheme of the current frame is the second encoding scheme and The initial encoding scheme of the frame preceding the current frame is the first encoding scheme, and then it is determined that the encoding scheme of the current frame is the third encoding scheme.
  • the device also includes:
  • the fourth encoding module is configured to encode the indication information of the initial encoding scheme of the current frame into the code stream.
  • the device also includes:
  • the second determination module is used to determine the value of the switching flag of the current frame.
  • the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the first value;
  • the encoding scheme is the third encoding scheme, the value of the switching flag of the current frame is the second value;
  • the fifth encoding module is used to encode the value of the switching flag into the code stream.
  • the device also includes:
  • the sixth encoding module is configured to encode the indication information of the encoding scheme of the current frame into the code stream.
  • the specified channel is consistent with the preset transmission channel in the first encoding scheme.
  • a decoding device in a fourth aspect, has the function of realizing the behavior of the decoding method in the second aspect above.
  • the decoding device includes one or more modules, and the one or more modules are used to implement the decoding method provided by the second aspect above.
  • a decoding device which includes:
  • the first obtaining module is used to obtain the decoding scheme of the current frame based on the code stream, and the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme; wherein, the first decoding scheme is based on High-order ambisonic reverberation HOA decoding scheme for directional audio decoding, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme;
  • the first determination module is used to determine the signal of the specified channel in the HOA signal of the current frame based on the code stream if the decoding scheme of the current frame is the third decoding scheme, and the specified channel is a part of all channels of the HOA signal;
  • the second determination module is used to determine the gain of one or more remaining channels in the HOA signal except the specified channel based on the signal of the specified channel;
  • a third determination module configured to determine the signal of each of the one or more remaining channels based on the signal of the specified channel and the gain of the one or more remaining channels;
  • the second obtaining module is configured to obtain the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signals of the one or more remaining channels.
  • the first determination module includes:
  • a first determining submodule configured to determine a virtual speaker signal and a residual signal based on a code stream
  • the second determining submodule is configured to determine the signal of the specified channel based on the virtual speaker signal and the residual signal.
  • the first determination submodule is used for:
  • one virtual speaker signal and three residual signals are determined.
  • the first determination submodule is used for:
  • the first determination submodule is used for:
  • the code stream is decoded by a monophonic decoder to obtain one virtual speaker signal and three residual signals.
  • the signal of the designated channel includes a first-order ambisonic reverberation FOA signal
  • the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals;
  • the first determined submodule is used for:
  • the X signal, the Y signal and the Z signal are determined based on the residual signal and the W signal, or the X signal, the Y signal and the Z signal are determined based on the residual signal.
  • the device also includes:
  • the first decoding module is used to obtain the reconstructed HOA signal of the current frame according to the code stream according to the first decoding scheme if the decoding scheme of the current frame is the first decoding scheme;
  • the second decoding module is configured to obtain the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme if the decoding scheme of the current frame is the second decoding scheme.
  • the second decoding module includes:
  • the first obtaining submodule is used to obtain the initial HOA signal according to the code stream according to the second decoding scheme
  • the gain adjustment submodule is used to adjust the gain of the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame if the decoding scheme of the previous frame of the current frame is the third decoding scheme;
  • the second obtaining sub-module is used to obtain the reconstructed HOA signal based on the low-order part and the gain-adjusted high-order part of the original HOA signal.
  • the first obtaining module includes:
  • the first parsing submodule is used to parse out the value of the switching flag of the current frame from the code stream;
  • the second parsing submodule is used to parse the indication information of the decoding scheme of the current frame from the code stream if the value of the switching flag is the first value, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or second decoding scheme;
  • the third determining submodule is configured to determine that the decoding scheme of the current frame is the third decoding scheme if the value of the switching flag is the second value.
  • the first obtaining module includes:
  • the third parsing sub-module is used to parse out the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.
  • the first obtaining module includes:
  • the fourth parsing submodule is used to parse out the initial decoding scheme of the current frame from the code stream, where the initial decoding scheme is the first decoding scheme or the second decoding scheme;
  • the fourth determining submodule is used to determine that the decoding scheme of the current frame is the initial decoding scheme of the current frame if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame;
  • the fifth determining submodule is used to determine if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and The initial decoding scheme of the previous frame of the current frame is the first decoding scheme, and then it is determined that the decoding scheme of the current frame is the third decoding scheme.
  • an encoding end device includes a processor and a memory, and the memory is used to store a program for executing the encoding method provided in the above first aspect, and to store a program for realizing the above first aspect.
  • the processor is configured to execute programs stored in the memory.
  • the operating device of the storage device may further include a communication bus for establishing a connection between the processor and the memory.
  • a decoding end device includes a processor and a memory, and the memory is used to store a program for executing the decoding method provided in the above second aspect, and to store a program for implementing the above second The data involved in the decode method provided by the aspect.
  • the processor is configured to execute programs stored in the memory.
  • the operating device of the storage device may further include a communication bus for establishing a connection between the processor and the memory.
  • a computer-readable storage medium is provided. Instructions are stored in the computer-readable storage medium. When the instructions are run on a computer, the computer executes the encoding method or the second encoding method described in the first aspect above. The decoding method described in the aspect.
  • the eighth aspect provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the encoding method described in the first aspect or the decoding method described in the second aspect.
  • two schemes i.e. the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding
  • the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an implementation environment of a terminal scenario provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an implementation environment of a transcoding scenario of a wireless or core network device provided in an embodiment of the present application;
  • FIG. 4 is a schematic diagram of an implementation environment of a broadcast television scene provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an implementation environment of a virtual reality streaming scene provided by an embodiment of the present application.
  • FIG. 6 is a flow chart of an encoding method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a switching frame coding scheme provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an HOA coding scheme based on virtual speaker selection provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a DirAC-based HOA coding scheme provided by an embodiment of the present application.
  • FIG. 10 is a flow chart of another encoding method provided by the embodiment of the present application.
  • FIG. 11 is a flow chart of a decoding method provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a switching frame decoding scheme provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of an HOA decoding scheme based on virtual speaker selection provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a DirAC-based HOA decoding scheme provided by an embodiment of the present application.
  • Fig. 15 is a flow chart of another decoding method provided by the embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of an encoding device provided by an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of a decoding device provided by an embodiment of the present application.
  • Fig. 18 is a schematic block diagram of a codec device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • the implementation environment includes source device 10 , destination device 20 , link 30 and storage device 40 .
  • the source device 10 may generate encoded media data. Therefore, the source device 10 may also be called a media data encoding device.
  • Destination device 20 may decode the encoded media data generated by source device 10 . Accordingly, destination device 20 may also be referred to as a media data decoding device.
  • Link 30 may receive encoded media data generated by source device 10 and may transmit the encoded media data to destination device 20 .
  • the storage device 40 can receive the encoded media data generated by the source device 10, and can store the encoded media data.
  • the destination device 20 can directly obtain the encoded media from the storage device 40.
  • the storage device 40 may correspond to a file server or another intermediate storage device that may save encoded media data generated by the source device 10, in which case the destination device 20 may transmit or download the media data from the storage device 40 via streaming or downloading. Stored encoded media data.
  • Both the source device 10 and the destination device 20 may include one or more processors and a memory coupled to the one or more processors, and the memory may include random access memory (random access memory, RAM), read-only memory ( read-only memory, ROM), charged erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), flash memory, can be used to store the desired program in the form of instructions or data structures that can be accessed by the computer Any other media etc. of the code.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory can be used to store the desired program in the form of instructions or data structures that can be accessed by the computer Any other media etc. of the code.
  • both source device 10 and destination device 20 may include desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called “smart" phones, Televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.
  • Link 30 may include one or more media or devices capable of transmitting encoded media data from source device 10 to destination device 20 .
  • link 30 may include one or more communication media that enable source device 10 to transmit encoded media data directly to destination device 20 in real-time.
  • the source device 10 may modulate the encoded media data based on a communication standard, such as a wireless communication protocol, etc., and may send the modulated media data to the destination device 20 .
  • the one or more communication media may include wireless and/or wired communication media, for example, the one or more communication media may include radio frequency (radio frequency, RF) spectrum or one or more physical transmission lines.
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet), among others.
  • the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the source device 10 to the destination device 20, etc., which are not specifically limited in this embodiment of the present application.
  • the storage device 40 may store the received encoded media data sent by the source device 10 , and the destination device 20 may directly acquire the encoded media data from the storage device 40 .
  • the storage device 40 may include any one of a variety of distributed or locally accessed data storage media, for example, any one of the various distributed or locally accessed data storage media may be Hard disk drive, Blu-ray Disc, digital versatile disc (DVD), compact disc read-only memory (CD-ROM), flash memory, volatile or nonvolatile memory, or Any other suitable digital storage medium for storing encoded media data, etc.
  • the storage device 40 may correspond to a file server or another intermediate storage device that may save the encoded media data generated by the source device 10, and the destination device 20 may transmit or download the storage device via streaming or downloading. 40 stored media data.
  • the file server may be any type of server capable of storing encoded media data and sending the encoded media data to destination device 20 .
  • the file server may include a network server, a file transfer protocol (file transfer protocol, FTP) server, a network attached storage (network attached storage, NAS) device, or a local disk drive.
  • Destination device 20 may obtain encoded media data over any standard data connection, including an Internet connection.
  • Any standard data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a digital subscriber line (DSL), cable modem, etc.), or is suitable for obtaining encoded data stored on a file server.
  • a wireless channel e.g., a Wi-Fi connection
  • a wired connection e.g., a digital subscriber line (DSL), cable modem, etc.
  • DSL digital subscriber line
  • cable modem etc.
  • the transmission of encoded media data from storage device 40 may be a streaming transmission, a download transmission, or a combination of both.
  • the implementation environment shown in Figure 1 is only a possible implementation, and the technology of the embodiment of the present application is not only applicable to the source device 10 shown in Figure 1 that can encode media data, but also can encode the encoded media
  • the destination device 20 for decoding data may also be applicable to other devices capable of encoding media data and decoding encoded media data, which is not specifically limited in this embodiment of the present application.
  • the source device 10 includes a data source 120 , an encoder 100 and an output interface 140 .
  • output interface 140 may include a conditioner/demodulator (modem) and/or a transmitter, where a transmitter may also be referred to as a transmitter.
  • Data source 120 may include an image capture device (e.g., video camera, etc.), an archive containing previously captured media data, a feed interface for receiving media data from a media data content provider, and/or a computer for generating media data graphics system, or a combination of these sources of media data.
  • the data source 120 may send media data to the encoder 100, and the encoder 100 may encode the received media data sent by the data source 120 to obtain encoded media data.
  • An encoder may send encoded media data to an output interface.
  • source device 10 sends the encoded media data directly to destination device 20 via output interface 140 .
  • encoded media data may also be stored on storage device 40 for later retrieval by destination device 20 for decoding and/or display.
  • the destination device 20 includes an input interface 240 , a decoder 200 and a display device 220 .
  • input interface 240 includes a receiver and/or a modem.
  • the input interface 240 can receive the encoded media data via the link 30 and/or from the storage device 40, and then send it to the decoder 200, and the decoder 200 can decode the received encoded media data to obtain the decoded media data. media data.
  • the decoder may transmit the decoded media data to the display device 220 .
  • the display device 220 may be integrated with the destination device 20 or may be external to the destination device 20 . In general, the display device 220 displays the decoded media data.
  • the display device 220 can be any type of display device in various types, for example, the display device 220 can be a liquid crystal display (liquid crystal display, LCD), a plasma display, an organic light-emitting diode (organic light-emitting diode, OLED) monitor or other type of display device.
  • the display device 220 can be a liquid crystal display (liquid crystal display, LCD), a plasma display, an organic light-emitting diode (organic light-emitting diode, OLED) monitor or other type of display device.
  • encoder 100 and decoder 200 may be individually integrated with the encoder and decoder, and may include appropriate multiplexer-demultiplexer (multiplexer-demultiplexer) , MUX-DEMUX) unit or other hardware and software for encoding both audio and video in a common data stream or in separate data streams.
  • the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as user datagram protocol (UDP), if applicable.
  • Each of the encoder 100 and the decoder 200 can be any one of the following circuits: one or more microprocessors, digital signal processing (digital signal processing, DSP), application specific integrated circuit (application specific integrated circuit, ASIC) ), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the techniques of the embodiments of the present application are implemented partially in software, the device may store instructions for the software in a suitable non-transitory computer-readable storage medium, and may use one or more processors in hardware The instructions are executed to implement the technology of the embodiments of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered to be one or more processors. Each of encoder 100 and decoder 200 may be included in one or more encoders or decoders, either of which may be integrated into a combined encoding in a corresponding device Part of a codec/decoder (codec).
  • codec codec/decoder
  • Embodiments of the present application may generally refer to the encoder 100 as “signaling” or “sending” certain information to another device such as the decoder 200 .
  • the term “signaling” or “sending” may generally refer to the transmission of syntax elements and/or other data for decoding compressed media data. This transfer can occur in real time or near real time. Alternatively, this communication may occur after a period of time, such as upon encoding when storing syntax elements in an encoded bitstream to a computer-readable storage medium, which the decoding device may then perform after the syntax elements are stored on this medium The syntax element is retrieved at any time.
  • the encoding and decoding methods provided in the embodiments of the present application can be applied to various scenarios. Next, several scenarios will be introduced by taking the media data to be encoded as an HOA signal as an example.
  • FIG. 2 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a terminal scenario.
  • the implementation environment includes a first terminal 101 and a second terminal 201 , and the first terminal 101 and the second terminal 201 are connected in communication.
  • the communication connection may be a wireless connection or a wired connection, which is not limited in this embodiment of the present application.
  • the first terminal 101 may be a sending end device or a receiving end device.
  • the second terminal 201 may be a receiving end device or a sending end device.
  • the first terminal 101 is a sending end device
  • the second terminal 201 is a receiving end device
  • the first terminal 101 is a receiving end device
  • the second terminal 201 is a sending end device.
  • Both the first terminal 101 and the second terminal 201 include an audio collection module, an audio playback module, an encoder, a decoder, a channel encoding module and a channel decoding module.
  • the encoder is a three-dimensional audio encoder
  • the decoder is a three-dimensional audio decoder.
  • the audio collection module in the first terminal 101 collects the HOA signal and transmits it to the encoder.
  • the encoder encodes the HOA signal using the encoding method provided in the embodiment of the present application.
  • the encoding may be called source encoding. Later, in order to realize the transmission of the HOA signal in the channel, the channel coding module needs to perform channel coding again, and then transmit the encoded code stream in the digital channel through the wireless or wired network communication equipment.
  • the second terminal 201 receives the code stream transmitted in the digital channel through a wireless or wired network communication device, the channel decoding module performs channel decoding on the code stream, and then the decoder decodes the HOA signal by using the decoding method provided in the embodiment of this application, and then passes the audio Playback module to play.
  • the first terminal 101 and the second terminal 201 can be any electronic product that can interact with the user through one or more ways such as keyboard, touch pad, touch screen, remote control, voice interaction or handwriting equipment, etc.,
  • Such as personal computer personal computer, PC
  • mobile phone smart phone
  • personal digital assistant personal digital assistant, PDA
  • wearable device PPC (pocket PC)
  • tablet computer smart car machine, smart TV, smart speaker wait.
  • FIG. 3 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a transcoding scenario of a wireless or core network device.
  • the implementation environment includes a channel decoding module, an audio decoder, an audio encoder and a channel encoding module.
  • the audio encoder is a three-dimensional audio encoder
  • the audio decoder is a three-dimensional audio decoder.
  • the audio decoder may be a decoder using the decoding method provided in the embodiment of the present application, or may be a decoder using other decoding methods.
  • the audio encoder may be an encoder using the encoding method provided by the embodiment of the present application, or may be an encoder using other encoding methods.
  • the audio encoder is a coder using other encoding methods
  • the audio The encoder is an encoder using the encoding method provided by the embodiment of the present application.
  • the audio decoder is a decoder using the decoding method provided by the embodiment of the present application, and the audio encoder is an encoder using other encoding methods.
  • the channel decoding module is used to perform channel decoding on the received code stream, and then the audio decoder is used to use the decoding method provided by the embodiment of the application to perform source decoding, and then the audio encoder is used to encode according to other encoding methods to achieve a
  • the conversion from one format to another is known as transcoding. After that, it is sent after channel coding.
  • the audio decoder is a decoder using other decoding methods
  • the audio encoder is an encoder using the encoding method provided by the embodiment of the present application.
  • the channel decoding module is used to perform channel decoding on the received code stream, and then the audio decoder is used to use other decoding methods to perform source decoding, and then the audio encoder uses the encoding method provided by the embodiment of the application to perform encoding to realize a
  • the conversion from one format to another is known as transcoding. After that, it is sent after channel coding.
  • the wireless device may be a wireless access point, a wireless router, a wireless connector, and the like.
  • a core network device may be a mobility management entity, a gateway, and the like.
  • FIG. 4 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a broadcast television scene.
  • the broadcast TV scene is divided into a live scene and a post-production scene.
  • the implementation environment includes a live program 3D sound production module, a 3D sound encoding module, a set-top box and a speaker group, and the set-top box includes a 3D sound decoding module.
  • the implementation environment includes post-program 3D sound production modules, 3D sound coding modules, network receivers, mobile terminals, earphones, and the like.
  • the three-dimensional sound production module of the live program produces a three-dimensional sound signal (such as an HOA signal), and the three-dimensional sound signal obtains a code stream by applying the encoding method of the embodiment of the application, and the code stream is transmitted to the user side through the radio and television network, and the The 3D sound decoder in the set-top box uses the decoding method provided by the embodiment of the present application to decode the code stream, thereby reconstructing the 3D sound signal, which is played back by the speaker group.
  • a three-dimensional sound signal such as an HOA signal
  • the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the network receiver decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back by the speaker group.
  • the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the mobile terminal decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back through the earphone.
  • the post-program 3D sound production module produces a 3D sound signal, and the 3D sound signal obtains a code stream by applying the encoding method of the embodiment of the application.
  • the acoustic decoder uses the decoding method provided by the embodiment of the present application to decode the code stream, so as to reconstruct the three-dimensional acoustic signal, which is played back by the speaker group.
  • the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the network receiver decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back by the speaker group.
  • the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the mobile terminal decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back through the earphone.
  • FIG. 5 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a virtual reality streaming scene.
  • the implementation environment includes an encoding end and a decoding end.
  • the encoding end includes an acquisition module, a preprocessing module, an encoding module, a packaging module and a sending module
  • the decoding end includes an unpacking module, a decoding module, a rendering module and earphones.
  • the acquisition module collects the HOA signal, and then preprocesses the HOA signal through the preprocessing module.
  • the preprocessing operation includes filtering out the low frequency part of the HOA signal, usually using 20Hz or 50Hz as the cut-off point to extract the orientation information in the HOA signal wait.
  • use the encoding module to perform encoding processing using the encoding method provided by the embodiment of the present application. After encoding, use the packing module to pack and send to the decoding end through the sending module.
  • the unpacking module at the decoding end first unpacks, and then uses the decoding method provided by the embodiment of the application to decode through the decoding module, and then performs binaural rendering processing on the decoded signal through the rendering module, and the rendered signal is mapped to the listener's earphones superior.
  • the earphone can be an independent earphone, or an earphone on a virtual reality glasses device.
  • FIG. 6 is a flow chart of an encoding method provided by an embodiment of the present application, and the encoding method is applied to an encoding end. Please refer to FIG. 6 , the method includes the following steps.
  • Step 601 Determine the coding scheme of the current frame according to the HOA signal of the current frame.
  • the encoder performs encoding frame by frame.
  • the HOA signal of the audio frame is an audio signal obtained through the HOA acquisition technology.
  • the HOA signal is a scene audio signal and also a three-dimensional audio signal.
  • the HOA signal refers to the audio signal obtained by collecting the sound field where the microphone is located in the space.
  • the collected audio signal is called the original HOA signal.
  • the HOA signal of the audio frame may also be an HOA signal obtained by converting a 3D audio signal in another format. For example, convert a 5.1-channel signal into an HOA signal, or convert a 3D audio signal mixed with a 5.1-channel signal and object audio into an HOA signal.
  • the HOA signal of the audio frame to be encoded is a time-domain signal or a frequency-domain signal, and may include all channels of the HOA signal, or may include some channels of the HOA signal.
  • the order of the HOA signal of the audio frame is 3, the number of channels of the HOA signal is 16, the frame length of the audio frame is 20ms, and the sampling rate is 48KHz, then the HOA signal of the audio frame to be encoded contains 16 channels The signal, each channel contains 960 sampling points.
  • the encoder can down-sample the original HOA signal to obtain the The HOA signal of the audio frame. For example, the encoder performs 1/Q down-sampling on the original HOA signal to reduce the number of sampling points or frequency points of the HOA signal to be encoded. For example, in the embodiment of the present application, each channel of the original HOA signal contains 960 sampling points. After /120 downsampling, each channel of the HOA signal to be encoded contains 8 sampling points.
  • the encoding method of the encoding end is introduced by taking the encoding end encoding the current frame as an example.
  • the current frame is an audio frame to be encoded. That is, the encoding end acquires the HOA signal of the current frame, and encodes the HOA signal of the current frame by using the encoding method provided in the embodiment of the present application.
  • the encoding end first determines the initial encoding scheme of the current frame according to the HOA signal of the current frame, and the initial encoding scheme is the first encoding scheme or the second encoding scheme. The encoding end judges whether the first encoding scheme, the second encoding scheme or the third encoding scheme is used to encode the HOA signal of the current frame by comparing the initial encoding scheme of the current frame with the initial encoding scheme of the previous frame of the current frame. .
  • the encoding end uses the encoding scheme consistent with the initial encoding scheme of the current frame to encode the HOA signal of the current frame. If the initial coding scheme of the current frame is different from the initial coding scheme of the previous frame of the current frame, the encoding end uses the switching frame coding scheme to encode the HOA signal of the current frame.
  • the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme.
  • the first coding scheme is a DirAC-based HOA coding scheme
  • the second coding scheme is an HOA coding scheme based on virtual speaker selection
  • the third coding scheme is a hybrid coding scheme.
  • the hybrid coding scheme is also referred to as a switched frame coding scheme.
  • the third coding scheme is a switching frame coding scheme provided by the embodiment of the present application, and the third coding scheme is for smooth transition of auditory quality when switching between different codec schemes.
  • the HOA coding scheme based on virtual speaker selection is also referred to as the MP-based HOA coding scheme.
  • the coding end determines the initial coding scheme of the current frame according to the HOA signal of the current frame. Then, the encoding end determines the encoding scheme of the current frame based on the initial encoding scheme of the current frame and the initial encoding scheme of the previous frame of the current frame. It should be noted that this embodiment of the present application does not limit the implementation manner in which the encoding end determines the initial encoding scheme.
  • the coding end analyzes the sound field type of the HOA signal of the current frame to obtain the sound field classification result of the current frame, and determines the initial coding scheme of the current frame based on the sound field classification result of the current frame.
  • the embodiment of the present application does not limit the method of sound field type analysis, for example, the encoding end performs singular value decomposition on the HOA signal of the current frame to perform sound field type analysis, or performs other linear decomposition on the HOA signal to perform sound field analysis. type analysis.
  • the sound field classification result includes the number of distinct sound sources.
  • the encoding end analyzes the sound field type of the HOA signal of the current frame to obtain the sound field classification result of the current frame: the encoding end analyzes the current frame Singular value decomposition is performed on the HOA signal to obtain M singular values.
  • the encoding end determines the number of different sound sources corresponding to the current frame based on the M-1 sound field classification parameters.
  • the encoder determines the number of dissimilar sound sources of the current frame
  • the initial encoding scheme is the second encoding scheme. If the number of dissimilar sound sources corresponding to the current frame is not greater than the first threshold or not less than the second threshold, the encoder determines that the initial encoding scheme of the current frame is the first encoding scheme.
  • the first threshold is smaller than the second threshold.
  • the first threshold is 0 or other values
  • the second threshold is 3 or other values.
  • the aforementioned first threshold and second threshold are preset values, which can be preset based on experience or through statistics.
  • the sound field classification result includes sound field types, and the sound field types are divided into diffuse sound fields and heterogeneous sound fields.
  • the sound field type may be determined according to the number of distinct sound sources obtained by the foregoing method, that is, the encoder determines the sound field type of the current frame based on the number of distinct sound sources corresponding to the current frame. For example, if the number of distinct sound sources corresponding to the current frame is greater than the first threshold and smaller than the second threshold, the encoder determines that the sound field type of the current frame is a distinct sound field. If the number of dissimilar sound sources corresponding to the current frame is not greater than the first threshold or not less than the second threshold, the encoder determines that the sound field type of the current frame is a diffuse sound field.
  • the encoder determines that the initial encoding scheme of the current frame is the second encoding scheme, that is, the MP-based HOA encoding scheme. If the sound field type of the current frame is a diffuse sound field type, the encoding end determines that the initial encoding scheme of the current frame is the first encoding scheme, that is, the HOA encoding scheme based on DirAC.
  • the initial encoding scheme of each audio frame may be switched back and forth, that is, there are more switching frames that need to be encoded in the end . Since there are many problems caused by the switching between encoding schemes, that is, there are many problems to be solved, the problems caused by the switching can be reduced by reducing the number of switching frames.
  • the encoding end can first determine the expected encoding scheme of the current frame according to the sound field classification result of the current frame, that is, the encoding end uses the initial encoding scheme determined according to the aforementioned method as the expected encoding scheme. Then, the encoding end uses a sliding window method to update the initial encoding scheme of the current frame based on the expected encoding scheme, for example, the encoding end updates the initial encoding scheme of the current frame through hangover processing.
  • the sliding window includes the predicted coding scheme of the current frame and the updated initial coding scheme of the previous N ⁇ 1 frames of the current frame. If the cumulative number of second coding schemes in the sliding window is not less than the first specified threshold, the encoder updates the initial coding scheme of the current frame to the second coding scheme. If the cumulative number of second coding schemes in the sliding window is less than the first specified threshold, the encoder updates the initial coding scheme of the current frame to the first coding scheme.
  • the length N of the sliding window is 8, 10, 15, etc.
  • the first specified threshold is 5, 6, 7, etc. The embodiment of the present application does not limit the length of the sliding window and the value of the first specified threshold.
  • An example is as follows, assuming that the length of the sliding window is 10, the first specified threshold is 7, and the sliding window contains the predicted coding scheme of the current frame and the updated initial coding scheme of the first 9 frames of the current frame. If the second When the number of coding schemes accumulates to no less than 7, the encoding end determines the initial encoding scheme of the current frame as the second encoding scheme; if the number of second encoding schemes in the sliding window accumulates to less than 7, the encoding end determines The initial encoding scheme is updated to the first encoding scheme.
  • the encoder updates the initial coding scheme of the current frame to the first coding scheme. If the cumulative number of the first coding scheme in the sliding window is less than the second specified threshold, the encoder updates the initial coding scheme of the current frame to the second coding scheme.
  • the second designated threshold value is 5, 6, 7 and other values, and the embodiment of the present application does not limit the value of the second designated threshold value.
  • the second specified threshold is different from or the same as the above-mentioned first specified threshold.
  • the encoder can also use other methods to obtain the sound field classification result of the current frame, and other methods can also be used to determine the initial coding scheme based on the sound field classification result. Not limited.
  • the encoding end determines the initial encoding scheme of the current frame, if the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame, the encoding end determines that the encoding scheme of the current frame is the current frame initial encoding scheme. If the initial encoding scheme of the current frame is different from the initial encoding scheme of the frame preceding the current frame, the encoder determines that the encoding scheme of the current frame is the third encoding scheme.
  • the encoder determines that the coding scheme of the current frame is the first coding scheme. If the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame and is the second coding scheme, the encoder determines that the coding scheme of the current frame is the second coding scheme. If one of the initial coding scheme of the current frame and the initial coding scheme of the previous frame of the current frame is the first coding scheme, and the other is the second coding scheme, the encoder determines that the coding scheme of the current frame is the third coding scheme.
  • one of the initial coding scheme of the current frame and the initial coding scheme of the previous frame of the current frame is the first coding scheme
  • the other is the second coding scheme
  • the initial coding scheme of the current frame is the first coding scheme
  • the initial encoding scheme of the frame preceding the current frame is the second encoding scheme
  • the initial encoding scheme of the current frame is the second encoding scheme
  • the initial encoding scheme of the current frame is the second encoding scheme and the initial encoding scheme of the frame preceding the current frame is the first encoding scheme. That is, for the switching frame, the encoding end neither adopts the first encoding scheme nor the second encoding scheme to encode the HOA signal of the switching frame, but uses the switching frame encoding scheme to encode the HOA signal of the switching frame.
  • the coding end will use a coding scheme consistent with the initial coding scheme of the non-switching frame to code the HOA signal of the switching frame.
  • a coding scheme consistent with the initial coding scheme of the non-switching frame to code the HOA signal of the switching frame.
  • an audio frame whose initial coding scheme is different from that of the previous frame is a switching frame
  • an audio frame whose initial coding scheme is the same as that of the previous frame is a non-switching frame.
  • the encoding end in addition to determining the encoding scheme of the current frame, the encoding end also needs to encode information that can indicate the encoding scheme of the current frame into the code stream, so that the decoding end can determine which decoding scheme to use to decode the code stream of the current frame .
  • the encoding end there are many ways for the encoding end to encode information capable of indicating the encoding scheme of the current frame into the code stream, and three implementation ways will be introduced next.
  • the first implementation the code switching flag and the indication information of the two coding schemes
  • the indication information of the initial coding scheme is represented by a coding mode (coding mode) corresponding to the initial coding scheme, that is, the coding mode is used as the indication information.
  • the encoding mode corresponding to the initial encoding scheme is the initial encoding mode
  • the initial encoding mode is the first encoding mode (ie, the DirAC mode) or the second encoding mode (ie, the MP mode).
  • the preset indication information is a preset encoding mode
  • the preset encoding mode is a first encoding mode or a second encoding mode.
  • the preset indication information is other coding modes, that is, the specific indication information of the coding scheme of the switching frame encoded into the code stream is not limited.
  • the encoding end uses the switching flag to indicate the switching frame
  • the indication information of the coding scheme of the switching frame encoded into the code stream may not be limited, and the indication information of the coding scheme of the switching frame may be It may be an initial encoding mode, may also be a preset encoding mode, may also be randomly selected from the first encoding mode and the second encoding mode, or may be other indication information.
  • the switching flag is used to indicate whether the current frame is a switching frame, so that the decoder can directly determine whether the current frame is a switching frame by obtaining the switching flag in the code stream.
  • the switching flag of the current frame and the indication information of the initial coding scheme each occupy one bit of the code stream.
  • the value of the switching flag of the current frame is "0" or "1", wherein the value of the switching flag is "0" indicating that the current frame is not a switching frame, that is, the value of the switching flag of the current frame is the first value.
  • the switching flag being "1" indicates that the current frame is a switching frame, that is, the value of the switching flag of the current frame is the second value.
  • the indication information of the initial encoding scheme is “0” or “1", wherein “0” indicates the DirAC mode (ie, the DirAC encoding scheme), and “1” indicates the MP mode (ie, the MP-based encoding scheme).
  • the encoding end determines that the value of the switching flag of the current frame is the second value, and sets the value of the switching flag of the current frame to The value is encoded into the codestream. That is, for the switching frame, since the switching flag in the code stream can indicate the switching frame, there is no need to encode the indication information of the coding scheme of the switching frame.
  • the encoding end encodes the indication information of the initial encoding scheme of the current frame into the code stream.
  • the indication information encoded into the code stream is substantially the coding mode consistent with the initial coding scheme, that is, the initial coding mode, and the initial coding mode is the first coding mode or the second coding mode.
  • the encoding end may not encode the switching flag.
  • the indication information of the initial encoding scheme occupies one bit of the code stream.
  • the coding mode coded into the code stream is "0" or "1", where "0" indicates the DirAC mode, indicating that the initial coding scheme of the current frame is the first coding scheme , "1" indicates MP mode, indicating that the initial encoding scheme of the current frame is the second encoding scheme.
  • the third implementation mode encoding the indication information of the three encoding schemes
  • the indication information of the coding scheme of the current frame occupies two bits of the code stream.
  • the indication information of the coding scheme of the current frame is "00", “01” or “10".
  • "00" indicates that the encoding scheme of the current frame is the first encoding scheme
  • "01” indicates that the encoding scheme of the current frame is the second encoding scheme
  • "10" indicates that the encoding scheme of the current frame is the third encoding scheme.
  • the encoding end determines the value of the switching flag, and encodes the value of the switching flag into the code stream.
  • the instruction information of the initial encoding scheme of the current frame is encoded into the code stream, or, if the current frame is a switching frame, the encoder encodes the preset instruction information into the code stream, and if the current frame is a non-switching frame, the encoding end Encode the indication information of the initial coding scheme of the current frame into the code stream.
  • the encoder after determining the initial encoding scheme of the current frame, directly encodes the indication information of the initial encoding scheme of the current frame into the code stream.
  • the encoding end determines the initial encoding scheme of the current frame, it determines the encoding scheme of the current frame based on the initial encoding scheme of the current frame and the initial encoding scheme of the previous frame of the current frame, and converts the encoding scheme of the current frame to Instructions for encoding schemes are encoded into the bitstream.
  • Step 602 If the coding scheme of the current frame is the third coding scheme, code the signal of the designated channel in the HOA signal into the code stream, and the designated channel is a part of all the channels of the HOA signal.
  • the encoding end encodes the HOA signal of the current frame according to the third encoding scheme (ie, the hybrid encoding scheme).
  • the value of the switching flag of the current frame is the second value, it indicates that the current frame is a switching frame.
  • the initial coding scheme of the current frame is different from the initial coding scheme of the previous frame of the current frame, it means that the current frame is a switching frame.
  • the coding scheme of the current frame indicates that the current frame is a switching frame.
  • the encoding end adopts the third encoding scheme to encode the HOA signal of the current frame.
  • the third coding scheme indicates to code the signal of the specified channel in the HOA signal of the current frame into the code stream, wherein the specified channel is a part of all channels of the HOA signal.
  • the encoder encodes the signal of the specified channel in the HOA signal of the switching frame into the code stream instead of using the first coding scheme or the second coding scheme to encode the switching frame, that is, this scheme is for Smooth transition of auditory quality when coding schemes are switched, using a compromise method to encode switching frames.
  • the designated channel is consistent with a preset transmission channel in the first encoding scheme, that is, the designated channel is a preset channel. That is to say, under the premise that the third coding scheme is different from the second coding scheme, in order to make the coding effect of the third coding scheme and the second coding scheme close, the coding end will switch the HOA signal of the frame and the first coding scheme
  • the signal of the same channel as the preset transmission channel is encoded into the code stream, so that the auditory quality can be as smooth as possible.
  • different transmission channels can be preset according to different encoding bandwidths, bit rates, and even application scenarios.
  • the preset transmission channels may also be the same.
  • the signals of the specified channel include FOA signals, and the FOA signals include omnidirectional W signals, and directional X signals, Y signals, and Z signals. That is to say, the specified channel includes the FOA channel, and the signal of the FOA channel is a low-order signal, that is, if the current frame is a switching frame, the encoding end encodes the low-order part of the HOA signal of the current frame into the code stream, and the low-order part is Including W signal, X signal, Y signal and Z signal of FOA channel.
  • the encoding end determines the virtual speaker signal and the residual signal based on the W signal, X signal, Y signal, and Z signal, and encodes the virtual speaker signal and the residual signal into the code flow.
  • the encoder determines the W signal as one virtual speaker signal, determines three residual signals based on the W signal, X signal, Y signal and Z signal, or determines the X signal, Y signal and Z signal as three channels residual signal.
  • the encoding end determines the difference signal between any three signals of the W signal, the X signal, the Y signal, and the Z signal and the remaining signal as the three residual signals.
  • the encoding end determines the difference signals between the X signal, the Y signal, and the Z signal and the W signal as three residual signals.
  • the encoding end uses the difference signals X', Y', and Z' respectively obtained by X-W, Y-W, and Z-W as three-way residual signals.
  • the encoder uses the core encoder to encode the current frame, and the core encoder is a stereo encoder, since the determined one-way virtual speaker signal and three-way residual signals are all mono signals, the encoder needs to first base on these Mono signals are combined to form stereo signals, which are then encoded using a stereo encoder.
  • the encoding end combines the one virtual speaker signal with the first preset mono signal to obtain a stereo signal, and combines the three residual signals with the second preset mono signal to obtain Get two stereo signals.
  • the encoding end encodes the obtained three-way stereo signals into code streams respectively through a stereo encoder.
  • the embodiment of the present application does not limit the encoding end to combine the three residual signals and one preset mono signal to obtain a specific combination method of two stereo signals.
  • the encoding end combines the two most correlated residual signals among the three residual signals to obtain one stereo signal among the two stereo signals, and divides the three residual signals by dividing One residual signal other than the two residual signals is combined with the second preset mono signal to obtain another stereo signal among the two stereo signals. That is to say, the encoding end combines signals according to correlation to obtain stereo signals.
  • the encoding end may also combine any two residual signals of the three residual signals to obtain one stereo signal among the two stereo signals, and combine the remaining one residual signal with the second Combine the preset mono signals to obtain the other stereo signal of the two stereo signals.
  • the first preset monophonic signal in the embodiment of the present application is an all-zero signal or an all-ones signal
  • the second preset monophonic signal is an all-zero signal or an all-ones signal
  • the first preset mono signal is the same as or different from the second preset mono signal, that is, the first preset mono signal and the second preset mono signal are both All zeros or all ones, or, the first preset mono signal is all zeros and the second preset mono signal is all ones, or, the first preset mono signal is All ones and the second preset mono signal is all zeros.
  • the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency point value is all zero, and the all-one signal includes a signal whose sampling point value is all one or a signal whose frequency point value is all one.
  • the all-zero signal includes a signal whose sampling point values are all zero, and the all-ones signal includes a signal whose sampling point value is all one.
  • the HOA signal is a frequency-domain signal
  • the all-zero signal includes a signal whose frequency point values are all zero
  • the all-ones signal includes a signal whose frequency point value is all one.
  • the first preset mono signal and/or the second preset mono signal may also be preset signals in other forms.
  • the encoding end uses the mono encoder to encode the one virtual speaker signal and each residual signal of the three residual signals into the code stream respectively .
  • Fig. 7 is a schematic diagram of a switching frame coding scheme provided by an embodiment of the present application.
  • the current frame to be encoded is a switching frame
  • the encoding end obtains the HOA signal of the current frame, uses the W signal in the HOA signal as a virtual speaker signal, and determines the residual signal according to the FOA signal in the HOA signal, as shown in
  • the residual signal is determined according to the X, Y, and Z signals in the HOA signal, or the residual signal is determined according to the W signal and the X, Y, and Z signals.
  • the encoding end encodes the determined virtual speaker signal and residual signal into the code stream through the core encoder, so as to obtain the code stream of the switching frame.
  • the encoding end determines two channels of signals among W signal, X signal, Y signal and Z signal as two channels of virtual speaker signals, and determines the remaining two channels of signals as two channels of residual signals.
  • the encoding end combines the two channels of virtual speaker signals to obtain one channel of stereo signals, and combines the two channels of residual signals to obtain another channel of stereo signals.
  • the encoding end encodes the obtained two-way stereo signals into code streams respectively through a stereo encoder.
  • the embodiment of the present application does not limit the specific combination manner in which the encoding end combines the W signal, the X signal, the Y signal, and the Z signal in pairs to obtain two stereo signals.
  • the encoding end determines the W signal as a virtual speaker signal, and determines the signal of the highest correlation with the W signal among the X signal, Y signal, and Z signal as another virtual speaker signal, that is, the four channels included in the FOA channel. Combine the W signal and the signal with the highest correlation with the W signal among the two signals, and combine the remaining two signals.
  • the encoding end combines any two signals of W signal, X signal, Y signal and Z signal to obtain one stereo signal, and combines the remaining two signals to obtain another stereo signal.
  • the embodiment of the present application does not limit the specific implementation manner in which the encoding end uses the core encoder to encode the virtual speaker signal and the residual signal, for example, does not limit the number of encoding bits corresponding to the virtual speaker signal and the residual signal.
  • the above describes the process of encoding the current frame at the encoding end when the current frame is a switching frame, that is, the encoding end encodes the signal of the specified channel in the HOA signal of the switching frame into the code stream according to the third encoding scheme.
  • the third encoding scheme That is, switch the frame encoding scheme.
  • the signal of the specified channel may include the W signal, which is a core signal of the HOA signal.
  • the switching frame coding scheme can also be called an MP-W-based coding scheme.
  • the encoding end encodes the HOA signal of the current frame into the code stream according to the first encoding scheme. If the encoding scheme of the current frame is the second encoding scheme, the encoding end encodes the HOA signal of the current frame into the code stream according to the second encoding scheme. That is, if the current frame is not a switching frame, the encoding end uses the initial encoding scheme of the current frame to encode the current frame.
  • the encoding end encodes the HOA signal of the current frame into the code stream according to the second encoding scheme: the encoding end selects a target that matches the HOA signal of the current frame from the virtual speaker set based on the MP algorithm.
  • Virtual speaker based on the HOA signal of the current frame and the target virtual speaker, determine the virtual speaker signal through the MP-based spatial encoder, determine the residual signal based on the HOA signal of the current frame and the virtual speaker signal through the MP-based spatial encoder, through the core
  • the encoder encodes the virtual loudspeaker signal and the residual signal into the bitstream.
  • the encoding end encodes the HOA signal of the current frame into the code stream according to the first encoding scheme: the encoding end extracts the core layer signal and spatial parameters from the HOA signal of the current frame, and encodes the extracted core layer signal and spatial parameters into stream.
  • the encoding end extracts the core layer signal from the HOA signal of the current frame through the core encoded signal acquisition module, extracts the spatial parameters from the HOA signal of the current frame through the DirAC-based spatial parameter extraction module, and extracts the spatial parameters from the HOA signal of the current frame through the core
  • the encoder encodes the core layer signal into the bit stream, and the spatial parameter into the bit stream through the spatial parameter encoder.
  • the channel corresponding to the core layer signal is consistent with the specified channel in this solution.
  • the extracted spatial parameters are also encoded into the code stream.
  • the spatial parameters include rich scene information, such as direction information. It can be seen that, for the same frame, the effective information encoded into the code stream by using the DirAC-based HOA coding scheme will be more than the effective information encoded into the code stream by using the switching frame coding scheme.
  • the switching frame coding scheme also encodes the signal of the transmission channel preset by the first coding scheme in the HOA signal into the code stream , but it will not encode more information in the HOA signal except the signal of the specified channel into the code stream, that is, the spatial parameters will not be extracted, and the spatial parameters will not be encoded into the code stream, so that the auditory quality is as good as possible Smooth transition.
  • FIG. 10 is a flow chart of another encoding method provided by the embodiment of the present application.
  • the encoder first acquires the HOA signal of the current frame to be encoded. Then, the encoding end analyzes the sound field type of the HOA signal to determine the initial encoding scheme of the current frame, and the encoding end encodes the indication information of the initial encoding scheme of the current frame into the code stream. The encoder determines whether the initial encoding scheme of the current frame is the same as that of the previous frame.
  • the encoding end uses the initial encoding scheme of the current frame to encode the HOA signal of the current frame to obtain the code stream of the current frame. If the initial encoding scheme of the current frame is different from the initial encoding scheme of the previous frame, the encoding end uses the switching frame encoding scheme to encode the HOA signal of the current frame to obtain the code stream of the current frame.
  • the initial encoding scheme of the current frame is the first encoding scheme or the second encoding scheme
  • the encoder adopts the initial encoding scheme of the current frame to convert the HOA of the current frame
  • the signal is encoded into the bitstream.
  • the HOA signal of the audio frame is encoded and decoded by combining two schemes (namely, a codec scheme based on virtual speaker selection and a codec scheme based on directional audio coding), that is, for different
  • the audio frame selects an appropriate codec scheme, which can improve the compression rate of the audio signal.
  • either of the above two schemes is not directly used for encoding, but one of the above two schemes is used.
  • a new codec scheme is used to code and decode these audio frames, that is, to encode the signal of the specified channel in the HOA signal of these audio frames into the code stream, that is, to use a compromise scheme for codec, so that the HOA signal recovered by decoding The aural quality after rendered playback is smooth.
  • FIG. 11 is a flow chart of a decoding method provided by an embodiment of the present application, and the method is applied to a decoding end. It should be noted that this decoding method corresponds to the encoding method shown in FIG. 6 . Please refer to FIG. 11 , the method includes the following steps.
  • Step 1101 Obtain the decoding scheme of the current frame based on the code stream.
  • the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme.
  • the first decoding scheme is an HOA decoding scheme based on DirAC
  • the second decoding scheme is an HOA decoding scheme based on virtual speaker selection
  • the third decoding scheme is a hybrid decoding scheme.
  • the hybrid decoding scheme is also referred to as a switching frame decoding scheme.
  • the decoding end since the encoding end uses different encoding schemes for encoding different audio frames, the decoding end also needs to use a corresponding decoding scheme to decode each audio frame.
  • step 601 of the encoding method shown in FIG. 6 three implementations are introduced in which the encoding end encodes information that can be used to indicate the encoding scheme of the current frame into the code stream.
  • the decoding end determines the current frame's
  • the encoding scheme which will be introduced next.
  • the first implementation mode encoding the switching flag and the indication information of the two encoding schemes
  • the decoder first parses out the value of the switching flag of the current frame from the code stream. If the value of the switching flag is the first value, the decoding end parses the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme. decoding scheme. If the value of the switching flag is the second value, the decoding end determines that the decoding scheme of the current frame is the third decoding scheme. It should be noted that the indication information of the encoding scheme encoded into the code stream by the encoding end is the indication information of the decoding scheme parsed from the code stream by the decoding end.
  • the decoding end parses out that the value of the switching flag of the current frame is the first value, it means that the current frame is a non-switching frame.
  • the decoding end then parses out the indication information of the decoding scheme from the code stream, and determines the decoding scheme of the current frame based on the indication information. If the decoding end parses out that the value of the switching flag of the current frame is the second value, it means that the current frame is a switching frame, and even if the code stream contains the indication information, the decoding end does not need to decode the indication information.
  • the decoding end determines that the decoding scheme of the current frame is a switching frame decoding scheme, and the current frame is a switching frame, and the switching frame decoding scheme is different from the first decoding scheme and the second decoding scheme.
  • the decoding scheme of the two-decoding scheme, the switching frame decoding scheme is for smooth transition of auditory quality.
  • the indication information of the decoding scheme and the switching flag each occupy one bit of the code stream.
  • the decoder first parses the value of the switching flag of the current frame from the code stream. If the parsed value of the switching flag is "0", that is, the value of the switching flag is the first value, the decoding end then analyzes the value of the switching flag from the code stream. The indication information of the decoding scheme of the current frame is analyzed in the middle, and if the indication information analyzed is "0", the decoding end determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "1", the decoding end determines that the decoding scheme of the current frame is the second decoding scheme. If the parsed switching flag is a value of "1”, the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme (the third decoding scheme).
  • the second implementation mode encodes the indication information of two encoding schemes
  • the decoding end parses out the initial decoding scheme of the current frame from the code stream, and the initial decoding scheme is the first decoding scheme or the second decoding scheme. If the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame, it is determined that the decoding scheme of the current frame is the initial decoding scheme of the current frame. If the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame of the current frame, it is determined that the decoding scheme of the current frame is a third decoding scheme, that is, a hybrid decoding scheme.
  • the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame of the current frame means that the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme , or, the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme. That is, one of the initial decoding scheme of the current frame and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme, and the other is the second decoding scheme.
  • the indication information used to indicate the initial encoding scheme occupies one bit of the code stream, and taking the encoding mode as the indication information as an example, the encoding mode in the code stream occupies one bit.
  • the decoding end parses the indication information of the initial encoding scheme of the current frame from the code stream, if the parsed indication information is "0", and the indication information of the previous frame of the current frame is also "0", then decoding The terminal determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "1" and the indication information of the previous frame of the current frame is also "1", the decoding end determines that the decoding scheme of the current frame is the second decoding scheme.
  • the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme.
  • the indication information of the initial decoding scheme of the previous frame of the current frame is cached data.
  • the decoding end may acquire the indication information of the initial decoding scheme of the previous frame of the current frame from the cache.
  • the third implementation method encodes the indication information of three encoding schemes
  • the decoding end parses the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.
  • the indication information of the decoding scheme occupies two bits of the code stream.
  • the coding mode of the current frame occupies two bits of the code stream.
  • the decoding end parses the indication information of the decoding scheme of the current frame from the code stream, and if the parsed indication information is "00", the decoding end determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "01”, the decoding end determines that the decoding scheme of the current frame is the second decoding scheme. If the parsed indication information is "10”, the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme.
  • Step 1102 If the decoding scheme of the current frame is the third decoding scheme, determine the signal of the specified channel in the HOA signal of the current frame based on the code stream, and the specified channel is a part of all channels of the HOA signal.
  • the decoding end determines the current frame based on the code stream specified in the HOA signal channel signal. That is to say, for the switching frame, the encoding end encodes the signal of the specified channel into the code stream, and then the decoding end uses the switching frame decoding scheme to decode the switching frame, that is, the signal of the specified channel needs to be parsed from the code stream first.
  • the decoding end determines the signal of the specified channel in the HOA signal of the current frame based on the code stream realization process.
  • the process of the decoding end determining the signal of the specified channel in the HOA signal of the current frame based on the code stream is symmetrical to the process of encoding the signal of the specified channel in the HOA signal of the current frame into the code stream at the encoding end.
  • some implementation processes of encoding the signal of the specified channel into the code stream are introduced, and the decoding process corresponding to these implementation processes will be introduced at the decoding end.
  • the encoding end first determines the virtual speaker signal and the residual signal based on the signal of the specified channel, and then encodes the virtual speaker signal and the residual signal into the code stream, then, correspondingly, the decoding end first The virtual speaker signal and the residual signal are determined based on the code stream, and then the signal of the specified channel is determined based on the virtual speaker signal and the residual signal.
  • the decoding end decodes the code stream through the stereo decoder to obtain three Stereo signals, and then based on the three stereo signals, one virtual speaker signal and three residual signals are determined.
  • the decoder determines one virtual speaker signal based on one of the three stereo signals, and determines three residual signals based on the other two of the three stereo signals. That is, the decoder first parses the three stereo signals from the code stream, and then disassembles the three stereo signals to obtain a virtual speaker signal and three residual signals.
  • the decoding end parses three stereo signals from the code stream as S1, S2, and S3, wherein S1 is obtained by combining a virtual speaker signal and a preset mono signal, and S2 is obtained by combining two residual signals The signals are combined, and S3 is obtained by combining the remaining one residual signal and one preset mono signal.
  • the decoder disassembles S1 to obtain one virtual speaker signal, disassembles S2 to obtain two residual signals, and disassembles S3 to obtain the remaining one residual signal.
  • the decoding end uses the mono decoder to process the code stream decoding to obtain one virtual speaker signal and three residual signals, and the four monophonic signals include the one virtual speaker signal and the three residual signals.
  • the decoding end determines the virtual speaker signal and the residual signal based on the code stream Then, based on the virtual speaker signal, the W signal is determined.
  • the decoding end determines the X signal, the Y signal and the Z signal based on the residual signal and the W signal, or the decoding end determines the X signal, the Y signal and the Z signal based on the residual signal.
  • the decoding end parses three residual signals
  • the sum of the three residual signals and the W signal is determined as the X signal, the Y signal, and the Z signal, or the three residual signals are respectively determined as For X signal, Y signal and Z signal.
  • the decoding end determines the difference signals between the X signal, the Y signal and the Z signal and the W signal as three residual signals
  • the decoding end determines the sum of the three residual signals and the W signal as X signal, Y signal and Z signal.
  • the decoding end determines the X signal, the Y signal and the Z signal as three residual signals
  • the decoding end determines the three residual signals as the X signal, the Y signal and the Z signal respectively. That is, the decoding process at the decoding end matches the encoding process at the encoding end.
  • the decoding end decodes the code stream through the stereo decoder to obtain the two stereo signals.
  • the decoder determines two channels of virtual speaker signals based on one of the two channels of stereo signals, and determines two channels of residual signals based on the other channel of the two channels of stereo signals.
  • the two channels of virtual speaker signals and the two channels of residual signals The difference signal includes W signal, X signal, Y signal and Z signal.
  • the two virtual speaker signals determined by the decoding end include W signal and the signal with the highest correlation with W signal among X signal, Y signal and Z signal.
  • the signal with the highest correlation with the W signal among the X signal, Y signal, and Z signal is the X signal
  • the two virtual speaker signals determined by the decoder include the W signal and the X signal
  • the two residual signals determined by the decoder include Y signal and Z signal.
  • Step 1103 Based on the signal of the designated channel, determine the gain of one or more remaining channels in the HOA signal of the current frame except the designated channel.
  • the decoder determines the signal of the specified channel in the HOA signal of the current frame based on the code stream, based on the signal of the specified channel, it determines the signals of one or more remaining channels in the HOA signal except for the specified channel. gain.
  • the FOA channel may be called a low-order channel
  • the signal of the FOA channel may be called a low-order part of the HOA signal
  • one or more remaining channels in the HOA signal other than the specified channel are called
  • the signal of the high-order channel can be called the high-order part of the HOA signal.
  • the decoder determines the high-order gain of the HOA signal based on the low-order part of the HOA signal, that is, the gain of the high-order channel.
  • the decoding end first performs analysis and filtering on the signal of the specified channel in the HOA signal to obtain the signal of the specified channel after analysis and filtering, and determines the signal of the one or more remaining channels based on the signal of the specified channel after analysis and filtering. gain. For example, assuming that the signal of the specified channel is the low-order part of the HOA signal, the decoder first performs analysis and filtering on the low-order part of the HOA signal to obtain the low-order part of the analyzed and filtered HOA signal, and then based on the analysis and filtering The low-order part of the HOA signal estimates the high-order gain.
  • the analysis filter used by the decoding end for analysis and filtering is the same as the analysis filter used in the DirAC-based HOA decoding solution, which can make the decoding delay of the switching frame It is consistent with the decoding delay of the DirAC-based HOA decoding scheme, that is, delay alignment.
  • the decoding delay mentioned in this article refers to the end-to-end codec delay, and the decoding delay may also be referred to as encoding delay.
  • the decoding end determines the gain of one or more remaining channels in the HOA signal other than the designated channel based on the signal of the designated channel, that is, estimates the residual gain based on the signal of the designated channel.
  • the specific implementation of the channel gain process is the same as the remaining channel gain estimation method in the DirAC-based codec solution, which is not described in detail in the embodiment of the present application.
  • the method for estimating the high-order gain based on the low-order part of the HOA signal at the decoding end is the same as the method for estimating the high-order gain in the codec solution based on DirAC.
  • Step 1104 Based on the signal of the specified channel and the gain of the one or more remaining channels, determine the signal of each remaining channel in the one or more remaining channels.
  • the decoding end determines the signal of each remaining channel in the one or more remaining channels based on the signal of the specified channel and the gain of the one or more remaining channels.
  • the decoding end can base on the W signal in the low-order part and the high-order Gain, which determines the higher order components in the HOA signal.
  • the decoding end can determine the HOA signal after analysis and filtering based on the W signal and the high-order gain in the low-order part of the HOA signal after analysis and filtering. advanced part.
  • Step 1105 Obtain the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signals of the one or more remaining channels.
  • the decoder after obtaining the signal of the specified channel and the signal of the one or more remaining channels, obtains the reconstructed HOA of the current frame based on the signal of the specified channel and the signal of the one or more remaining channels Signal, that is, to reconstruct the HOA signal of the current frame.
  • the decoding end performs synthesis filtering processing on the signal of the designated channel and the signals of the one or more remaining channels, so as to obtain the reconstructed HOA signal of the current frame.
  • the decoding end can compare the low-order part and the high-order part of the HOA signal Synthetic filtering is performed to obtain the reconstructed HOA signal of the current frame.
  • the decoding end performs analysis filtering on the low-order part of the HOA signal
  • the decoding end performs synthesis filtering on the low-order part of the HOA signal analyzed and filtered and the high-order part of the HOA signal analyzed and filtered to obtain The reconstructed HOA signal for the current frame.
  • the synthesis filter used by the decoding end to perform synthesis filtering processing is the same as the synthesis filter used in the DirAC-based HOA codec scheme, which can make the decoding of the switching frame
  • the delay is consistent with the decoding delay of the DirAC-based HOA decoding scheme, that is, the delay is aligned.
  • Fig. 12 is a schematic diagram of a switching frame decoding solution provided by an embodiment of the present application.
  • the current frame to be decoded is a switching frame, assuming that the signal of the specified channel is the low-order part of the HOA signal, then, during the decoding process, the decoding end obtains the code stream of the current frame to be decoded, and the The core decoding of the code stream is used to reconstruct the low-order part of the HOA signal of the current frame, and a method similar to that of determining the high-order part in the DirAC-based HOA decoding scheme is used to estimate the high-order part based on the low-order part. That is, the higher order part of the HOA signal is reconstructed. Afterwards, the decoding end reconstructs the HOA signal based on the low-order part obtained through decoding and the high-order part obtained through estimation.
  • the above describes the process of decoding the current frame at the decoding end when the current frame is a switching frame, that is, the decoding end uses the switching frame decoding scheme to decode the switching frame, that is, the decoding end first decodes the signal of the specified channel in the HOA signal (such as low-order part), and then reconstruct the signal of each remaining channel (such as reconstructing the high-order part).
  • the process of decoding the current frame at the decoding end will be introduced.
  • the decoding end determines the decoding scheme of the current frame, if the decoding scheme of the current frame is the first decoding scheme, the decoding end obtains the reconstructed HOA signal of the current frame according to the code stream according to the first decoding scheme. If the decoding scheme of the current frame is the second decoding scheme, the decoding end obtains the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme.
  • the decoding end obtains the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme: the decoding end parses the virtual speaker signal from the code stream through the core decoder and the residual signal, the parsed virtual speaker signal and residual signal are sent to the MP-based spatial decoder to obtain the reconstructed HOA signal of the current frame.
  • the decoding scheme shown in FIG. 13 corresponds to the encoding scheme shown in FIG. 8 .
  • the realization process of obtaining the reconstructed HOA signal of the current frame according to the code stream is as follows: the decoder parses the core layer signal and spatial parameters from the code stream, and reconstructs the current frame based on the core layer signal and spatial parameters HOA signal.
  • the decoding end parses the core layer signal from the code stream through the core decoder, and parses the spatial parameters from the code stream through the spatial parameter decoder, and performs based on the parsed core layer signal and spatial parameters. DirAC's HOA signal synthesis processing to obtain the reconstructed HOA signal of the current frame.
  • the decoding scheme shown in FIG. 14 corresponds to the encoding scheme shown in FIG. 9 .
  • the decoding end obtains the current
  • gain adjustment may also be performed on the high-order part of the current frame.
  • the decoding end obtains the initial HOA signal according to the code stream according to the second decoding scheme. Higher-order gain of the previous frame of the frame, which performs gain adjustment on the higher-order part of the initial HOA signal.
  • the decoder obtains the reconstructed HOA signal of the current frame based on the low-order part of the original HOA signal and the high-order part after gain adjustment.
  • the current frame uses the high-order gain of the previous frame to perform gain adjustment on the high-order part of the initial HOA signal of the current frame, so that the gain-adjusted
  • the high-order part of is similar to the high-order part of the previous frame, for example, the gain adjustment makes the energy of the high-order part of the HOA signals in two adjacent frames similar. In this way, when the subsequent decoding end renders and plays each audio frame, the auditory quality of the switched frame and the auditory quality of the frame next to the switched frame can transition smoothly.
  • the decoder can also adjust these The gain adjustment is performed on the high-order part of the HOA signal of the audio frame, and the embodiment of the present application does not limit the specific implementation manner of performing gain adjustment on the high-order part of the HOA signal of these audio frames.
  • the decoding end may also perform gain adjustment on other parts of the HOA signal of these audio frames. That is, the embodiment of the present application does not limit which channel signals of the HOA signal are to be adjusted for gain.
  • the decoder can adjust the gain of any one or more channels in the HOA signal, and the one or more channels can include part or all of the high-order channels, or the remaining channels except the specified channel Some or all, or other channels.
  • Fig. 15 is a flow chart of another decoding method provided by the embodiment of the present application. Referring to Figure 15, take the encoding end coding the indication information of the initial encoding scheme into the code stream as an example, and assuming that the switching flag is not encoded in the code stream, then in the decoding process, the decoding end first parses the current frame's information from the code stream Indication of the initial decoding scheme. Then, the decoder judges whether the initial decoding scheme of the current frame is the same as that of the previous frame.
  • the decoder uses the initial decoding scheme of the current frame to decode the code stream to obtain the reconstructed HOA signal of the current frame. If the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame, it means that the current frame is a switched frame, and the decoding end uses the switched frame decoding scheme to decode the code stream to obtain the reconstructed HOA signal of the current frame.
  • the HOA signal of the audio frame is encoded and decoded by combining two schemes (namely, a codec scheme based on virtual speaker selection and a codec scheme based on directional audio coding), that is, for different
  • the audio frame selects an appropriate codec scheme, which can improve the compression rate of the audio signal.
  • Figure 16 is a schematic structural diagram of an encoding device 1600 provided by an embodiment of the present application.
  • the encoding device 1600 can be implemented by software, hardware, or a combination of the two to become part or all of the encoding end device.
  • the encoding end device can be the aforementioned implementation Any encoding device in the example.
  • the apparatus 1600 includes: a first determination module 1601 and a first encoding module 1602 .
  • the first determining module 1601 is configured to determine the coding scheme of the current frame according to the high-order ambisonics HOA signal of the current frame, and the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme ;
  • the first coding scheme is an HOA coding scheme based on directional audio coding
  • the second coding scheme is an HOA coding scheme based on virtual speaker selection
  • the third coding scheme is a hybrid coding scheme
  • the first encoding module 1602 is configured to encode the signal of the specified channel in the HOA signal into the code stream if the encoding scheme of the current frame is the third encoding scheme, and the specified channel is a part of all channels of the HOA signal.
  • the signal of the designated channel includes a first-order ambisonic reverberation FOA signal
  • the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals.
  • the first encoding module 1602 includes:
  • the first determination submodule is used to determine the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal and the Z signal;
  • the encoding sub-module is used to encode the virtual loudspeaker signal and the residual signal into a code stream.
  • the first determination submodule is used for:
  • the three residual signals are determined based on the W signal, the X signal, the Y signal and the Z signal, or the X signal, the Y signal and the Z signal are determined as the three residual signals.
  • the encoding submodule is used to:
  • the obtained three-way stereo signals are respectively coded into bit streams through a stereo encoder.
  • the encoding submodule is used to:
  • the first preset monophonic signal is an all-zero signal or an all-one signal.
  • the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency point values are all zero.
  • the all-one signal includes The value of the sampling point is all one signal or the signal of the frequency point value is one; the second preset mono signal is all zero signal or all one signal; the first preset mono signal and the second the same or different preset mono signals.
  • the encoding submodule is used to:
  • the one channel of virtual loudspeaker signals and the residual signals of the three channels of residual signals are respectively coded into code streams through a mono encoder.
  • the device 1600 also includes:
  • the second encoding module is used to encode the HOA signal into the code stream according to the first encoding scheme if the encoding scheme of the current frame is the first encoding scheme;
  • the third encoding module is configured to encode the HOA signal into the code stream according to the second encoding scheme if the encoding scheme of the current frame is the second encoding scheme.
  • the first determining module 1601 includes:
  • the second determining submodule is used to determine the initial encoding scheme of the current frame according to the HOA signal, where the initial encoding scheme is the first encoding scheme or the second encoding scheme;
  • the fourth determining submodule is used to determine if the initial encoding scheme of the current frame is the first encoding scheme and the initial encoding scheme of the previous frame of the current frame is the second encoding scheme, or the initial encoding scheme of the current frame is the second encoding scheme and The initial encoding scheme of the frame preceding the current frame is the first encoding scheme, and then it is determined that the encoding scheme of the current frame is the third encoding scheme.
  • the device 1600 also includes:
  • the fourth encoding module is configured to encode the indication information of the initial encoding scheme of the current frame into the code stream.
  • the device 1600 also includes:
  • the second determination module is used to determine the value of the switching flag of the current frame.
  • the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the first value;
  • the encoding scheme is the third encoding scheme, the value of the switching flag of the current frame is the second value;
  • the fifth encoding module is used to encode the value of the switching flag into the code stream.
  • the sixth encoding module is configured to encode the indication information of the encoding scheme of the current frame into the code stream.
  • the specified channel is consistent with the preset transmission channel in the first encoding scheme.
  • two schemes i.e. the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding
  • the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding
  • Fig. 17 is a schematic structural diagram of a decoding device 1700 provided by the embodiment of the present application.
  • the decoding device 1700 can be implemented by software, hardware or a combination of the two to become part or all of the decoding end device.
  • the decoding end device can be the aforementioned implementation Any encoding device in the example.
  • the apparatus 1700 includes: a first obtaining module 1701 , a first determining module 1702 , a second determining module 1703 , a third determining module 1704 and a second obtaining module 1705 .
  • the first obtaining module 1701 is used to obtain the decoding scheme of the current frame based on the code stream, and the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme; wherein, the first decoding scheme is A high-order stereo reverberation HOA decoding scheme based on directional audio decoding, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme;
  • the first determination module 1702 is used to determine the signal of the specified channel in the HOA signal of the current frame based on the code stream if the decoding scheme of the current frame is the third decoding scheme, and the specified channel is a part of all channels of the HOA signal;
  • the second determination module 1703 is configured to determine the gain of one or more remaining channels in the HOA signal except for the specified channel based on the signal of the specified channel;
  • the second obtaining module 1705 is configured to obtain the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signals of the one or more remaining channels.
  • a first determining submodule configured to determine a virtual speaker signal and a residual signal based on a code stream
  • the second determining submodule is configured to determine the signal of the specified channel based on the virtual speaker signal and the residual signal.
  • the first determination submodule is used for:
  • one virtual speaker signal and three residual signals are determined.
  • the first determination submodule is used for:
  • the code stream is decoded by a monophonic decoder to obtain one virtual speaker signal and three residual signals.
  • the X signal, the Y signal and the Z signal are determined based on the residual signal and the W signal, or the X signal, the Y signal and the Z signal are determined based on the residual signal.
  • the device 1700 also includes:
  • the first decoding module is used to obtain the reconstructed HOA signal of the current frame according to the code stream according to the first decoding scheme if the decoding scheme of the current frame is the first decoding scheme;
  • the second decoding module is configured to obtain the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme if the decoding scheme of the current frame is the second decoding scheme.
  • the second decoding module includes:
  • the first obtaining submodule is used to obtain the initial HOA signal according to the code stream according to the second decoding scheme
  • the gain adjustment submodule is used to adjust the gain of the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame if the decoding scheme of the previous frame of the current frame is the third decoding scheme;
  • the second obtaining sub-module is used to obtain the reconstructed HOA signal based on the low-order part and the gain-adjusted high-order part of the original HOA signal.
  • the first obtaining module 1701 includes:
  • the second parsing submodule is used to parse the indication information of the decoding scheme of the current frame from the code stream if the value of the switching flag is the first value, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or second decoding scheme;
  • the third determining submodule is configured to determine that the decoding scheme of the current frame is the third decoding scheme if the value of the switching flag is the second value.
  • the first obtaining module 1701 includes:
  • the third parsing sub-module is used to parse out the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.
  • the first obtaining module 1701 includes:
  • the fourth parsing submodule is used to parse out the initial decoding scheme of the current frame from the code stream, where the initial decoding scheme is the first decoding scheme or the second decoding scheme;
  • the fourth determining submodule is used to determine that the decoding scheme of the current frame is the initial decoding scheme of the current frame if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame;
  • the fifth determining submodule is used to determine if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and The initial decoding scheme of the previous frame of the current frame is the first decoding scheme, and then it is determined that the decoding scheme of the current frame is the third decoding scheme.
  • two schemes i.e. the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding
  • the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding
  • the decoding device provided in the above embodiment decodes audio frames, it only uses the division of the above-mentioned functional modules as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the decoding device and the decoding method embodiments provided in the above embodiments belong to the same idea, and the specific implementation process thereof is detailed in the method embodiments, and will not be repeated here.
  • Fig. 18 is a schematic block diagram of a codec device 1800 used in an embodiment of the present application.
  • the codec apparatus 1800 may include a processor 1801 , a memory 1802 and a bus system 1803 .
  • the processor 1801 and the memory 1802 are connected through the bus system 1803, the memory 1802 is used to store instructions, and the processor 1801 is used to execute the instructions stored in the memory 1802 to perform various encoding or decoding described in the embodiments of this application method. To avoid repetition, no detailed description is given here.
  • the processor 1801 can be a central processing unit (central processing unit, CPU), and the processor 1801 can also be other general-purpose processors, DSP, ASIC, FPGA or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 1802 may include a ROM device or a RAM device. Any other suitable type of storage device may also be used as memory 1802 .
  • Memory 1802 may include code and data 18021 accessed by processor 1801 using bus 1803 .
  • the memory 1802 may further include an operating system 18023 and an application program 18022, where the application program 18022 includes at least one program that allows the processor 1801 to execute the encoding or decoding method described in the embodiment of this application.
  • the application program 18022 may include applications 1 to N, which further include an encoding or decoding application (codec application for short) that executes the encoding or decoding method described in the embodiment of this application.
  • the bus system 1803 may include not only a data bus, but also a power bus, a control bus, and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 1803 in the figure.
  • the codec apparatus 1800 may also include one or more output devices, such as a display 1804 .
  • display 1804 may be a touch-sensitive display that incorporates a display with a haptic unit operable to sense touch input.
  • the display 1804 may be connected to the processor 1801 via the bus 1803 .
  • codec device 1800 may implement the encoding method in the embodiment of the present application, and may also implement the decoding method in the embodiment of the present application.
  • Computer-readable media may include computer-readable storage media, which correspond to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, based on a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this application.
  • a computer program product may include a computer readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can contain the desired program code in the form of a computer and can be accessed by a computer.
  • any connection is properly termed a computer-readable medium.
  • coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave
  • coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • Disk and disc includes compact disc (CD), laser disc, optical disc, DVD and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • the term "processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functionality described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in conjunction with into the combined codec.
  • the techniques may be fully implemented in one or more circuits or logic elements.
  • various illustrative logical blocks, units, and modules in the encoder 100 and the decoder 200 may be understood as corresponding circuit devices or logic elements.
  • inventions of the present application may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a group of ICs (eg, a chipset).
  • IC integrated circuit
  • a group of ICs eg, a chipset
  • Various components, modules or units are described in the embodiments of the present application to emphasize the functional aspects of the apparatus for performing the disclosed technology, but they do not necessarily need to be realized by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (comprising one or more processors as described above) to supply.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (eg coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or may be a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example: floppy disk, hard disk, magnetic tape), an optical medium (for example: digital versatile disc (digital versatile disc, DVD)) or a semiconductor medium (for example: solid state disk (solid state disk, SSD)) wait.
  • a magnetic medium for example: floppy disk, hard disk, magnetic tape
  • an optical medium for example: digital versatile disc (digital versatile disc, DVD)
  • a semiconductor medium for example: solid state disk (solid state disk, SSD)
  • the information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • All signals are authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Stereophonic System (AREA)

Abstract

L'invention concerne un procédé et un appareil de codage et de décodage, ainsi qu'un dispositif, un support de stockage et un produit programme informatique, qui appartiennent au domaine technique du traitement audio. Dans le procédé de codage et de décodage, un schéma de codage et de décodage qui est sélectionné sur la base d'un haut-parleur virtuel est combiné à un schéma de codage et de décodage basé sur un codage audio directionnel, de façon à effectuer un codage et un décodage sur un signal de HOA d'une trame audio, c'est-à-dire que des schémas de codage et de décodage appropriés sont sélectionnés pour différentes trames audio, de telle sorte que le taux de compression d'un signal audio peut être amélioré. De plus, afin d'obtenir une transition douce de la qualité auditive lors de la commutation entre différents schémas de codage et de décodage, pour certaines trames audio, au lieu d'utiliser directement l'un ou l'autre de deux schémas de codage et de décodage, un nouveau schéma de codage et de décodage est utilisé pour coder et décoder les trames audio, c'est-à-dire que des signaux de canaux spécifiés dans des signaux de HOA des trames audio sont codés en un flux de code, c'est-à-dire qu'un schéma de compromis est utilisé pour le codage et le décodage, ce qui permet de réaliser une transition douce de la qualité auditive après que le rendu et la lecture sont effectués sur les signaux de HOA récupérés par décodage.
PCT/CN2022/120495 2021-09-29 2022-09-22 Procédé et appareil de codage et de décodage, et dispositif, support de stockage et produit programme informatique WO2023051368A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111155384.0A CN115881140A (zh) 2021-09-29 2021-09-29 编解码方法、装置、设备、存储介质及计算机程序产品
CN202111155384.0 2021-09-29

Publications (1)

Publication Number Publication Date
WO2023051368A1 true WO2023051368A1 (fr) 2023-04-06

Family

ID=85756476

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/120495 WO2023051368A1 (fr) 2021-09-29 2022-09-22 Procédé et appareil de codage et de décodage, et dispositif, support de stockage et produit programme informatique

Country Status (2)

Country Link
CN (1) CN115881140A (fr)
WO (1) WO2023051368A1 (fr)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060173675A1 (en) * 2003-03-11 2006-08-03 Juha Ojanpera Switching between coding schemes
WO2011034376A2 (fr) * 2009-09-17 2011-03-24 Lg Electronics Inc. Procédé et appareil destinés au traitement d'un signal audio
CN102341851A (zh) * 2009-03-06 2012-02-01 株式会社Ntt都科摩 声音信号编码方法、声音信号解码方法、编码装置、解码装置、声音信号处理系统、声音信号编码程序以及声音信号解码程序
US20120320978A1 (en) * 2003-05-12 2012-12-20 Google Inc. Coder optimization using independent bitstream partitions and mixed mode entropy coding
US20150098572A1 (en) * 2012-05-14 2015-04-09 Thomson Licensing Method and apparatus for compressing and decompressing a higher order ambisonics signal representation
US20170164131A1 (en) * 2014-07-02 2017-06-08 Dolby International Ab Method and apparatus for decoding a compressed hoa representation, and method and apparatus for encoding a compressed hoa representation
US20170365264A1 (en) * 2015-03-09 2017-12-21 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
CN109215668A (zh) * 2017-06-30 2019-01-15 华为技术有限公司 一种声道间相位差参数的编码方法及装置
CN112074902A (zh) * 2018-02-01 2020-12-11 弗劳恩霍夫应用研究促进协会 使用混合编码器/解码器空间分析的音频场景编码器、音频场景解码器及相关方法

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060173675A1 (en) * 2003-03-11 2006-08-03 Juha Ojanpera Switching between coding schemes
US20120320978A1 (en) * 2003-05-12 2012-12-20 Google Inc. Coder optimization using independent bitstream partitions and mixed mode entropy coding
CN102341851A (zh) * 2009-03-06 2012-02-01 株式会社Ntt都科摩 声音信号编码方法、声音信号解码方法、编码装置、解码装置、声音信号处理系统、声音信号编码程序以及声音信号解码程序
WO2011034376A2 (fr) * 2009-09-17 2011-03-24 Lg Electronics Inc. Procédé et appareil destinés au traitement d'un signal audio
US20150098572A1 (en) * 2012-05-14 2015-04-09 Thomson Licensing Method and apparatus for compressing and decompressing a higher order ambisonics signal representation
US20170164131A1 (en) * 2014-07-02 2017-06-08 Dolby International Ab Method and apparatus for decoding a compressed hoa representation, and method and apparatus for encoding a compressed hoa representation
US20170365264A1 (en) * 2015-03-09 2017-12-21 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
CN109215668A (zh) * 2017-06-30 2019-01-15 华为技术有限公司 一种声道间相位差参数的编码方法及装置
CN112074902A (zh) * 2018-02-01 2020-12-11 弗劳恩霍夫应用研究促进协会 使用混合编码器/解码器空间分析的音频场景编码器、音频场景解码器及相关方法

Also Published As

Publication number Publication date
TW202333139A (zh) 2023-08-16
CN115881140A (zh) 2023-03-31

Similar Documents

Publication Publication Date Title
US20150280676A1 (en) Metadata for ducking control
US20140226842A1 (en) Spatial audio processing apparatus
JPWO2008016097A1 (ja) ステレオ音声符号化装置、ステレオ音声復号装置、およびこれらの方法
CN107277691B (zh) 基于云的多声道音频播放方法、系统及音频网关装置
US20230298600A1 (en) Audio encoding and decoding method and apparatus
US20230137053A1 (en) Audio Coding Method and Apparatus
CN114067810A (zh) 音频信号渲染方法和装置
GB2592896A (en) Spatial audio parameter encoding and associated decoding
US11031021B2 (en) Inter-channel phase difference parameter encoding method and apparatus
CN111149157A (zh) 使用经扩展参数对高阶立体混响系数的空间关系译码
WO2021213128A1 (fr) Procédé et appareil de codage de signal audio
US20230145725A1 (en) Multi-channel audio signal encoding and decoding method and apparatus
US20230105508A1 (en) Audio Coding Method and Apparatus
WO2023051368A1 (fr) Procédé et appareil de codage et de décodage, et dispositif, support de stockage et produit programme informatique
TWI847276B (zh) 編解碼方法、裝置、設備、儲存媒體及電腦程式產品
WO2023051367A1 (fr) Procédé et appareil de décodage, et dispositif, support de stockage et produit programme d'ordinateur
AU2021388397A1 (en) Audio encoding/decoding method and device
WO2023051370A1 (fr) Appareil et procédés de codage et de décodage, dispositif, support de stockage et programme informatique
CN115497485A (zh) 三维音频信号编码方法、装置、编码器和系统
WO2024139865A1 (fr) Procédé de détermination de haut-parleur virtuel et appareil associé
WO2022242534A1 (fr) Procédé et appareil d'encodage, procédé et appareil de décodage, dispositif, support de stockage et programme informatique
WO2022258036A1 (fr) Procédé et appareil d'encodage, procédé et appareil de décodage, dispositif, support de stockage et programme informatique
CN118283485A (zh) 虚拟扬声器的确定方法及相关装置
JP2024518846A (ja) 3次元オーディオ信号符号化方法および装置、ならびにエンコーダ
WO2021255327A1 (fr) Gestion de gigue de réseau pour de multiples flux audio

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874755

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE