WO2022012628A1 - 多声道音频信号编解码方法和装置 - Google Patents

多声道音频信号编解码方法和装置 Download PDF

Info

Publication number
WO2022012628A1
WO2022012628A1 PCT/CN2021/106514 CN2021106514W WO2022012628A1 WO 2022012628 A1 WO2022012628 A1 WO 2022012628A1 CN 2021106514 W CN2021106514 W CN 2021106514W WO 2022012628 A1 WO2022012628 A1 WO 2022012628A1
Authority
WO
WIPO (PCT)
Prior art keywords
energy
amplitude
channel
channels
equalization
Prior art date
Application number
PCT/CN2021/106514
Other languages
English (en)
French (fr)
Inventor
王智
丁建策
王宾
王喆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020237005513A priority Critical patent/KR20230038777A/ko
Priority to EP21843200.3A priority patent/EP4174854A4/en
Publication of WO2022012628A1 publication Critical patent/WO2022012628A1/zh
Priority to US18/154,633 priority patent/US20230145725A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes

Definitions

  • the present application relates to audio coding and decoding technologies, and in particular, to a method and apparatus for coding and decoding multi-channel audio signals.
  • Audio coding is one of the key technologies of multimedia technology. Audio coding compresses the amount of data by removing redundant information in the original audio signal to facilitate storage or transmission.
  • Multi-channel audio coding is the coding of more than two channels, and the common ones are 5.1 channels, 7.1 channels, 7.1.4 channels, 22.2 channels, etc.
  • a serial bit stream is formed to facilitate the channel transmitted or stored in digital media.
  • the present application provides a method and device for encoding and decoding a multi-channel audio signal, which are beneficial to improving the quality of encoding and decoding audio signals.
  • an embodiment of the present application provides a multi-channel audio signal encoding method, the method may include: acquiring audio signals of P channels of a current frame of the multi-channel audio signal, where P is a positive integer greater than 1,
  • the P channels include K channel pairs, each channel pair includes two channels, K is a positive integer, and P is greater than or equal to K*2.
  • the energy/amplitude equalized side information of the K channel pairs is generated.
  • the side information of the energy/amplitude equalization of the K channel pairs and the audio signals of the P channels are encoded to obtain an encoded code stream.
  • the encoded code stream carries the energy/amplitude equalized side information of the K channel pairs, but does not carry the energy/amplitude of the unpaired channels.
  • Equalized side information which can reduce the number of bits of energy/amplitude equalized side information in the encoded code stream, reduce the number of bits of multi-channel side information, and can allocate the saved bits to other functional modules of the encoder to improve decoding. The quality of the audio signal is reconstructed at the end, and the encoding quality is improved.
  • the saved bits can be used for the encoding of multi-channel audio signals, so as to reduce the compression rate of the data part and improve the quality of the reconstructed audio signals at the decoding end.
  • the encoded code stream includes a control information part and a data part
  • the control information part may include the above-mentioned energy/amplitude equalization side information
  • the data part may include the above-mentioned multi-channel audio signal, that is, the encoded code stream includes the multi-channel audio signal and Control information generated in the process of encoding the multi-channel audio signal.
  • the number of bits occupied by the control information part can be reduced to increase the number of bits occupied by the data part, thereby improving the quality of the reconstructed audio signal at the decoding end.
  • saved bits may also be used for other control information transmission, and the embodiments of the present application are not limited by the foregoing examples.
  • the K channel pairs include the current channel pair
  • the side information of the energy/amplitude equalization of the current channel pair includes: fixed-point energy/amplitude scaling and energy/amplitude of the current channel pair Scaling identifier
  • the fixed-point energy/amplitude scaling ratio is the fixed-point value of the energy/amplitude scaling factor
  • the energy/amplitude scaling factor is based on the respective energy/amplitude of the audio signals of the two channels of the current channel pair before equalization
  • the energy/amplitude is obtained from the energy/amplitude after equalization with the respective energy/amplitude of the audio signals of the two channels
  • the energy/amplitude scaling identifier is used to identify the respective audio signals of the two channels of the current channel pair.
  • the energy/amplitude after energy/amplitude equalization is enlarged or reduced relative to the respective energy/amplitude before energy/amplitude equalization.
  • the decoding end can perform energy de-equalization to obtain a decoded signal.
  • the bits occupied by the energy/amplitude balanced side information can be saved, thereby improving transmission efficiency.
  • the K channel pairs include the current channel pair, and according to the respective energy/amplitude of the audio signals of the P channels, the side information of the energy/amplitude equalization of the K channel pairs is generated, It may include: according to the energy/amplitude of the audio signals of the two channels of the current channel pair before equalization, determining the respective energy/amplitude of the audio signals of the two channels of the current channel pair after equalization energy/amplitude.
  • the current channel pair is generated according to the energy/amplitude of the audio signals of the two channels of the current channel pair before energy/amplitude equalization, and the energy/amplitude of the audio signals of the two channels after equalization of the energy/amplitude. Side information for energy/amplitude equalization of channel pairs.
  • the current channel pair includes a first channel and a second channel
  • the side information of the energy/amplitude equalization of the current channel pair includes: the fixed-point energy/amplitude scaling ratio of the first channel , the fixed-point energy/amplitude scaling ratio of the second channel, the energy/amplitude scaling identifier of the first channel, and the energy/amplitude scaling identifier of the second channel.
  • the decoding end can be made to perform energy de-equalization, so as to obtain the decoded signal, and further reduce the current Bits occupied by side information for energy/amplitude equalization of channel pairs.
  • energy/amplitude generating side information of the energy/amplitude equalization of the current channel pair, which may include: energy/amplitude before equalization according to the energy/amplitude of the audio signal of the qth channel of the current channel pair, and the The energy/amplitude of the equalized energy/amplitude of the audio signal of the q-th channel determines the energy/amplitude scaling factor of the q-th channel and the energy/amplitude scaling flag of the q-th channel.
  • the energy/amplitude scaling factor of the qth channel the fixed-point energy/amplitude scaling of the qth channel is determined. where q is one or two.
  • /amplitude may include: determining the energy/amplitude average value of the audio signals of the current channel pair according to the energy/amplitude of the audio signals of the two channels of the current channel pair before equalization; The energy/amplitude average value of the audio signal of the current channel pair determines the energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair.
  • the side information of the energy/amplitude equalization of the K channel pairs and the audio signals of the P channels are encoded to obtain an encoded code stream, which may include: the K channels
  • the side information of the energy/amplitude equalization of the pair, the corresponding channel pair indices of the K and K channel pairs, and the audio signals of the P channels are encoded to obtain an encoded code stream.
  • an embodiment of the present application provides a method for decoding a multi-channel audio signal, and the method may include: acquiring a code stream to be decoded. Demultiplexing the code stream to be decoded to obtain the current frame of the multi-channel audio signal to be decoded, the number K of channel pairs included in the current frame, the channel pair indices corresponding to the K channel pairs, and Energy/amplitude equalized side information for the K channel pairs. Decode the current frame of the multi-channel audio signal to be decoded according to the channel pair indices corresponding to the K channel pairs and the side information of the energy/amplitude equalization of the K channel pairs to obtain the current frame of the multi-channel audio signal to be decoded.
  • the decoded signal of the frame, K is a positive integer, and each channel pair includes two channels.
  • the K channel pairs include the current channel pair
  • the side information of the energy/amplitude equalization of the current channel pair includes: fixed-point energy/amplitude scaling and energy/amplitude of the current channel pair A scaling identifier, where the fixed-point energy/amplitude scaling ratio is a fixed-point value of an energy/amplitude scaling factor, and the energy/amplitude scaling factor is based on the respective energy/amplitude of the audio signals of the two channels of the current channel pair
  • the energy/amplitude before equalization is obtained from the energy/amplitude after equalization of the respective energy/amplitude of the audio signals of the two channels, and the energy/amplitude scaling identifier is used to identify the audio signals of the two channels of the current channel pair
  • the energy/amplitude after the respective energy/amplitude equalization is enlarged or reduced relative to the energy/amplitude before the respective energy/amplitude equalization.
  • the K channel pairs include the current channel pair, and according to the channel pair indices corresponding to the K channel pairs and the side information of the energy/amplitude equalization of the K channel pairs,
  • Decoding the current frame of the multi-channel audio signal to be decoded to obtain the decoded signal of the current frame may include: according to the channel pair index corresponding to the current channel pair, decoding the current frame of the multi-channel audio signal to be decoded.
  • the frame is subjected to stereo decoding processing to obtain the audio signals of the two channels of the current channel pair of the current frame.
  • the side information of the energy/amplitude equalization of the current channel pair perform energy/amplitude de-equalization processing on the audio signals of the two channels of the current channel pair to obtain the two channels of the current channel pair. decode the signal.
  • the current channel pair includes a first channel and a second channel
  • the side information of the energy/amplitude equalization of the current channel pair includes: the fixed-point energy/amplitude scaling ratio of the first channel , the fixed-point energy/amplitude scaling ratio of the second channel, the energy/amplitude scaling identifier of the first channel, and the energy/amplitude scaling identifier of the second channel.
  • an embodiment of the present application provides an audio signal encoding apparatus.
  • the audio signal encoding apparatus may be an audio encoder, or a chip or a system-on-a-chip of an audio encoding device, and may also be an audio encoder for implementing the above-mentioned first A functional module of the method of any possible design of the aspect or the above-mentioned first aspect.
  • the audio signal encoding apparatus can implement the functions performed in the first aspect or each possible design of the first aspect, and the functions can be implemented by executing corresponding software in hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the audio signal encoding apparatus may include: an acquisition module, an equalization side information generation module, and an encoding module.
  • an embodiment of the present application provides an audio signal decoding apparatus
  • the audio signal decoding apparatus may be an audio decoder, or a chip or a system-on-a-chip of an audio decoding device, and may also be used in an audio decoder to implement the second A functional module of the method of any possible design of the aspect or the above-mentioned second aspect.
  • the audio signal decoding apparatus can implement the functions performed in the second aspect or each possible design of the second aspect, and the functions can be implemented by executing corresponding software in hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the audio signal decoding apparatus may include: an acquisition module, a demultiplexing module, and a decoding module.
  • an embodiment of the present application provides an audio signal encoding apparatus, characterized by comprising: a non-volatile memory and a processor coupled to each other, wherein the processor invokes program codes stored in the memory to execute The above-mentioned first aspect or any possible design method of the above-mentioned first aspect.
  • an embodiment of the present application provides an audio signal decoding apparatus, characterized by comprising: a non-volatile memory and a processor coupled to each other, wherein the processor invokes program codes stored in the memory to execute The above-mentioned second aspect or any possible design method of the above-mentioned second aspect.
  • an embodiment of the present application provides an audio signal encoding device, characterized by comprising: an encoder, where the encoder is configured to execute the above-mentioned first aspect or any possible design method of the above-mentioned first aspect.
  • an embodiment of the present application provides an audio signal decoding device, characterized by comprising: a decoder, where the decoder is configured to execute the second aspect or any possible design method of the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium, which is characterized by comprising an encoded code stream obtained according to the above-mentioned first aspect or any possible design method of the above-mentioned first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, including a computer program, and when the computer program is executed on a computer, the computer program causes the computer to execute the method described in any one of the first aspects above, Alternatively, the method according to any one of the above second aspects is performed.
  • the present application provides a computer program product, the computer program product comprising a computer program, when the computer program is executed by a computer, for executing the method described in any one of the above first aspects, or executing The method of any one of the first aspects above.
  • the present application provides a chip, including a processor and a memory, the memory is used for storing a computer program, and the processor is used for calling and running the computer program stored in the memory to execute the above-mentioned first
  • the present application provides an encoding and decoding device, the encoding and decoding device includes an encoder and a decoder, the encoder is configured to perform the above-mentioned first aspect or any possible design method of the above-mentioned first aspect, the decoding A method for performing the above second aspect or any possible design of the above second aspect.
  • the method and device for encoding and decoding a multi-channel audio signal by acquiring the audio signals of the P channels of the current frame of the multi-channel audio signal and the respective energy/amplitude of the audio signals of the P channels, the The P channels include K channel pairs. According to the respective energy/amplitude of the audio signals of the P channels, the energy/amplitude equalized side information of the K channel pairs is generated. According to the energy/amplitude of the K channel pairs The equalized side information is used to encode the audio signals of the P channels to obtain an encoded code stream.
  • the encoded code stream By generating the energy/amplitude equalized side information of the channel pair, the encoded code stream carries the energy/amplitude equalized side information of the K channel pairs, but does not carry the energy/amplitude equalized side information of the unpaired channels , which can reduce the number of bits of side information for energy/amplitude equalization in the encoded code stream, reduce the number of bits of multi-channel side information, and allocate the saved bits to other functional modules of the encoder to improve the reconstruction of the audio signal at the decoding end. to improve the encoding quality.
  • FIG. 1 is a schematic diagram of an example of an audio encoding and decoding system in an embodiment of the application
  • FIG. 2 is a flowchart of a method for encoding a multi-channel audio signal according to an embodiment of the present application
  • FIG. 3 is a flowchart of a multi-channel audio signal encoding method according to an embodiment of the application.
  • FIG. 4 is a schematic diagram of a processing process of an encoding end according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a processing process of a multi-channel coding processing unit according to an embodiment of the present application
  • FIG. 6 is a schematic diagram of a writing process of multi-channel side information according to an embodiment of the present application.
  • FIG. 7 is a flowchart of a method for decoding a multi-channel audio signal according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a processing process of a decoding end according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a processing process of a multi-channel decoding processing unit according to an embodiment of the present application.
  • FIG. 10 is a flowchart of a multi-channel side information analysis according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an audio signal encoding apparatus 1100 according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an audio signal encoding device 1200 according to an embodiment of the application.
  • FIG. 13 is a schematic structural diagram of an audio signal decoding apparatus 1300 according to an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of an audio signal decoding apparatus 1400 according to an embodiment of the present application.
  • At least one (item) refers to one or more, and "a plurality” refers to two or more.
  • “And/or” is used to describe the relationship between related objects, indicating that there can be three kinds of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B exist , where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an “or” relationship.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c” ”, where a, b, c can be single or multiple respectively, or part of them can be single and part of them can be multiple.
  • FIG. 1 exemplarily shows a schematic block diagram of an audio encoding and decoding system 10 to which the embodiments of the present application are applied.
  • audio encoding and decoding system 10 may include source device 12 and destination device 14, source device 12 producing encoded audio data, and thus source device 12 may be referred to as an audio encoding device.
  • Destination device 14 may decode encoded audio data produced by source device 12, and thus destination device 14 may be referred to as an audio decoding device.
  • Various implementations of source device 12, destination device 14, or both may include one or more processors and a memory coupled to the one or more processors.
  • Source device 12 and destination device 14 may include a variety of devices, including desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, so-called "smart" phones, and other telephone handsets , TVs, speakers, digital media players, video game consoles, in-vehicle computers, any wearable devices, virtual reality (VR) devices, servers providing VR services, augmented reality (AR) devices, A server, wireless communication device or the like that provides AR services.
  • VR virtual reality
  • AR augmented reality
  • FIG. 1 depicts source device 12 and destination device 14 as separate devices
  • device embodiments may also include the functionality of both source device 12 and destination device 14 or both, ie source device 12 or a corresponding and the functionality of the destination device 14 or corresponding.
  • source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software, or using separate hardware and/or software, or any combination thereof .
  • Source device 12 and destination device 14 may be communicatively connected via link 13 through which destination device 14 may receive encoded audio data from source device 12 .
  • Link 13 may include one or more media or devices capable of moving encoded audio data from source device 12 to destination device 14 .
  • link 13 may include one or more communication media that enable source device 12 to transmit encoded audio data directly to destination device 14 in real-time.
  • source device 12 may modulate the encoded audio data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated audio data to destination device 14 .
  • the one or more communication media may include wireless and/or wired communication media, such as radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet).
  • the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from source device 12 to destination device 14 .
  • Source device 12 includes encoder 20 , and optionally, source device 12 may also include audio source 16 , pre-processor 18 , and communication interface 22 .
  • the encoder 20 , the audio source 16 , the preprocessor 18 , and the communication interface 22 may be hardware components in the source device 12 or software programs in the source device 12 . They are described as follows:
  • Audio source 16 which may include or may be any type of sound capture device, for example capturing real world sounds, and/or any type of audio generation device. Audio source 16 may be a microphone for capturing sound or a memory for storing audio data, audio source 16 may also include any category (internal or external) that stores previously captured or generated audio data and/or acquires or receives audio data. )interface. When the audio source 16 is a microphone, the audio source 16 may be, for example, a local or integrated microphone integrated in the source device; when the audio source 16 is a memory, the audio source 16 may be local or, for example, an integrated microphone integrated in the source device memory.
  • the interface may be, for example, an external interface that receives audio data from an external audio source, such as an external sound capture device, such as a microphone, an external memory, or an external audio generation device.
  • the interface may be any class of interface according to any proprietary or standardized interface protocol, eg wired or wireless interfaces, optical interfaces.
  • the audio data transmitted from the audio source 16 to the preprocessor 18 may also be referred to as original audio data 17 .
  • the preprocessor 18 is used for receiving the original audio data 17 and performing preprocessing on the original audio data 17 to obtain the preprocessed audio 19 or the preprocessed audio data 19 .
  • the preprocessing performed by the preprocessor 18 may include filtering, or denoising, or the like.
  • the encoder 20 (or called the audio encoder 20 ) is used to receive the pre-processed audio data 19 and to execute the embodiments of the encoding methods described later, so as to realize the audio signal encoding method described in the present application. Application on the encoding side.
  • a communication interface 22 that can be used to receive encoded audio data 21 and to transmit the encoded audio data 21 via link 13 to destination device 14 or any other device (eg, memory) for storage or direct reconstruction , the other device can be any device for decoding or storage.
  • the communication interface 22 may, for example, be used to encapsulate the encoded audio data 21 into a suitable format, eg, data packets, for transmission over the link 13 .
  • the destination device 14 includes a decoder 30 , and optionally, the destination device 14 may also include a communication interface 28 , an audio post-processor 32 and a speaker device 34 . They are described as follows:
  • a communication interface 28 may be used to receive encoded audio data 21 from source device 12 or any other source, such as a storage device, such as an encoded audio data storage device.
  • the communication interface 28 may be used to transmit or receive encoded audio data 21 via the link 13 between the source device 12 and the destination device 14, such as a direct wired or wireless connection, or via any kind of network.
  • Classes of networks are, for example, wired or wireless networks or any combination thereof, or any classes of private and public networks, or any combination thereof.
  • the communication interface 28 may, for example, be used to decapsulate data packets transmitted by the communication interface 22 to obtain encoded audio data 21 .
  • Both the communication interface 28 and the communication interface 22 may be configured as a one-way communication interface or a two-way communication interface, and may be used, for example, to send and receive messages to establish connections, acknowledge and exchange any other communication links and/or, for example, encoded audio Data transfer information about data transfer.
  • Decoder 30 (or referred to as decoder 30 ) for receiving encoded audio data 21 and providing decoded audio data 31 or decoded audio 31 .
  • the decoder 30 may be configured to execute the embodiments of the decoding methods described later, so as to realize the application of the audio signal decoding method described in this application on the decoding side.
  • the post-processing performed by the audio post-processor 32 may include, for example, rendering, or any other processing, and may also be used to transmit the post-processed audio data 33 to the speaker device 34 .
  • a loudspeaker device 34 for receiving post-processed audio data 33 to play audio to eg a user or viewer.
  • the speaker device 34 may be or include any type of speaker for presenting the reconstructed sound.
  • FIG. 1 depicts source device 12 and destination device 14 as separate devices
  • device embodiments may include the functionality of both source device 12 and destination device 14 or both, ie source device 12 or Corresponding functionality and destination device 14 or corresponding functionality.
  • source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software, or using separate hardware and/or software, or any combination thereof .
  • Source device 12 and destination device 14 may include any of a variety of devices, including any class of handheld or stationary devices, for example, notebook or laptop computers, mobile phones, smartphones, tablet or tablet computers, video cameras, desktops Computers, set-top boxes, televisions, cameras, in-vehicle equipment, stereos, digital media players, audio game consoles, audio streaming devices (such as content serving servers or content distribution servers), broadcast receiver equipment, broadcast transmitter equipment, Smart glasses, smart watches, etc., and can use no or any kind of operating system.
  • handheld or stationary devices for example, notebook or laptop computers, mobile phones, smartphones, tablet or tablet computers, video cameras, desktops Computers, set-top boxes, televisions, cameras, in-vehicle equipment, stereos, digital media players, audio game consoles, audio streaming devices (such as content serving servers or content distribution servers), broadcast receiver equipment, broadcast transmitter equipment, Smart glasses, smart watches, etc., and can use no or any kind of operating system.
  • Both encoder 20 and decoder 30 may be implemented as any of a variety of suitable circuits, eg, one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (application-specific integrated circuits) circuit, ASIC), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof.
  • DSPs digital signal processors
  • ASIC application-specific integrated circuits
  • FPGA field-programmable gate array
  • an apparatus may store instructions for the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure . Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered one or more processors.
  • the audio encoding and decoding system 10 shown in FIG. 1 is merely an example, and the techniques of this application may be applicable to audio encoding setups (eg, audio encoding or decoding).
  • data may be retrieved from local storage, streamed over a network, and the like.
  • An audio encoding device may encode and store data to memory, and/or an audio decoding device may retrieve and decode data from memory.
  • encoding and decoding is performed by devices that do not communicate with each other but only encode data to and/or retrieve data from memory and decode data.
  • the above-mentioned encoder may be a multi-channel encoder, for example, a stereo encoder, a 5.1 channel encoder, or a 7.1 channel encoder, or the like.
  • the above audio data may also be referred to as an audio signal.
  • the audio signal in the embodiment of the present application refers to an input signal in an audio coding device, and the audio signal may include multiple frames.
  • the current frame may specifically refer to a certain one of the audio signals.
  • frame in the embodiment of the present application, the encoding and decoding of the audio signal of the current frame is used as an example, and the previous frame or the next frame of the current frame in the audio signal can be encoded and decoded correspondingly according to the encoding and decoding mode of the audio signal of the current frame, The encoding and decoding process of the previous frame or the next frame of the current frame in the audio signal will not be described one by one.
  • the audio signal in this embodiment of the present application may be a multi-channel audio signal, that is, including P channels. The embodiments of the present application are used to implement encoding and decoding of multi-channel audio signals.
  • the above-mentioned encoder can execute the multi-channel audio signal encoding method of the embodiment of the present application, so as to reduce the number of bits of multi-channel side information, so that the saved bits can be allocated to other functional modules of the encoder, so as to improve the reconstruction of the decoding end.
  • the quality of the audio signal to improve the encoding quality can refer to the specific explanations of the following embodiments.
  • FIG. 2 is a flowchart of a multi-channel audio signal encoding method according to an embodiment of the present application.
  • the execution body of the embodiment of the present application may be the above encoder.
  • the method in this embodiment may include:
  • Step 201 Acquire the audio signals of the P channels of the current frame of the multi-channel audio signal and the respective energy/amplitude of the audio signals of the P channels, where the P channels include K channel pairs.
  • each channel pair includes two channels.
  • P is a positive integer greater than 1
  • K is a positive integer
  • P is greater than or equal to K*2.
  • K channel pairs can be obtained by filtering and grouping multi-channel signals in the current frame of the multi-channel audio signal.
  • the above-mentioned P channels include K channel pairs.
  • the audio signals of the P channels also include unpaired Q mono audio signals.
  • the 5.1 channel includes a left (L) channel, a right (R) channel, a center (C) channel, a low frequency effects (LFE) channel, and a left surround (LS) channel. ) channel, and Surround Right (RS) channel.
  • the channels involved in multi-channel processing are screened from 5.1 channels.
  • the channels involved in multi-channel processing include L channel, R channel, C channel, LS channel channel, RS channel. Pairing is performed on channels participating in multichannel processing.
  • the L channel and the R channel are paired to form the first channel pair.
  • the LS channel and the RS channel are paired to form a second channel pair.
  • the above-mentioned P channels include a first channel pair, a second channel pair, and an unpaired LFE channel and a C channel.
  • the way of grouping the channels involved in the multi-channel processing may be to determine K channel pairs through multiple iterations, that is, to determine one channel pair in one iteration. For example, in the first iteration, calculate the inter-channel correlation value between any two channels among the P channels participating in the multi-channel processing, and select the two channels with the highest inter-channel correlation value in the first iteration form a channel pair. In the second iteration, two channels with the highest inter-channel correlation value among the remaining channels (excluding the paired channels among the P channels) are selected to form a channel pair. By analogy, K channel pairs are obtained.
  • the embodiments of the present application may also adopt other grouping methods to determine the K channel pairs, and the embodiments of the present application are not limited by the foregoing exemplary description of grouping.
  • Step 202 Generate energy/amplitude equalized side information of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels.
  • energy/amplitude in the embodiments of the present application represents energy or amplitude, and, in the actual processing process, for the processing of a frame, if the energy is initially processed, then in the subsequent processing All are processing energy, or, if amplitude is initially processed, then amplitude is processed in subsequent processing.
  • energy-balanced side information for the K channel pairs is generated from the energy of the audio signals of the P channels. That is, the energy of the P channels is used for energy equalization, and the side information of the energy equalization is obtained.
  • the amplitudes of the audio signals of the P channels generate energy-balanced side information for the K channel pairs. That is, energy equalization is performed using the amplitudes of the P channels to obtain side information of energy equalization.
  • the amplitudes of the audio signals of the P channels are used to generate amplitude-balanced side information of the K channel pairs. That is, the amplitudes of the P channels are used for amplitude equalization to obtain the side information of the amplitude equalization.
  • the embodiment of the present invention performs stereo encoding on the channel pair.
  • the two channels of the current channel pair may be processed first.
  • the energy/amplitude of the audio signal is energy/amplitude equalized to obtain the energy/amplitude equalized energy/amplitude of the two channels, and then the subsequent stereo encoding processing is performed based on the energy/amplitude equalized energy/amplitude.
  • the energy/amplitude equalization may be based on the audio signals of two channels of the current channel pair, but not based on other channel pairs other than the current channel pair and/or corresponding to a single channel
  • the energy/amplitude equalization may be further based on other channel pairs and/or the corresponding monophonic channels in addition to the audio signals of the two channels of the current channel pair. audio signal.
  • the side information of the energy/amplitude equalization is used for the decoding end to perform energy/amplitude de-equalization to obtain a decoded signal.
  • the side information of the energy/amplitude equalization may include a fixed-point energy/amplitude scaling ratio and an energy/amplitude scaling identifier.
  • the fixed-point energy/amplitude scaling ratio is the fixed-point value of the energy/amplitude scaling factor.
  • the energy/amplitude scaling factor is obtained from the energy/amplitude before energy/amplitude equalization and the energy/amplitude after energy/amplitude equalization.
  • the /amplitude scaling flag is used to identify whether the energy/amplitude after energy/amplitude equalization is enlarged or reduced relative to the energy/amplitude before energy/amplitude equalization.
  • the energy/amplitude scaling factor may be an energy/amplitude scaling factor, and the energy/amplitude scaling factor is between (0, 1).
  • the side information of the energy/amplitude equalization of the channel pair may include the fixed-point energy/amplitude scaling ratio and the energy/amplitude scaling identifier of the channel pair.
  • the fixed-point energy/amplitude scaling ratio of the channel pair includes the fixed-point energy/amplitude scaling ratio of the first channel and the fixed-point energy/amplitude scaling ratio of the second channel.
  • Amplitude scaling ratio, the energy/amplitude scaling identifier of the channel pair includes the energy/amplitude scaling identifier of the first channel and the energy/amplitude scaling identifier of the second channel.
  • the fixed-point energy/amplitude scaling ratio of the first channel is the fixed-point value of the energy/amplitude scaling factor of the first channel
  • the energy/amplitude scaling factor of the first channel is based on
  • the energy/amplitude of the audio signal of the first channel before energy/amplitude equalization is obtained from the energy/amplitude of the audio signal of the first channel after energy/amplitude equalization.
  • the energy/amplitude scaling identifier of the first channel is obtained according to the energy/amplitude of the audio signal of the first channel before energy/amplitude equalization and the energy/amplitude of the audio signal of the first channel after equalization.
  • the energy/amplitude scaling factor of the first channel is the energy/amplitude of the audio signal of the first channel before energy/amplitude equalization and the energy/amplitude of the audio signal of the first channel after equalization
  • the smaller of the /amplitude is divided by the energy/amplitude of the audio signal of the first channel before equalization and the energy/amplitude of the audio signal of the first channel after equalization.
  • the energy/amplitude of the audio signal of the first channel before equalization is greater than the energy/amplitude of the audio signal of the first channel after equalization, then the energy/amplitude of the first channel
  • the scaling factor is the energy/amplitude of the audio signal of the first channel after energy/amplitude equalization divided by the energy/amplitude of the audio signal of the first channel before energy/amplitude equalization.
  • the energy/amplitude scaling identifier of the first channel is 1.
  • the energy/amplitude scaling identifier of the first channel is 0.
  • the energy/amplitude scaling flag of one channel is 0, and the implementation principle thereof is similar, and the embodiments of the present application are not limited by the foregoing examples.
  • the energy/amplitude scaling factor in this embodiment of the present application may also be referred to as a floating-point energy/amplitude scaling factor.
  • the energy/amplitude equalization side information may include fixed-point energy/amplitude scaling.
  • the fixed-point energy/amplitude scaling ratio is the fixed-point value of the energy/amplitude scaling factor
  • the energy/amplitude scaling factor is the ratio of the energy/amplitude before energy/amplitude equalization to the energy/amplitude after energy/amplitude equalization. That is, the energy/amplitude scaling factor is the energy/amplitude before energy/amplitude equalization divided by the energy/amplitude after energy/amplitude equalization.
  • the decoding end may determine that the energy/amplitude after energy/amplitude equalization is amplified relative to the energy/amplitude before energy/amplitude equalization.
  • the decoding end may determine that the energy after energy/amplitude equalization is reduced relative to the energy/amplitude before energy/amplitude equalization.
  • the energy/amplitude scaling factor can also be the energy/amplitude after energy/amplitude equalization, divided by the energy/amplitude before energy/amplitude equalization, the implementation principle is similar, and the above examples are not used in the embodiments of the present application. Description as a limitation. In this implementation manner, the energy/amplitude equalization side information may not include the energy/amplitude scaling identifier.
  • Step 203 Encode the audio signals of the P channels according to the energy/amplitude equalized side information of the K channel pairs to obtain an encoded code stream.
  • the side information of the energy/amplitude equalization of the K channel pairs and the audio signals of the P channels are encoded to obtain an encoded code stream. That is, the energy/amplitude equalized side information of the K channel pairs is written into the encoded code stream. In other words, the side information of the energy/amplitude equalization of the K channel pairs is carried in the encoded code stream, but the side information of the energy/amplitude equalization of the unpaired channels is not carried, so that the energy/amplitude equalization in the encoded code stream can be reduced.
  • the encoded code stream also carries the number of channel pairs and K channel pair indices of the current frame, where the number of channel pairs and the K channel pair indices are used for the decoding end to perform stereo decoding, Energy/amplitude de-equalization and other processing.
  • a channel pair index is used to indicate two channels included in a channel pair.
  • a possible implementation of step 203 is to encode the energy/amplitude equalized side information of the K channel pairs, the number of channel pairs, the indices of the K channel pairs, and the audio signals of the P channels, Get the encoded code stream.
  • the number of channel pairs can be K.
  • the K channel pair indices include respective channel pair indices corresponding to the K channel pairs.
  • the sequence of writing the above-mentioned number of channel pairs, K channel pair indices, and the energy/amplitude equalized side information of the K channel pairs into the encoded code stream may be that the number of channel pairs is written first, so that the number of channel pairs is written first.
  • the decoding end decodes the received code stream, it first obtains the number of channel pairs. Afterwards, the K channel pair indices and the energy/amplitude equalized side information for the K channel pairs are written.
  • the number of channel pairs may be 0, that is, there is no channel paired, then the number of channel pairs and the audio signals of the P channels are encoded to obtain an encoded code stream.
  • the decoding end decodes the received code stream, and first obtains that the number of channel pairs is 0, then it can directly decode the current frame of the multi-channel audio signal to be decoded, without further parsing to obtain the side information of energy/amplitude equalization.
  • energy/amplitude equalization may also be performed on the coefficients in the current frame of the channel according to the fixed-point energy/amplitude scaling ratio of the channel and the energy/amplitude scaling flag.
  • P channels of the current frame of the multi-channel audio signal are acquired, the P channels include K channel pairs, and K channels are generated according to the energy/amplitude of the audio signals of the P channels
  • K channels are generated according to the energy/amplitude of the audio signals of the P channels
  • the audio signals of the P channels are encoded to obtain an encoded code stream.
  • the encoded code stream By generating the energy/amplitude equalized side information of the channel pair, the encoded code stream carries the energy/amplitude equalized side information of the K channel pairs, but does not carry the energy/amplitude equalized side information of the unpaired channels , which can reduce the number of bits of side information for energy/amplitude equalization in the encoded code stream, reduce the number of bits of multi-channel side information, and allocate the saved bits to other functional modules of the encoder to improve the reconstruction of the audio signal at the decoding end. to improve the encoding quality.
  • FIG. 3 is a flowchart of a multi-channel audio signal encoding method according to an embodiment of the present application.
  • the execution body of the embodiment of the present application may be the above encoder, and this embodiment is the method described in the embodiment shown in FIG. 2 above.
  • a specific implementation manner, as shown in FIG. 3 the method of this embodiment may include:
  • Step 301 Acquire the audio signals of P channels of the current frame of the multi-channel audio signal.
  • Step 302 Perform screening and pairing of multi-channel signals on the P channels of the current frame of the multi-channel audio signal, and determine K channel pairs and K channel pair indices.
  • step 201 For the specific implementation of screening and group pairing, reference may be made to the explanation of step 201 in the embodiment shown in FIG. 2 .
  • a channel pair index is used to indicate the two channels included in the channel pair. Different values of the channel pair index correspond to two different channels. The corresponding relationship between the value of the channel pair index and the two channels may be preset.
  • the L channel and the R channel are grouped to form a first channel pair.
  • the LS channel and the RS channel are paired to form a second channel pair.
  • the first channel pair index is used to indicate the L channel and R channel group pair. For example, the value of the first channel pair index is 0.
  • the second channel pair index is used to indicate the LS channel and RS channel group pair. For example, the value of the second channel pair index is 9.
  • Step 303 Perform energy/amplitude equalization processing on the respective audio signals of the K channels respectively, and obtain the audio signals after the respective energy/amplitude equalization of the K channels and the energy/amplitude equalization of the K channels. side information.
  • an implementation method is to perform energy/amplitude equalization processing with the channel pair as the granularity: according to the respective energy/amplitude of the audio signals of the two channels of the channel pair:
  • the energy/amplitude before amplitude equalization determines the energy/amplitude after equalization of the respective energy/amplitude of the audio signals of the two channels of the channel pair.
  • the current channel is generated according to the energy/amplitude of the audio signals of the two channels of the channel pair before equalization, and the energy/amplitude of the audio signals of the two channels after equalization.
  • the side information of the energy/amplitude equalization is obtained, and the energy/amplitude equalized audio signal of the two channels is obtained.
  • the following method may be used: according to the respective energy/amplitude of the audio signals of the two channels of the channel pair The energy/amplitude before amplitude equalization, determine the energy/amplitude average value of the audio signal of the channel pair, and determine the audio signals of the two channels of the channel pair according to the energy/amplitude average value of the audio signal of the channel pair.
  • the respective energy/amplitude equalized energy/amplitude For example, the respective energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the channel pair are equal, and both are the average energy/amplitude of the audio signals of the channel pair.
  • a channel pair may include a first channel and a second channel
  • the side information of the energy/amplitude equalization of the channel pair includes: the fixed-point energy/amplitude scaling of the first channel, the Fixed-point energy/amplitude scaling ratio, energy/amplitude scaling flag for the first channel, and energy/amplitude scaling flag for the second channel.
  • the energy/amplitude scaling factor of the qth channel determines the energy/amplitude scaling factor for the qth channel.
  • the energy/amplitude scaling factor of the qth channel the fixed-point energy/amplitude scaling of the qth channel is determined.
  • the energy/amplitude scaling flag of the qth channel is determined.
  • the fixed-point energy/amplitude scaling ratio of the qth channel and the energy/amplitude scaling identifier of the qth channel of a channel pair may be determined according to the following formulas (1) to (3).
  • the fixed-point energy/amplitude scaling of the qth channel is calculated according to equations (1) and (2).
  • scaleInt_q is the fixed-point energy/amplitude scaling of the qth channel
  • scaleF_q is the floating-point energy/amplitude scaling factor of the qth channel
  • M is the scaling factor from floating-point energy/amplitude to fixed-point energy/amplitude scaling
  • the number of fixed-point bits of (a,min(b,(x))), a ⁇ b, ceil(x) is a function that rounds up x.
  • M can take any integer, for example, M takes 4.
  • energyBigFlag_q When energy_q>energy_q e , energyBigFlag_q is set to 1, and when energy_q ⁇ energy_q e , energyBigFlag_q is set to 0.
  • energy_q is the energy/amplitude before energy/amplitude equalization of the qth channel
  • energy_q e is the energy/amplitude after the energy/amplitude equalization of the qth channel
  • energyBigFlag_q is the energy/amplitude scaling flag of the qth channel.
  • energy_q e may be the average of the energy/amplitude of the two channels of the channel pair.
  • energy_q is the energy/amplitude of the qth channel before energy/amplitude equalization
  • energy_q e is the energy/amplitude of the qth channel after energy/amplitude equalization
  • scaleF_q is the floating-point energy/amplitude scaling of the qth channel scale factor.
  • energy_q is determined by the following formula (3).
  • sampleCoef(q, i) represents the i-th coefficient of the current frame of the q-th channel before energy/amplitude equalization, and N is the number of frequency domain coefficients of the current frame.
  • energy/amplitude equalization may be performed on the current frame of the qth channel according to the fixed-point energy/amplitude scaling ratio of the qth channel and the energy/amplitude scaling identifier of the qth channel, to obtain the energy/amplitude equalized audio signal of the qth channel.
  • i is used to identify the coefficient of the current frame
  • q(i) is the i-th frequency domain coefficient of the current frame before energy/amplitude equalization
  • q e (i) is the i-th frequency domain coefficient of the current frame after energy/amplitude equalization
  • M is the number of fixed-point bits from floating-point energy/amplitude scaling factor to fixed-point energy/amplitude scaling.
  • Another possible implementation is to perform energy/amplitude equalization processing at granularity of all channels or all channel pairs or some channels in all channels. For example, according to the energy/amplitude of the audio signals of the P channels before equalization, the average energy/amplitude of the audio signals of the P channels is determined, and according to the energy/amplitude average of the audio signals of the P channels The value determines the energy/amplitude equalized energy or amplitude of the respective energy/amplitude of the audio signals of the two channels of a channel pair. For example, the average value of the energy/amplitude of the audio signals of the P channels may be used as the energy/amplitude equalized energy or amplitude of the audio signals of any one channel of a channel pair.
  • the method for determining the energy or amplitude after energy/amplitude equalization is different from the above-mentioned one achievable method, and the other methods for determining the side information of energy/amplitude equalization can be the same.
  • the specific implementation please refer to the above description, which will not be repeated here. .
  • the side information of the energy/amplitude equalization of the current channel pair includes the fixed-point energy/amplitude scaling ratio and the energy/amplitude scaling identifier of the first channel, and the fixed-point energy/amplitude scaling ratio of the second channel and the energy/amplitude scaling flag, that is, for the current channel (the first channel or the second channel), the side information includes both the fixed-point energy/amplitude scaling ratio and the energy/amplitude scaling flag, because when the energy is obtained When the /amplitude scaling is performed, the larger one of the energy/amplitude of the current channel before energy/amplitude equalization and the energy/amplitude of the current channel after equalization is compared with the smaller one or the smaller one Compared with the larger one, the obtained energy/amplitude scaling ratio is fixed to be greater than or equal to 1, or the obtained energy/amplitude scaling ratio is fixed to be less than or equal to 1, so simply use the energy/amplitude scaling ratio or fixed-point energy/amplitude scaling The scaling ratio cannot determine whether the energy/amplitude after energy/amplitude equalization is greater than
  • the energy/amplitude of the current channel before energy/amplitude equalization and the energy/amplitude of the current channel after equalization may be fixedly used, or the energy/amplitude of the front channel may be fixedly used
  • the energy/amplitude after equalization and the energy/amplitude of the current channel before equalization, so that the energy/amplitude scaling flag does not need to be indicated, and accordingly, the side information of the current channel can include fixed points Energy/amplitude scaling, but need not include the energy/amplitude scaling flag.
  • Step 304 Perform stereo processing on the respective energy/amplitude equalized audio signals of the K channel pairs, respectively, to obtain the respective stereo processed audio signals of the K channel pairs and the respective stereo side information of the K channel pairs.
  • a channel pair perform stereo processing on the energy/amplitude equalized audio signals of the two channels of the channel pair to obtain the stereo processed audio signals of the two channels, and generate the audio signal.
  • Stereo side information for a channel pair Taking a channel pair as an example, perform stereo processing on the energy/amplitude equalized audio signals of the two channels of the channel pair to obtain the stereo processed audio signals of the two channels, and generate the audio signal.
  • Stereo side information for a channel pair For a channel pair.
  • Step 305 Stereo processed audio signals for the K channel pairs, energy/amplitude equalized side information for the K channel pairs, stereo side information for the K channel pairs, K, the K channel pair indices, and The audio signal of the unpaired channel is encoded to obtain the encoded code stream.
  • Stereo processed audio signal for K channel pairs, energy/amplitude equalized side information for K channel pairs, stereo side information for K channel pairs, number of channel pairs (K), K sound The channel pair index and the audio signal of the unpaired channel are encoded to obtain the encoded code stream for the decoding end to decode and reconstruct the audio signal.
  • the audio signals of the P channels of the current frame of the multi-channel audio signal are acquired, and the P channels of the current frame of the multi-channel audio signal are screened and paired to determine the K channels.
  • Channel pair and K channel pair indices respectively perform energy/amplitude equalization processing on the respective audio signals of the K channel pairs, and obtain the K channel pairs’ respective energy/amplitude equalized audio signals and K channels
  • For the respective energy/amplitude equalized side information perform stereo processing on the respective energy/amplitude equalized audio signals of the K channels, and obtain the K channel pairs of the respective stereo processed audio signals and K audio signals.
  • the respective stereo side information of the channel pairs, the stereo processed audio signal of the K channel pairs, the energy/amplitude equalized side information of the K channel pairs, the stereo side information of the K channel pairs, K, K The channel pair index and the audio signal of the unpaired channel are encoded to obtain the encoded code stream.
  • the encoded code stream By generating the energy/amplitude equalized side information of the channel pair, the encoded code stream carries the energy/amplitude equalized side information of the K channel pairs, but does not carry the energy/amplitude equalized side information of the unpaired channels , which can reduce the number of bits of side information for energy/amplitude equalization in the encoded code stream, reduce the number of bits of multi-channel side information, and allocate the saved bits to other functional modules of the encoder to improve the reconstruction of the audio signal at the decoding end. to improve the encoding quality.
  • the following embodiments take a 5.1-channel signal as an example to schematically illustrate the multi-channel audio signal encoding method according to the embodiment of the present application.
  • FIG. 4 is a schematic diagram of a processing process of an encoding end according to an embodiment of the present application.
  • the encoding end may include a multi-channel encoding processing unit 401 , a channel encoding unit 402 and a code stream multiplexing interface 403 .
  • the encoding end may be an encoder as described above.
  • the multi-channel encoding processing unit 401 is used to perform multi-channel signal screening, group pairing, stereo processing, and generation of side information and stereo side information for energy/amplitude equalization on the input signal.
  • the input signal is a 5.1 (L channel, R channel, C channel, LFE channel, LS channel, RS channel) signal.
  • the multi-channel encoding processing unit 401 pairs the L channel signal and the R channel signal to form a first channel pair, and obtains the middle channel M1 channel signal and the side channel S1 sound through stereo processing.
  • the LS channel signal and the RS channel signal are paired to form a second channel pair, and the middle channel M2 channel signal and the side channel S2 channel signal are obtained through stereo processing.
  • the specific description of the multi-channel encoding processing unit 401 may refer to the following embodiment shown in FIG. 5 .
  • the multi-channel encoding processing unit 401 outputs the stereo-processed M1 channel signal, the S1 channel signal, the M2 channel signal, the S2 channel signal, the LFE channel signal and the C channel signal without stereo processing, and the energy/ Amplitude equalized side information, stereo side information, and channel pair index.
  • the channel encoding unit 402 is used to encode the stereo processed M1 channel signal, S1 channel signal, M2 channel signal, S2 channel signal, LFE channel signal and C channel signal without stereo processing, and multi-channel signal.
  • the channel side information is encoded, and the encoded channels E1-E6 are output.
  • the multi-channel side information may include energy/amplitude equalized side information, stereo side information, and channel pair indices.
  • the multi-channel side information may also include side information of bit allocation, side information of entropy coding, and the like, which are not specifically limited in this embodiment of the present application.
  • the channel encoding unit 402 sends the encoded channels E1-E6 to the stream multiplexing interface 403.
  • the code stream multiplexing interface 403 multiplexes the six coded channels E1-E6 to form a serial bitstream (bitStream), that is, a coded code stream, so as to facilitate transmission of multi-channel audio signals in channels or storage in digital media.
  • bitStream serial bitstream
  • FIG. 5 is a schematic diagram of a processing process of a multi-channel encoding processing unit according to an embodiment of the present application.
  • the above-mentioned multi-channel encoding processing unit 401 may include a multi-channel screening unit 4011 and an iterative processing unit 4012.
  • the processing unit 4012 may include a group pair decision unit 40121 , a channel pair energy/amplitude equalization unit 40122 , a channel pair energy/amplitude equalization unit 40123 , a stereo processing unit 40124 and a stereo processing unit 40125 .
  • the multi-channel screening unit 4011 filters out the participating channels from the 5.1 input channels (L channel, R channel, C channel, LS channel, RS channel, LFE channel) according to the multi-channel processing indicator (MultiProcFlag).
  • Multi-channel processing channels including L channel, R channel, C channel, LS channel, RS channel.
  • the group pair decision unit 40121 in the iterative processing unit 4012 calculates the inter-channel between each pair of channels in the L channel, the R channel, the C channel, the LS channel and the RS channel. related value.
  • the channel pair (L channel, R channel) with the highest inter-channel correlation value among the channels (L channel, R channel, C channel, LS channel, RS channel) is selected ) to form the first channel pair.
  • the side information of the first channel pair includes energy/amplitude equalized side information of the first channel pair, stereo side information and a channel index.
  • the channel pair (LS channel, RS channel) with the highest inter-channel correlation value among the channels (C channel, LS channel, RS channel) is selected to form a second channel pair.
  • the LS channel and the RS channel are energy/amplitude equalized by the energy/amplitude equalization unit 40123 to obtain the LS e channel and the RS e channel.
  • the stereo processing unit 40125 performs stereo processing on the LS e channel and the RS e channel, and obtains the side information of the second channel pair and the stereo processed center channel M2 and side channel S2.
  • the side information for the second channel pair includes energy/amplitude equalized side information for the second channel pair, stereo side information, and a channel index.
  • the side information of the first channel pair and the side information of the second channel pair constitute the multi-channel side information.
  • the channel pair energy/amplitude equalization unit 40122 and the channel pair energy/amplitude equalization unit 40123 average the energy/amplitude of the input channel pair to obtain the energy/amplitude after energy/amplitude equalization.
  • the channel pair energy/amplitude equalization unit 40122 can determine the energy/amplitude after energy/amplitude equalization by the following formula (4).
  • the Avg(a 1 , a 2 ) function outputs the mean of the two parameters a 1 , a 2 .
  • energy_L is the frame energy/amplitude of the L channel before energy/amplitude equalization
  • energy_R is the frame energy/amplitude of the R channel before energy/amplitude equalization
  • energy_avg_pair1 is the energy/amplitude equalized energy/amplitude of the first channel pair/ magnitude.
  • energy_L and energy_R can be determined by the above formula (3).
  • the channel pair energy/amplitude equalization unit 40123 can determine the energy/amplitude after energy/amplitude equalization by the following formula (4).
  • energy_avg_pair2 avg(energy_LS,energy_RS) (5)
  • the Avg(a 1 , a 2 ) function outputs the mean of the two parameters a 1 , a 2 .
  • energy_LS is the frame energy/amplitude of the LS channel before energy/amplitude equalization
  • energy_RS is the frame energy/amplitude of the RS channel before energy/amplitude equalization
  • energy_avg_pair2 is the energy/amplitude equalized energy/amplitude of the second channel pair/ magnitude.
  • the energy/amplitude equalization side information of the first channel pair and the energy/amplitude equalization side information of the second channel pair in the above-mentioned embodiment are generated.
  • the energy/amplitude equalized side information of the first channel pair and the energy/amplitude equalized side information of the second channel pair are transmitted in the encoded code stream to guide the energy/amplitude de-equalization at the decoding end.
  • S01 Calculate the energy/amplitude energy_avg_pair1 of the first channel pair after being equalized by the channel pair energy/amplitude equalization unit 40122.
  • the energy_avg_pair1 is determined by the above formula (4).
  • S02 Calculate the floating-point energy/amplitude scaling factor of the L channel of the first channel pair.
  • energy_L e is equal to energy_avg_pair1.
  • S03 Calculate the fixed-point energy/amplitude scaling of the L channel of the first channel pair.
  • the fixed-point energy/amplitude scaling of the L channel is scaleInt_L.
  • the fixed-point number of bits from the floating-point energy/amplitude scaling factor scaleF_L to the fixed-point energy/amplitude scaling scaleInt_L is a fixed value.
  • the ceil(x) function is a function that rounds up x.
  • the clip(x,a,b) function is a two-way clamp function that clamps x to between [a,b].
  • S04 Calculate the energy/amplitude scaling flag of the L channel of the first channel pair.
  • the energy/amplitude scaling for the L channel is identified as energyBigFlag_L. If energy_L>energy_L e , energyBigFlag_L is set to 1, otherwise if energy_L ⁇ energy_L e , energyBigFlag_L is set to 0.
  • L e (i) L(i) ⁇ scaleInt_L/(1 ⁇ 4).
  • i is used to identify the coefficient of the current frame
  • L(i) is the ith frequency domain coefficient of the current frame before energy/amplitude equalization
  • Similar operations S01 to S04 may be performed on the R channel of the first channel pair to obtain the floating-point energy/amplitude scaling factor scaleF_R, the fixed-point energy/amplitude scaling scaleInt_R, the energy/amplitude scaling flag energyBigFlag_R for the R channel, and R e after the current frame power / amplitude equalization. That is, L is replaced with R in the above-mentioned S01 to S04.
  • Similar operations S01 to S04 may be performed on the LS channel of the second channel pair to obtain the floating-point energy/amplitude scaling factor scaleF_LS, the fixed-point energy/amplitude scaling scaleInt_LS, the energy/amplitude scaling flag energyBigFlag_LS for the LS channel, and The current frame LS e after energy/amplitude equalization. That is, in the above S01 to S04, L is replaced by LS.
  • the multi-channel side information includes the number of channel pairs, the side information of the energy/amplitude equalization of the first channel pair, the index of the first channel pair, the second channel Pair of energy/amplitude equalization side information and second channel pair index.
  • the number of channel pairs is currPairCnt
  • the energy/amplitude equalized side information of the first channel pair and the energy/amplitude equalized side information of the second channel pair are two-dimensional arrays
  • the first channel pair index and the second channel pair index as a 1D array.
  • the fixed-point energy/amplitude scaling for the first channel pair is PairILDScale[0][0] and PairILDScale[0][1]
  • the energy/amplitude scaling for the first channel pair is identified as energyBigFlag[0][0] and energyBigFlag[0][1]
  • the fixed-point energy/amplitude scaling of the second channel pair is PairILDScale[1][0] and PairILDScale[1][1]
  • the energy/amplitude scaling of the second channel pair is identified as energyBigFlag[1][0] and energyBigFlag[1][1].
  • the first channel pair index is PairIndex[0]
  • the second channel pair index is PairIndex[1].
  • the number of channel pairs currPairCnt may be a fixed bit length, for example, may be composed of 4 bits, and may identify up to 16 stereo pairs.
  • PairIndex[pair] the value definition of the channel pair index PairIndex[pair] is shown in Table 1, and the channel pair index can be variable-length encoding, which is used for transmission in the encoded code stream to save bits and for audio signal recovery at the decoding end .
  • PairIndex[0] 0, which indicates that the channel pair includes an R channel and an L channel.
  • energyBigFlag[0][0] energyBigFlag_L.
  • energyBigFlag[0][1] energyBigFlag_R.
  • energyBigFlag[1][0] energyBigFlag_LS.
  • energyBigFlag[1][1] energyBigFlag_RS.
  • the number of channel pairs currPairCnt can be 4 bits.
  • Step 602 determine whether the pair is less than the number of channel pairs, if so, execute step 603, if not, end.
  • Step 604 Write the fixed-point energy/amplitude scaling ratio of the i-th channel pair into the code stream.
  • Step 605 Write the energy/amplitude scaling identifier of the i-th channel pair into the code stream. For example, write energyBigFlag[0][0] and energyBigFlag[0][1] to the codestream. energyBigFlag[0][0] and energyBigFlag[0][1] may each occupy 1 bit.
  • Fig. 7 is the flow chart of a kind of multi-channel audio signal decoding method of the embodiment of the present application
  • the execution body of the embodiment of the present application can be the above-mentioned decoder, as shown in Fig. 7, the method of the present embodiment can include:
  • Step 701 Acquire a code stream to be decoded.
  • the to-be-decoded code stream may be the encoded code stream obtained by the above encoding method embodiment.
  • Step 702 Demultiplex the code stream to be decoded to obtain the current frame of the multi-channel audio signal to be decoded and the number of channel pairs included in the current frame.
  • the M1 channel signal, the S1 channel signal, the M2 channel signal, the S2 channel signal, the LFE channel signal and the C channel signal are obtained, and The number of channel pairs.
  • Step 703 Determine whether the number of channel pairs is equal to 0, if yes, go to Step 704, if not, go to Step 705.
  • Step 704 Decode the current frame of the multi-channel audio signal to be decoded to obtain the decoded signal of the current frame.
  • the current frame of the multi-channel audio signal to be decoded can be decoded to obtain the decoded signal of the current frame.
  • Step 705 Parse the current frame, and obtain the K channel pair indices and the side information of the energy/amplitude equalization of the K channel pairs included in the current frame.
  • the current frame can be further parsed to obtain other control information, for example, the index of the K channel pairs and the side information of the energy/amplitude equalization of the K channel pairs of the current frame , so that energy/amplitude de-equalization is performed in the subsequent decoding process of the current frame of the multi-channel audio signal to be decoded, so as to obtain the decoded signal of the current frame.
  • other control information for example, the index of the K channel pairs and the side information of the energy/amplitude equalization of the K channel pairs of the current frame , so that energy/amplitude de-equalization is performed in the subsequent decoding process of the current frame of the multi-channel audio signal to be decoded, so as to obtain the decoded signal of the current frame.
  • Step 706 Decode the current frame of the multi-channel audio signal to be decoded according to the indices of the K channel pairs and the energy/amplitude equalized side information of the K channel pairs to obtain the decoded signal of the current frame.
  • energy/amplitude de-equalization is performed based on the energy/amplitude equalized side information of the K channel pairs.
  • the side information of the energy/amplitude equalization of a channel pair may include the fixed-point energy/amplitude scaling ratio and the energy/amplitude scaling identifier of the channel pair, and the specific explanation can refer to the explanation of the foregoing coding embodiments description, which will not be repeated here.
  • the current frame of the multi-channel audio signal to be decoded and the number of channel pairs included in the current frame are obtained by demultiplexing the code stream to be decoded.
  • the number of channel pairs is greater than 0, further analysis is performed.
  • For the current frame obtain the K channel pair indices and the energy/amplitude equalized side information of the K channel pairs, and treat the K channel pair indices and the energy/amplitude equalized side information of the K channel pairs according to the K channel pair indices.
  • the current frame of the decoded multi-channel audio signal is decoded to obtain the decoded signal of the current frame.
  • the code stream sent by the encoding end does not carry the energy/amplitude equalized side information of the unpaired channels, the number of bits of the energy/amplitude equalized side information in the encoded code stream can be reduced, and the bits of the multi-channel side information can be reduced.
  • the saved bits can be allocated to other functional modules of the encoder to improve the quality of the reconstructed audio signal at the decoding end.
  • the following embodiments take a 5.1-channel signal as an example to schematically illustrate the multi-channel audio signal decoding method according to the embodiment of the present application.
  • FIG. 8 is a schematic diagram of a processing process of a decoding end according to an embodiment of the present application.
  • the decoding end may include a code stream demultiplexing interface 801 , a channel decoding unit 802 and a multi-channel decoding processing unit 803 .
  • the decoding process in this embodiment is an inverse process of the encoding process in the embodiments shown in FIG. 4 and FIG. 5 above.
  • the code stream demultiplexing interface 801 is used for demultiplexing the code stream output by the encoding end to obtain six channels of encoded channels E1-E6.
  • the channel decoding unit 802 is used to perform inverse entropy coding and inverse quantization on the encoded channels E1-E6 to obtain multi-channel signals, including the middle channel M1 and the side channel S1 of the first channel pair, and the second channel pair. Middle channel M2 and side channel S2, and unpaired C channel and LFE channel.
  • the channel decoding unit 802 also decodes to obtain multi-channel side information.
  • the multi-channel side information includes side information (eg, entropy-coded side information) generated during the channel encoding process of the embodiment shown in FIG. 4, and side information generated during the multi-channel encoding process (eg, side information for channel pair energy/amplitude equalization).
  • the multi-channel decoding processing unit 803 performs multi-channel decoding processing on the middle channel M1 and the side channel S1 of the first channel pair, and the middle channel M2 and the side channel S2 in the second channel pair. Using multi-channel side information, decode the center channel M1 and side channel S1 of the first channel pair into L channel and R channel, and decode the center channel M2 and side channel S2 of the second channel pair into LS channel and RS channel.
  • the L channel, R channel, LS channel, RS channel, unpaired C channel and LFE channel constitute the output of the decoding end.
  • FIG. 9 is a schematic diagram of a processing process of a multi-channel decoding processing unit according to an embodiment of the present application.
  • the above-mentioned multi-channel decoding processing unit 803 may include a multi-channel screening unit 8031 and a multi-channel decoding processing sub-module 8032.
  • the multi-channel encoding processing sub-module 8032 includes two stereo decoding boxes, an energy/amplitude de-equalization unit 8033 and an energy/amplitude de-equalization unit 8034 .
  • the multi-channel screening unit 8031 selects the channels from 5.1 input channels (M1 channel, S1 channel, C channel, M2 channel, S2 channel, The M1 channel, the S1 channel, the M2 channel, and the S2 channel participating in the multi-channel processing are selected from the LFE channel).
  • the stereo decoding box in the multi-channel decoding processing sub-module 8032 is used to perform the following steps: instruct the stereo decoding box to decode the first channel pair (M1, S1) into the Le channel according to the stereo side information of the first channel pair R e and channels.
  • the stereo decoding box is directed to decode the second channel pair (M2, S2) into the LS e channel and the RS e channel according to the stereo side information of the second channel pair.
  • Power / amplitude unit 8033 to the equalizer for performing the steps of: a first energy guide to the channel equalization unit of the L e and R e channel according to the channel side information for a first channel energy / amplitude / amplitude To equalize and restore to L channel, R channel.
  • the energy/amplitude de-equalization unit 8034 is configured to perform the following steps: according to the side information of the energy/amplitude equalization of the second channel pair, instruct the first channel pair de-equalization unit to restore the LS e channel and the RS e channel to the LS sound channel. channel, RS channel.
  • FIG. 10 is a flowchart of a multi-channel side information analysis according to an embodiment of the present application. This embodiment is the inverse process of the embodiment shown in FIG. 6 above.
  • step 701 is to parse the code stream to obtain the current The number of channel pairs for the frame. For example, the number of channel pairs currPairCnt occupies 4 bits in the code stream.
  • Step 702 Determine whether the number of channel pairs in the current frame is zero, if yes, end, if not, go to Step 703.
  • the number of channel pairs in the current frame, currPairCnt is zero, indicating that the current frame has not been paired, and the side information of energy/amplitude equalization is not obtained through analysis.
  • Step 703 Determine whether the pair is less than the number of channel pairs, if so, go to Step 704, if not, end.
  • Step 705 Parse the fixed-point energy/amplitude scaling ratio of the i-th channel pair from the code stream. For example, PairILDScale[pair][0] and PairILDScale[pair][1].
  • Step 706 Parse the energy/amplitude scaling identifier of the i-th channel pair from the code stream. For example, energyBigFlag[pair][0] and energyBigFlag[pair][1].
  • the side information parsing process of the first channel pair and the second channel pair is described by taking the 5.1 (L, R, C, LFE, LS, RS) signal of the encoder as an example.
  • the side information parsing process of the first channel pair is as follows: parsing the 4-bit channel pair index PairIndex[0] from the code stream, and mapping it into the L channel and the R channel according to the definition rule of the channel pair index. Parse the fixed-point energy/amplitude scaling PairILDScale[0][0] of the L channel and the fixed-point energy/amplitude scaling PairILDScale[0][1] of the R channel from the code stream. The energy/amplitude scaling flag energyBigFlag[0][0] of the L channel and the energy/amplitude scaling flag energyBigFlag[0][1] of the R channel are parsed from the code stream. Parse the stereo side information of the first channel pair from the bitstream. The side information parsing of the first channel pair ends.
  • the side information parsing process of the second channel pair is as follows: Parse the 4-bit channel pair index PairIndex[1] from the code stream, and map it into LS channels and RS channels according to the definition rule of the channel pair index. Parse the fixed-point energy/amplitude scaling PairILDScale[1][0] of the LS channel and the fixed-point energy/amplitude scaling PairILDScale[1][1] of the RS channel from the code stream. Parse the energy/amplitude scaling flag energyBigFlag[1][0] of the LS channel and the energy/amplitude scaling flag energyBigFlag[1][1] of the RS channel from the code stream. Parse the stereo side information for the second channel pair from the bitstream. The side information parsing of the second channel pair ends.
  • Process Energy / amplitude unit 8033 to the equalizer for channel L e and R e of the channel of the first channel power / amplitude equalization to the following:
  • the frequency domain coefficient of the L channel after energy/amplitude de-equalization is obtained according to the floating-point energy/amplitude scaling factor scaleF_L of the L channel.
  • L(i) L e (i) ⁇ scaleF_L; wherein, i is used to identify the coefficient of the current frame, L(i) is the ith frequency domain coefficient of the current frame before energy/amplitude equalization, and L e (i) is the ith frequency domain coefficient of the current frame after energy/amplitude equalization.
  • the frequency domain coefficient of the R channel after energy/amplitude de-equalization is obtained according to the floating-point energy/amplitude scaling factor scaleF_R of the R channel.
  • R (i) R e ( i) ⁇ scaleF_R; where, i identifies the coefficients for the current frame, L (i) is the i th frequency domain coefficients of the current frame before the power / amplitude equalization, L e (i) is the ith frequency domain coefficient of the current frame after energy/amplitude equalization.
  • Power / amplitude equalization unit 8034 to a second channel for the energy channels and LS e RS e channel / amplitude equalization to, specific embodiments thereof with the first channel of the L e and R e sound channel
  • the energy/amplitude de-equalization of the channel is consistent, which will not be repeated here.
  • the output of the multi-channel decoding processing unit 803 is the decoded L channel signal, R channel signal, LS channel signal, RS channel signal, C channel signal and LFE channel signal.
  • the number of bits of the energy/amplitude equalized side information in the encoded code stream can be reduced, reducing the multi-channel
  • the number of bits of side information, the saved bits can be allocated to other functional modules of the encoder to improve the quality of the reconstructed audio signal at the decoding end.
  • an embodiment of the present application further provides an audio signal encoding apparatus, which can be applied to an audio encoder.
  • FIG. 11 is a schematic structural diagram of an audio signal encoding apparatus according to an embodiment of the present application.
  • the audio signal encoding apparatus 1100 includes an acquisition module 1101 , an equalization side information generation module 1102 , and an encoding module 1103 .
  • the acquisition module 1101 is used to acquire the respective energy/amplitude of the audio signals of the P channels and the audio signals of the P channels of the current frame of the multi-channel audio signal, where P is a positive integer greater than 1, and the P channels It includes K channel pairs, each channel pair includes two channels, K is a positive integer, and P is greater than or equal to K*2.
  • Equalization side information generation module 1102 for generating the energy/amplitude equalized side information of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels;
  • the encoding module 1103 is configured to encode the energy/amplitude equalized side information of the K channel pairs and the audio signals of the P channels to obtain an encoded code stream.
  • the K channel pairs include a current channel pair
  • the energy/amplitude equalized side information of the current channel pair includes: fixed-point energy/amplitude scaling and energy/amplitude scaling of the current channel pair Identifies that the fixed-point energy/amplitude scaling ratio is the fixed-point value of the energy/amplitude scaling factor, and the energy/amplitude scaling factor is based on the respective energy/amplitude before equalization of the audio signals of the two channels of the current channel pair.
  • energy/amplitude is obtained by equalizing the energy/amplitude with the respective energy/amplitude of the audio signals of the two channels, and the energy/amplitude scaling identifier is used to identify the respective energy of the audio signals of the two channels of the current channel pair
  • the energy/amplitude after /amplitude equalization is enlarged or reduced relative to the respective energy/amplitude before equalization.
  • the K channel pairs include the current channel pair
  • the equalization side information generation module 1102 is configured to: according to the respective energy/amplitude energy/amplitude of the audio signals of the two channels of the current channel pair before equalization/ Amplitude, to determine the energy/amplitude of the respective energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair.
  • the current audio signal is generated according to the energy/amplitude of the audio signals of the two channels of the current channel pair before equalization and the energy/amplitude of the audio signals of the two channels after equalization. Side information for energy/amplitude equalization of track pairs.
  • the current channel pair includes a first channel and a second channel
  • the side information of the energy/amplitude equalization of the current channel pair includes: the fixed-point energy/amplitude scaling of the first channel, the first channel The fixed-point energy/amplitude scaling ratio of the second channel, the energy/amplitude scaling flag of the first channel, and the energy/amplitude scaling flag of the second channel.
  • the equalization side information generating module 1102 is configured to: equalize the energy/amplitude before equalization according to the energy/amplitude of the qth channel of the current channel pair and the energy/amplitude of the audio signal of the qth channel After the energy/amplitude, determine the energy/amplitude scaling factor of the audio signal of the qth channel. According to the energy/amplitude scaling factor of the qth channel, the fixed-point energy/amplitude scaling of the qth channel is determined. According to the energy/amplitude of the qth channel before energy/amplitude equalization, and the energy/amplitude of the qth channel after energy/amplitude equalization, determine the energy/amplitude scaling identifier of the qth channel. where q is one or two.
  • the equalization side information generating module 1102 is configured to: determine the energy/amplitude of the audio signals of the current channel pair according to the energy/amplitude of the audio signals of the two channels of the current channel pair before equalization.
  • the energy/amplitude average value according to the energy/amplitude average value of the audio signal of the current channel pair, determines the energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair.
  • the encoding module 1103 is configured to: perform the processing on the energy/amplitude equalized side information of the K channel pairs, the channel pair indices corresponding to the K and K channel pairs, and the audio signals of the P channels. Encode to get the encoded bitstream.
  • the acquisition module 1101 , the equalization side information generation module 1102 , and the encoding module 1103 can be applied to the audio signal encoding process at the encoding end.
  • an embodiment of the present application provides an audio signal encoder.
  • the audio signal encoder is used to encode an audio signal, including: performing the encoder described in one or more of the above embodiments, wherein , the audio signal encoding device is used to encode and generate the corresponding code stream.
  • an embodiment of the present application provides a device for encoding an audio signal, for example, an audio signal encoding device, please refer to FIG. 12 , the audio signal encoding device 1200 includes:
  • a processor 1201, a memory 1202, and a communication interface 1203 (wherein the number of processors 1201 in the audio signal encoding device 1200 may be one or more, and one processor is taken as an example in FIG. 12).
  • the processor 1201, the memory 1202, and the communication interface 1203 may be connected by a bus or other means, wherein the connection by a bus is taken as an example in FIG. 12 .
  • Memory 1202 may include read-only memory and random access memory, and provides instructions and data to processor 1201 .
  • a portion of memory 1202 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1202 stores an operating system and operation instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operation instructions may include various operation instructions for implementing various operations.
  • the operating system may include various system programs for implementing various basic services and handling hardware-based tasks.
  • the processor 1201 controls the operation of the audio encoding device, and the processor 1201 may also be referred to as a central processing unit (central processing unit, CPU).
  • CPU central processing unit
  • various components of the audio coding device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus.
  • the various buses are referred to as bus systems in the figures.
  • the methods disclosed in the above embodiments of the present application may be applied to the processor 1201 or implemented by the processor 1201 .
  • the processor 1201 may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above-mentioned method may be completed by an integrated logic circuit of hardware in the processor 1201 or an instruction in the form of software.
  • the above-mentioned processor 1201 may be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), a field-programmable gate array (field-programmable gate array, FPGA) or Other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 1202, and the processor 1201 reads the information in the memory 1202, and completes the steps of the above method in combination with its hardware.
  • the communication interface 1203 can be used to receive or send digital or character information, for example, it can be an input/output interface, a pin or a circuit, and the like. For example, the above-mentioned encoded code stream is sent through the communication interface 1203 .
  • an embodiment of the present application provides an audio encoding device, including: a non-volatile memory and a processor coupled to each other, the processor calling program codes stored in the memory to execute Part or all of the steps of the multi-channel audio signal encoding method as described in one or more of the above embodiments.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a program code, wherein the program code includes a method for executing one or more of the above Instructions for part or all of the steps of the multi-channel audio signal encoding method described in the embodiments.
  • an embodiment of the present application provides a computer program product, when the computer program product is run on a computer, the computer is made to execute the multiple methods described in one or more of the above embodiments. Some or all of the steps of a method for encoding a channel audio signal.
  • an embodiment of the present application further provides an audio signal decoding apparatus, which can be applied to an audio decoder.
  • FIG. 13 is a schematic structural diagram of an audio signal decoding apparatus according to an embodiment of the present application.
  • the audio signal decoding apparatus 1300 includes an acquisition module 1301 , a demultiplexing module 1302 , and a decoding module 1303 .
  • the obtaining module 1301 is used to obtain the code stream to be decoded.
  • the demultiplexing module 1302 is used for demultiplexing the code stream to be decoded to obtain the current frame of the multi-channel audio signal to be decoded, the number K of channel pairs included in the current frame, and the corresponding audio channels of the K channel pairs.
  • the decoding module 1303 is configured to decode the current frame of the multi-channel audio signal to be decoded according to the channel pair indices corresponding to the K channel pairs and the side information of the energy/amplitude equalization of the K channel pairs to obtain
  • the decoded signal of the current frame, K is a positive integer, and each channel pair includes two channels.
  • the K channel pairs include a current channel pair
  • the energy/amplitude equalized side information of the current channel pair includes: fixed-point energy/amplitude scaling and energy/amplitude scaling of the current channel pair Identifies that the fixed-point energy/amplitude scaling ratio is the fixed-point value of the energy/amplitude scaling factor, and the energy/amplitude scaling factor is based on the respective energy/amplitude before equalization of the audio signals of the two channels of the current channel pair.
  • energy/amplitude is obtained by equalizing the energy/amplitude with the respective energy/amplitude of the audio signals of the two channels, and the energy/amplitude scaling identifier is used to identify the respective energy of the audio signals of the two channels of the current channel pair
  • the energy/amplitude after /amplitude equalization is enlarged or reduced relative to the respective energy/amplitude before equalization.
  • the K channel pairs include the current channel pair
  • the decoding module 1303 is configured to: perform stereo decoding processing on the current frame of the multi-channel audio signal to be decoded according to the channel pair index corresponding to the current channel pair , to obtain the audio signals of the two channels of the current channel pair of the current frame. According to the side information of the energy/amplitude equalization of the current channel pair, perform energy/amplitude de-equalization processing on the audio signals of the two channels of the current channel pair to obtain the two channels of the current channel pair. decode the signal.
  • the current channel pair includes a first channel and a second channel
  • the side information of the energy/amplitude equalization of the current channel pair includes: the fixed-point energy/amplitude scaling of the first channel, The fixed-point energy/amplitude scaling ratio of the second channel, the energy/amplitude scaling identifier of the first channel, and the energy/amplitude scaling identifier of the second channel.
  • obtaining module 1301, demultiplexing module 1302, and decoding module 1303 can be applied to the audio signal decoding process of the decoding end.
  • an embodiment of the present application provides an audio signal decoder.
  • the audio signal decoder is used for decoding an audio signal, including: performing the decoder as described in one or more of the above embodiments, wherein , the audio signal decoding device is used for decoding to generate the corresponding code stream.
  • an embodiment of the present application provides a device for decoding an audio signal, for example, an audio signal decoding device, please refer to FIG. 14 , the audio signal decoding device 1400 includes:
  • a processor 1401, a memory 1402, and a communication interface 1403 (wherein the number of processors 1401 in the audio signal decoding device 1400 may be one or more, and one processor is taken as an example in FIG. 14).
  • the processor 1401 , the memory 1402 , and the communication interface 1403 may be connected by a bus or in other manners, wherein the connection by a bus is taken as an example in FIG. 14 .
  • Memory 1402 may include read-only memory and random access memory, and provides instructions and data to processor 1401 .
  • a portion of memory 1402 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1402 stores an operating system and operation instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operation instructions may include various operation instructions for implementing various operations.
  • the operating system may include various system programs for implementing various basic services and handling hardware-based tasks.
  • the processor 1401 controls the operation of the audio decoding device, and the processor 1401 may also be referred to as a central processing unit (central processing unit, CPU).
  • CPU central processing unit
  • various components of the audio decoding device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus.
  • bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus.
  • the various buses are referred to as bus systems in the figures.
  • the methods disclosed in the above embodiments of the present application may be applied to the processor 1401 or implemented by the processor 1401 .
  • the processor 1401 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method may be completed by an integrated logic circuit of hardware in the processor 1401 or an instruction in the form of software.
  • the above-mentioned processor 1401 may be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (field-programmable gate array, FPGA) or Other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • Other programmable logic devices discrete gate or transistor logic devices, discrete hardware components.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 1402, and the processor 1401 reads the information in the memory 1402, and completes the steps of the above method in combination with its hardware.
  • the communication interface 1403 can be used to receive or transmit digital or character information, for example, it can be an input/output interface, a pin or a circuit, and the like. For example, the above-mentioned encoded code stream is received through the communication interface 1403 .
  • an embodiment of the present application provides an audio decoding device, comprising: a non-volatile memory and a processor coupled to each other, the processor calling program codes stored in the memory to execute Part or all of the steps of the multi-channel audio signal decoding method as described in one or more of the above embodiments.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a program code, wherein the program code includes a program code for executing one or more of the above Instructions for some or all of the steps of the multi-channel audio signal decoding method described in the embodiments.
  • an embodiment of the present application provides a computer program product, when the computer program product is run on a computer, the computer is made to execute the multiple methods described in one or more of the above embodiments. Some or all of the steps of a channel audio signal decoding method.
  • the processor mentioned in the above embodiments may be an integrated circuit chip, which has signal processing capability.
  • each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the methods disclosed in the embodiments of the present application may be directly embodied as executed by a hardware coding processor, or executed by a combination of hardware and software modules in the coding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Stereophonic System (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种多声道音频信号编解码方法和装置(1100,1300)。可以实现降低多声道边信息的比特数,从而可以将节省的比特分配到编码器的其他功能模块,以提升解码端重建音频信号的质量,提升编码质量。

Description

多声道音频信号编解码方法和装置
本申请要求于2020年07月17日提交中国专利局、申请号为202010699711.8、申请名称为“多声道音频信号编解码方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频编解码技术,尤其涉及一种多声道音频信号编解码方法和装置。
背景技术
随着多媒体技术的不断发展,音频在多媒体通信、消费电子、虚拟现实、人机交互等领域得到了广泛应用。音频编码是多媒体技术的关键技术之一。音频编码通过去除原始音频信号中的冗余信息来实现数据量的压缩,以方便存储或传输。
多声道音频编码是两个以上声道的编码,常见的有5.1声道、7.1声道、7.1.4声道、22.2声道等。通过对多路原始音频信号进行多声道信号的筛选、组对、立体声处理、多声道边信息生成、量化处理、熵编码处理以及码流复用,形成串行比特流,以方便在信道中传输或在数字媒介中存储。
其中,如何减少多声道边信息的编码比特,以提升解码端重建信号的质量,成为一个亟需解决的技术问题。
发明内容
本申请提供一种多声道音频信号编解码方法和装置,有益于提升编解码音频信号的质量。
第一方面,本申请实施例提供一种多声道音频信号编码方法,该方法可以包括:获取多声道音频信号的当前帧的P个声道的音频信号,P为大于1的正整数,该P个声道包括K个声道对,每个声道对包括两个声道,K为正整数,P大于或等于K*2。获取该P个声道的音频信号各自的能量/幅度。根据该P个声道的音频信号各自的能量/幅度,生成该K个声道对的能量/幅度均衡的边信息。对该K个声道对的能量/幅度均衡的边信息,和该P个声道的音频信号进行编码,以获取编码码流。
本实现方式,通过生成声道对的能量/幅度均衡的边信息,编码码流中携带K个声道对的能量/幅度均衡的边信息,而未携带未组对的声道的能量/幅度均衡的边信息,从而可以减少编码码流中能量/幅度均衡的边信息的比特数,降低多声道边信息的比特数,可以将节省的比特分配到编码器的其他功能模块,以提升解码端重建音频信号的质量,提升编码质量。
例如,可以将节省的比特用于多声道音频信号的编码,以降低数据部分的压缩率,提升解码端重建音频信号的质量。
换言之,编码码流包括控制信息部分和数据部分,该控制信息部分可以包括上述能量/幅度均衡的边信息,数据部分可以包括上述多声道音频信号,即编码码流包括多声道音频 信号以及对该多声道音频信号进行编码过程中所产生的控制信息。本申请实施例可以通过降低控制信息部分所占用的比特数,以提升数据部分所占用的比特数,从而提升解码端重建音频信号的质量。
需要说明的是,节省的比特也可以用于其他控制信息传输,本申请实施例不以上述举例说明作为限制。
一种可能的设计中,该K个声道对包括当前声道对,该当前声道对的能量/幅度均衡的边信息包括:该当前声道对的定点能量/幅度缩放比例和能量/幅度缩放标识,该定点能量/幅度缩放比例为能量/幅度缩放比例系数的定点化值,该能量/幅度缩放比例系数根据该当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度与该两个声道的音频信号各自的能量/幅度均衡后的能量/幅度获得,该能量/幅度缩放标识用于标识该当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度相对于各自的能量/幅度均衡前的能量/幅度是被放大或被缩小。
本实现方式,通过该当前声道对的定点能量/幅度缩放比例和能量/幅度缩放标识,可以使得解码端进行能量去均衡,以获取解码信号。
通过将浮点的能量/幅度缩放比例系数转换为定点能量/幅度缩放比例,可以节省该能量/幅度均衡的边信息所占用的比特,从而提升传输效率。
一种可能的设计中,该K个声道对包括当前声道对,根据该P个声道的音频信号各自的能量/幅度,生成该K个声道对的能量/幅度均衡的边信息,可以包括:根据该当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定该当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度。根据该当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,和该两个声道的音频信号各自的能量/幅度均衡后的能量/幅度,生成该当前声道对的能量/幅度均衡的边信息。
本实现方式,通过对声道对内的两个声道进行能量/幅度均衡,可以实现对于能量差异较大的声道对间,经过能量/幅度均衡后,仍可以保持较大的能量差异,从而使得后续编码处理过程中满足能量/幅度较大的声道对的编码需求,提高编码效率和编码效果,进而提升解码端重建音频信号的质量。
一种可能的设计中,该当前声道对包括第一声道和第二声道,该当前声道对的能量/幅度均衡的边信息包括:该第一声道的定点能量/幅度缩放比例、该第二声道的定点能量/幅度缩放比例、第一声道的能量/幅度缩放标识和第二声道的能量/幅度缩放标识。
本实现方式,通过当前声道对的两个声道各自的定点能量/幅度缩放比例和能量/幅度缩放标识,可以使得解码端进行能量去均衡,以获取解码信号的基础上,进一步降低该当前声道对的能量/幅度均衡的边信息所占用的比特。
一种可能的设计中,根据该当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,和该两个声道的音频信号各自的能量/幅度均衡后的能量/幅度,生成该当前声道对的能量/幅度均衡的边信息,可以包括:根据该当前声道对的第q声道的音频信号的能量/幅度均衡前的能量/幅度,和该第q声道的音频信号的能量/幅度均衡后的能量/幅度,确定该第q声道的能量/幅度缩放比例系数和该第q声道的能量/幅度缩放标识。根据该第q声道的能量/幅度缩放比例系数,确定该第q声道的定点能量/幅度缩放比例。其中,q为一或二。
一种可能的设计中,根据该当前声道对的两个声道的音频信号各自的能量/幅度,确定该当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度,可以包括:根据该当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定该当前声道对的音频信号的能量/幅度平均值,根据该当前声道对的音频信号的能量/幅度平均值确定该当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度。
本实现方式,通过对声道对内的两个声道进行能量/幅度均衡,可以实现对于能量差异较大的声道对间,经过能量/幅度均衡后,仍可以保持较大的能量差异,从而使得后续编码处理过程中满足能量/幅度较大的声道对的编码需求,进而提升解码端重建音频信号的质量。
一种可能的设计中,对该K个声道对的能量/幅度均衡的边信息,和该P个声道的音频信号进行编码,以获取编码码流,可以包括:对该K个声道对的能量/幅度均衡的边信息、K、K个声道对各自对应的声道对索引以及该P个声道的音频信号进行编码,以获取编码码流。
第二方面,本申请实施例提供一种多声道音频信号解码方法,该方法可以包括:获取待解码码流。对该待解码码流进行解复用,以获取待解码多声道音频信号的当前帧,该当前帧包括的声道对的数量K,K个声道对各自对应的声道对索引,以及该K个声道对的能量/幅度均衡的边信息。根据该K个声道对各自对应的声道对索引,以及该K个声道对的能量/幅度均衡的边信息,对该待解码多声道音频信号的当前帧进行解码,以获取该当前帧的解码信号,K为正整数,每个声道对包括两个声道。
一种可能的设计中,该K个声道对包括当前声道对,该当前声道对的能量/幅度均衡的边信息包括:该当前声道对的定点能量/幅度缩放比例和能量/幅度缩放标识,其中,该定点能量/幅度缩放比例为能量/幅度缩放比例系数的定点化值,该能量/幅度缩放比例系数根据该当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度与该两个声道的音频信号各自的能量/幅度均衡后的能量/幅度获得,该能量/幅度缩放标识用于标识该当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度相对于各自的能量/幅度均衡前的能量/幅度是被放大或被缩小。
一种可能的设计中,该K个声道对包括当前声道对,根据该K个声道对各自对应的声道对索引,以及该K个声道对的能量/幅度均衡的边信息,对该待解码多声道音频信号的当前帧进行解码,以获取该当前帧的解码信号,可以包括:根据当前声道对对应的声道对索引,对该待解码多声道音频信号的当前帧进行立体声解码处理,以获取该当前帧的当前声道对的两个声道的音频信号。根据该当前声道对的能量/幅度均衡的边信息,对该当前声道对的两个声道的音频信号进行能量/幅度去均衡处理,以获取该当前声道对的两个声道的解码信号。
一种可能的设计中,该当前声道对包括第一声道和第二声道,该当前声道对的能量/幅度均衡的边信息包括:该第一声道的定点能量/幅度缩放比例、该第二声道的定点能量/幅度缩放比例、第一声道的能量/幅度缩放标识和第二声道的能量/幅度缩放标识。
其中,多声道音频信号解码方法的技术效果可以参见上述对应的编码方法的技术效果,此处不再赘述。
第三方面,本申请实施例提供一种音频信号编码装置,该音频信号编码装置可以为音 频编码器,或音频编码设备的芯片或者片上系统,还可以为音频编码器中用于实现上述第一方面或上述第一方面的任一可能的设计的方法的功能模块。该音频信号编码装置可以实现上述第一方面或上述第一方面的各可能的设计中所执行的功能,功能可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个上述功能相应的模块。举例来说,一种可能的设计中,该音频信号编码装置,可以包括:获取模块、均衡边信息生成模块和编码模块。
第四方面,本申请实施例提供一种音频信号解码装置,该音频信号解码装置可以为音频解码器,或音频解码设备的芯片或者片上系统,还可以为音频解码器中用于实现上述第二方面或上述第二方面的任一可能的设计的方法的功能模块。该音频信号解码装置可以实现上述第二方面或上述第二方面的各可能的设计中所执行的功能,功能可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个上述功能相应的模块。举例来说,一种可能的设计中,该音频信号解码装置,可以包括:获取模块、解复用模块和解码模块。
第五方面,本申请实施例提供一种音频信号编码装置,其特征在于,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行上述第一方面或上述第一方面的任一可能的设计的方法。
第六方面,本申请实施例提供一种音频信号解码装置,其特征在于,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行上述第二方面或上述第二方面的任一可能的设计的方法。
第七方面,本申请实施例提供一种音频信号编码设备,其特征在于,包括:编码器,所述编码器用于执行上述第一方面或上述第一方面的任一可能的设计的方法。
第八方面,本申请实施例提供一种音频信号解码设备,其特征在于,包括:解码器,所述解码器用于执行上述第二方面或上述第二方面的任一可能的设计的方法。
第九方面,本申请实施例提供一种计算机可读存储介质,其特征在于,包括根据上述第一方面或上述第一方面的任一可能的设计的方法获得的编码码流。
第十方面,本申请实施例提供一种计算机可读存储介质,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行上述第一方面中任一项所述的方法,或者,执行如上述第二方面中任一项所述的方法。
第十一方面,本申请提供一种计算机程序产品,该计算机程序产品包括计算机程序,当所述计算机程序被计算机执行时,用于执行上述第一方面中任一项所述的方法,或者执行上述第而方面中任一项所述的方法。
第十二方面,本申请提供一种芯片,包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于调用并运行所述存储器中存储的计算机程序,以执行如上述第一方面中任一项所述的方法,或者以执行如上述第二方面中任一项所述的方法。
第十三方面,本申请提供一种编解码设备,该编解码设备包括编码器和解码器,该编码器用于执行上述第一方面或上述第一方面的任一可能的设计的方法,该解码器用于执行上述第二方面或上述第二方面的任一可能的设计的方法。
本申请实施例的多声道音频信号编解码方法和装置,通过获取多声道音频信号的当前帧的P个声道的音频信号和该P个声道的音频信号各自的能量/幅度,该P个声道包括K个声道对,根据P个声道的音频信号各自的能量/幅度,生成K个声道对的能量/幅度均衡的边信息,根据K个声道对的能量/幅度均衡的边信息,对P个声道的音频信号进行编码, 获取编码码流。通过生成声道对的能量/幅度均衡的边信息,编码码流中携带K个声道对的能量/幅度均衡的边信息,而未携带未组对的声道的能量/幅度均衡的边信息,从而可以减少编码码流中能量/幅度均衡的边信息的比特数,降低多声道边信息的比特数,可以将节省的比特分配到编码器的其他功能模块,以提升解码端重建音频信号的质量,提升编码质量。
附图说明
图1为本申请实施例中的音频编码及解码系统实例的示意图;
图2为本申请实施例的一种多声道音频信号编码方法的流程图;
图3为本申请实施例的一种多声道音频信号编码方法的流程图;
图4为本申请实施例的编码端的处理过程的示意图;
图5为本申请实施例的多声道编码处理单元的处理过程的示意图;
图6为本申请实施例的一种多声道边信息的写入流程的示意图;
图7为本申请实施例的一种多声道音频信号解码方法的流程图;
图8为本申请实施例的解码端的处理过程的示意图;
图9为本申请实施例的多声道解码处理单元的处理过程的示意图;
图10为本申请实施例的一种多声道边信息解析的流程图;
图11为本申请实施例的一种音频信号编码装置1100的结构示意图;
图12为本申请实施例的一种音频信号编码设备1200的结构示意图;
图13为本申请实施例的一种音频信号解码装置1300的结构示意图;
图14为本申请实施例的一种音频信号解码设备1400的结构示意图。
具体实施方式
本申请实施例涉及的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c分别可以是单个,也可以分别是多个,也可以是部分是单个,部分是多个。
下面描述本申请实施例所应用的系统架构。参见图1,图1示例性地给出了本申请实施例所应用的音频编码及解码系统10的示意性框图。如图1所示,音频编码及解码系统10可包括源设备12和目的地设备14,源设备12产生经编码的音频数据,因此,源设备12可被称为音频编码装置。目的地设备14可对由源设备12所产生的经编码的音频数据进 行解码,因此,目的地设备14可被称为音频解码装置。源设备12、目的地设备14或两个的各种实施方案可包含一或多个处理器以及耦合到所述一或多个处理器的存储器。所述存储器可包含但不限于RAM、ROM、EEPROM、快闪存储器或可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体,如本文所描述。源设备12和目的地设备14可以包括各种装置,包含桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、所谓的“智能”电话等电话手持机、电视机、音箱、数字媒体播放器、视频游戏控制台、车载计算机、任意可穿戴设备、虚拟现实(virtual reality,VR)设备、提供VR服务的服务器、增强现实(augmented reality,AR)设备、提供AR服务的服务器、无线通信设备或其类似者。
虽然图1将源设备12和目的地设备14绘示为单独的设备,但设备实施例也可以同时包括源设备12和目的地设备14或同时包括两者的功能性,即源设备12或对应的功能性以及目的地设备14或对应的功能性。在此类实施例中,可以使用相同硬件和/或软件,或使用单独的硬件和/或软件,或其任何组合来实施源设备12或对应的功能性以及目的地设备14或对应的功能性。
源设备12和目的地设备14之间可通过链路13进行通信连接,目的地设备14可经由链路13从源设备12接收经编码的音频数据。链路13可包括能够将经编码的音频数据从源设备12移动到目的地设备14的一或多个媒体或装置。在一个实例中,链路13可包括使得源设备12能够实时将经编码的音频数据直接发射到目的地设备14的一或多个通信媒体。在此实例中,源设备12可根据通信标准(例如无线通信协议)来调制经编码的音频数据,且可将经调制的音频数据发射到目的地设备14。所述一或多个通信媒体可包含无线和/或有线通信媒体,例如射频(RF)频谱或一或多个物理传输线。所述一或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一或多个通信媒体可包含路由器、交换器、基站或促进从源设备12到目的地设备14的通信的其它设备。
源设备12包括编码器20,另外可选地,源设备12还可以包括音频源16、预处理器18、以及通信接口22。具体实现形态中,所述编码器20、音频源16、预处理器18、以及通信接口22可能是源设备12中的硬件部件,也可能是源设备12中的软件程序。分别描述如下:
音频源16,可以包括或可以为任何类别的声音捕获设备,用于例如捕获现实世界的声音,和/或任何类别的音频生成设备。音频源16可以为用于捕获声音的麦克风或者用于存储音频数据的存储器,音频源16还可以包括存储先前捕获或产生的音频数据和/或获取或接收音频数据的任何类别的(内部或外部)接口。当音频源16为麦克风时,音频源16可例如为本地的或集成在源设备中的集成麦克风;当音频源16为存储器时,音频源16可为本地的或例如集成在源设备中的集成存储器。当所述音频源16包括接口时,接口可例如为从外部音频源接收音频数据的外部接口,外部音频源例如为外部声音捕获设备,比如麦克风、外部存储器或外部音频生成设备。接口可以为根据任何专有或标准化接口协议的任何类别的接口,例如有线或无线接口、光接口。
本申请实施例中,由音频源16传输至预处理器18的音频数据也可称为原始音频数据17。
预处理器18,用于接收原始音频数据17并对原始音频数据17执行预处理,以获取经预处理的音频19或经预处理的音频数据19。例如,预处理器18执行的预处理可以包括滤波、或去噪等。
编码器20(或称音频编码器20),用于接收经预处理的音频数据19,并用于执行后文所描述的各个编码方法的实施例,以实现本申请所描述的音频信号编码方法在编码侧的应用。
通信接口22,可用于接收经编码的音频数据21,并可通过链路13将经编码的音频数据21传输至目的地设备14或任何其它设备(如存储器),以用于存储或直接重构,所述其它设备可为任何用于解码或存储的设备。通信接口22可例如用于将经编码的音频数据21封装成合适的格式,例如数据包,以在链路13上传输。
目的地设备14包括解码器30,另外可选地,目的地设备14还可以包括通信接口28、音频后处理器32和扬声设备34。分别描述如下:
通信接口28,可用于从源设备12或任何其它源接收经编码的音频数据21,所述任何其它源例如为存储设备,存储设备例如为经编码的音频数据存储设备。通信接口28可以用于藉由源设备12和目的地设备14之间的链路13或藉由任何类别的网络传输或接收经编码音频数据21,链路13例如为直接有线或无线连接,任何类别的网络例如为有线或无线网络或其任何组合,或任何类别的私网和公网,或其任何组合。通信接口28可以例如用于解封装通信接口22所传输的数据包以获取经编码的音频数据21。
通信接口28和通信接口22都可以配置为单向通信接口或者双向通信接口,以及可以用于例如发送和接收消息来建立连接、确认和交换任何其它与通信链路和/或例如经编码的音频数据传输的数据传输有关的信息。
解码器30(或称为解码器30),用于接收经编码的音频数据21并提供经解码的音频数据31或经解码的音频31。在一些实施例中,解码器30可以用于执行后文所描述的各个解码方法的实施例,以实现本申请所描述的音频信号解码方法在解码侧的应用。
音频后处理器32,用于对经解码的音频数据31(也称为经重构的音频数据)执行后处理,以获得经后处理的音频数据33。音频后处理器32执行的后处理可以包括:例如渲染,或任何其它处理,还可用于将将经后处理的音频数据33传输至扬声设备34。
扬声设备34,用于接收经后处理的音频数据33以向例如用户或观看者播放音频。扬声设备34可以为或可以包括任何类别的用于呈现经重构的声音的扬声器。
虽然,图1将源设备12和目的地设备14绘示为单独的设备,但设备实施例也可以同时包括源设备12和目的地设备14或同时包括两者的功能性,即源设备12或对应的功能性以及目的地设备14或对应的功能性。在此类实施例中,可以使用相同硬件和/或软件,或使用单独的硬件和/或软件,或其任何组合来实施源设备12或对应的功能性以及目的地设备14或对应的功能性。
本领域技术人员基于描述明显可知,不同单元的功能性或图1所示的源设备12和/或目的地设备14的功能性的存在和(准确)划分可能根据实际设备和应用有所不同。源设备12和目的地设备14可以包括各种设备中的任一个,包含任何类别的手持或静止设备,例如,笔记本或膝上型计算机、移动电话、智能手机、平板或平板计算机、摄像机、台式计算机、机顶盒、电视机、相机、车载设备、音响、数字媒体播放器、音频游戏控制台、音 频流式传输设备(例如内容服务服务器或内容分发服务器)、广播接收器设备、广播发射器设备、智能眼镜、智能手表等,并可以不使用或使用任何类别的操作系统。
编码器20和解码器30都可以实施为各种合适电路中的任一个,例如,一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件实施所述技术,则设备可将软件的指令存储于合适的非暂时性计算机可读存储介质中,且可使用一或多个处理器以硬件执行指令从而执行本公开的技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可视为一或多个处理器。
在一些情况下,图1中所示音频编码及解码系统10仅为示例,本申请的技术可以适用于不必包含编码和解码设备之间的任何数据通信的音频编码设置(例如,音频编码或音频解码)。在其它实例中,数据可从本地存储器检索、在网络上流式传输等。音频编码设备可以对数据进行编码并且将数据存储到存储器,和/或音频解码设备可以从存储器检索数据并且对数据进行解码。在一些实例中,由并不彼此通信而是仅编码数据到存储器和/或从存储器检索数据且解码数据的设备执行编码和解码。
上述编码器可以是多声道编码器,例如,立体声编码器,5.1声道编码器,或7.1声道编码器等。
上述音频数据也可以称为音频信号,本申请实施例中的音频信号是指音频编码设备中的输入信号,该音频信号中可以包括多个帧,例如当前帧可以特指音频信号中的某一个帧,本申请实施例中以当前帧音频信号的编解码进行示例说明,音频信号中当前帧的前一帧或者后一帧都可以根据该当前帧音频信号的编解码方式进行相应的编解码,对于音频信号中当前帧的前一帧或者后一帧的编解码过程不再逐一说明。另外,本申请实施例中的音频信号可以是多声道音频信号,即包括P个声道。本申请实施例用于实现多声道音频信号编解码。
上述编码器可以执行本申请实施例的多声道音频信号编码方法,以实现降低多声道边信息的比特数,从而可以将节省的比特分配到编码器的其他功能模块,以提升解码端重建音频信号的质量,提升编码质量。其具体实施方式可以参见下述实施例的具体解释说明。
图2为本申请实施例的一种多声道音频信号编码方法的流程图,本申请实施例的执行主体可以是上述编码器,如图2所示,本实施例的方法可以包括:
步骤201、获取多声道音频信号的当前帧的P个声道的音频信号和该P个声道的音频信号各自的能量/幅度,该P个声道包括K个声道对。其中,多声道信号可以是5.1声道信号(对应的P为5+1=6),7.1声道信号(对应的P为7+1=8),或11.1声道信号(对应的P为11+1=12)等等。
其中,每个声道对(channel pair)包括两个声道。P为大于1的正整数,K为正整数,P大于或等于K*2。
在一些实施例中,P=2K。通过在多声道音频信号的当前帧中进行多声道信号的筛选和组对,可以获取K个声道对。上述P个声道包括K个声道对。
在一些实施例中,P=2*K+Q,Q为正整数。该P个声道的音频信号还包括未组对的Q 个单声道的音频信号。以5.1声道信号为例,该5.1声道包括左(L)声道、右(R)声道、中央(C)声道、低频效果(low frequency effects,LFE)声道、左环绕(LS)声道、以及右环绕(RS)声道。根据多声道处理指示符(MultiProcFlag)从5.1声道中筛选出参与多声道处理的声道,例如,参与多声道处理的声道包括L声道、R声道、C声道、LS声道、RS声道。在参与多声道处理的声道中进行组对。例如,将L声道和R声道组对,形成第一声道对。将LS声道和RS声道组对,形成第二声道对。LFE声道和C声道为未组对的声道。即P=6,K=2,Q=2。上述P个声道包括第一声道对、第二声道对、以及未组对的LFE声道和C声道。
示例性的,在参与多声道处理的声道中进行组对的方式可以是,通过多次迭代的方式确定K个声道对,也即一次迭代确定一个声道对。例如,在第一迭代中计算参与多声道处理的P个声道中任意两个声道之间的声道间相关值,在第一迭代中选择声道间相关值最高的两个声道形成一个声道对。在第二迭代中选择剩余的声道中(P个声道中除已组对的声道)声道间相关值最高的两个声道形成一个声道对。以此类推,得到K个声道对。
需要说明的是,本申请实施例还可以采用其他组对方式,以确定K个声道对,本申请实施例不以上述进行组对的示例性说明作为限制。
步骤202、根据P个声道的音频信号各自的能量/幅度,生成K个声道对的能量/幅度均衡的边信息。
需要说明的是,本申请实施例中的“能量/幅度”表示的是能量或者幅度,并且,在实际处理过程中,对于一个帧的处理,如果一开始处理的是能量,那么后续的处理中都是对能量进行处理,或者,如果一开始处理的是幅度,那么后续的处理中都是对幅度进行处理。
例如,根据P个声道的音频信号的能量,生成K个声道对的能量均衡的边信息。即使用P个声道的能量进行能量均衡,得到能量均衡的边信息。或者,P个声道的音频信号的幅度,生成K个声道对的能量均衡的边信息。即使用P个声道的幅度进行能量均衡,得到能量均衡的边信息。或者,P个声道的音频信号的幅度,生成K个声道对的幅度均衡的边信息。即使用P个声道的幅度进行幅度均衡,得到幅度均衡的边信息。
具体地,本发明实施例会对声道对进行立体声编码处理,为了提高编码效率以及编码效果,例如在对当前声道对进行立体声编码处理之前,可以先对当前声道对的两个声道的音频信号的能量/幅度进行能量/幅度均衡,从而获得两个声道的能量/幅度均衡后的能量/幅度,再基于能量/幅度均衡后的能量/幅度进行后续的立体声编码处理。其中,在一种实施方式中,能量/幅度均衡可以基于当前声道对的两个声道的音频信号,但是不基于当前声道对之外的其他声道对和/或单声道对应的音频信号来进行;在另一种实施方式中,能量/幅度均衡除了要基于当前声道对的两个声道的音频信号外,还可以进一步基于其他声道对和/或单声道对应的音频信号来进行。
该能量/幅度均衡的边信息用于解码端进行能量/幅度去均衡,以获取解码信号。
一种可实现方式,该能量/幅度均衡的边信息可以包括定点能量/幅度缩放比例和能量/幅度缩放标识。该定点能量/幅度缩放比例为能量/幅度缩放比例系数的定点化值,该能量/幅度缩放比例系数根据能量/幅度均衡前的能量/幅度与能量/幅度均衡后的能量/幅度获得,该能量/幅度缩放标识用于标识能量/幅度均衡后的能量/幅度相对于能量/幅度均衡前的能量/幅度是被放大或被缩小。该能量/幅度缩放比例系数可以是能量/幅度缩放比例系数,该 能量/幅度缩放比例系数在(0,1)之间。
以一个声道对为例,该声道对的能量/幅度均衡的边信息可以包括该声道对的定点能量/幅度缩放比例和能量/幅度缩放标识。以该声道对包括第一声道和第二声道为例,该声道对的定点能量/幅度缩放比例包括第一声道的定点能量/幅度缩放比例和第二声道的定点能量/幅度缩放比例,该声道对的能量/幅度缩放标识包括第一声道的能量/幅度缩放标识和第二声道的能量/幅度缩放标识。以第一声道为例,该第一声道的定点能量/幅度缩放比例为第一声道的能量/幅度缩放比例系数的定点化值,该第一声道的能量/幅度缩放比例系数根据第一声道的音频信号的能量/幅度均衡前的能量/幅度与该第一声道的音频信号的能量/幅度均衡后的能量/幅度获得。该第一声道的能量/幅度缩放标识根据第一声道的音频信号的能量/幅度均衡前的能量/幅度与该第一声道的音频信号的能量/幅度均衡后的能量/幅度获得。例如,该第一声道的能量/幅度缩放比例系数为,第一声道的音频信号的能量/幅度均衡前的能量/幅度与该第一声道的音频信号的能量/幅度均衡后的能量/幅度中较小的,除以第一声道的音频信号的能量/幅度均衡前的能量/幅度与该第一声道的音频信号的能量/幅度均衡后的能量/幅度中较大的。举例而言,第一声道的音频信号的能量/幅度均衡前的能量/幅度大于该第一声道的音频信号的能量/幅度均衡后的能量/幅度,则第一声道的能量/幅度缩放比例系数为,第一声道的音频信号的能量/幅度均衡后的能量/幅度,除以该第一声道的音频信号的能量/幅度均衡前的能量/幅度。当第一声道的音频信号的能量/幅度均衡前的能量/幅度大于该第一声道的音频信号的能量/幅度均衡后的能量/幅度时,该第一声道的能量/幅度缩放标识为1。当第一声道的音频信号的能量/幅度均衡前的能量/幅度大于该第一声道的音频信号的能量/幅度均衡后的能量/幅度时,该第一声道的能量/幅度缩放标识为0。当然可以理解的,也可以设置,当第一声道的音频信号的能量/幅度均衡前的能量/幅度大于该第一声道的音频信号的能量/幅度均衡后的能量/幅度时,该第一声道的能量/幅度缩放标识为0,其实现原理类似,本申请实施例不以上述举例说明作为限制。
本申请实施例的能量/幅度缩放比例系数也可以称为浮点能量/幅度缩放比例系数。
另一种可实现方式,该能量/幅度均衡的边信息可以包括定点能量/幅度缩放比例。该定点能量/幅度缩放比例为能量/幅度缩放比例系数的定点化值,该能量/幅度缩放比例系数为能量/幅度均衡前的能量/幅度与能量/幅度均衡后的能量/幅度的比值。即该能量/幅度缩放比例系数为能量/幅度均衡前的能量/幅度,除以能量/幅度均衡后的能量/幅度。当该能量/幅度缩放比例系数小于1时,则解码端可以确定能量/幅度均衡后的能量/幅度相对于能量/幅度均衡前的能量/幅度是被放大。当该能量/幅度缩放比例系数大于1时,则解码端可以确定能量/幅度均衡后的能量相对于能量/幅度均衡前的能量/幅度是被缩小。当然可以理解的,该能量/幅度缩放比例系数也可以为能量/幅度均衡后的能量/幅度,除以能量/幅度均衡前的能量/幅度,其实现原理类似,本申请实施例不以上述举例说明作为限制。本实现方式中,该能量/幅度均衡的边信息可以不包括能量/幅度缩放标识。
步骤203、对K个声道对的能量/幅度均衡的边信息,对P个声道的音频信号进行编码,获取编码码流。
对K个声道对的能量/幅度均衡的边信息和P个声道的音频信号进行编码,获取编码码流。即,将K个声道对的能量/幅度均衡的边信息写入编码码流中。换言之,编码码流中携带K个声道对的能量/幅度均衡的边信息,而未携带未组对的声道的能量/幅度均衡的边 信息,从而可以减少编码码流中能量/幅度均衡的边信息的比特数。
在一些实施例中,该编码码流中还携带当前帧的声道对个数以及K个声道对索引,该声道对个数以及K个声道对索引用于解码端进行立体声解码、能量/幅度去均衡等处理。一个声道对索引用于指示一个声道对所包括的两个声道。换言之,步骤203的一种可实现方式为,对K个声道对的能量/幅度均衡的边信息、声道对个数、K个声道对索引以及P个声道的音频信号进行编码,获取编码码流。该声道对个数可以为K。该K个声道对索引包括K个声道对各自对应的声道对索引。
上述声道对个数、K个声道对索引、K个声道对的能量/幅度均衡的边信息的写入编码码流中的顺序可以是,先写入声道对个数,以使得解码端解码接收到的码流时,先获取到声道对个数。之后,写入K个声道对索引和K个声道对的能量/幅度均衡的边信息。
还需要说明的是,声道对个数可以为0,即没有组对的声道,则声道对个数和P个声道的音频信号进行编码,获取编码码流。解码端解码接收到的码流,先获取到声道对个数为0,则可以直接对待解码多声道音频信号的当前帧进行解码,而不用再解析获取能量/幅度均衡的边信息。
在获取编码码流之前,还可以根据声道的定点能量/幅度缩放比例,和能量/幅度缩放标识,对该声道的当前帧内的系数进行能量/幅度均衡。
本实施例,获取多声道音频信号的当前帧的P个声道,所述P个声道包括K个声道对,根据P个声道的音频信号的能量/幅度,生成K个声道对的能量/幅度均衡的边信息,根据K个声道对的能量/幅度均衡的边信息,对P个声道的音频信号进行编码,获取编码码流。通过生成声道对的能量/幅度均衡的边信息,编码码流中携带K个声道对的能量/幅度均衡的边信息,而未携带未组对的声道的能量/幅度均衡的边信息,从而可以减少编码码流中能量/幅度均衡的边信息的比特数,降低多声道边信息的比特数,可以将节省的比特分配到编码器的其他功能模块,以提升解码端重建音频信号的质量,提升编码质量。
图3为本申请实施例的一种多声道音频信号编码方法的流程图,本申请实施例的执行主体可以是上述编码器,本实施例为上述图2所示实施例所述的方法的一种具体的可实现方式,如图3所示,本实施例的方法可以包括:
步骤301、获取多声道音频信号的当前帧的P个声道的音频信号。
步骤302、对多声道音频信号的当前帧的P个声道进行多声道信号的筛选和组对,确定K个声道对和K个声道对索引。
其中,筛选和组对的具体实施方式可以参见图2所示实施例的步骤201的解释说明。
一个声道对索引用于指示该声道对所包括的两个声道。声道对索引的不同取值对应不同的两个声道。声道对索引的取值与两个声道的对应关系可以是预设的。
以5.1声道信号为例,例如,通过筛选和组对,将L声道和R声道组对,形成第一声道对。将LS声道和RS声道组对,形成第二声道对。LFE声道和C声道为未组对的声道。即K=2。第一声道对索引用于指示L声道和R声道组对。例如该第一声道对索引的取值为0。第二声道对索引用于指示LS声道和RS声道组对。例如该第二声道对索引的取值为9。
步骤303、分别对K个声道对各自的音频信号进行能量/幅度均衡处理,获取K个声道对各自的能量/幅度均衡后的音频信号和K个声道对各自的能量/幅度均衡的边信息。
以一个声道对的能量/幅度均衡处理为例,一种可实现方式,以声道对为粒度进行能量/幅度均衡处理:根据该声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定该声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度。根据该声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,和两个声道的音频信号各自的能量/幅度均衡后的能量/幅度,生成该当前声道对的能量/幅度均衡的边信息,并获取两个声道的能量/幅度均衡后的音频信号。
其中,对于确定该声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度,可以采用如下方式:根据该声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定该声道对的音频信号的能量/幅度平均值,根据该声道对的音频信号的能量/幅度平均值确定该声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度。例如,该声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度相等,均为该声道对的音频信号的能量/幅度平均值。
如上所述一个声道对可以包括第一声道和第二声道,该声道对的能量/幅度均衡的边信息包括:第一声道的定点能量/幅度缩放比例、第二声道的定点能量/幅度缩放比例、第一声道的能量/幅度缩放标识和第二声道的能量/幅度缩放标识。
在一些实施例中,可以根据该声道对的第q声道的音频信号的能量/幅度均衡前的能量/幅度,和第q声道的音频信号的能量/幅度均衡后的能量/幅度,确定第q声道的能量/幅度缩放比例系数。根据第q声道的能量/幅度缩放比例系数,确定第q声道的定点能量/幅度缩放比例。根据第q声道的能量/幅度均衡前的能量/幅度,和第q声道的能量/幅度均衡后的能量/幅度,确定第q声道的能量/幅度缩放标识。其中,q为一或二。
例如,可以根据如下公式(1)至(3)确定一个声道对的第q声道的定点能量/幅度缩放比例和第q声道的能量/幅度缩放标识。
根据公式(1)和(2)计算所述第q声道的定点能量/幅度缩放比例。
scaleInt_q=ceil((1<<M)×scaleF_q)                      (1)
scaleInt_q=clip(scaleInt_q,1,2 M-1)                        (2)
其中,scaleInt_q为第q声道的定点能量/幅度缩放比例,scaleF_q为第q声道的浮点能量/幅度缩放比例系数,M为从浮点能量/幅度缩放比例系数到定点能量/幅度缩放比例的定点化比特数,clip(x,a,b)函数是双向钳位函数,将x钳位到[a,b]之间,clip((x),(a),(b))=max(a,min(b,(x))),a≤b,ceil(x)是对x向上取整的函数。M可以取任意整数,例如,M取4。
当energy_q>energy_q e,energyBigFlag_q置1,当energy_q≤energy_q e,energyBigFlag_q置0。
其中,energy_q为第q声道的能量/幅度均衡前的能量/幅度,energy_q e为第q声道的能量/幅度均衡后的能量/幅度,energyBigFlag_q为第q声道的能量/幅度缩放标识。energy_q e可以是该声道对的两个声道的能量/幅度的平均值。
上述公式(1)中的scaleF_q的确定方式:当energy_q>energy_q e,scaleF_q=energy_q e/energy_q,当energy_q≤energy_q e,scaleF_q=energy_q/energy_q e。其中,energy_q为第q声道的能量/幅度均衡前的能量/幅度,energy_q e为第q声道的能量/幅度均衡后的能量/幅度,scaleF_q为第q声道的浮点能量/幅度缩放比例系数。
其中,通过如下公式(3)确定energy_q。
Figure PCTCN2021106514-appb-000001
其中,sampleCoef(q,i)表示能量/幅度均衡前的第q声道的当前帧的第i个系数,N为当前帧的频域系数的个数。
在能量/幅度均衡处理过程中,可以根据根据所述第q声道的定点能量/幅度缩放比例和第q声道的能量/幅度缩放标识,对第q声道当前帧进行能量/幅度均衡,以获取第q声道的能量/幅度均衡后的音频信号。
例如,当energyBigFlag_q为1,q e(i)=q(i)×scaleInt_q/(1<<M)。当energyBigFlag_q为0,q e(i)=q(i)×(1<<M)/scaleInt_q。
其中,i用于标识当前帧的系数,q(i)为能量/幅度均衡前的当前帧的第i个频域系数,q e(i)为能量/幅度均衡后的当前帧的第i个频域系数,M为从浮点能量/幅度缩放比例系数到定点能量/幅度缩放比例的定点化比特数。
另一种可实现方式,以所有声道或所有声道对或所有声道中的部分声道为粒度进行能量/幅度均衡处理。例如,根据P个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定P个声道的音频信号的能量/幅度平均值,根据P个声道的音频信号的能量/幅度平均值确定一个声道对的两个声道的音频信号各自的能量/幅度均衡后的能量或幅度。例如,可以将P个声道的音频信号的能量/幅度平均值作为一个声道对的任意一个声道的音频信号的能量/幅度均衡后的能量或幅度。即能量/幅度均衡后的能量或幅度的确定方式与上述一种可实现方式不同,其他确定能量/幅度均衡的边信息的方式可以相同,其具体实施方式可以参见上述说明,此处不再赘述。
在上述实施例中,当前声道对的能量/幅度均衡的边信息包括了第一声道的定点能量/幅度缩放比例和能量/幅度缩放标识、以及第二声道的定点能量/幅度缩放比例和能量/幅度缩放标识,即针对当前声道(第一声道或第二声道)来说,边信息同时包括了定点能量/幅度缩放比例和能量/幅度缩放标识,这是因为在获取能量/幅度缩放比例时,是通过当前声道能量/幅度均衡前的能量/幅度和所述当前声道的能量/幅度均衡后的能量/幅度中的较大者比上较小者或者较小者比上较大者,因此获得的能量/幅度缩放比例固定是大于或等于1,或者获得的能量/幅度缩放比例固定是小于或等于1的,因此单纯通过能量/幅度缩放比例或者定点能量/幅度缩放比例并不能确定能量/幅度均衡后的能量/幅度是不是大于能量/幅度均衡前的能量/幅度,因此需要能量/幅度缩放标识来指示。
在本方面另一个实施例中,可以固定使用当前声道能量/幅度均衡前的能量/幅度和所述当前声道的能量/幅度均衡后的能量/幅度,或者固定使用前声道能量/幅度均衡后的能量/幅度和所述当前声道的能量/幅度均衡前的能量/幅度,这样就不需要通过能量/幅度缩放标识来指示了,相应地,当前声道的边信息就可以包括定点能量/幅度缩放比例,但是不需要包括能量/幅度缩放标识。
步骤304、分别对K个声道对各自的能量/幅度均衡后的音频信号进行立体声处理,获取K个声道对各自的立体声处理后的音频信号和K个声道对各自的立体声边信息。
以一个声道对为例,对该声道对的两个声道的能量/幅度均衡后的音频信号进行立体声处理,以获取该两个声道的立体声处理后的音频信号,并生成该声道对的立体声边信息。
步骤305、对K个声道对的立体声处理后的音频信号、K个声道对的能量/幅度均衡的 边信息、K个声道对的立体声边信息、K、K个声道对索引以及未组对的声道的音频信号进行编码,获取编码码流。
对K个声道对的立体声处理后的音频信号、K个声道对的能量/幅度均衡的边信息、K个声道对的立体声边信息、声道对个数(K)、K个声道对索引以及未组对的声道的音频信号进行编码,获取编码码流,以供解码端解码重建音频信号。
本实施例,获取多声道音频信号的当前帧的P个声道的音频信号,对多声道音频信号的当前帧的P个声道进行多声道信号的筛选和组对,确定K个声道对和K个声道对索引,分别对K个声道对各自的音频信号进行能量/幅度均衡处理,获取K个声道对各自的能量/幅度均衡后的音频信号和K个声道对各自的能量/幅度均衡的边信息,分别对K个声道对各自的能量/幅度均衡后的音频信号进行立体声处理,获取K个声道对各自的立体声处理后的音频信号和K个声道对各自的立体声边信息,对K个声道对的立体声处理后的音频信号、K个声道对的能量/幅度均衡的边信息、K个声道对的立体声边信息、K、K个声道对索引以及未组对的声道的音频信号进行编码,获取编码码流。通过生成声道对的能量/幅度均衡的边信息,编码码流中携带K个声道对的能量/幅度均衡的边信息,而未携带未组对的声道的能量/幅度均衡的边信息,从而可以减少编码码流中能量/幅度均衡的边信息的比特数,降低多声道边信息的比特数,可以将节省的比特分配到编码器的其他功能模块,以提升解码端重建音频信号的质量,提升编码质量。
下面实施例以5.1声道信号为例,对本申请实施例的多声道音频信号编码方法进行示意性举例说明。
图4为本申请实施例的编码端的处理过程的示意图,如图4所示,该编码端可以包括多声道编码处理单元401、声道编码单元402和码流复用接口403。该编码端可以是如上所述的编码器。
多声道编码处理单元401用于对输入信号进行多声道信号的筛选、组对、立体声处理及能量/幅度均衡的边信息和立体声边信息的生成。本实施例中该输入信号为5.1(L声道、R声道、C声道、LFE声道、LS声道、RS声道)信号。
一种举例,多声道编码处理单元401将L声道信号和R声道信号进行组对,形成第一声道对,并经过立体声处理得到中声道M1声道信号和侧声道S1声道信号,将LS声道信号和RS声道信号进行组对,形成第二声道对,并经过立体声处理得到中声道M2声道信号和侧声道S2声道信号。其中,多声道编码处理单元401的具体说明可以参见下述图5所示实施例。
多声道编码处理单元401输出经过立体声处理的M1声道信号、S1声道信号、M2声道信号、S2声道信号和未经过立体声处理的LFE声道信号和C声道信号,以及能量/幅度均衡的边信息、立体声边信息和声道对索引。
声道编码单元402用于对经过立体声处理的M1声道信号、S1声道信号、M2声道信号、S2声道信号和未经过立体声处理的LFE声道信号和C声道信号,以及多声道边信息进行编码,输出编码声道E1-E6。该多声道边信息可以包括能量/幅度均衡的边信息、立体声边信息和声道对索引。当然可以理解的,该多声道边信息还可以包括比特分配的边信息、熵编码的边信息等,本申请实施例对此不作具体限定。声道编码单元402将编码声道E1- E6发送给码流复用接口403。
码流复用接口403将六个编码声道E1-E6进行复用形成串行比特流(bitStream),即编码码流,以方便多声道音频信号在信道中传输或者在数字媒质中存储。
图5为本申请实施例的多声道编码处理单元的处理过程的示意图,如图5所示,上述多声道编码处理单元401可以包括多声道筛选单元4011和迭代处理单元4012,该迭代处理单元4012可以包括组对判决单元40121、声道对能量/幅度均衡单元40122、声道对能量/幅度均衡单元40123、立体声处理单元40124和立体声处理单元40125。
多声道筛选单元4011根据多声道处理指示符(MultiProcFlag)从5.1输入声道(L声道、R声道、C声道、LS声道、RS声道、LFE声道)中筛选出参与多声道处理的声道,包括L声道、R声道、C声道、LS声道、RS声道。
迭代处理单元4012中的组对判决单元40121在第一迭代步骤中,计算L声道、R声道、C声道、LS声道和RS声道中的每对声道之间的声道间相关值。在第一迭代步骤中,选择声道中(L声道、R声道、C声道、LS声道、RS声道)声道间相关值最高的声道对(L声道、R声道)形成第一声道对。将L声道和R声道经过声道对能量/幅度均衡单元40122进行能量/幅度均衡得到L e声道和R e声道。立体声处理单元40124对L e声道和R e声道进行立体声处理,获取第一声道对的边信息及立体声处理后的中声道M1和侧声道S1。该第一声道对的边信息包括第一声道对的能量/幅度均衡的边信息、立体声边信息和声道索引。在第二迭代步骤中选择声道中(C声道、LS声道、RS声道)声道间相关值最高的声道对(LS声道、RS声道)形成第二声道对。将LS声道和RS声道经过能量/幅度均衡单元40123进行能量/幅度均衡得到LS e声道和RS e声道。立体声处理单元40125对LS e声道和RS e声道进行立体声处理,获取第二声道对的边信息及立体声处理后的中声道M2和侧声道S2。第二声道对的边信息包括第二声道对的能量/幅度均衡的边信息、立体声边信息和声道索引。第一声道对的边信息和第二声道对的边信息组成了多声道边信息。
声道对能量/幅度均衡单元40122和声道对能量/幅度均衡单元40123对输入的声道对的能量/幅度取平均得到能量/幅度均衡后的能量/幅度。
例如,声道对能量/幅度均衡单元40122可以通过如下公式(4)确定能量/幅度均衡后的能量/幅度。
energy_avg_pair1=avg(energy_L,energy_R)                 (4)
其中,Avg(a 1,a 2)函数输出2个参数a 1,a 2的均值。energy_L为能量/幅度均衡前的L声道的帧能量/幅度,energy_R为能量/幅度均衡前的R声道的帧能量/幅度,energy_avg_pair1为第一声道对的能量/幅度均衡后的能量/幅度。
其中,energy_L和energy_R可以通过上述公式(3)确定。
声道对能量/幅度均衡单元40123可以通过如下公式(4)确定能量/幅度均衡后的能量/幅度。
energy_avg_pair2=avg(energy_LS,energy_RS)              (5)
其中,Avg(a 1,a 2)函数输出2个参数a 1,a 2的均值。energy_LS为能量/幅度均衡前的LS声道的帧能量/幅度,energy_RS为能量/幅度均衡前的RS声道的帧能量/幅度,energy_avg_pair2为第二声道对的能量/幅度均衡后的能量/幅度。
同时能量/幅度均衡过程中会生成如上述实施例的第一声道对的能量/幅度均衡的边信 息和第二声道对的能量/幅度均衡的边信息。该第一声道对的能量/幅度均衡的边信息和第二声道对的能量/幅度均衡的边信息在编码码流中传输,以指导解码端的能量/幅度去均衡。
对第一声道对的能量/幅度均衡的边信息的确定方式进行解释说明。
S01:计算第一声道对经过声道对能量/幅度均衡单元40122均衡后的能量/幅度energy_avg_pair1。该energy_avg_pair1采用上述公式(4)确定。
S02:计算第一声道对的L声道的浮点能量/幅度缩放比例系数。
一种示例,L声道的浮点能量/幅度缩放比例系数为scaleF_L。浮点能量/幅度缩放比例系数在(0,1)之间。如果energy_L>energy_L e,scaleF_L=energy_L e/energy_L,反之如果energy_L≤energy_L e,scaleF_L=energy_L/energy_L e
其中,energy_L e等于energy_avg_pair1。
S03:计算第一声道对的L声道的定点能量/幅度缩放比例。
一种示例,L声道的定点能量/幅度缩放比例为scaleInt_L。从浮点能量/幅度缩放比例系数scaleF_L到定点能量/幅度缩放比例scaleInt_L的定点化比特数为固定值。定点化比特数决定浮点转定点的精度,同时也要兼顾传输效率(因边信息要占用比特)。这里假设定点化比特数是4(即M=4),L声道的定点能量/幅度缩放比例的计算公式如下:
scaleInt_L=ceil((1<<4)×scaleF_L)
scaleInt_L=clip(scaleInt_L,1,15)
clip((x),(a),(b))=max(a,min(b,(x))),a≤b。其中,ceil(x)函数是对x向上取整的函数。clip(x,a,b)函数是双向钳位函数,将x钳位到[a,b]之间。
S04:计算第一声道对的L声道的能量/幅度缩放标识。
一种示例,L声道的能量/幅度缩放标识为energyBigFlag_L。如果energy_L>energy_L e,energyBigFlag_L置1,反之如果energy_L≤energy_L e,energyBigFlag_L置0。
对L声道当前帧内每个系数进行能量/幅度均衡,具体如下:
若energyBigFlag_L为1,L e(i)=L(i)×scaleInt_L/(1<<4)。其中,i用于标识当前帧的系数,L(i)为能量/幅度均衡前的当前帧的第i个频域系数,L e(i)为能量/幅度均衡后的当前帧的第i个频域系数。若energyBigFlag_L为0,L e(i)=L(i)×(1<<4)/scaleInt_L。
对第一声道对的R声道可以执行类似的S01至S04操作,得到R声道的浮点能量/幅度缩放比例系数scaleF_R、定点能量/幅度缩放比例scaleInt_R、能量/幅度缩放标识energyBigFlag_R,及能量/幅度均衡后的当前帧R e。即,将上述S01至S04中L替换为R。
对第二声道对的LS声道可以执行类似的S01至S04操作,得到LS声道的浮点能量/幅度缩放比例系数scaleF_LS、定点能量/幅度缩放比例scaleInt_LS、能量/幅度缩放标识energyBigFlag_LS,及能量/幅度均衡后的当前帧LS e。即,将上述S01至S04中L替换为LS。
对第二声道对RS声道执行类似的S01至S04操作,得到RS声道的浮点能量/幅度缩放比例系数scaleF_RS、定点能量/幅度缩放比例scaleInt_RS、能量/幅度缩放标识energyBigFlag_RS,及能量/幅度均衡后的当前帧RS e
将多声道边信息写到编码码流,该多声道边信息包括声道对个数、第一声道对的能量/幅度均衡的边信息、第一声道对索引、第二声道对的能量/幅度均衡的边信息和第二声道 对索引。
示例性的,声道对个数为currPairCnt,第一声道对的能量/幅度均衡的边信息和第二声道对的能量/幅度均衡的边信息为二维数组,第一声道对索引和第二声道对索引为一维数组。例如,第一声道对的定点能量/幅度缩放比例为PairILDScale[0][0]和PairILDScale[0][1],第一声道对的能量/幅度缩放标识为energyBigFlag[0][0]和energyBigFlag[0][1],第二声道对的定点能量/幅度缩放比例为PairILDScale[1][0]和PairILDScale[1][1],第二声道对的能量/幅度缩放标识为energyBigFlag[1][0]和energyBigFlag[1][1]。第一声道对索引为PairIndex[0],第二声道对索引为PairIndex[1]。
其中,声道对个数currPairCnt可以为固定比特长度,例如,可以由4比特组成,可以标识最多16个立体声对。
其中,声道对索引PairIndex[pair]的取值定义如表1所示,声道对索引可以为变长编码,用于在编码码流中传输,以节省比特以及用于解码端的音频信号恢复。例如,PairIndex[0]=0,即指示声道对包括R声道和L声道。
表1 5声道的声道对索引映射表
  0(L) 1(R) 2(C) 3(RS) 4(RS)
0(L)   0 1 3 6
1(R)     2 4 7
2(C)       5 8
3(RS)         9
4(RS)          
本实施例中,PairILDScale[0][0]=scaleInt_L。PairILDScale[0][1]=scaleInt_R。
PairILDScale[1][0]=scaleInt_LS。PairILDScale[1][1]=scaleInt_RS。
energyBigFlag[0][0]=energyBigFlag_L。energyBigFlag[0][1]=energyBigFlag_R。
energyBigFlag[1][0]=energyBigFlag_LS。energyBigFlag[1][1]=energyBigFlag_RS。
PairIndex[0]=0(L和R)。PairIndex[1]=9(LS和RS)。
示例性的,多声道边信息写码流的流程如图6所示。步骤601、设置变量pair=0,将声道对个数写入码流。例如,声道对个数currPairCnt可以使4比特。步骤602、判断pair是否小于声道对个数,若是,则执行步骤603,若否,则结束。步骤603、将第i声道对索引写入码流。i=pair+1,例如,将PairIndex[0]写入码流。步骤604、将第i声道对的定点能量/幅度缩放比例写入码流。例如,将PairILDScale[0][0]和PairILDScale[0][1]写入到码流。PairILDScale[0][0]和PairILDScale[0][1]可以各占4比特。步骤605、将第i声道对的能量/幅度缩放标识写入码流。例如,将energyBigFlag[0][0]和energyBigFlag[0][1]写入到码流。energyBigFlag[0][0]和energyBigFlag[0][1]可以各占1比特。步骤606、将第i声道对的立体声边信息写入码流,且pair=pair+1,返回执行步骤602。返回步骤602之后,将PairIndex[1]、PairILDScale[1][0]、PairILDScale[1][1]、energyBigFlag[1][0]、energyBigFlag[1][1]写入码流,直至结束。
图7为本申请实施例的一种多声道音频信号解码方法的流程图,本申请实施例的执行 主体可以是上述解码器,如图7所示,本实施例的方法可以包括:
步骤701、获取待解码码流。
其中,该待解码码流可以是如上编码方法实施例得到的编码码流。
步骤702、对待解码码流进行解复用,以获取待解码多声道音频信号的当前帧以及该当前帧包括的声道对个数。
以5.1声道信号为例,对待解码码流进行解复用后,得到M1声道信号、S1声道信号、M2声道信号、S2声道信号、LFE声道信号和C声道信号,以及声道对个数。
步骤703、判断声道对个数是否等于0,若是,则执行步骤704,若否,则执行步骤705。
步骤704、对待解码多声道音频信号的当前帧进行解码,以获取当前帧的解码信号。
当声道对个数等于0时,即各个声道均未组对,则可以对待解码多声道音频信号的当前帧进行解码,以获取当前帧的解码信号。
步骤705、解析该当前帧,获取该当前帧包括的K个声道对索引和K个声道对的能量/幅度均衡的边信息。
当声道对个数等于K时,则可以对当前帧进行进一步解析,以获取其他控制信息,例如,K个声道对索引和当前帧的K个声道对的能量/幅度均衡的边信息,以便后续对待解码多声道音频信号的当前帧进行解码过程中进行能量/幅度去均衡,以获取当前帧的解码信号。
步骤706、根据K个声道对索引,以及K个声道对的能量/幅度均衡的边信息,对待解码多声道音频信号的当前帧进行解码,以获取当前帧的解码信号。
以5.1声道信号为例,对M1声道信号、S1声道信号、M2声道信号、S2声道信号、LFE声道信号和C声道信号进行解码,以获取L声道信号、R声道信号、LS声道信号、RS声道信号、LFE声道信号和C声道信号。解码过程中,基于K个声道对的能量/幅度均衡的边信息进行能量/幅度去均衡。
在一些实施例中,一个声道对的能量/幅度均衡的边信息可以包括该声道对的定点能量/幅度缩放比例和能量/幅度缩放标识,其具体解释说明可以参见前述编码实施例的解释说明,此处不再赘述。
本实施例,通过对待解码码流进行解复用,以获取待解码多声道音频信号的当前帧和该当前帧包括的声道对个数,当声道对个数大于0时,进一步解析该当前帧,获取K个声道对索引和K个声道对的能量/幅度均衡的边信息,根据K个声道对索引,以及K个声道对的能量/幅度均衡的边信息,对待解码多声道音频信号的当前帧进行解码,以获取当前帧的解码信号。由于编码端发送的码流未携带未组对的声道的能量/幅度均衡的边信息,从而可以减少编码码流中能量/幅度均衡的边信息的比特数,降低多声道边信息的比特数,可以将节省的比特分配到编码器的其他功能模块,以提升解码端重建音频信号的质量。
下面实施例以5.1声道信号为例,对本申请实施例的多声道音频信号解码方法进行示意性举例说明。
图8为本申请实施例的解码端的处理过程的示意图,如图8所示,该解码端可以包括码流解复用接口801、声道解码单元802和多声道解码处理单元803。本实施例的解码过程为上述图4和图5所示实施例的编码过程的逆过程。
码流解复用接口801用于对编码端输出的码流进行解复用得到六路编码声道E1-E6。
声道解码单元802用于对编码声道E1-E6进行逆熵编码和逆量化得到多声道信号,包括第一声道对的中声道M1和侧声道S1,第二声道对的中声道M2和侧声道S2,以及未组对的C声道和LFE声道。声道解码单元802还解码得到多声道边信息。该多声道边信息包括上述图4所示实施例的声道编码处理过程中生成的边信息(例如,熵编码的边信息),以及多声道编码处理过程中生成的边信息(例如,声道对的能量/幅度均衡的边信息)。
多声道解码处理单元803对第一声道对的中声道M1和侧声道S1,第二声道对的中声道M2和侧声道S2进行多声道解码处理。利用多声道边信息,将第一声道对的中声道M1和侧声道S1解码成L声道和R声道,将第二声道对的中声道M2和侧声道S2解码成LS声道和RS声道。L声道、R声道、LS声道、RS声道、未组对的C声道和LFE声道构成了解码端的输出。
图9为本申请实施例的多声道解码处理单元的处理过程的示意图,如图9所示,上述多声道解码处理单元803可以包括多声道筛选单元8031和多声道解码处理子模块8032。该多声道编码处理子模块8032包括两个立体声解码盒、能量/幅度去均衡单元8033和能量/幅度去均衡单元8034。
多声道筛选单元8031根据多声道边信息中的声道对个数和声道对索引从5.1输入声道(M1声道、S1声道、C声道、M2声道、S2声道、LFE声道)中筛选出参与多声道处理的M1声道、S1声道、M2声道、S2声道。
多声道解码处理子模块8032中的立体声解码盒用于执行以下步骤:根据第一声道对的立体声边信息指导立体声解码盒将第一声道对(M1,S1)解码成L e声道和R e声道。根据第二声道对的立体声边信息指导立体声解码盒将第二声道对(M2,S2)解码成LS e声道和RS e声道。
能量/幅度去均衡单元8033用于执行以下步骤:根据第一声道对的能量/幅度的边信息指导第一声道对去均衡单元,将L e声道和R e声道的能量/幅度去均衡恢复成L声道、R声道。能量/幅度去均衡单元8034用于执行以下步骤:根据第二声道对的能量/幅度均衡的边信息指导第一声道对去均衡单元将LS e声道、RS e声道恢复成LS声道、RS声道。
对多声道边信息解码的过程进行解释说明。图10为本申请实施例的一种多声道边信息解析的流程图,本实施例为上述图6所示实施例的逆过程,如图10所示,步骤701、解析码流,得到当前帧的声道对个数。例如,声道对个数currPairCnt,声道对个数currPairCnt在码流中占用4比特。步骤702、判断当前帧的声道对个数是否为零,若是,则结束,若否,则执行步骤703。当前帧的声道对个数currPairCnt为零,表示当前帧没有进行组对,则不用解析获取能量/幅度均衡的边信息。当前帧的声道对个数currPairCnt不为零,循环解析第一声道对,……,第currPairCnt声道对的能量/幅度均衡的边信息。例如,设置变量pair=0。并执行后续步骤703至707。步骤703、判断pair是否小于声道对个数,若是,则执行步骤704,若否,则结束。步骤704、从码流解析第i声道对索引。i=pair+1。步骤705、从码流解析第i声道对的定点能量/幅度缩放比例。例如,PairILDScale[pair][0]和PairILDScale[pair][1]。步骤706、从码流解析第i声道对的能量/幅度缩放标识。例如,energyBigFlag[pair][0]和energyBigFlag[pair][1]。步骤707、从码流解析第i声道对的立体声边信息,且pair=pair+1,返回执行步骤703,直至解析出所有声道对索引、定点能量/幅度缩放比例以及能量/幅度缩放标识。
以编码端5.1(L、R、C、LFE、LS、RS)信号为例描述第一声道对和第二声道对的边信息解析过程。
第一声道对的边信息解析过程如下:从码流解析4比特的声道对索引PairIndex[0],根据声道对索引的定义规则映射成L声道和R声道。从码流解析L声道的定点能量/幅度缩放比例PairILDScale[0][0]和R声道的定点能量/幅度缩放比例PairILDScale[0][1]。从码流解析L声道的能量/幅度缩放标识energyBigFlag[0][0]和R声道的能量/幅度缩放标识energyBigFlag[0][1]。从码流解析第一声道对的立体声边信息。第一声道对的边信息解析结束。
第二声道对的边信息解析过程如下:从码流解析4比特的声道对索引PairIndex[1],根据声道对索引的定义规则映射成LS声道和RS声道。从码流解析LS声道的定点能量/幅度缩放比例PairILDScale[1][0]和RS声道的定点能量/幅度缩放比例PairILDScale[1][1]。从码流解析LS声道的能量/幅度缩放标识energyBigFlag[1][0]和RS声道的能量/幅度缩放标识energyBigFlag[1][1]。从码流解析第二声道对的立体声边信息。第二声道对的边信息解析结束。
能量/幅度去均衡单元8033用于将第一声道对的L e声道和R e声道的能量/幅度去均衡的过程如下:
根据L声道的定点能量/幅度缩放比例PairILDScale[0][0]和L声道的能量/幅度缩放标识energyBigFlag[0][0]计算L声道的浮点能量/幅度缩放比例系数scaleF_L。若L声道的能量/幅度缩放标识energyBigFlag[0][0]为1,scaleF_L=(1<<4)/PairILDScale[0][0];若L声道的能量/幅度缩放标识energyBigFlag[0][0]为0,scaleF_L=PairILDScale[0][0]/(1<<4)。
根据L声道的浮点能量/幅度缩放比例系数scaleF_L得到能量/幅度去均衡后的L声道的频域系数。L(i)=L e(i)×scaleF_L;其中,i用于标识当前帧的系数,L(i)为能量/幅度均衡前的当前帧的第i个频域系数,L e(i)为能量/幅度均衡后的当前帧的第i个频域系数。
根据R声道的定点能量/幅度缩放比例PairILDScale[0][1]和R声道的能量/幅度缩放标识energyBigFlag[0][1]计算R声道的浮点能量/幅度缩放比例系数scaleF_R。若R声道的能量/幅度缩放标识energyBigFlag[0][1]为1,scaleF_R=(1<<4)/PairILDScale[0][1];若R声道的能量/幅度缩放标识energyBigFlag[0][1]为0,scaleF_R=PairILDScale[0][1]/(1<<4)。
根据R声道的浮点能量/幅度缩放比例系数scaleF_R得到能量/幅度去均衡后的R声道的频域系数。R(i)=R e(i)×scaleF_R;其中,i用于标识当前帧的系数,L(i)为能量/幅度均衡前的当前帧的第i个频域系数,L e(i)为能量/幅度均衡后的当前帧的第i个频域系数。
能量/幅度去均衡单元8034用于将第二声道对LS e声道和RS e声道的能量/幅度去均衡,其具体实施方式与第一声道对的L e声道和R e声道的能量/幅度去均衡的一致,此处不再赘述。
多声道解码处理单元803的输出是经过解码后的L声道信号、R声道信号、LS声道信号、RS声道信号、C声道信号和LFE声道信号。
本实施例,由于编码端发送的码流未携带未组对的声道的能量/幅度均衡的边信息,从而可以减少编码码流中能量/幅度均衡的边信息的比特数,降低多声道边信息的比特数,可以将节省的比特分配到编码器的其他功能模块,以提升解码端重建音频信号的质量。
基于与上述方法相同的发明构思,本申请实施例还提供了一种音频信号编码装置,该音频信号编码装置可以应用于音频编码器。
图11为本申请实施例的一种音频信号编码装置的结构示意图,如图11所示,该音频信号编码装置1100包括:获取模块1101、均衡边信息生成模块1102、以及编码模块1103。
获取模块1101,用于获取多声道音频信号的当前帧的P个声道的音频信号和P个声道的音频信号各自的能量/幅度,P为大于1的正整数,该P个声道包括K个声道对,每个声道对包括两个声道,K为正整数,P大于或等于K*2。
均衡边信息生成模块1102,用于根据P个声道的音频信号各自的能量/幅度,生成K个声道对的能量/幅度均衡的边信息;
编码模块1103,用于对K个声道对的能量/幅度均衡的边信息,和P个声道的音频信号进行编码,以获取编码码流。
在一些实施例中,该K个声道对包括当前声道对,该当前声道对的能量/幅度均衡的边信息包括:该当前声道对的定点能量/幅度缩放比例和能量/幅度缩放标识,该定点能量/幅度缩放比例为能量/幅度缩放比例系数的定点化值,该能量/幅度缩放比例系数根据该当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度与该两个声道的音频信号各自的能量/幅度均衡后的能量/幅度获得,该能量/幅度缩放标识用于标识该当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度相对于各自的能量/幅度均衡前的能量/幅度是被放大或被缩小。
在一些实施例中,该K个声道对包括当前声道对,均衡边信息生成模块1102用于:根据当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度。根据当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,和该两个声道的音频信号各自的能量/幅度均衡后的能量/幅度,生成该当前声道对的能量/幅度均衡的边信息。
在一些实施例中,该当前声道对包括第一声道和第二声道,该当前声道对的能量/幅度均衡的边信息包括:第一声道的定点能量/幅度缩放比例、第二声道的定点能量/幅度缩放比例、第一声道的能量/幅度缩放标识和第二声道的能量/幅度缩放标识。
在一些实施例中,均衡边信息生成模块1102用于:根据当前声道对的第q声道的能量/幅度均衡前的能量/幅度,和该第q声道的音频信号的能量/幅度均衡后的能量/幅度,确定该第q声道的音频信号的能量/幅度缩放比例系数。根据该第q声道的能量/幅度缩放比例系数,确定该第q声道的定点能量/幅度缩放比例。根据该第q声道的能量/幅度均衡前的能量/幅度,和该第q声道的能量/幅度均衡后的能量/幅度,确定该第q声道的能量/幅度缩放标识。其中,q为一或二。
在一些实施例中,均衡边信息生成模块1102用于:根据当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定该当前声道对的音频信号的能量/幅度平均 值,根据该当前声道对的音频信号的能量/幅度平均值确定该当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度。
在一些实施例中,编码模块1103用于:对K个声道对的能量/幅度均衡的边信息、K、K个声道对各自对应的声道对索引以及P个声道的音频信号进行编码,以获取编码码流。
需要说明的是,上述获取模块1101、均衡边信息生成模块1102、以及编码模块1103可应用于编码端的音频信号编码过程。
还需要说明的是,获取模块1101、均衡边信息生成模块1102、以及编码模块1103的具体实现过程可参考上述方法实施例中关于编码方法的详细描述,为了说明书的简洁,这里不再赘述。
基于与上述方法相同的发明构思,本申请实施例提供一种音频信号编码器,音频信号编码器用于编码音频信号,包括:如执行如上述一个或者多个实施例中所述的编码器,其中,音频信号编码装置用于编码生成对应的码流。
基于与上述方法相同的发明构思,本申请实施例提供一种用于编码音频信号的设备,例如,音频信号编码设备,请参阅图12所示,音频信号编码设备1200包括:
处理器1201、存储器1202以及通信接口1203(其中音频信号编码设备1200中的处理器1201的数量可以一个或多个,图12中以一个处理器为例)。在本申请的一些实施例中,处理器1201、存储器1202以及通信接口1203可通过总线或其它方式连接,其中,图12中以通过总线连接为例。
存储器1202可以包括只读存储器和随机存取存储器,并向处理器1201提供指令和数据。存储器1202的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1202存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。
处理器1201控制音频编码设备的操作,处理器1201还可以称为中央处理单元(central processing unit,CPU)。具体的应用中,音频编码设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1201中,或者由处理器1201实现。处理器1201可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1201中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1201可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程 只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1202,处理器1201读取存储器1202中的信息,结合其硬件完成上述方法的步骤。
通信接口1203可用于接收或发送数字或字符信息,例如可以是输入/输出接口、管脚或电路等。举例而言,通过通信接口1203发送上述编码码流。
基于与上述方法相同的发明构思,本申请实施例提供一种音频编码设备,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行如上述一个或者多个实施例中所述的多声道音频信号编码方法的部分或全部步骤。
基于与上述方法相同的发明构思,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储了程序代码,其中,所述程序代码包括用于执行如上述一个或者多个实施例中所述的多声道音频信号编码方法的部分或全部步骤的指令。
基于与上述方法相同的发明构思,本申请实施例提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如上述一个或者多个实施例中所述的多声道音频信号编码方法的部分或全部步骤。
基于与上述方法相同的发明构思,本申请实施例还提供了一种音频信号解码装置,该音频信号解码装置可以应用于音频解码器。
图13为本申请实施例的一种音频信号解码装置的结构示意图,如图13所示,该音频信号解码装置1300包括:获取模块1301、解复用模块1302、以及解码模块1303。
获取模块1301,用于获取待解码码流。
解复用模块1302,用于对待解码码流进行解复用,以获取待解码多声道音频信号的当前帧,当前帧包括的声道对的数量K,K个声道对各自对应的声道对索引,以及K个声道对的能量/幅度均衡的边信息;
解码模块1303,用于根据K个声道对各自对应的声道对索引,以及K个声道对的能量/幅度均衡的边信息,对待解码多声道音频信号的当前帧进行解码,以获取当前帧的解码信号,K为正整数,每个声道对包括两个声道。
在一些实施例中,该K个声道对包括当前声道对,该当前声道对的能量/幅度均衡的边信息包括:该当前声道对的定点能量/幅度缩放比例和能量/幅度缩放标识,该定点能量/幅度缩放比例为能量/幅度缩放比例系数的定点化值,该能量/幅度缩放比例系数根据该当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度与该两个声道的音频信号各自的能量/幅度均衡后的能量/幅度获得,该能量/幅度缩放标识用于标识该当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度相对于各自的能量/幅度均衡前的能量/幅度是被放大或被缩小。
在一些实施例中,该K个声道对包括当前声道对,解码模块1303用于:根据当前声道对对应的声道对索引,对待解码多声道音频信号的当前帧进行立体声解码处理,以获取当前帧的当前声道对的两个声道的音频信号。根据该当前声道对的能量/幅度均衡的边信息,对该当前声道对的两个声道的音频信号进行能量/幅度去均衡处理,以获取该当前声道对的两个声道的解码信号。
在一些实施例中,该当前声道对包括第一声道和第二声道,该当前声道对的能量/幅度 均衡的边信息包括:该第一声道的定点能量/幅度缩放比例、该第二声道的定点能量/幅度缩放比例、第一声道的能量/幅度缩放标识和第二声道的能量/幅度缩放标识。
需要说明的是,上述获取模块1301、解复用模块1302、以及解码模块1303可应用于解码端的音频信号解码过程。
还需要说明的是,获取模块1301、解复用模块1302、以及解码模块1303的具体实现过程可参考上述方法实施例中关于解码方法的详细描述,为了说明书的简洁,这里不再赘述。
基于与上述方法相同的发明构思,本申请实施例提供一种音频信号解码器,音频信号解码器用于解码音频信号,包括:如执行如上述一个或者多个实施例中所述的解码器,其中,音频信号解码装置用于解码生成对应的码流。
基于与上述方法相同的发明构思,本申请实施例提供一种用于解码音频信号的设备,例如,音频信号解码设备,请参阅图14所示,音频信号解码设备1400包括:
处理器1401、存储器1402以及通信接口1403(其中音频信号解码设备1400中的处理器1401的数量可以一个或多个,图14中以一个处理器为例)。在本申请的一些实施例中,处理器1401、存储器1402以及通信接口1403可通过总线或其它方式连接,其中,图14中以通过总线连接为例。
存储器1402可以包括只读存储器和随机存取存储器,并向处理器1401提供指令和数据。存储器1402的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1402存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。
处理器1401控制音频解码设备的操作,处理器1401还可以称为中央处理单元(central processing unit,CPU)。具体的应用中,音频解码设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1401中,或者由处理器1401实现。处理器1401可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1401中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1401可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1402,处理器1401读取存储器1402中的信息,结合其硬件完成上述方法的步 骤。
通信接口1403可用于接收或发送数字或字符信息,例如可以是输入/输出接口、管脚或电路等。举例而言,通过通信接口1403接收上述编码码流。
基于与上述方法相同的发明构思,本申请实施例提供一种音频解码设备,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行如上述一个或者多个实施例中所述的多声道音频信号解码方法的部分或全部步骤。
基于与上述方法相同的发明构思,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储了程序代码,其中,所述程序代码包括用于执行如上述一个或者多个实施例中所述的多声道音频信号解码方法的部分或全部步骤的指令。
基于与上述方法相同的发明构思,本申请实施例提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如上述一个或者多个实施例中所述的多声道音频信号解码方法的部分或全部步骤。
以上各实施例中提及的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、特定应用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请实施例公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
上述各实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本 申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (27)

  1. 一种多声道音频信号编码方法,其特征在于,包括:
    获取多声道音频信号的当前帧的P个声道的音频信号,P为大于1的正整数,所述P个声道包括K个声道对,每个声道对包括两个声道,K为正整数,P大于或等于K*2;
    获取所述P个声道的音频信号各自的能量/幅度;
    根据所述P个声道的音频信号各自的能量/幅度,生成所述K个声道对的能量/幅度均衡的边信息;
    对所述K个声道对的能量/幅度均衡的边信息,和所述P个声道的音频信号进行编码,以获取编码码流。
  2. 根据权利要求1所述的方法,其特征在于,所述K个声道对包括当前声道对,所述当前声道对的能量/幅度均衡的边信息包括:
    所述当前声道对的定点能量/幅度缩放比例和能量/幅度缩放标识,其中,所述定点能量/幅度缩放比例为能量/幅度缩放比例系数的定点化值,所述能量/幅度缩放比例系数根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度与所述两个声道的音频信号各自的能量/幅度均衡后的能量/幅度获得,所述能量/幅度缩放标识用于标识所述当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度相对于各自的能量/幅度均衡前的能量/幅度是被放大或被缩小。
  3. 根据权利要求1或2所述的方法,其特征在于,所述K个声道对包括当前声道对,所述根据所述P个声道的音频信号各自的能量/幅度,生成所述K个声道对的能量/幅度均衡的边信息包括根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,生成所述当前声道对的能量/幅度均衡的边信息;
    所述根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,生成所述当前声道对的能量/幅度均衡的边信息包括:
    根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定所述当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度;
    根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,和所述两个声道的音频信号各自的能量/幅度均衡后的能量/幅度,生成所述当前声道对的能量/幅度均衡的边信息。
  4. 根据权利要求3所述的方法,其特征在于,所述当前声道对包括第一声道和第二声道,所述当前声道对的能量/幅度均衡的边信息包括:
    所述第一声道的定点能量/幅度缩放比例和能量/幅度缩放标识、以及所述第二声道的定点能量/幅度缩放比例和能量/幅度缩放标识。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,和所述两个声道的音频信号各自的能量/幅度均衡后的能量/幅度,生成所述当前声道对的能量/幅度均衡的边信息,包括:
    根据所述当前声道对的第q声道的音频信号的能量/幅度均衡前的能量/幅度,和所述第q声道的音频信号的能量/幅度均衡后的能量/幅度,确定所述第q声道的能量/幅度缩放比例系数和所述第q声道的能量/幅度缩放标识;
    根据所述第q声道的能量/幅度缩放比例系数,确定所述第q声道的定点能量/幅度缩 放比例;
    其中,q为一或二。
  6. 根据权利要求3至5任一项所述的方法,其特征在于,所述根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定所述当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度,包括:
    根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定所述当前声道对的音频信号的能量/幅度平均值,根据所述当前声道对的音频信号的能量/幅度平均值确定所述当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述对所述K个声道对的能量/幅度均衡的边信息,和所述P个声道的音频信号进行编码,以获取编码码流,包括:
    对所述K个声道对的能量/幅度均衡的边信息、所述K、所述K个声道对各自对应的声道对索引以及所述P个声道的音频信号进行编码,以获取所述编码码流。
  8. 一种多声道音频信号解码方法,其特征在于,包括:
    获取待解码码流;
    对所述待解码码流进行解复用,以获取待解码多声道音频信号的当前帧,所述当前帧包括的声道对的数量K,所述K个声道对各自对应的声道对索引,以及所述K个声道对的能量/幅度均衡的边信息,K为正整数,每个声道对包括两个声道;
    根据所述K个声道对各自对应的声道对索引,以及所述K个声道对的能量/幅度均衡的边信息,对所述待解码多声道音频信号的当前帧进行解码,以获取所述当前帧的解码信号。
  9. 根据权利要求8所述的方法,其特征在于,所述K个声道对包括当前声道对,所述当前声道对的能量/幅度均衡的边信息包括:所述当前声道对的定点能量/幅度缩放比例和能量/幅度缩放标识,其中,所述定点能量/幅度缩放比例为能量/幅度缩放比例系数的定点化值,所述能量/幅度缩放比例系数根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度与所述两个声道的音频信号各自的能量/幅度均衡后的能量/幅度获得,所述能量/幅度缩放标识用于标识所述当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度相对于各自的能量/幅度均衡前的能量/幅度是被放大或被缩小。
  10. 根据权利要求9所述的方法,其特征在于,所述当前声道对包括第一声道和第二声道,所述当前声道对的能量/幅度均衡的边信息包括:所述第一声道的定点能量/幅度缩放比例和能量/幅度缩放标识、以及所述第二声道的定点能量/幅度缩放比例和能量/幅度缩放标识。
  11. 根据权利要求8至10任一项所述的方法,其特征在于,所述K个声道对包括当前声道对,所述根据所述K个声道对各自对应的声道对索引,以及所述K个声道对的能量/幅度均衡的边信息,对所述待解码多声道音频信号的当前帧进行解码,以获取所述当前帧的解码信号,包括:
    根据所述当前声道对对应的声道对索引,对所述待解码多声道音频信号的当前帧进行立体声解码处理,以获取所述当前帧的当前声道对的两个声道的音频信号;
    根据所述当前声道对的能量/幅度均衡的边信息,对所述当前声道对的两个声道的音频 信号进行能量/幅度去均衡处理,以获取所述当前声道对的两个声道的解码信号。
  12. 一种音频信号编码装置,其特征在于,包括:
    获取模块,用于获取多声道音频信号的当前帧的P个声道的音频信号和所述P个声道的音频信号各自的能量/幅度,P为大于1的正整数,所述P个声道包括K个声道对,每个声道对包括两个声道,K为正整数,P大于或等于K*2;
    均衡边信息生成模块,用于根据所述P个声道的音频信号各自的能量/幅度,生成所述K个声道对的能量/幅度均衡的边信息;
    编码模块,用于对所述K个声道对的能量/幅度均衡的边信息,和所述P个声道的音频信号进行编码,以获取编码码流。
  13. 根据权利要求12所述的装置,其特征在于,所述K个声道对包括当前声道对,所述当前声道对的能量/幅度均衡的边信息包括:所述当前声道对的定点能量/幅度缩放比例和能量/幅度缩放标识,其中,所述定点能量/幅度缩放比例为能量/幅度缩放比例系数的定点化值,所述能量/幅度缩放比例系数根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度与所述两个声道的音频信号各自的能量/幅度均衡后的能量/幅度获得,所述能量/幅度缩放标识用于标识所述当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度相对于各自的能量/幅度均衡前的能量/幅度是被放大或被缩小。
  14. 根据权利要求12或13所述的装置,其特征在于,所述K个声道对包括当前声道对,所述均衡边信息生成模块用于:根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定所述当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度;根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,和所述两个声道的音频信号各自的能量/幅度均衡后的能量/幅度,生成所述当前声道对的能量/幅度均衡的边信息。
  15. 根据权利要求14所述的装置,其特征在于,所述当前声道对包括第一声道和第二声道,所述当前声道对的能量/幅度均衡的边信息包括:所述第一声道的定点能量/幅度缩放比例和能量/幅度缩放标识、以及所述第二声道的定点能量/幅度缩放比例和能量/幅度缩放标识。
  16. 根据权利要求15所述的装置,其特征在于,所述均衡边信息生成模块用于:根据所述当前声道对的第q声道的音频信号的能量/幅度均衡前的能量/幅度,和所述第q声道的音频信号的能量/幅度均衡后的能量/幅度,确定所述第q声道的能量/幅度缩放比例系数和所述第q声道的能量/幅度缩放标识;根据所述第q声道的能量/幅度缩放比例系数,确定所述第q声道的定点能量/幅度缩放比例;
    其中,q为一或二。
  17. 根据权利要求14至16任一项所述的装置,其特征在于,所述均衡边信息生成模块用于:根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定所述当前声道对的音频信号的能量/幅度平均值,根据所述当前声道对的音频信号的能量/幅度平均值确定所述当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度。
  18. 根据权利要求12至17任一项所述的装置,其特征在于,所述编码模块用于:对所述K个声道对的能量/幅度均衡的边信息、所述K、所述K个声道对各自对应的声道对索引以及所述P个声道的音频信号进行编码,以获取所述编码码流。
  19. 一种音频信号解码装置,其特征在于,包括:
    获取模块,用于获取待解码码流;
    解复用模块,用于对所述待解码码流进行解复用,以获取待解码多声道音频信号的当前帧,所述当前帧包括的声道对的数量K,所述K个声道对各自对应的声道对索引,以及所述K个声道对的能量/幅度均衡的边信息,K为正整数,每个声道对包括两个声道;
    解码模块,用于根据所述K个声道对各自的声道对索引,以及所述K个声道对的能量/幅度均衡的边信息,对所述待解码多声道音频信号的当前帧进行解码,以获取所述当前帧的解码信号。
  20. 根据权利要求19所述的装置,其特征在于,所述K个声道对包括当前声道对,所述当前声道对的能量/幅度均衡的边信息包括:所述当前声道对的定点能量/幅度缩放比例和能量/幅度缩放标识,其中,所述定点能量/幅度缩放比例为能量/幅度缩放比例系数的定点化值,所述能量/幅度缩放比例系数根据所述当前声道对的两个声道的音频信号各自的能量/幅度均衡前的能量/幅度与所述两个声道的音频信号各自的能量/幅度均衡后的能量/幅度获得,所述能量/幅度缩放标识用于标识所述当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度相对于各自的能量/幅度均衡前的能量/幅度是被放大或被缩小。
  21. 根据权利要求20所述的装置,其特征在于,所述当前声道对包括第一声道和第二声道,所述当前声道对的能量/幅度均衡的边信息包括:所述第一声道的定点能量/幅度缩放比例和能量/幅度缩放标识、以及所述第二声道的定点能量/幅度缩放比例和能量/幅度缩放标识。
  22. 根据权利要求19至21任一项所述的装置,其特征在于,所述K个声道对包括当前声道对,所述解码模块用于:
    根据所述当前声道对对应的声道对索引,对所述待解码多声道音频信号的当前帧进行立体声解码处理,以获取所述当前帧的当前声道对的两个声道的音频信号;
    根据所述当前声道对的能量/幅度均衡的边信息,对所述当前声道对的两个声道的音频信号进行能量/幅度去均衡处理,以获取所述当前声道对的两个声道的解码信号。
  23. 一种音频信号编码装置,其特征在于,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行如权利要求1至7任一项所述的方法。
  24. 一种音频信号解码装置,其特征在于,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行如权利要求8至11任一项所述的方法。
  25. 一种音频信号编码设备,其特征在于,包括:编码器,所述编码器用于执行如权利要求1至7任一项所述的方法。
  26. 一种音频信号解码设备,其特征在于,包括:解码器,所述解码器用于执行如权利要求8至11任一项所述的方法。
  27. 一种计算机可读存储介质,其特征在于,包括根据如权利要求1至7任一项所述的方法获得的编码码流。
PCT/CN2021/106514 2020-07-17 2021-07-15 多声道音频信号编解码方法和装置 WO2022012628A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020237005513A KR20230038777A (ko) 2020-07-17 2021-07-15 멀티-채널 오디오 신호 인코딩/디코딩 방법 및 장치
EP21843200.3A EP4174854A4 (en) 2020-07-17 2021-07-15 METHOD AND DEVICE FOR ENCODING/DECODING MULTI-CHANNEL AUDIO SIGNAL
US18/154,633 US20230145725A1 (en) 2020-07-17 2023-01-13 Multi-channel audio signal encoding and decoding method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010699711.8 2020-07-17
CN202010699711.8A CN113948096A (zh) 2020-07-17 2020-07-17 多声道音频信号编解码方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/154,633 Continuation US20230145725A1 (en) 2020-07-17 2023-01-13 Multi-channel audio signal encoding and decoding method and apparatus

Publications (1)

Publication Number Publication Date
WO2022012628A1 true WO2022012628A1 (zh) 2022-01-20

Family

ID=79326911

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/106514 WO2022012628A1 (zh) 2020-07-17 2021-07-15 多声道音频信号编解码方法和装置

Country Status (5)

Country Link
US (1) US20230145725A1 (zh)
EP (1) EP4174854A4 (zh)
KR (1) KR20230038777A (zh)
CN (1) CN113948096A (zh)
WO (1) WO2022012628A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023173941A1 (zh) * 2022-03-14 2023-09-21 华为技术有限公司 一种多声道信号的编解码方法和编解码设备以及终端设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276587A (zh) * 2007-03-27 2008-10-01 北京天籁传音数字技术有限公司 声音编码装置及其方法和声音解码装置及其方法
US20150189457A1 (en) * 2013-12-30 2015-07-02 Aliphcom Interactive positioning of perceived audio sources in a transformed reproduced sound field including modified reproductions of multiple sound fields
CN105264595A (zh) * 2013-06-05 2016-01-20 汤姆逊许可公司 用于编码音频信号的方法、用于编码音频信号的装置、用于解码音频信号的方法和用于解码音频信号的装置
CN108206022A (zh) * 2016-12-16 2018-06-26 南京青衿信息科技有限公司 利用aes/ebu信道传输三维声信号的编解码器及其编解码方法
CN109074810A (zh) * 2016-02-17 2018-12-21 弗劳恩霍夫应用研究促进协会 用于多声道编码中的立体声填充的装置和方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2495503C2 (ru) * 2008-07-29 2013-10-10 Панасоник Корпорэйшн Устройство кодирования звука, устройство декодирования звука, устройство кодирования и декодирования звука и система проведения телеконференций
GB2563635A (en) * 2017-06-21 2018-12-26 Nokia Technologies Oy Recording and rendering audio signals
CN112639967A (zh) * 2018-07-04 2021-04-09 弗劳恩霍夫应用研究促进协会 使用信号白化作为预处理的多信号音频编码

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276587A (zh) * 2007-03-27 2008-10-01 北京天籁传音数字技术有限公司 声音编码装置及其方法和声音解码装置及其方法
CN105264595A (zh) * 2013-06-05 2016-01-20 汤姆逊许可公司 用于编码音频信号的方法、用于编码音频信号的装置、用于解码音频信号的方法和用于解码音频信号的装置
US20150189457A1 (en) * 2013-12-30 2015-07-02 Aliphcom Interactive positioning of perceived audio sources in a transformed reproduced sound field including modified reproductions of multiple sound fields
CN109074810A (zh) * 2016-02-17 2018-12-21 弗劳恩霍夫应用研究促进协会 用于多声道编码中的立体声填充的装置和方法
CN108206022A (zh) * 2016-12-16 2018-06-26 南京青衿信息科技有限公司 利用aes/ebu信道传输三维声信号的编解码器及其编解码方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4174854A4

Also Published As

Publication number Publication date
CN113948096A (zh) 2022-01-18
EP4174854A1 (en) 2023-05-03
EP4174854A4 (en) 2024-01-03
US20230145725A1 (en) 2023-05-11
KR20230038777A (ko) 2023-03-21

Similar Documents

Publication Publication Date Title
US20230298600A1 (en) Audio encoding and decoding method and apparatus
US8041041B1 (en) Method and system for providing stereo-channel based multi-channel audio coding
CN111034225B (zh) 使用立体混响信号的音频信号处理方法和装置
CN114067810A (zh) 音频信号渲染方法和装置
US20230040515A1 (en) Audio signal coding method and apparatus
WO2022012628A1 (zh) 多声道音频信号编解码方法和装置
US20210027795A1 (en) Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations
US20230298601A1 (en) Audio encoding and decoding method and apparatus
US12027174B2 (en) Apparatus, methods, and computer programs for encoding spatial metadata
WO2022012554A1 (zh) 多声道音频信号编码方法和装置
WO2022237851A1 (zh) 一种音频编码、解码方法及装置
WO2023051368A1 (zh) 编解码方法、装置、设备、存储介质及计算机程序产品
WO2022257824A1 (zh) 一种三维音频信号的处理方法和装置
WO2022012553A1 (zh) 多声道音频信号的编解码方法和装置
WO2023221590A1 (zh) 编解码方法及电子设备
CN115881138A (zh) 解码方法、装置、设备、存储介质及计算机程序产品
CN115410585A (zh) 音频数据编解码方法和相关装置及计算机可读存储介质
CN115938388A (zh) 一种三维音频信号的处理方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21843200

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021843200

Country of ref document: EP

Effective date: 20230127

ENP Entry into the national phase

Ref document number: 20237005513

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE